genQC logo genQC
  • Overview
  • Get Started
  • Tutorials
  • API Reference
  • Research
  • Code Repository
  1. Dataset
  2. Dataset helper functions

API Reference

  • Modules Overview
  • Release notes

  • Benchmark
    • Compilation benchmark
  • Dataset
    • Dataset balancing
    • Cached dataset
    • Quantum circuit dataset
    • Config dataset
    • Dataset helper functions
    • Mixed cached dataset
  • Inference
    • Evaluation metrics
    • Evaluation helper
    • Sampling functions
  • Models
    • Config model
    • Frozen OpenCLIP
    • Layers
    • Position encodings
    • Conditional qc-UNet
    • Encoder for unitaries
    • Clip
      • Frozen OpenCLIP
      • Unitary CLIP
    • Embedding
      • Base embedder
      • Rotational preset embedder
    • Transformers
      • Transformers and attention
      • CirDiT - Circuit Diffusion Transformer
      • Transformers
  • Pipeline
    • Callbacks
    • Compilation Diffusion Pipeline
    • Diffusion Pipeline
    • Diffusion Pipeline Special
    • Metrics
    • Multimodal Diffusion Pipeline
    • Pipeline
    • Unitary CLIP Pipeline
  • Platform
    • Circuits dataset generation functions
    • Circuits instructions
    • Simulation backend
    • Backends
      • Base backend
      • CUDA-Q circuits backend
      • Pennylane circuits backend
      • Qiskit circuits backend
    • Tokenizer
      • Base tokenizer
      • Circuits tokenizer
      • Tensor tokenizer
  • Scheduler
    • Scheduler
    • DDIM Scheduler
    • DDPM Scheduler
    • DPM Scheduler
  • Utils
    • Async functions
    • Config loader
    • Math and algorithms
    • Miscellaneous util

On this page

  • Checking
    • check_duplicate_in_dataset
    • check_duplicates_in_dataset
  • Manipulating
    • shuffle_tensor_dataset
    • get_unique_elements_indices
    • uniquify_tensor_dataset
    • balance_tensor_dataset
  • Report an issue
  • View source
  1. Dataset
  2. Dataset helper functions

Dataset helper functions

Some comonly used functions for datasets.

Checking


source

check_duplicate_in_dataset

 check_duplicate_in_dataset (x, dataset)

Check if ‘x’ is in ‘dataset’


source

check_duplicates_in_dataset

 check_duplicates_in_dataset (xs, dataset, return_ind=False, invert=False)

Checks if xs is are dataset. Boolean invert changes if we count duplicates (False) or ones that are not in dataset (True). Uses torch.vmap which copies dataset for every element in xs.

Check if this works:

xs = torch.tensor(
    [[0.7, 1, 0.5], 
     [0.3, 1, 0.5],
     [  0, 1, 0.5]])

d = torch.tensor([
    [0.11, 1, 0.5],
    [0.70, 1, 0.5],      #here a dup
    [0.71, 1, 0.5],
    [0.3 , 1, 0.5]])

check_duplicates_in_dataset(xs, d, return_ind=True)
(2, tensor([0, 1]))

Manipulating


source

shuffle_tensor_dataset

 shuffle_tensor_dataset (x, y=None, *z, cpu_copy=True)

Assumes numpy or tensor objects with same length.


source

get_unique_elements_indices

 get_unique_elements_indices (tensor)

Returns indices of unique_elements in tensor.


source

uniquify_tensor_dataset

 uniquify_tensor_dataset (x, y=None, *z)

x has to be tensor, assumes numpy or tensor obj for y and z


source

balance_tensor_dataset

 balance_tensor_dataset (x, y, *z, samples:int=None,
                         make_unique:bool=True, y_uniques=None,
                         shuffle_lables:bool=True, add_balance_fn:<built-
                         infunctioncallable>=None, njobs=1)

Assumes x is tensor and y is tensor or numpy.

Back to top
Config dataset
Mixed cached dataset
 

Copyright 2025, Florian Fürrutter

  • Report an issue
  • View source