genQC logo genQC
  • Overview
  • Get Started
  • Tutorials
  • API Reference
  • Research
  • Code Repository
  1. Models
  2. Clip
  3. Frozen OpenCLIP

API Reference

  • Modules Overview
  • Release notes

  • Benchmark
    • Compilation benchmark
  • Dataset
    • Dataset balancing
    • Cached dataset
    • Quantum circuit dataset
    • Config dataset
    • Dataset helper functions
    • Mixed cached dataset
  • Inference
    • Evaluation metrics
    • Evaluation helper
    • Sampling functions
  • Models
    • Config model
    • Frozen OpenCLIP
    • Layers
    • Position encodings
    • Conditional qc-UNet
    • Encoder for unitaries
    • Clip
      • Frozen OpenCLIP
      • Unitary CLIP
    • Embedding
      • Base embedder
      • Rotational preset embedder
    • Transformers
      • Transformers and attention
      • CirDiT - Circuit Diffusion Transformer
      • Transformers
  • Pipeline
    • Callbacks
    • Compilation Diffusion Pipeline
    • Diffusion Pipeline
    • Diffusion Pipeline Special
    • Metrics
    • Multimodal Diffusion Pipeline
    • Pipeline
    • Unitary CLIP Pipeline
  • Platform
    • Circuits dataset generation functions
    • Circuits instructions
    • Simulation backend
    • Backends
      • Base backend
      • CUDA-Q circuits backend
      • Pennylane circuits backend
      • Qiskit circuits backend
    • Tokenizer
      • Base tokenizer
      • Circuits tokenizer
      • Tensor tokenizer
  • Scheduler
    • Scheduler
    • DDIM Scheduler
    • DDPM Scheduler
    • DPM Scheduler
  • Utils
    • Async functions
    • Config loader
    • Math and algorithms
    • Miscellaneous util

On this page

  • CLIP model
    • FrozenOpenCLIPEmbedderConfig
    • FrozenOpenCLIPEmbedder
  • Cached model
    • CachedFrozenOpenCLIPEmbedderConfig
    • CachedFrozenOpenCLIPEmbedder
  • Report an issue
  • View source
  1. Models
  2. Clip
  3. Frozen OpenCLIP

Frozen OpenCLIP

Interface to the OpenCLIP library.
print("OpenCLIP version:", open_clip.__version__)
OpenCLIP version: 2.30.0

CLIP model


source

FrozenOpenCLIPEmbedderConfig

 FrozenOpenCLIPEmbedderConfig (arch:str, version:str, max_length:int,
                               freeze:bool, layer:str)

source

FrozenOpenCLIPEmbedder

 FrozenOpenCLIPEmbedder (arch='ViT-B-32', version='datacomp_xl_s13b_b90k',
                         max_length=77, freeze=True, layer='penultimate',
                         **kwargs)

Loads and freezes the OpenCLIP transformer encoder for text prompts.

device = infer_torch_device()
a = FrozenOpenCLIPEmbedder().to(device)
[INFO]: Cuda device has a capability of 8.9 (>= 8), allowing tf32 matmul.
p="[1, 2, 2]", "[1, 2, a 2]"
a.tokenize_and_push_to_device(p)
tensor([[49406,   314,   272,   267,   273,   267,   273,   316, 49407,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0],
        [49406,   314,   272,   267,   273,   267,   320,   273,   316, 49407,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0]], device='cuda:0')
a.tokenize_and_push_to_device("").shape
torch.Size([1, 77])
a.tokenize_and_push_to_device(["1,1,2", "2,2,2"]).shape
torch.Size([2, 77])
a.model.attn_mask.shape
torch.Size([77, 77])
c = a.tokenize_and_push_to_device(["1,1,2", "2,2,2"])
enc = a(c)
enc.shape, enc
(torch.Size([2, 77, 512]),
 tensor([[[-0.3819, -0.3694, -0.0712,  ...,  0.0959, -0.0834, -0.0929],
          [-0.2669,  0.1847, -0.5890,  ...,  0.7211, -1.7483,  1.2858],
          [-0.9821, -0.6650,  0.2107,  ..., -0.4223,  0.5351,  0.8494],
          ...,
          [-0.0300,  1.3871,  0.3989,  ...,  0.2657, -0.1257, -1.3758],
          [-0.0797,  1.4044,  0.3595,  ...,  0.2328, -0.0766, -1.3314],
          [ 0.1599,  1.5989,  0.2775,  ...,  0.1202, -0.1294, -1.5480]],
 
         [[-0.3819, -0.3694, -0.0712,  ...,  0.0959, -0.0834, -0.0929],
          [-1.2507,  1.4711,  0.7264,  ...,  1.1489, -0.4983,  0.4494],
          [-1.2645, -0.3412,  0.9422,  ...,  0.1529,  0.0271,  0.4574],
          ...,
          [-0.0694,  1.4021,  0.4687,  ...,  0.2277, -0.0694, -1.3635],
          [-0.1196,  1.4167,  0.4262,  ...,  0.1955, -0.0225, -1.3245],
          [ 0.1381,  1.6182,  0.3528,  ...,  0.0775, -0.0853, -1.5246]]], device='cuda:0'))
a.tokenizer.decode(c[1].tolist())
'<start_of_text>2 , 2 , 2 <end_of_text>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
open_clip.decode(c[1])
'<start_of_text>2 , 2 , 2 <end_of_text>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'

Cached model

Model takes now also (batched) scalar int values that are defined to unique conditions like \([1,2,2]=4\). If input is now such int the output is the cached pre-embedded tensor. If a non int, like a token string is passed we just do the normal embedding live.


source

CachedFrozenOpenCLIPEmbedderConfig

 CachedFrozenOpenCLIPEmbedderConfig (arch:str, version:str,
                                     max_length:int, freeze:bool,
                                     layer:str,
                                     enable_cache_token_limit:bool)

source

CachedFrozenOpenCLIPEmbedder

 CachedFrozenOpenCLIPEmbedder (arch='ViT-B-32',
                               version='datacomp_xl_s13b_b90k',
                               max_length=77, freeze=True,
                               layer='penultimate',
                               enable_cache_token_limit:bool=True,
                               **kwargs)

Adds caching support to FrozenOpenCLIPEmbedder.

a = CachedFrozenOpenCLIPEmbedder(enable_cache_token_limit=True).to(device)
p = ["1,1,2", "2,2,2", "4,4,4", "6,4,7", "6,4,8", "6,4,9", "6,4,1"]

a.generate_cache(p)
[INFO]: - `generate_cache` infered a TOKEN limit of 7
[INFO]: caching trying to allocate memory (7, 77, 512) on cuda, approx. 0.001 GB
a.params_config
CachedFrozenOpenCLIPEmbedderConfig(arch='ViT-B-32', version='datacomp_xl_s13b_b90k', max_length=7, freeze=True, layer='penultimate', enable_cache_token_limit=True)
c_cached   = torch.tensor([0, 0, 1], device=a.device)#.cpu()
c_uncached = a.tokenize_and_push_to_device(["1,1,2", "1,1,2", "2,2,2"])

enc_cached   = a(c_cached)
enc_uncached = a(c_uncached)#.cpu()

enc_cached.shape, enc_uncached.shape, torch.allclose(enc_cached, enc_uncached, atol=1e-3)
(torch.Size([3, 7, 512]), torch.Size([3, 7, 512]), False)
enc_cached.dtype, enc_uncached.dtype
(torch.float32, torch.float32)
(enc_cached[0, :4, :10]-enc_uncached[1, :4, :10]).abs().max()
tensor(0.0014, device='cuda:0')
(enc_cached[0, :4, :10]-enc_uncached[2, :4, :10]).abs().max()
tensor(1.9729, device='cuda:0')
Back to top
Encoder for unitaries
Unitary CLIP
 

Copyright 2025, Florian Fürrutter

  • Report an issue
  • View source