print("OpenCLIP version:", open_clip.__version__)
OpenCLIP version: 2.32.0
API Reference
FrozenOpenCLIPEmbedderConfig (arch:str, version:str, max_length:int, freeze:bool, layer:str)
FrozenOpenCLIPEmbedder (arch='ViT-B-32', version='datacomp_xl_s13b_b90k', max_length=77, freeze=True, layer='penultimate', **kwargs)
Loads and freezes the OpenCLIP transformer encoder for text prompts.
[INFO]: Cuda device has a capability of 8.6 (>= 8), allowing tf32 matmul.
tensor([[49406, 314, 272, 267, 273, 267, 273, 316, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[49406, 314, 272, 267, 273, 267, 320, 273, 316, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], device='cuda:0')
(torch.Size([2, 77, 512]),
tensor([[[-0.3819, -0.3694, -0.0712, ..., 0.0958, -0.0834, -0.0929],
[-0.2665, 0.1840, -0.5888, ..., 0.7207, -1.7479, 1.2859],
[-0.9813, -0.6659, 0.2100, ..., -0.4228, 0.5374, 0.8488],
...,
[-0.0302, 1.3877, 0.3986, ..., 0.2663, -0.1264, -1.3759],
[-0.0793, 1.4047, 0.3585, ..., 0.2325, -0.0762, -1.3315],
[ 0.1596, 1.5992, 0.2774, ..., 0.1208, -0.1303, -1.5472]],
[[-0.3819, -0.3694, -0.0712, ..., 0.0958, -0.0834, -0.0929],
[-1.2511, 1.4713, 0.7262, ..., 1.1487, -0.4976, 0.4495],
[-1.2653, -0.3404, 0.9427, ..., 0.1537, 0.0260, 0.4574],
...,
[-0.0698, 1.4014, 0.4691, ..., 0.2275, -0.0690, -1.3637],
[-0.1190, 1.4172, 0.4266, ..., 0.1950, -0.0225, -1.3243],
[ 0.1392, 1.6179, 0.3527, ..., 0.0764, -0.0845, -1.5251]]], device='cuda:0'))
'<start_of_text>2 , 2 , 2 <end_of_text>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
Model takes now also (batched) scalar int values that are defined to unique conditions like \([1,2,2]=4\). If input is now such int the output is the cached pre-embedded tensor. If a non int, like a token string is passed we just do the normal embedding live.
CachedFrozenOpenCLIPEmbedderConfig (arch:str, version:str, max_length:int, freeze:bool, layer:str, enable_cache_token_limit:bool)
CachedFrozenOpenCLIPEmbedder (arch='ViT-B-32', version='datacomp_xl_s13b_b90k', max_length=77, freeze=True, layer='penultimate', enable_cache_token_limit:bool=True, **kwargs)
Adds caching support to FrozenOpenCLIPEmbedder
.
a = CachedFrozenOpenCLIPEmbedder(enable_cache_token_limit=True).to(device)
p = ["1,1,2", "2,2,2", "4,4,4", "6,4,7", "6,4,8", "6,4,9", "6,4,1"]
a.generate_cache(p)
[INFO]: - `generate_cache` infered a TOKEN limit of 7
[INFO]: caching trying to allocate memory (7, 77, 512) on cuda, approx. 0.001 GB
CachedFrozenOpenCLIPEmbedderConfig(arch='ViT-B-32', version='datacomp_xl_s13b_b90k', max_length=7, freeze=True, layer='penultimate', enable_cache_token_limit=True)
c_cached = torch.tensor([0, 0, 1], device=a.device)#.cpu()
c_uncached = a.tokenize_and_push_to_device(["1,1,2", "1,1,2", "2,2,2"])
enc_cached = a(c_cached)
enc_uncached = a(c_uncached)#.cpu()
enc_cached.shape, enc_uncached.shape, torch.allclose(enc_cached, enc_uncached, atol=1e-3)
(torch.Size([3, 7, 512]), torch.Size([3, 7, 512]), False)