CirDiT - Circuit Diffusion Transformer

The multimodal circuit generation model: Circuit Diffusion Transformer (CirDiT).

RotaryMultiheadAttention

 RotaryMultiheadAttention (in_dim:int, embed_dim:int, num_heads:int,
                           bias:bool=True, p_rope:float=1.0,
                           max_seq_len:int=4096, base_rope:float=10000,
                           enable_qk_norm:bool=False)

*MultiheadAttention described in the paper: Attention Is All You Need (https://arxiv.org/abs/1706.03762). We add a rotary position encoding (RoPE).

The attention core is F.scaled_dot_attention from pytorch. Could be switched to https://github.com/Dao-AILab/flash-attention or xFormers.*

Transformer blocks

source

FeedForwardBlock

 FeedForwardBlock (in_dim:int, hidden_dim:int, out_dim:Optional[int]=None,
                   dropout:float=0.0)

A small dense feed-forward network as used in transformers. Assumes channel last. Inspired by https://arxiv.org/pdf/2401.11605 and added from https://arxiv.org/pdf/2002.05202 a modification to SiGLU structure.

source

SelfAttnBlock

 SelfAttnBlock (ch:int, t_emb_size:int, num_heads:int, dropout:float=0.0,
                p_rope:float=1.0, base_rope:float=10000)

A self-attention block which includes the time condition t_emb, see https://arxiv.org/pdf/2312.02139.

source

AdaptiveSelfAttnBlock

 AdaptiveSelfAttnBlock (ch:int, mod_ch:int, t_emb_size:int, num_heads:int,
                        dropout:float=0.0, p_rope:float=1.0,
                        base_rope:float=10000)

A self-attention block which includes the time condition t_emb, see https://arxiv.org/pdf/2312.02139.

source

CrossAttnBlock

 CrossAttnBlock (ch:int, t_emb_size:int, num_heads:int, dropout:float=0.0,
                 p_rope:float=1.0, base_rope:float=10000)

A cross-attention block which includes the time condition t_emb, see https://arxiv.org/pdf/2312.02139

Main transformer

source

CoreTransformer

 CoreTransformer (ch:int, c_emb_size:int, t_emb_size:int, depth:int,
                  num_heads:int, dropout:float=0.0, p_rope:float=1.0,
                  base_rope:float=10000)

The main transformer of the CirDiT model, intakes time (attn-concat) and condition encodings (cross-attn). Applies a RoPE for time dimension.

Packing blocks

source

PackingTransformer

 PackingTransformer (ch:int, t_emb_size:int, depth:int, num_heads:int,
                     dropout:float=0.0, p_rope:float=1.0,
                     base_rope:float=10000)

The first stage packing/unpacking transformers of the CirDiT model, intakes time (attn-concat). Applies a RoPE for time dimension only, not on spatial dimension.

source

UnpackingTransformer

 UnpackingTransformer (ch:int, mod_ch:int, t_emb_size:int, depth:int,
                       num_heads:int, dropout:float=0.0, p_rope:float=1.0,
                       base_rope:float=10000)

The first stage packing/unpacking transformers of the CirDiT model, intakes time (attn-concat). Applies a RoPE for time dimension only, not on spatial dimension.

Time embedding

source

TimeEmbedding

 TimeEmbedding (d_model:int, dropout:float=0.0, max_len:int=5000,
                freq_factor:float=10000.0)

A time embedding layer.

CirDiT architecture

source

CirDiTConfig

 CirDiTConfig (clr_dim:int, ch_packing:int, ch_core:int, c_emb_size:int,
               t_emb_size:int, depth_packing:int, depth_core:int,
               num_heads_packing:int, num_heads_core:int, dropout:float,
               p_rope:float, base_rope:float)

source

CirDiT

 CirDiT (clr_dim:int, ch_packing:int, ch_core:int, c_emb_size:int,
         t_emb_size:int, depth_packing:int, depth_core:int,
         num_heads_packing:int, num_heads_core:int, dropout:float=0.0,
         p_rope:float=1.0, base_rope:float=10000)

The proposed Circuit Diffusion Transformer (CirDiT).

UnitaryCLIPPartialNoiseCompilationCirDiT

source

UnitaryCLIPPartialNoiseCompilationCirDiTConfig

 UnitaryCLIPPartialNoiseCompilationCirDiTConfig (clr_dim:int,
                                                 ch_packing:int,
                                                 ch_core:int,
                                                 c_emb_size:int,
                                                 t_emb_size:int,
                                                 depth_packing:int,
                                                 depth_core:int,
                                                 num_heads_packing:int,
                                                 num_heads_core:int,
                                                 dropout:float,
                                                 p_rope:float,
                                                 base_rope:float, unitary_
                                                 encoder_config:dict)

source

UnitaryCLIPPartialNoiseCompilationCirDiT

 UnitaryCLIPPartialNoiseCompilationCirDiT (clr_dim:int, ch_packing:int,
                                           ch_core:int, c_emb_size:int,
                                           t_emb_size:int,
                                           depth_packing:int,
                                           depth_core:int,
                                           num_heads_packing:int,
                                           num_heads_core:int,
                                           dropout:float=0.0,
                                           p_rope:float=1.0,
                                           base_rope:float=10000, unitary_
                                           encoder_config:Optional[dict]=N
                                           one, unitary_encoder:Optional[t
                                           orch.nn.modules.module.Module]=
                                           None)

Extends CirDiT to the multimodal unitary compilation model.