Position encodings

Implementation of special position encodings.

p-RoPE

RotaryPositionalEmbedding

 RotaryPositionalEmbedding (head_dim:int, p:float=1.0,
                            max_seq_len:int=4096, base:float=10000)

*This class implements the Rotary Positional Embeddings (RoPE), proposed in https://arxiv.org/abs/2104.09864.

Additionally adds p-RoPE from https://openreview.net/pdf?id=GtvuNrk58a Note: p=0 coincides with NoPE, while the case p=1 with RoPE*

b = 1
s = 256
n_heads = 1
head_dim = 32
q = torch.ones((b, s, n_heads, head_dim))

p1 = 1
p2 = 0.5

pe = RotaryPositionalEmbedding(head_dim, p1)
q_pe1 = pe(q).squeeze() # [s, head_dim]

pe = RotaryPositionalEmbedding(head_dim, p2)
q_pe2 = pe(q).squeeze() # [s, head_dim]

f, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
ax1.imshow(q_pe1.T) 
ax1.set_title(f"p={p1}")
ax1.set_xlabel("position")
ax1.set_ylabel("channel")
ax2.imshow(q_pe2.T) 
ax2.set_title(f"p={p2}")
ax2.set_xlabel("position")
ax2.set_ylabel("channel")
plt.show()

2d p-RoPE

source

RotaryPositionalEmbedding2D

 RotaryPositionalEmbedding2D (head_dim:int, p:float=1.0,
                              max_seq_len:int=4096, base:float=10000)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*

b = 1
s = 256
n_heads = 1
head_dim = 64
q = torch.ones((b, s, n_heads, head_dim))

nx = 32
ny = 8
px = torch.arange(nx).expand(ny, -1)
py = torch.arange(ny).unsqueeze(-1).expand(-1, nx)
pos_idx = torch.stack([py, px], dim=-1).reshape(-1, 2)

p1 = 1
p2 = 0.5
base = 100

pe = RotaryPositionalEmbedding2D(head_dim, p1, base=base)
q_pe1 = pe(q, pos_idx).squeeze() # [s, head_dim]

pe = RotaryPositionalEmbedding2D(head_dim, p2, base=base)
q_pe2 = pe(q, pos_idx).squeeze() # [s, head_dim]

f, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
ax1.imshow(q_pe1.T) 
ax1.set_title(f"p={p1}")
ax1.set_xlabel("position")
ax1.set_ylabel("channel")
ax2.imshow(q_pe2.T) 
ax2.set_title(f"p={p2}")
ax2.set_xlabel("position")
ax2.set_ylabel("channel")
plt.show()

Learned position encoding

source

LearnedPositionalEmbedding

 LearnedPositionalEmbedding (dim:int, max_seq_len:int=64)

This class implements a Learned Positional Embedding, e.g. used for spatial circuit dimension.

b = 1
s = 8
t = 1
dim = 64

pe = LearnedPositionalEmbedding(dim)

q = torch.zeros((b, s, t, dim))
q = pe(q).squeeze() # [s, dim]

plt.figure(figsize=(15, 5))
plt.imshow(q.detach()) 
plt.xlabel("channel")
plt.ylabel("position")
plt.show()