= 1
b = 256
s = 1
n_heads = 32
head_dim = torch.ones((b, s, n_heads, head_dim))
q
= 1
p1 = 0.5
p2
= RotaryPositionalEmbedding(head_dim, p1)
pe = pe(q).squeeze() # [s, head_dim]
q_pe1
= RotaryPositionalEmbedding(head_dim, p2)
pe = pe(q).squeeze() # [s, head_dim]
q_pe2
= plt.subplots(1, 2, figsize=(15, 5))
f, (ax1, ax2)
ax1.imshow(q_pe1.T) f"p={p1}")
ax1.set_title("position")
ax1.set_xlabel("channel")
ax1.set_ylabel(
ax2.imshow(q_pe2.T) f"p={p2}")
ax2.set_title("position")
ax2.set_xlabel("channel")
ax2.set_ylabel( plt.show()
Position encodings
p-RoPE
RotaryPositionalEmbedding
RotaryPositionalEmbedding (head_dim:int, p:float=1.0, max_seq_len:int=4096, base:float=10000)
*This class implements the Rotary Positional Embeddings (RoPE), proposed in https://arxiv.org/abs/2104.09864.
Code adjusted from https://github.com/pytorch/torchtune/blob/main/torchtune/modules/position_embeddings.py > Copyright (c) Meta Platforms, Inc. and affiliates. > All rights reserved.
Additionally adds p-RoPE from https://openreview.net/pdf?id=GtvuNrk58a Note: p=0 coincides with NoPE, while the case p=1 with RoPE*
2d p-RoPE
RotaryPositionalEmbedding2D
RotaryPositionalEmbedding2D (head_dim:int, p:float=1.0, max_seq_len:int=4096, base:float=10000)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self) -> None:
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
= 1
b = 256
s = 1
n_heads = 64
head_dim = torch.ones((b, s, n_heads, head_dim))
q
= 32
nx = 8
ny = torch.arange(nx).expand(ny, -1)
px = torch.arange(ny).unsqueeze(-1).expand(-1, nx)
py = torch.stack([py, px], dim=-1).reshape(-1, 2)
pos_idx
= 1
p1 = 0.5
p2 = 100
base
= RotaryPositionalEmbedding2D(head_dim, p1, base=base)
pe = pe(q, pos_idx).squeeze() # [s, head_dim]
q_pe1
= RotaryPositionalEmbedding2D(head_dim, p2, base=base)
pe = pe(q, pos_idx).squeeze() # [s, head_dim]
q_pe2
= plt.subplots(1, 2, figsize=(15, 5))
f, (ax1, ax2)
ax1.imshow(q_pe1.T) f"p={p1}")
ax1.set_title("position")
ax1.set_xlabel("channel")
ax1.set_ylabel(
ax2.imshow(q_pe2.T) f"p={p2}")
ax2.set_title("position")
ax2.set_xlabel("channel")
ax2.set_ylabel( plt.show()
Learned position encoding
LearnedPositionalEmbedding
LearnedPositionalEmbedding (dim:int, max_seq_len:int=64)
This class implements a Learned Positional Embedding, e.g. used for spatial circuit dimension.
= 1
b = 8
s = 1
t = 64
dim
= LearnedPositionalEmbedding(dim)
pe
= torch.zeros((b, s, t, dim))
q = pe(q).squeeze() # [s, dim]
q
=(15, 5))
plt.figure(figsize
plt.imshow(q.detach()) "channel")
plt.xlabel("position")
plt.ylabel( plt.show()