Input Tokens
"The cat sat on the"
The
cat
sat
on
the
Embedding Layer
Tokens become dense vector representations
Positional Encoding
Adding position information to embeddings
Transformer Block 1
Multi-Head Self-Attention
Feed-Forward Network
Add & Norm
Add & Norm
Transformer Block 2
Multi-Head Self-Attention
Feed-Forward Network
Add & Norm
Add & Norm
Transformer Block 3
Multi-Head Self-Attention
Feed-Forward Network
Add & Norm
Add & Norm
Output Layer
Next token predictions