Input Tokens

"The cat sat on the"

The
cat
sat
on
the

Embedding Layer

Tokens become dense vector representations

Positional Encoding

Adding position information to embeddings

Transformer Block 1

Multi-Head Self-Attention
Feed-Forward Network
Add & Norm
Add & Norm

Transformer Block 2

Multi-Head Self-Attention
Feed-Forward Network
Add & Norm
Add & Norm

Transformer Block 3

Multi-Head Self-Attention
Feed-Forward Network
Add & Norm
Add & Norm

Output Layer

Next token predictions

45%
mat
30%
down
15%
floor
8%
ground
2%
table