← All Days
Day 4 — Thu, Apr 30

The transformer architecture, drawn from scratch

  • My goal for today is to deep dive into transformer models. Nothing else
  • Checklist: what are encoder models, what are decoder models, what are the training strategies, what are LLM objectives in computer vision

Transformer architecture (full diagram)

Drew the complete architecture from the original paper

  • I want to learn the transformer from scratch again for interviews. Starting with: 1) Tokenizer and BPE
encoder layer
Add & norm
Feed forward
Add & norm
Multi-head attention
⊕ positional encoding
Input embedding
Inputs
decoder block
Output
Softmax
Linear
Add & norm
Feed forward
Add & norm
Multi-head attention (cross)
Add & norm
Multi-head attention (masked)
⊕ positional encoding
Output embedding
Outputs (shifted right)