← All Days
Day 4 — Thu, Apr 30
The transformer architecture, drawn from scratch
- My goal for today is to deep dive into transformer models. Nothing else
- Checklist: what are encoder models, what are decoder models, what are the training strategies, what are LLM objectives in computer vision
Transformer architecture (full diagram)
Drew the complete architecture from the original paper
- I want to learn the transformer from scratch again for interviews. Starting with: 1) Tokenizer and BPE
encoder layer
Add & norm
↑
Feed forward
↑
Add & norm
↑
Multi-head attention
↑
⊕ positional encoding
↑
Input embedding
↑
Inputs
decoder block
Output
↑
Softmax
↑
Linear
↑
Add & norm
↑
Feed forward
↑
Add & norm
↑
Multi-head attention (cross)
↑
Add & norm
↑
Multi-head attention (masked)
↑
⊕ positional encoding
↑
Output embedding
↑
Outputs (shifted right)