This is the living index for the series. It gets updated as new posts are published.
New here? Start with the series introduction.
Section 1 - Deep Learning Basics
- 1.1 - What is a Neural Network? (coming soon)
- 1.2 - Activation Functions: ReLU, GELU, SiLU
- 1.3 - Loss Functions
- 1.4 - Gradient Descent and Backpropagation
- 1.5 - Optimizers: SGD, Adam, AdamW
- 1.6 - Normalisation: BatchNorm vs LayerNorm
- 1.7 - Dropout and Overfitting
- 1.8 - CNN, RNN, LSTM: The Road to Transformers
- 1.9 - Seq2Seq and the Bottleneck Problem
Section 2 - Transformers
- 2.1 - Attention Mechanism
- 2.2 - Self-Attention
- 2.3 - Multi-Head Self-Attention
- 2.4 - Positional Encodings: Absolute, RoPE, ALiBi
- 2.5 - Residual Connections and Layer Norm
- 2.6 - Encoder, Decoder, Encoder-Decoder
- 2.7 - BERT vs GPT: Encoder-only vs Decoder-only
- 2.8 - KV Cache
- 2.9 - Flash Attention
- 2.10 - Context Window and How It’s Determined
- 2.11 - Transformers End to End
Section 3 - LLM Fundamentals
- 3.1 - Tokenization in Depth: BPE and SentencePiece
- 3.2 - Next Token Prediction
- 3.3 - Decoding Strategies: Greedy, Beam Search, Sampling
- 3.4 - Temperature, Top-p, Top-k
- 3.5 - Streaming
- 3.6 - Context Window vs Memory
Section 4 - Pre-training and Fine-tuning
- 4.1 - Pre-training Objective and Scaling Laws
- 4.2 - Supervised Fine-tuning (SFT)
- 4.3 - LoRA and QLoRA
- 4.4 - Dataset Curation for Fine-tuning
Section 5 - Alignment and Post-training
- 5.1 - RLHF: Reward Model and PPO
- 5.2 - DPO: Direct Preference Optimization
- 5.3 - Constitutional AI and RLAIF
- 5.4 - Quantization
- 5.5 - Distillation
Section 6 - Inference and Serving
- 6.1 - KV Cache Deep Dive
- 6.2 - Batching Strategies
- 6.3 - vLLM and PagedAttention
- 6.4 - Speculative Decoding
- 6.5 - Latency vs Throughput
Section 7 - Production Concerns
- 7.1 - Prompt Engineering and Optimization
- 7.2 - Prompt Caching
- 7.3 - Guardrails
- 7.4 - Hallucinations: Causes and Mitigation
Section 8 - RAG
- 8.1 - What is RAG and Why It Exists
- 8.2 - Embeddings
- 8.3 - Chunking Strategies
- 8.4 - Retrieval: Dense, Sparse (BM25), Hybrid
- 8.5 - Re-ranking
- 8.6 - Knowledge Graphs as Retrieval Source
- 8.7 - Agentic RAG
- 8.8 - RAG Failure Modes
Section 9 - Agents
- 9.1 - What is an Agent
- 9.2 - ReAct Pattern
- 9.3 - Planning: Chain of Thought, Tree of Thought
- 9.4 - Memory Types
- 9.5 - Tools and Function Calling
- 9.6 - MCP: Model Context Protocol
- 9.7 - A2A: Agent to Agent
- 9.8 - Multi-Agent Orchestration
- 9.9 - Agent Failure Modes
Section 10 - Evaluation
- 10.1 - Hallucination Detection and Metrics
- 10.2 - LLM-as-Judge
- 10.3 - RAGAS
- 10.4 - Standard Benchmarks and Their Limits
- 10.5 - Red Teaming Basics
Section 11 - LLMOps
- 11.1 - Prompt Versioning and Management
- 11.2 - Observability and Tracing
- 11.3 - Cost Optimization
- 11.4 - A/B Testing LLM Outputs
Walkthroughs
Foundations
- Implement Attention from Scratch in PyTorch
- Build a BPE Tokenizer from Scratch
Models and Fine-tuning
- Run Your First LLM Locally with Ollama
- Fine-tune Llama 3 with LoRA
- Fine-tune with DPO on a Preference Dataset
Inference and Serving
- Serve a Model with vLLM and Benchmark Throughput
- Implement KV Cache and Measure the Speedup
RAG
- Build a RAG Pipeline from Scratch
- Hybrid Search: BM25 + Dense Retrieval
- Improve RAG with Re-ranking
Agents
- Build a ReAct Agent from Scratch
- Build a Research Agent with LangGraph
- Build and Expose an MCP Server
- Multi-Agent Pipeline with LangGraph
Evaluation
- Evaluate a RAG Pipeline with RAGAS
- LLM-as-Judge: Build Your Own Eval Pipeline
LLMOps
- Add Tracing to an LLM App with Langfuse