Archives
All the articles I've archived.
-
Bias, Variance, and the Tradeoff Every Model Faces
Why models fail in two opposite ways — being too rigid or too sensitive — and how to find the sweet spot between them.
-
Dropout and Overfitting: Teaching a Network Not to Cheat
What overfitting is, why it happens, and how dropout stops a network from memorising the training data.
-
Transformer Architecture & Key Design Decisions
A deep dive into the transformer architecture, why decoder-only models won, and the key design decisions — RoPE, GQA, Flash Attention, MoE — that define every modern LLM.
-
Normalization: BatchNorm, LayerNorm, and Why Transformers Need a Different One
Why activations drift as they pass through deep networks, and how BatchNorm and LayerNorm fix it in different ways.
-
Optimizers: SGD, Momentum, Adam, and AdamW
Why plain gradient descent isn't enough, and how SGD, momentum, Adam, and AdamW each fix a problem the previous one had.
-
Gradient Descent and Backpropagation: How a Network Actually Learns
How gradient descent uses the loss to update weights, and how backpropagation computes the gradients that make it possible.
-
Loss Functions: How a Neural Network Knows It's Wrong
What loss functions are, how MSE and cross-entropy work, and why picking the wrong one breaks your model even if everything else is right.
-
Activation Functions: Why ReLU, GELU, and SiLU Exist
Why stacking linear layers isn't enough, and how activation functions like ReLU, GELU, and SiLU give neural networks their power.
-
What is a Neural Network?
A neural network explained from scratch - neurons, weights, layers, and the forward pass - no ML background required.
-
GenZ to AI Enz: Series Index
Full table of contents for the GenZ to AI Enz series - every post and walkthrough in order.
-
GenZ to AI Enz: A Roadmap for CS Grads Breaking into AI
A complete series taking CS students and early-career engineers from zero ML knowledge to building real AI systems with LLMs and agents.
-
Fine-tuning Phi-2 with DPO on the Anthropic HH Dataset
Fine-tuning Microsoft's Phi-2 using Direct Preference Optimization (DPO) on the Anthropic Helpful and Harmless dataset with LoRA and 8-bit quantization.
-
How We Cut ML Inference Latency by 40% on Kubernetes
The architecture behind our async model serving platform at Instabase — async workers, RabbitMQ, multi-level caching, and sticky routing to cut inference time by 40%.
-
GupShup: Summarizing Code-Switched Conversations
Our EMNLP 2021 paper on abstractive summarization of Hindi-English code-switched conversations — introducing the GupShup dataset.