Writing
Notes on LLM scaling, distributed training, and building the research environment and teams.
-
Research Problems in Pretraining
A practitioner's map of the pretraining landscape: scaling laws, principled parameterization, optimizer frontiers, and the open mysteries that keep the field honest.
-
Your Org Has the Same Scaling Problem as a Badly Tuned Training Run
AI raised individual throughput but coordination overhead stayed fixed. For many product-engineering orgs, the bottleneck flipped from compute-bound to communication-bound.
-
On Assessing the Value of a Project
A simple framework for picking the most impactful work — with practical tips for estimating chance of success, effect size, and applicable scenarios.
-
Determining Model Size and Training Horizon through Scaling Laws
A framework for optimizing model size and training token count from scaling-law coefficients — accounting for inference cost, data repetition, and system efficiency.
-
GluonNLP — Deep Learning Toolkit for Natural Language Processing
Introducing GluonNLP, an open-source toolkit that tackles the reproducibility crisis in NLP research with stable APIs, reusable components, and centralized resources.