<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Sheng Zha — Writing</title><description>Notes on LLM scaling, distributed training, and building the research environment and teams.</description><link>https://blog.szha.ai/</link><item><title>Research Problems in Pretraining</title><link>https://blog.szha.ai/blog/research-problems-in-llm-pretraining/</link><guid isPermaLink="true">https://blog.szha.ai/blog/research-problems-in-llm-pretraining/</guid><description>A practitioner&apos;s map of the pretraining landscape: scaling laws, principled parameterization, optimizer frontiers, and the open mysteries that keep the field honest.</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate><category>pretraining</category><category>scaling-laws</category><category>optimization</category><category>muP</category><category>research</category></item><item><title>Your Org Has the Same Scaling Problem as a Badly Tuned Training Run</title><link>https://blog.szha.ai/blog/badly-tuned-training-run/</link><guid isPermaLink="true">https://blog.szha.ai/blog/badly-tuned-training-run/</guid><description>AI raised individual throughput but coordination overhead stayed fixed. For many product-engineering orgs, the bottleneck flipped from compute-bound to communication-bound.</description><pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate><category>management</category><category>ai</category><category>scaling</category><category>organizations</category></item><item><title>On Assessing the Value of a Project</title><link>https://blog.szha.ai/blog/on-assessing-the-value-of-a-project/</link><guid isPermaLink="true">https://blog.szha.ai/blog/on-assessing-the-value-of-a-project/</guid><description>A simple framework for picking the most impactful work — with practical tips for estimating chance of success, effect size, and applicable scenarios.</description><pubDate>Tue, 20 May 2025 00:00:00 GMT</pubDate><category>research</category><category>decision-making</category></item><item><title>Determining Model Size and Training Horizon through Scaling Laws</title><link>https://blog.szha.ai/blog/model-size-and-training-horizon-scaling-laws/</link><guid isPermaLink="true">https://blog.szha.ai/blog/model-size-and-training-horizon-scaling-laws/</guid><description>A framework for optimizing model size and training token count from scaling-law coefficients — accounting for inference cost, data repetition, and system efficiency.</description><pubDate>Mon, 02 Dec 2024 00:00:00 GMT</pubDate><category>scaling</category><category>training</category><category>compute-optimal</category><category>chinchilla</category></item><item><title>GluonNLP — Deep Learning Toolkit for Natural Language Processing</title><link>https://blog.szha.ai/blog/gluonnlp-deep-learning-toolkit-for-nlp/</link><guid isPermaLink="true">https://blog.szha.ai/blog/gluonnlp-deep-learning-toolkit-for-nlp/</guid><description>Introducing GluonNLP, an open-source toolkit that tackles the reproducibility crisis in NLP research with stable APIs, reusable components, and centralized resources.</description><pubDate>Tue, 24 Jul 2018 00:00:00 GMT</pubDate><category>nlp</category><category>deep-learning</category><category>mxnet</category><category>open-source</category></item></channel></rss>