A curated list of my open-source research projects on efficient LLM training and inference systems.
ZO2: Full-Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory 
Zeroth-order offloading framework that enables memory-efficient full-parameter fine-tuning for extremely large LLMs.
Tinytron 
A minimal yet practical pre-training stack for GPT-style models with FA/GQA/MoE support and distributed training utilities (ZeRO-1, Sequence-Expert Joint Parallelism).
Tiny-LLM-Libs
Educational mini-replicas of major distributed training stacks, designed for reading core mechanisms quickly.
Tiny-FSDP 
DDP/ZeRO-3/FSDP side-by-side implementations for communication and memory trade-off learning.
Tiny-DeepSpeed 
Minimal DDP + ZeRO1/2/3 training stack with meta initialization and overlap primitives.
Tiny-Megatron 
Educational TP/DP/2D hybrid pipeline with custom modules and runtime auto-tuning.