Projects

A curated list of my open-source research projects on efficient LLM training and inference systems.

ZO2: Full-Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory ZO2 GitHub stars

Zeroth-order offloading framework that enables memory-efficient full-parameter fine-tuning for extremely large LLMs.

Tinytron Tinytron GitHub stars

A minimal yet practical pre-training stack for GPT-style models with FA/GQA/MoE support and distributed training utilities (ZeRO-1, Sequence-Expert Joint Parallelism).

Tiny-LLM-Libs

Educational mini-replicas of major distributed training stacks, designed for reading core mechanisms quickly.

Tiny-FSDP Tiny-FSDP GitHub stars

DDP/ZeRO-3/FSDP side-by-side implementations for communication and memory trade-off learning.

Tiny-DeepSpeed Tiny-DeepSpeed GitHub stars

Minimal DDP + ZeRO1/2/3 training stack with meta initialization and overlap primitives.

Tiny-Megatron Tiny-Megatron GitHub stars

Educational TP/DP/2D hybrid pipeline with custom modules and runtime auto-tuning.