Projects
A curated list of my open-source research projects on efficient LLM training and inference systems.
Canzona
Unified, asynchronous, and load-balanced matrix-based optimization for distributed training, with implementations for different sharding stacks.
Megatron integration with load-balanced DP partitioning, async TP micro-group scheduling, parameter splitting, and optimizer plugin support.
Tiny-LLM-Libs
Educational mini-replicas of major distributed training stacks, designed for reading core mechanisms quickly.
DDP/ZeRO-3/FSDP side-by-side implementations for communication and memory trade-off learning.
Minimal DDP + ZeRO1/2/3 training stack with meta initialization and overlap primitives.
