Canzona: Bringing Matrix-based Optimizers to Large-Scale Distributed Training
Published:
Canzona decouples logical optimizer assignment from physical parameter distribution to make matrix-based optimizers practical in large-scale distributed LLM training.
