Sven Lüpke
Blog
Projects
CV
Home
Blog
Training Diffusion Transformers with Muon
How fast can we train DiTs with the Muon optimizer?