Why Muon Outperforms Adam: A Curvature Perspective (中文翻译) Permalink
Published:
A translation and walk-through of the curvature argument for why the Muon optimizer beats Adam — what the loss-landscape geometry says about each update rule.
Published:
A translation and walk-through of the curvature argument for why the Muon optimizer beats Adam — what the loss-landscape geometry says about each update rule.
Published:
A read-through and Chinese translation of the diffusion language models survey — how denoising-diffusion ideas carry over from images to discrete text generation.
Published:
Notes and translation on Cola DLM — running diffusion language modeling in a continuous latent space instead of over discrete tokens.