Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
arXiv:2602.06412v2 Announce Type: replace Abstract: Masked Diffusion Language Models generate sequences via iterative sampling that progressively unmasks tokens. However, they still recompute the attention and...