Spectral Condition for $\mu$P under Width-Depth Scaling
arXiv:2603.00541v1 Announce Type: new Abstract: Generative foundation models are increasingly scaled in both width and depth, posing significant challenges for stable feature learning and reliable...