$\infty$-MoE: Generalizing Mixture of Experts to Infinite Experts
arXiv:2601.17680v1 Announce Type: new Abstract: The Mixture of Experts (MoE) selects a few feed-forward networks (FFNs) per token, achieving an effective trade-off between computational cost...