MergeMix: Optimizing Mid-Training Data Mixtures via Learnable Model Merging
arXiv:2601.17858v1 Announce Type: new Abstract: Optimizing data mixtures is essential for unlocking the full potential of large language models (LLMs), yet identifying the optimal composition...