GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
arXiv:2601.18543v2 Announce Type: replace Abstract: We introduce GenAgent, unifying visual understanding and generation through an agentic multimodal model. Unlike unified models that face expensive training...