Closing the Distribution Gap in Adversarial Training for LLMs
arXiv:2602.15238v2 Announce Type: replace Abstract: Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant...