Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
arXiv:2504.20106v3 Announce Type: replace Abstract: Ensuring that large language models (LLMs) are both helpful and harmless is a critical challenge, as overly strict constraints can...