The Shadow Self: Intrinsic Value Misalignment in Large Language Model Agents
arXiv:2601.17344v1 Announce Type: new Abstract: Large language model (LLM) agents with extended autonomy unlock new capabilities, but also introduce heightened challenges for LLM safety. In...