Deep SPI: Safe Policy Improvement via World Models
arXiv:2510.12312v2 Announce Type: replace Abstract: Safe policy improvement (SPI) offers theoretical control over policy updates, yet existing guarantees largely concern offline, tabular reinforcement learning (RL)....