Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation
arXiv:2502.08941v3 Announce Type: replace Abstract: This paper analyzes multi-step temporal difference (TD)-learning algorithms within the ``deadly triad'' scenario, characterized by linear function approximation, off-policy learning,...