Divergence Results and Convergence of a Variance Reduced Version of ADAM
arXiv:2210.05607v2 Announce Type: replace Abstract: Stochastic optimization algorithms using exponential moving averages of the past gradients, such as ADAM, RMSProp and AdaGrad, have been having...