Splitting Adam -

Splitting Adam -

It isolates the stochastic direction (the sign of the gradient) from the adaptive step size (the relative variance).

It proposes Coupled Adam to fix this specific side effect. Splitting Adam

This paper effectively "splits" the Adam algorithm into two distinct components to study them: It isolates the stochastic direction (the sign of