Tag - Generalization

The Two Phases of Gradient Descent in Deep Learning

In short, you reach different resting placing with different SGD algorithms. That is, different SGDs just give you differing convergence rates due to different strategies, but we do expect that they all end up at the same results! By Carlos Perez, In...