Deep Learning in Science and Engineering

November 28, 2018
Texas A&M University
College Station, TX
Joe C. Richardson Petroleum Engineering Building (RICH) 910

Boris Hanin, Department of Mathematics

Which ReLU Net Architectures Give Rise to Exploding and Vanishing Gradients?

Abstract

Due to its compositional nature, the function computed by a deep neural net often produces gradients whose magnitude is either very close to 0 or very large. This so-called vanishing and exploding gradient problem is often already present at initialization and is a major impediment to gradient-based optimization techniques. I will give a rigorous answer to the question of which neural architectures have exploding and vanishing gradients for feed-forward neural nets with ReLU activations. The results presented will cover both independent and orthogonal weight initializations. The results are partly joint with Mihai Nica (Toronto).

Deep Learning in Science and Engineering

Boris Hanin, Department of Mathematics Which ReLU Net Architectures Give Rise to Exploding and Vanishing Gradients?

Abstract

Boris Hanin, Department of Mathematics

Which ReLU Net Architectures Give Rise to Exploding and Vanishing Gradients?