Deep Learning in Science and Engineering
- Texas A&M University
- College Station, TX
- Joe C. Richardson Petroleum Engineering Building (RICH) 910
- Boris Hanin, Department of Mathematics
- Which ReLU Net Architectures Give Rise to Exploding and Vanishing Gradients?
Abstract
Due to its compositional nature, the function computed by a deep neural net often produces gradients whose magnitude is either very close to 0 or very large. This so-called vanishing and exploding gradient problem is often already present at initialization and is a major impediment to gradient-based optimization techniques. I will give a rigorous answer to the question of which neural architectures have exploding and vanishing gradients for feed-forward neural nets with ReLU activations. The results presented will cover both independent and orthogonal weight initializations. The results are partly joint with Mihai Nica (Toronto).