Mathematics of Deep Learning

3:00–5:00 pm

Boris Hanin (Princeton) will be giving a special lecture course on the mathematics of deep learning. I believe these lectures will be interesting on a purely mathematical level to every member of the department. Lectures will take place every day during the week April 6-10. The time/day/room of the lectures is:

Monday April 6: Eck 206 3-4pm (in lieu of the regular dynamics seminar)
Tuesday April 7: Eck 202 3-4pm
Wednesday April 8: Eck 202 4-5pm (note different time to avoid conflict with the colloquium!)
Thursday April 9: Eck 203 3:30-4:30pm (in lieu of the regular NB/GT seminar) ... room corrected from E206, which apparently was double-booked?
Friday April 10: Eck 202 3-4pm

https://hanin.princeton.edu/nn-notes.pdf

Titles and Abstracts for the talks are:


Lecture 1

Title: Neural Network for Mathematicians

Abstract: This talk starts from first principles: what is a neural network? How are neural networks used in practice? After setting the stage with these preliminaries, I will describe several grand mathematical challenges in deep learning theory that are both theoretically interesting and relevant to practitioners. At the heart of this talk will be the idea of studying scaling limits of neural networks, i.e. obtaining a mathematical description of the possible behaviors of neural networks in the limit of diverging model size, dataset size, and compute budget.


Lecture 2

Title: Random Neural Networks

Abstract: This talk explores the basic properties of neural networks at initialization, when they have random weights and biases. We will begin with a first principles overview of how to initialize a neural network and will discuss a well-known result that random neural networks at infinite width converge to Gaussian processes. We will then discuss how to choose a sensible learning rate in wide networks by analyzing the first step of training. Finally, we will consider finite width corrections to behavior of random neural networks and discuss the interplay of depth and width. Time permitting, we will conclude with a discussion of several open problems.


Lecture 3

Title: Deep Learning in the Kernel Regime

Abstract: This talk explores the simplest scaling limit of neural networks, often called the kernel or NTK regime. The hallmark of this regime is that neural networks converge at infinite width to linear models. As I will explain, this is a mathematically interesting statement but is the "wrong" scaling limit for practice since much of the power of deep learning resides in fact that neural networks are non-linear functions of their parameters. Nonetheless, this scaling limit will be the jumping off point for other, more interesting, scaling regimes that we will consider in the next lecture.


Lecture 4

Title: Deep Learning in the Mean-Field Regime

Abstract: This talk introduces an important scaling limit of neural networks, often called the mean-field regime. Unlike the NTK regime, the infinite width limit of neural networks in the mean-field regime is highly non-linear. For instance, training dynamics for one layer networks are equivalent to Wasserstein gradient flows and for deeper models correspond to novel PDEs on spaces of probability distributions. I will formulate a number of open questions.


Lecture 5

Title: Empirical Wonders and Open Questions

Abstract: In this final talk I will survey some mathematical problems in deep learning whose resolution could be used to influence the state of the art. These include understanding and predicting various observed empirical phenomena, designing principled optimization algorithms, and proposing new quantization schemes.


 

Event Type

Lectures, Workshops

Apr 10