Recent Advances in Deep Learning Theory

Course Overview

Despite the empirical success of deep learning, it is largely unknown why simple gradient-based local algorithms, such as Stochastic Gradient Descent can perform so well in deep learning, where the loss function is usually highly non-convex. This has become one of the major open problems in this area in recent years, and many studies have been conducted to understand the behaviors (global convergence properties, algorithmic regularization, etc.) of these algorithms. The goal of this mini-course is to introduce to the audience some recent advances in this rapidly developing area.

Prerequisites

The prerequisites are rather mild. Only undergraduate calculus and linear algebra are required. We will, however, assume the audience has a certain level of math maturity.

Schedule (Tentative)

* We will mainly use the blackboard in this course. Therefore, the materials covered in those “slides” may (slightly) differ from the ones covered during the lectures.

Date (Time) Lecture References Location
Feb. 21
(4:00 PM)
#1 Basics of Gradient Descent
(Slides)
  East Middle Hall 1-303
Mar. 7
(4:00 PM)
#2 Escaping Saddle Points
#3 Neural Tangent Kernels
(Slides)
Jin et al., 2017
Du et al., 2018
Yehudai & Shamir, 2019
East Middle Hall 2-202
Mar. 21
(4:00 PM)
#3 Neural Tangent Kernels (continued)
#4 Mean-Field Networks
(Slides)
Santambrogio, 2015
Chizat & Bach, 2018
Pham & Nguyen, 2021
East Middle Hall 4-202
Apr. 4
(4:00 PM)
#4 Mean-Field Networks (Slides) Santambrogio, 2015
Chizat & Bach, 2018
Pham & Nguyen, 2021
East Lower Hall 103
Apr. 18
(4:00 PM)
#5 Depth Separation (Slides) Eldan & Shamir, 2016
Safran et al., 2019
Ren et al., 2023
East Lower Hall 103
May. 9
(4:00 PM)
#6 Langevin Algorithm
(lecture by Chihao Zhang)
Dalalyan, 2017 East Lower Hall 103
May. 23
(4:00 PM)
#7 Diffusion Models (Slides) Hyvärinen, 2005
Song et al., 2021
Chen et al., 2022
East Lower Hall 103
Jun. 6
(4:00 PM)
#8 Diffusion Models 2 TBD East Lower Hall 303