Recent Advances in Deep Learning Theory

Course Overview

Despite the empirical success of deep learning, it is largely unknown why simple gradient-based local algorithms, such as Stochastic Gradient Descent can perform so well in deep learning, where the loss function is usually highly non-convex. This has become one of the major open problems in this area in recent years, and many studies have been conducted to understand the behaviors (global convergence properties, algorithmic regularization, etc.) of these algorithms. The goal of this mini-course is to introduce to the audience some recent advances in this rapidly developing area.

Prerequisites

The prerequisites are rather mild. Only undergraduate calculus and linear algebra are required. We will, however, assume the audience has a certain level of math maturity.

Schedule (Tentative)

* We will mainly use the blackboard in this course. Therefore, the materials covered in those “slides” may (slightly) differ from the ones covered during the lectures.

Date (Time)	Lecture	References	Location
Feb. 21 (4:00 PM)	#1 Basics of Gradient Descent (Slides)		East Middle Hall 1-303
Mar. 7 (4:00 PM)	#2 Escaping Saddle Points #3 Neural Tangent Kernels (Slides)	Jin et al., 2017 Du et al., 2018 Yehudai & Shamir, 2019	East Middle Hall 2-202
Mar. 21 (4:00 PM)	#3 Neural Tangent Kernels (continued) #4 Mean-Field Networks (Slides)	Santambrogio, 2015 Chizat & Bach, 2018 Pham & Nguyen, 2021	East Middle Hall 4-202
Apr. 4 (4:00 PM)	#4 Mean-Field Networks (Slides)	Santambrogio, 2015 Chizat & Bach, 2018 Pham & Nguyen, 2021	East Lower Hall 103
Apr. 18 (4:00 PM)	#5 Depth Separation (Slides)	Eldan & Shamir, 2016 Safran et al., 2019 Ren et al., 2023	East Lower Hall 103
May. 9 (4:00 PM)	#6 Langevin Algorithm (lecture by Chihao Zhang)	Dalalyan, 2017	East Lower Hall 103
May. 23 (4:00 PM)	#7 Diffusion Models (Slides)	Hyvärinen, 2005 Song et al., 2021 Chen et al., 2022	East Lower Hall 103
Jun. 6 (4:00 PM)	#8 Diffusion Models 2	TBD	East Lower Hall 303