Despite the empirical success of deep learning, it is largely unknown why simple gradient-based local algorithms, such as Stochastic Gradient Descent can perform so well in deep learning, where the loss function is usually highly non-convex. This has become one of the major open problems in this area in recent years, and many studies have been conducted to understand the behaviors (global convergence properties, algorithmic regularization, etc.) of these algorithms. The goal of this mini-course is to introduce to the audience some recent advances in this rapidly developing area.

The prerequisites are rather mild. Only undergraduate calculus and linear algebra are required. We will, however, assume the audience has a certain level of math maturity.

* We will mainly use the blackboard in this course. Therefore, the materials covered in those “slides” may (slightly) differ from the ones covered during the lectures.

Date (Time) | Lecture | References | Location |
---|---|---|---|

Feb. 21 (4:00 PM) |
#1 Basics of Gradient Descent (Slides) |
East Middle Hall 1-303 | |

Mar. 7 (4:00 PM) |
#2 Escaping Saddle Points #3 Neural Tangent Kernels (Slides) |
Jin et al., 2017 Du et al., 2018 Yehudai & Shamir, 2019 |
East Middle Hall 2-202 |

Mar. 21 (4:00 PM) |
#3 Neural Tangent Kernels (continued) #4 Mean-Field Networks (Slides) |
Santambrogio, 2015 Chizat & Bach, 2018 Pham & Nguyen, 2021 |
East Middle Hall 4-202 |

Apr. 4 (4:00 PM) |
#4 Mean-Field Networks (Slides) | Santambrogio, 2015 Chizat & Bach, 2018 Pham & Nguyen, 2021 |
East Lower Hall 103 |

Apr. 18 (4:00 PM) |
#5 Depth Separation (Slides) | Eldan & Shamir, 2016 Safran et al., 2019 Ren et al., 2023 |
East Lower Hall 103 |

May. 9 (4:00 PM) |
#6 Langevin Algorithm (lecture by Chihao Zhang) |
Dalalyan, 2017 | East Lower Hall 103 |

May. 23 (4:00 PM) |
#7 Diffusion Models (Slides) | Hyvärinen, 2005 Song et al., 2021 Chen et al., 2022 |
East Lower Hall 103 |

Jun. 6 (4:00 PM) |
#8 Diffusion Models 2 | TBD | East Lower Hall 303 |