This course aims to illustrate the central role of mathematics in understanding and designing modern artificial intelligence (AI) systems for data science, with a focus on AI methods based on deep neural networks. In particular, we will explore how tools from random matrix theory and optimal transport allow us to understand the generalization properties and learning dynamics of overparameterized neural networks. To this end, we will focus on their ability to overfit (on training data) while maintaining good generalization properties (on test data). We will also analyze the convergence of learning algorithms (gradient descent) associated with training deep neural networks using the notion of gradient flow in the Wasserstein space of probability measures and mean field theory. Through theoretical analyses and hands-on computational experiments (in Python notebooks), participants will examine how AI challenges are motivating new mathematical developments at the intersection of probability, statistics, analysis, optimization, and machine learning.
- Enseignant: Jeremie Bigot