CSCI E-104
Advanced Deep Learning
Deep learning artificial intelligence (AI) models and applications have proliferated and are profoundly affecting almost every aspect of economic, social, and scientific activity.
This course equips students with skills needed to engage in advanced research and development in AI and deep learning.
We cover details of several classes of transformers, which are the basis of large language models (LLMs).
We study deep probabilistic models as the foundation of generative techniques (stable diffusion, text-to-speech, and flow models).
We study Bayesian models and apply them to the optimization of neural networks and problems with small datasets.
Students learn how to utilize the overlap between dynamical systems, ordinary/partial/stochastic differential equations, and physics-based neural networks.
For important classes of neural networks, we explore the fundamental mechanisms behind their operations and provide practical illustrations of their uses.
For example, we review the structure of transformer-based pre-trained LLMs, the principles of attention, and their use in applications such as ChatGPT, with a focus on understanding prompt programming and structure of agentic applications.
For generative networks, we examine the generation of realistic representations of people, speech, paintings, and music.
For graph neural networks (GNNs), we dive into the analysis of chemical molecules, proteins, and drugs and quantitative structure property relationship in physical systems.
We learn how to impose constraints that are reflections of physical or geometric laws governing physical systems.
Concepts introduced in every lecture are illustrated by practical examples.
Code samples in lectures and homework assignments are written in PyTorch and occasionally in Keras 3.
Students learn how to scale training of deep learning models to clusters of two or many graphics processing units (GPUs).