Held in cooperation with the University of Chicago Department of Computer Science

TTIC 31020 - Introduction to Machine Learning

Instructor: Greg Shakhnarovich

Location: **TTIC Room 526B**

Time: **Tuesday and Thursday - 2:00-3:20pm**

Final Exam: **Room K120 - Thursday, December 13, 1:30-3:30pm**

TTIC 31150 - Mathematical Toolkit (CMSC 31150) (100 units)

Instructor: Madhur Tulsiani

Location: **TTIC Room 530**

Time: **Tuesday and Thursday - 9:30-10:50am**

Final Exam: **Room STU104 - Tuesday, December 11, 10:30am-12:30pm**

TTIC 31120 - Statistical and Computational Learning Theory (100 units)

Instructor: Nathan Srebro

Location: **TTIC Room 526**

Time: **Monday and Wednesday - 1:30-2:50pm**

Final Exam: **Room 526B - Friday, December 14 - 1:30-4:30pm**

TTIC 41000- Special Topics: Spectral Techniques for Machine Learning (50 units)

Instructor: Karl Stratos

Location: **TTIC Room 526**

Time: **Monday - 3:00-4:20pm**

TTIC 31010 - Algorithms (CMSC 37000) (100 units)

Instructor: Avrim Blum

Location: **TTIC Room 526**

Time: **Tuesday and Thursday - 9:30-10:5am**

**Tutorial meetings: TTIC Room 526 - Wednesday - 3:00-4:00pm**

TTIC 31100 Computational and Metric Geometry (CMSC 39010) (100 units)

Instructor: Yury Makarychev

Location: **TTIC Room 530**

Time: **Monday and Wednesday - 1:30-2:50pm**

TTIC 31160 - Topics in Bioinformatics (100 units)

Instructor: Jinbo Xu

Location: **TTIC Room 530**

Time: **Monday and Wednesday - 12:00-1:20pm**

TTIC 31230 - Fundamentals of Deep Learning (CMSC 31230) (100 units)

Instructor: David McAllester

Location: **TTIC Room 526B**

Time: **Tuesday and Thursday - 2:00-3:20pm**

TTIC 31220 - Unsupervised Learning and Data Analysis (100 units)

Instructor: Karen Livescu

Location: **TTIC Room 526**

Time: **Monday and Wednesday 1:30-2:50pm**

TTIC Course 31210 - Advanced Natural Language Processing (100 units)

Instructor: Kevin Gimpel

Location: **TTIC Room 526**

Time: **Monday and Wednesday - 1:30-2:50pm**

100 units

TBD

Weekly lectures and discussions by TTIC researchers introducing their research and research problems. Provides a broad view of research carried out at TTIC. Course is pass/fail credit. Satisfies one quarter of credit (of the three required) to fulfill the Research at TTIC Series Requirement. (See Academic Program Guide for details)

100 units

Chuzhoy, Julia, and Makarychev, Yury

This is a graduate level course on algorithms with the emphasis on central combinatorial optimization problems and advanced methods for algorithm design and analysis. Topics covered include greedy algorithms, dynamic programming, randomized algorithms and the probabilistic method, combinatorial optimization and approximation algorithms, linear programming, and online algorithms.

The course textbook is “Algorithm Design” by Kleinberg and Tardos

- Greedy algorithms (1 week)
- Dynamic programming (1 week)
- Online algorithms (1 week) 4-6 Max flow, min-cut, bipartite matching and their applications (3 weeks)
- Linear programming, LP-duality (1 week)
- NP-hardness (1 week)
- Approximation algorithms (1 week)
- Randomized algorithms (1 week)

Assumes familiarity with proofs and an the asymptotic notation. Some basic knowledge of the notion of NP-hardness is also required.

Expected outcomes:

- Ability to design and rigorously analyze algorithms using paradigms such as greedy or dynamic programming.
- Understand the use of linear programming in optimization. Be able to formulate problems as linear programs.
- Understand linear programming duality and applications to problems such as max- flow/min-cut. Be able to write duals for linear programs.

Prerequisites: Assumes familiarity with proofs and an the asymptotic notation. Some basic knowledge of the notion of NP-hardness is also required.

100 units

Shakhnarovich, Greg

A systematic introduction to machine learning, covering theoretical as well as practical aspects of the use of statistical methods. Topics include linear models for classification and regression, support vector machines, regularization and model selection, and introduction to structured prediction and deep learning. Application examples are taken from areas like information retrieval, natural language processing, computer vision and others.

Prerequisites: Probability, Linear Algebra, Undergraduate Algorithms.

I will focus on supervised learning and only talk about unsupervised settings when necessary (e.g., mixture models and density estimation for generative methods for classification). So, no clustering. There will be twenty 1.5 hour lectures; numbers in parentheses are estimated # of lectures per topic.

- intro to ML, motivation etc. (1)
- refresher on probability and algebra (1)
- statistical framework for learning; loss/risk; least squares regression (1)
- noise models; error decomposition; bias/variance and overfitting (1)
- estimation theory; ML/MAP, overfitting, bias/variance (1)
- model complexity, sparsity (L1/L2) in regression; stepwise methods for L0 sparsity (1)
- classification; Fisher’s LDA, logistic regression and softmax (1)
- ensemble methods, boosting (1)
- generative models, Naive Bayes, multivariate Gaussians (1)
- mixture models; EM (2)
- SVM and kernels (2)
- nonparametric methods; nearest neighbors, density esimation (1)
- multilayer neural networks and deep learning (1)
- information theory and learning; information criteria, MDL and their connections to regularization (1)
- experiment design and evaluation in ML (1)
- advanced topics TBD (1)
- wrap-up and review (1)

Prerequisites: knowledge of basic linear algebra, probability and calculus.

Expected outcomes:

- Understand the notion of fitting a model to data and concepts such as model complexity, overfitting and generalization, and bias-variance tradeoff in estimation.
- Learn and be able to apply some of the fundamental learning methods, such as logistic regression, support vector machines, boosting, decision trees, neural networks.
- Learn the basics of optimization techniques such as gradient descent and the general EM algorithm.
- Familiarity with multivariate Gaussians and mixtures of Gaussians.
- Understand fundamental concepts in information theory (entropy, KL-divergence) and their relationship to machine learning.

100 units

McAllester, David

This course covers the foundations of mathematics from a classical (nonconstructive) type-theoretic perspective, the general notion of a mathematical structure, the general notion of isomorphism, and the role of axiomatizations and constructions in mathematical definitions. The definition of the real numbers used as a fundamental example. The course also covers the notion of definability in well-typed formalisms. A primary example is the non-definability of a linear bijection between a vector space and its dual. Ontologies (types) relevant to machine learning are emphasized such as the type theory of PCA, CCA and Banach spaces (norms and dual norms).

There are again two lectures per week. Introduction and course outline.

Part I: Logic with Abstraction Barriers.

Sequent inference rules and proofs. Types and variable declarations. Free and bound variables. Structures and Isomorphism.

Isomorphism as equivalence under an abstract interface.

Part II: Case Studies in Abstraction

The natural numbers, integers, rationals and reals. Vector spaces. A formal treatment of the non-existence of a canonical basis (coordinate system), canonical inner product, or canonical isomorphism with the dual space. Coordinate-free treatment of matrix algebra.

Equivalences between matrix (operator) types. The fact that least squares regression does not require an ambient inner product but regularization does. Gradient descent. Gradient descent requires an ambient inner product. Newton’s method does not. The covariance matrix of a probability distribution on a vector space. The fact that the multivariate central limit theorem does not require an ambient inner product. Canonical correlation analysis also does not require an ambient inner product. PCA requires and ambient inner product.

Norms and Banach spaces. Dual norms. Measure spaces. Hilbert space. Differentiable manifolds. Information geometry. Jeffery’s prior. Natural Gradient Descent.

Expected outcomes:

- Be able to write rigorous proofs and reason formally about various mathematical notions.
- Understand basics of linear algebra, eigenvalues and eigenvectors from the viewpoint of operators.
- Understand various algorithms such as SVMs, PCA, CCA and gradient descent from the operator viewpoint.

100 units

McAllester, David

Introduction to deep learning for computer vision. Although deep learning based computer vision systems are evolving rapidly, this course attempts to teach material that will remain relevant and useful as the field changes. The course begins with general deep learning methods relevant to many applications and gradually focuses to a greater extent on computer vision. The course will emphasize theoretical and intuitive understanding to the extent possible.

Topics:

- Introduction to multi-layer perceptrons and backpropogation.
- Convolutional neural networks (CNNs)
- Recurrent neural networks (RNNs).
- General frameworks and mathematical notations for expressing neural networks.
- Implementing a framework — CUDA.
- The theory and practice of stochastic gradient descent.
- Theoretical guarantees for SGD.
- First order vs. second order methods.
- Adagrad, RMSprop, ADAM
- Vector-Hessian products
- Learning rates and momentum.

- Vanishing Gradients and exploding gradients and methods to counter these problems.
- LSTMs, GRUs, highway networks, and Resnet.

- Regularization
- L2 regularization
- Dropouts
- Early Stopping

- A survey of vision tasks and deep approaches to solving them.
- Image classification
- Object Detection
- Semantic Segmentation
- Image captioning
- Visual question answering

Expected outcomes:

- Ability to apply the methods for designing and training general deep learning architectures.
- An understanding of the general issues and phenomenon sufficient to guide architecture design.
- An understanding of the basic vision tasks to which deep methods are being applied and how deep methods approach these tasks.

Prerequisites: Introduction to machine learning.

100 units

Xu, Jinbo

This course will focus on the application of mathematical models and computer algorithms to studying molecular biology. In particular, this course will cover the following topics.

- homology search ( 1 week)
- sequence alignment and motif discovery ( 1 week)
- next-gen sequencing and genome assembly ( 1 week)
- protein sequence/structure analysis including alignment, classification, structure and function prediction ( 2 weeks).
- RNA sequence/structure analysis including alignment, classification and prediction ( 1 week)
- gene expression analysis (1 week)
- biological network analysis ( 1 week)
- phylogeny ( 1 week)

Expected outcomes:

- Ability to use popular bioinformatics tools to generate biologically meaningful results
- Ability to interpret biological results generated by a bioinformatics tool
- Ability to implement basic machine learning and optimization algorithms for important biology problems
- Application of some basic models and algorithms such as Support Vector Machines, Hidden Markov Model, Dynamic Programming to solve biology problems

Prerequisites: None

100 units

Razborov, Alexander

Part one consists of models for defining computable functions: primitive recursive functions, (general) recursive functions, and Turing machines; the Church-Turing Thesis; unsolvable problems; diagonalization; and properties of computably enumerable sets. Part two deals with Kolmogorov (resource bounded) complexity: the quantity of information in individual objects. Part three covers functions computable with time and space bounds of the Turing machine: polynomial time computability, the classes P and NP, NP- complete problems, polynomial time hierarchy, and P-space complete problems.

The tentative plan for this course is to do the following three parts.

Part I. Models for defining computable functions: primitive recursive functions, (general) recursive functions, and Turing machines; the Church-Turing Thesis; unsolvable problems; diagonalization; and properties of computably enumerable sets.

Part II. Kolmogorov (resource bounded) complexity: the quantity of information in individual objects.

Part III. Functions computable with time and space bounds of the Turing machine: polynomial time computability, the classes P and NP, NP-complete problems, polynomial time hierarchy, and P-space complete problems.

Here is the Web page of the previous version of this course: http://people.cs.uchicago.edu/~razborov/teaching/spring14.html. But I will be very willing to adjust the speed, style and content depending on the background of the students enrolled in the course.

Expected outcomes:

- Ability to identify algorithmic problems arising in various contexts.
- Application of classical recursion theory, Kolmogorov complexity and basic computational complexity to the analysis of problems arising in mathematics and beyond.
- Recognize and identify solvable and unsolvable problems, problems with high/low Kolmogorov complexity, as well as problems complete for basic complexity classes.

100 units

Srebro, Nati

The course will cover techniques in unconstrained and constrained convex optimization and a practical introduction to convex duality. The course will focus on (1) formulating and understanding convex optimization problems and studying their properties; (2) understanding and using the dual; and (3) presenting and understanding optimization approaches, including interior point methods and first order methods for non-smooth problems. Examples will be mostly from data fitting, statistics and machine learning.

Prerequisites: Linear Algebra, Multidimensional Calculus, Undergraduate Algorithms

Specific Topics:

- Formalization of optimization problems
- Smooth unconstrained optimization: Gradient Descent, Conjugate Gradient Descent, Newton and Quasi-Newton methods; Line Search methods
- Standard formulations of constrained optimization: Linear, Quadratic, Conic and Semi-Definite Programming
- KKT optimality conditions
- Lagrangian duality, constraint qualification, weak and strong duality
- Fenchel conjugacy and its relationship to Lagrangian duality
- Multi-objective optimization
- Equality constrained Newton method
- Log Barrier (Central Path) methods, and Primal-Dual optimization methods
- Overview of the simplex method as an example of an active set method.
- Overview of the first order oracle model, subgradient and separation oracles, and the Ellipsoid algorithm.
- Large-scale subgradient methods: subgradient descent, mirror descent, stochastic subgradient descent.

Expected outcomes:

- Ability to discuss and understand optimization problems in terms of required information/access, assumptions, iteration complexity and runtime
- Ability to identify convex and non-convex optimization problems
- Ability to make informed choices about the choice of optimization algorithm
- Ability to derive the dual problem, and use the dual and KKT conditions to reason about optimal solutions
- Familiarity with unconstrained optimization methods, including Gradient Descent, Conjugate Gradient Descent, Newton’s Method and Quasi-Newton Methods
- Familiarity with Interior Point methods for constrained optimization
- Familiarity with standard formulations including Linear Programming, Quadratic Programming and Semidefinite Programming
- Ability to cast abstract problems as constrained optimization problems, and in terms of standard formulations

100 units

Chuzhoy, Julia

This is a basic course on approximation algorithms, with the main focus on approximation algorithms for central combinatorial optimization problems. We will mostly focus on classical algorithmic results, but will also present some state of the art results and challenges in the area of approximation. The course will cover major algorithmic techniques, including LP-rounding, primal-dual schema, metric methods, SDP rounding and so on. While the main focus of the course is on algorithms, we will also discuss lower bounds on approximation and connections between algorithm design and lower bound proofs.

Assumes the knowledge of material covered in the Algorithms course.

Expected outcomes:

- Understand concepts such as approximation factor, polynomial time approximation schemes and hardness of approximation.
- Understand applications of linear programs (LPs) to design of approximation algorithms. Learn to analyze rounding algorithms for LPs and understand integrality gaps. Be able to apply LP duality.
- Understand semi-definite programming and its applications to approximation.

Prerequisites: Algorithms (TTIC31010 or CMSC 37000)

100 units

Livescu, Karen

Introduction to analysis of signals and linear time-invariant systems at a graduate level. Topics include: Continuous and discrete-time transforms (Fourier and others); linear filtering; sampling and aliasing; random processes and their interaction with linear systems; applications in areas such as speech and image processing and robotics.

Prerequisites: familiarity with basic linear algebra, notions of eigenvalues and eigenvectors, and (multivariate) Gaussian distributions.

Expected outcomes:

- Understand the properties and eigenfunctions of linear time-invariant (LTI) systems.
- Understand Fourier Series (discrete- and continuous-time), Fourier transform, and convolutions.
- Be able to analyze random processes, understand their stationarity properties and their interaction with LTI systems.
- Learn about sampling and aliasing, signal estimation minimizing mean square error and parameter estimation for random processes. Understand the Karhunen-Loeve transform.

100 units

Makarychev, Yury

The course covers fundamental concepts, algorithms and techniques in computational and metric geometry. Topics covered include: convex hulls, polygon triangulations, range searching, segment intersection, Voronoi diagrams, Delaunay triangulations, metric and normed spaces, low–distortion metric embeddings and their applications in approximation algorithms, padded decomposition of metric spaces, Johnson—Lindenstrauss transform and dimension reduction, approximate nearest neighbor search and locality–sensitive hashing.

The course textbook is “Computational Geometry” by M. de Berg, O. Cheong, M. van Kreveld, M. Overmars.

- Convexity: convex sets, convex hulls, vertices, supporting lines, edges, different definitions and basic properties, Caratheodory’s theorem
- Convex Hulls and Line Segment Intersections: Jarvis March, Andrew’s algorithm, sweep line algorithms, line segment intersection, Bentley—Ottmann algorithm
- Planar Graphs and Overlays: graphs, graph drawings, plane and planar graphs, Euler’s formula, data structure for plane graphs, computing overlays
- Orthogonal Range Searching (2 lectures): binary search, kd-trees, range trees
- Point Location: trapezoidal maps, randomized algorithm
- Voronoi Diagrams: Voronoi diagrams, Fortune’s algorithm
- Delaunay Triangulations (1.5 lectures): triangulations, Delaunay and locally Delaunay triangulations: definitions, existence and equivalence, duality between Delaunay triangulations and Voronoi diagrams, angle optimality
- Metric Spaces, Normed Spaces, Low Distortion Metric Embeddings (1.5 lectures): metric and normed spaces, Lipschitz maps, distortion, embeddings into Lp and lp
- Bourgain’s Theorem
- Sparsest Cut: approximation algorithm for Sparsest Cut
- Minimum Balanced Cut, Minimum Linear Arrangement, Sparsest Cut with Non-Uniform Demands, Expanders: polylog approximation algorithms for Balanced Cut and Minimum Linear Arrangement, expander graphs, integrality gap for Sparsest Cut, Sparsest Cut with non-uniform demands
- Minimum Multiway Cut, Minimum Multicut: approximation algorithms for Minimum Multiway Cut and Minimum Multicut
- Padded Decomposition, Hierarchically Separated Trees, Applications (2 lectures)
- Semidefinite Programming, Algorithm of Arora, Rao and Vazirani: semidefinite programming, ARV (high-level overview), delta separated sets, matching covers
- Dimension Reduction, Nearest Neighbor Search: dimension reduction, approximate nearest neighbor search, locality sensitive hashing
- Locality Sensitive Hashing, p–Stable Random Variables: locality sensitive hashing, p–stable random variables

Expected outcomes:

- Know standard algorithms and data structures for solving geometric problems
- Be able to design efficient algorithms and data structures for solving geometric problems
- Understand basic concepts of metric geometry such as metric and normed space, low distortion embedding, dimension reduction, nearest neighbor search.
- Understand applications of metric geometry to the field of approximation algorithms and other areas of computer science.

Prerequisites: undergraduate-level algorithms, linear algebra and probability classes; a good background in mathematical analysis/calculus

100 units

Livescu, Karen

This course will introduce techniques used in speech technologies, mainly focusing on speech recognition and related tasks. Speech recognition is one of the oldest and most complex structured sequence prediction tasks receiving significant research and commercial attention, and therefore provides a good case study for many of the techniques that are used in other areas of artificial intelligence involving sequence modeling. The course will cover core techniques in detail, including hidden Markov models, recurrent neural networks, and conditional random fields. The course will include practical homework exercises where we will build and experiment with speech processing models. Finally, it will include a sampling of the connections between speech and other modalities like images and text.

Prerequisites: a good background in basic probability, preferably also introductory machine learning.

Expected outcomes:

- Understand and apply tools for analyzing speech time series such as Fourier analysis and dynamic time warping.
- Understand and apply hidden Markov models, Gaussian mixtures, and the EM algorithm for speech problems.
- Understand and apply language models, smoothing techniques, and their application to speech recognition.
- Understand generative and discriminative structured prediction approaches for speech problems.
- Understand and apply modern deep learning tools for speech processing tasks.

100 units

Srebro, Nati

We will discuss classic results and recent advances in statistical learning theory (mostly under the agnostic PAC model), touch on computational learning theory, and also explore the relationship with stochastic optimization and online regret analysis. Our emphasis will be on concept development and on obtaining a rigorous quantitative understanding of machine learning. We will also study techniques for analyzing and proving performance guarantees for learning methods.

Pre-Requisites: Mathematical maturity, as obtain, e.g., in a rigorous analysis course. Discrete Math (specifically combinatorics and asymptotic notation) Probability Theory Introduction to Machine Learning Algorithms; Basic Complexity Theory (NP-Hardness) Familiarity with Convex Optimization, Computational Complexity and background in Statistics can be helpful, but is not required.

Specific Topics:

- The Statistical Model (Learning Based on an IID Sample):
- The PAC (Probably Approximately Correct) and Agnostic PAC models.
- Stochastic Optimization
- Cardinality Bounds
- Description Length Bounds
- PAC-Bayes
- Compression Bounds
- The Growth Function and VC Dimension
- VC Subgraph Dimension and Fat Shattering Dimension
- Tight Characterization of Learning in terms of the VC and Fat Shattering Dimensions
- Covering Numbers
- Rademacher Averages, including Local Rademacher Analysis

- Uniform Learning and No-Free Lunch Theorems
- Online Learning, Online Optimization and Online Regret
- The Perceptron Rule and Online Gradient Descent
- Experts and the Winnow Rule
- Bregman Divergence and Online Mirror Descent
- Online to Batch Conversion

- Computational Lower Bounds:
- Computational Hardness of Proper Learning
- Cryptographic Hardness of Learning

- Additional Topics
- Stability Based Analysis
- Boosting: Weak Learning and the Margin Interpretation of Boosting.

Expected outcomes:

- Ability to recognize different learning models and make rigorous statements about learning methods
- Ability to use standard techniques to prove learning guarantees
- Ability to prove lower bounds for learning problems

100 units

Shakhnarovich, Greg

Developing autonomous systems that are able to assist us in everyday’s tasks is one of the grand challenges in modern computer science. While a variety of novel sensors have been developed in the past few years, in this class we will focus on the extraction of this knowledge from visual information alone. One of the most remarkable examples of successful recognition systems is our visual system, which is able to extract high-level information from very noisy and ambiguous data. Unfortunately, despite decades of research efforts, machines are still way below human performance. In this class we will study why this is the case.

The goal of this graduate class is to understand the different visual recognition tasks as well as the techniques employed to solve them. A strong component of the course will be statistical learning as it plays a key role in almost every modern visual recognition system. We will cover all stages of the recognition pipeline: low-level (e.g., features), mid-level (e.g., segmentation) as well as high-level reasoning (e.g., scene understanding).

Knowledge of machine learning and computer vision is not required, but highly recommended. The theoretical aspects of visual recognition will be covered during the lectures. The class will have a strong practical component, as the students will build the different recognition components during the homework sessions. A list of topics includes:

- Classification: features, bag of words (BOW), similarity between images, learning features as well as hashing schemes and retrieval.
- Detection: sliding window approaches, branch and bound, structure prediction, hough voting and NN approaches, hierarchical models.
- Segmentation: classical approaches (e.g., watershading) as well as modern structure pre- diction approaches including message passing and graph cuts for inference, and CRFs and structured-SVMs for learning.
- Modern 3D geometry and 3D scene understanding: stereo, scene layout (e.g., 3D box for indoor scenes, road layout for outdoor scenes).
- Pose estimation: pictorial structures (2D) as well as 3D pose estimation including particle filter-based approaches.

100 units

Urtasun, Raquel

A graphical model is a probabilistic model, where the conditional dependencies between the random variables is specified via a graph. Graphical models provide a flexible framework for modeling large collection of variables with complex interactions, as evidenced by their wide domain of application, including for example machine learning, computer vision, speech and computational biology. This course will provide a comprehensive survey of learning and inference methods in graphical models, including variational methods, primal-dual methods and sampling techniques.

100 units

Ohannessian, Mesrob

The course is aimed at first-year graduate students and advanced undergraduates. The goal of the course is to collect and present important mathematical tools used in different areas of computer science. The course will mostly focus on linear algebra and probability.

We intend to cover the following topics and examples:

- Abstract linear algebra: vector spaces, linear transformations, Hilbert spaces, inner product, Gram-Schmidt orthogonalization, Eigenvalues and eigenvectors, SVD, least squares (under/over-constrained)
- Discrete probability: random variables, Markov, Chebyshev and Chernoff bounds.
- Gaussian variables, concentration inequalities, dimension reduction
- Martingales (time permitting)
- Stochastic Processes (time permitting)

Expected outcomes:

- Ability to write correctly typed rigorous proofs.
- Understanding of various notions of linear algebra in the context of abstract vector spaces.
- Ability to understand and analyze stochastic processes. Familiarity with discrete and continuous random variables and various concentration bounds.

Prerequisites: None

100 units

Xu, Jinbo

TTIC 31160 will focus on the application of mathematical models and computer algorithms to studying structure biology, in particular, protein, RNA and DNA molecule structures.

Here is a list of topics that I am going to cover in this course.

- Introduction to molecular structures (1 week)
- Bioinformatics for biological sequence analysis (1 week)
- Algorithms for molecule structure comparison and alignment (1 week)
- Algorithms for protein secondary structure prediction (0.5 week)
- Algorithms for protein tertiary structure prediction (1 week)
- Algorithms for RNA secondary structure prediction (1 week)
- Algorithms for RNA tertiary structure prediction (0.5 week)
- Algorithms for protein docking (1 week)
- Algorithms for protein-protein and protein-RNA interaction prediction (1 week)
- Algorithms for Chromatin structure determination ( 1 week)

There will be both homework and final projects.

Expected outcomes:

- Ability to formulate structure biology problems into a mathematical problem
- Application of advanced optimization algorithms (linear programming, semidefinite programming and graph algorithms) and machine learning models (probabilistic graphical models) to solve important problems in structure bioinformatics
- Mastery of current hot topics in structure bioinformatics
- Ability to conduct semi-independent research in structure bioinformatics

Prerequisites: None

100 units

Walter, Matthew

This course concerned with fundamental techniques in robotics and artificial intelligence (AI), with an emphasis on probabilistic inference, learning, and planning under uncertainty. The course will investigate the theoretical foundations underlying these topics as rigorous mathematical tools that enable solutions to real-world problems drawn broadly from robotics and AI. The course will cover topics that include: Bayesian filtering (Kalman filtering, particle filtering, and dynamic Bayesian networks), simultaneous localization and mapping, planning, Markov decision processes, partially observable Markov decision processes, reinforcement learning, and graphical models.

Expected outcomes:

- Understand the role of and probabilistic techniques for modeling and mitigating uncertainty in dynamic systems
- Demonstrate the ability to derive analytical and nonparametric solutions for recursive Bayesian estimation
- Formulate probabilistic models that represent the problem of robot localization and mapping, and show how these models afford the use of techniques from recursive estimation
- Understand algorithms for planning/search and decision making within deterministic and stochastic domains
- Demonstrate the ability to implement state-of-the-art algorithms for uncertainty mitigation and to apply these techniques to new problems and domains

Prerequisites: Basic familiarity with basic linear algebra; background in probability theory; basic programming experience

100 units

Walter, Matthew

Many problems in machine learning, computer vision, natural language processing, robotics, computational biology, and beyond require modeling complex interactions between large, heterogeneous collections of random variables. Graphical models combine probability theory and graph theory to provide a unifying framework for representing these relationships in a compact, structured form. Probabilistic graphical models decompose multivariate joint distributions into a set of local relationships among small subsets of random variables via a graph. These local interactions result in conditional independencies that afford efficient learning and inference algorithms. Moreover, their modular structure provides an intuitive language for expressing domain-specific knowledge, and facilitates the transfer of modeling advances to new applications.

This graduate-level course will provide a strong foundation for learning and inference with probabilistic graphical models. The course will first introduce the underlying representational power of graphical models, including Bayesian and Markov networks, and dynamic Bayesian networks. Next, the course will investigate contemporary approaches to statistical inference, both exact and approximate. The course will then survey state-of-the-art methods for learning the structure and parameters of graphical models.

- Review of probability theory
- Directed graphical models
- Undirected graphical models
- Conditional random fields
- Temporal models (hidden Markov models)
- Exponential families
- Variable elimination, sum-product
- Belief propagation
- MAP inference
- Variational inference
- Sampling-based inference
- Dual decomposition
- Multivariate Gaussians: Kalman filters and smoothing
- Learning for directed graphical models
- Learning for undirected graphical models
- Learning given unobserved data
- Learning for Gaussian models

Expected outcomes:

- Understand the representation of graphical models, including directed, undirected, and factor graph representations; factorization and Markov properties; and common spatial, temporal, hierarchical, and relational models.
- Develop a solid understanding of exponential families and the related issues of conjugate priors, ML estimation, and parameter estimation in directed and undirected graphical models.
- Demonstrate a familiarity with Gaussian graphical models, including Bayesian networks, Markov random fields, and inference algorithms under these models.
- Understand methods for exact inference, including variable elimination, belief propagation (message passing), and the junction tree algorithm.
- Understand techniques for approximate inference, including variational and Monte Carlo methods (e.g., Gibbs sampling, Rao-Blackwellization, and Metropolis-Hastings).
- Understand techniques for learning the structure and parameters of different families of graphical models both from observed and latent data.

Prerequisites: TTIC 31020 (or equivalent)

100 units

Gimpel, Kevin

This course will introduce fundamental concepts in natural language processing (NLP). NLP includes a range of research problems that involve computing with natural language. Some are user-facing applications, such as spam classification, question answering, summarization, and machine translation. Others serve supporting roles, such as part-of-speech tagging and syntactic parsing. Solutions draw from machine learning (especially deep learning), algorithms, and linguistics. There is particular interest in structured prediction in which the output structure is a sequence, tree, or sentence.

Topics include:

- words: lexical semantics, distributional representations, clusters, and word embeddings
- sequences: language modeling and smoothing, tasks such as part-of-speech tagging and named-entity recognition, model families like hidden Markov models and conditional random fields, neural sequence encoders such as recurrent and convolutional neural networks
- trees: syntactic parsing, including constituency parsing and dependency parsing, context-free grammars, parsing algorithms
- computational semantics, including compositionality, reference, and shallow semantic parsing
- word alignment and machine translation

Assignments include formal exercises as well as practical exercises involving implementing algorithms and using NLP toolkits.

Expected outcomes:

- Understand key challenges of computing with natural language
- Understand and apply solutions to standard NLP tasks, such as hidden Markov models, conditional random fields, and bidirectional LSTM taggers for sequence labeling
- Be able to implement basic neural network architectures for core NLP tasks using deep learning toolkits
- Be able to derive dynamic programming algorithms to perform inference in structured output spaces, and to analyze their computational properties
- Understand common types of syntactic and semantic analysis, and how they are used in downstream applications
- Recognize and characterize the errors made by NLP systems

Prerequisites: basic knowledge of calculus, linear algebra, and probability; basic programming experience; a machine learning course is recommended but not required.

100 units

Tulsiani, Madhur

This course is meant to serve as an introduction to some basic concepts in information theory and error-correcting codes, and some of their applications in computer science and statistics. We plan to cover the following topics:

- Introduction to entropy and source coding. Some applications of entropy to counting problems.
- Mutual information and KL-divergence. Method of types and hypothesis testing. I-projections and applications.
- Introduction to error-correcting codes. Unique and list decoding of Reed-Solomon and Reed-Muller codes.
- Applications of information theory to lower bounds in computational complexity and communication complexity.

Expected outcomes:

- Familiarity with concepts such as Entropy, Mutual information and KL-divergence.
- Familiarity with source and channel coding.
- Understanding of the method of types and ability to derive large-deviation bounds using information-theoretic concepts.
- Understanding of the notions of unique and list decoding for various codes.

Prerequisites: Discrete probability. Some knowledge of finite-field algebra is required for the part on error-correcting codes but required basics are reviewed in class.

100 units

Blum, Avrim

This course will cover some of the basic theory underlying machine learning and the process of generalizing from data. We will talk about both the PAC model for batch learning (learning over one set of data with the intent of producing a predictor that performs well on new data) and models for learning from feedback over time (online learning). We will discuss important fundamental concepts including overfitting, uniform convergence, formal notions of Occam’s razor, VC-dimension, and regularization, as well as several classic algorithms including the Perceptron algorithm, SVMs, algorithms for combining expert advice, and boosting. We will also discuss limited-feedback (bandit) algorithms, reinforcement learning, connections between learning and game theory, and formal guarantees on privacy. This will be a proof-oriented course: our focus will be on proving performance guarantees for algorithms that aim to generalize from data as well as understanding what kinds of performance guarantees we can hope to prove.

Pre-Requisites:

The main pre-requisite is comfort with a proof-oriented course, having taken some algorithms class, and comfort with basic probabilistic modeling and reasoning. For example, 1000 programs are submitted to a stock-market prediction challenge, and we find that one of those programs has correctly predicted whether the market will go up or down the next week for 10 weeks in a row; should we feel confident that the program is a good one? Comfort with thinking about points and vectors in high-dimensional space is a plus.

Specific Topics Include:

- The PAC (Probably Approximately Correct) batch learning model
- Overfitting and generalization
- Cardinality and description-length bounds: Occam’s razor
- Regularization
- The online mistake bound model
- The Perceptron algorithm
- Hinge-loss and inseparable data
- Kernel functions, SVMs
- Online to batch conversion
- VC-dimension, the growth function, and Sauer’s lemma
- Uniform convergence proofs using ghost samples
- Boosting
- Learning and Game Theory
- Bandit algorithms
- Reinforcement learning
- Differential Privacy
- Additional possible topics: Semi-Supervised Learning, Rademacher Bounds, Limitations on Learning

Expected outcomes:

- Ability to recognize different learning models and make rigorous statements about learning methods
- Ability to use standard techniques to prove learning guarantees
- Ability to think critically about new learning paradigms

100 units

Gimpel, Kevin

This course is a follow-up to TTIC 31190. It will go into more depth of the fundamentals of natural language processing (NLP) and cover a broader range of applications. Some of the class meetings will be hands-on, guided laboratory-style meetings; a laptop is strongly recommended for these class meetings, but not strictly required.

Topics include:

- grammatical formalisms (CFG, TSG, TAG, and CCG)
- exact and approximate parsing algorithms (CKY, shift-reduce, k-best parsing, cube pruning, etc.)
- logical semantics and semantic parsing
- semantic formalisms (abstract meaning representation, etc.)
- training and decoding criteria for NLP (e.g., minimum Bayes risk)
- unsupervised learning in NLP (EM for HMMs and PCFGs, topic models, Bayesian nonparametrics)
- advanced neural network methods in NLP, including recurrent, recursive, and convolutional networks, encoder-decoder architectures, and attention-based models
- the application of these techniques to important NLP applications, including: textual entailment, dialogue systems, machine translation, question answering, automatic summarization, and coreference resolution

Assignments include formal exercises as well as practical exercises involving implementing algorithms and using NLP and deep learning toolkits.

Expected outcomes:

- Be able to derive dynamic programming algorithms for inference with grammatical formalisms and other structured output spaces, and to analyze their computational properties
- Understand trade-offs of approximate inference algorithms used in NLP and be able to choose algorithms appropriately
- Be able to design generative models for textual data and derive statistical inference algorithms for quantities of interest
- Understand state-of-the-art solutions to key NLP applications, including approaches based on deep learning

Prerequisites: TTIC 31190 or permission of the instructor.

100 units

Livescu, Karen

This course will introduce concepts and techniques for analyzing and learning from unlabeled data. Unsupervised methods are used, for example, for visualization, data generation, and representation learning. The course will motivate settings in which unsupervised methods are needed or useful, and discuss relationships between unsupervised and supervised methods. Topics will include fixed feature representations such as Fourier methods and count-based features; linear and nonlinear dimensionality reduction; clustering; density estimation, mixture models, and expectation maximization; and semi-supervised/ distant-supervision settings. Linear, kernel, and deep neural network-based methods will be included. Assignments will include theoretical and practical exercises.

Prerequisites: a good background in basic probability, linear algebra, TTIC 31020, and familiarity and comfort with programming; or permission of the instructor.

Expected outcomes:

- Understand typical settings where unsupervised methods are used, including visualization, representation, analysis, and generation, and be able to choose relevant methods for a given situation
- Understand how supervised and unsupervised data and methods can be combined.
- Be able to analyze and visualize data using relevant fixed feature representations.
- Understand the motivation and application of dimensionality reduction techniques.
- Understand and be able to apply clustering and density estimation techniques.
- Understand the current state of the art and research landscape in selected areas.
- Develop proficiency in applying relevant techniques to real data in practical settings.

100 units

McAllester, David

Introduction to fundamental principles of deep learning. Although deep learning systems are evolving rapidly, this course attempts to teach material that will remain relevant and useful as the field changes. The course will emphasize theoretical and intuitive understanding to the extent possible.

Topics:

- Introduction to multi-layer perceptrons and backpropagation.
- Convolutional neural networks (CNNs) and Recurrent neural networks (RNNs).
- General frameworks and mathematical notations for expressing neural networks.
- Implementing a framework in Python.
- The theory and practice of stochastic gradient descent.
- Batch Normalization
- Vanishing Gradients and exploding gradients and methods to counter these problems.
- LSTMs, GRUs, highway networks, and Resnet.

- Regularization
- L2 regularization
- Dropouts
- Early Stopping
- Generalization Bounds

- Model Compression
- Attention and Memory
- Stack Architectures
- Generative Adversarial Networks
- Autoencoders

Expected outcomes:

- Ability to design and train novel deep learning architectures.
- An understanding of the general issues and phenomenon sufficient to guide architecture design and training.

Prerequisites: Introduction to machine learning.

100 units

Walter, Matthew

This course considers problems in perception, navigation, and control, and their systems-level integration in the context of self-driving vehicles through an open-source curriculum for autonomy education that emphasizes hands-on experience. Integral to the course, students will collaborate to implement concepts covered in lecture on a low-cost autonomous vehicle with the goal of navigating a model town complete with roads, signage, traffic lights, obstacles, and citizens. The wheeled platform is equipped with a monocular camera and a performs all processing onboard with a Raspberry Pi 3, and must: follow lanes while avoiding obstacles, pedestrians and other robots; localize within a global map; navigate a city; and coordinate with other robots to avoid collisions. The platform and environment are carefully designed to allow a sliding scale of difficulty in perception, inference, and control tasks, making it usable in a wide range of applications, from undergraduate-level education to research-level problems. For example, one solitary robot can successfully wander the environment using only line detection and reactive control, while successful point-to-point navigation requires recognizing street signs. In turn, sign detections can be “simulated” either by using fiducials affixed to each sign, or it can be implemented using “real” object detection. Realizing more complex behaviors, such as vision-based decentralized multi-robot coordination, poses research-level challenges, especially considering resource constraints. In this manner, the course is well suited to facilitate undergraduate and graduate-level education in autonomy.

The course will be taught in concurrently and in conjunction with classes at the University of Montreal and ETH Zurich, which provides opportunities for interaction and collaboration across institutions.

Topics:

The course covers fundamental techniques in perception, planning, and control for autonomous agents and their integration as part of a complex system. Specific topics include:

- camera geometry, intrinsic/extrinsic calibration;
- minimal sufficient representations for visual tasks;
- nonlinear filtering, including robust localization and mapping (localize in a given map, or create your own map);
- shared control and level 2,3 autonomy;
- complex perception pipelines: (use of) object detection (reading traffic signs) and tracking;
- safety and correctness (navigate intersections); and
- signaling and coordination for distributed robotics (reason about the intent of the other drivers).

Expected outcomes:

- An understanding of fundamental techniques in perception, planning, and control for autonomous agents and their integration as part of a complex system
- Understand how to design these subsystems together, such that performance is maximized, while (shared) resource usage is minimized.
- Familiarity with basic practices of reliable system development, including test-driven and data-driven development.
- Understanding of the tools and the dynamics of software and hardware open-source development. All homework is shared on a central Git repository.
- Familiarity with the constraints related to co-design of integrated systems.

Prerequisites: There are no formal course requirements, though having taken TTIC 31170 – Planning, Learning, and Estimation for Robotics and Artificial Intelligence is desireable. Students must be familiar with the GNU/Linux development environment and are required to have access to a laptop with Ubuntu 16.04 installed.

50 units

Stratos, Karl

A spectral technique refers to a technique that makes use of the eigenvalues of a matrix. Thanks to the rich theory underlying linear algebra and matrix perturbation, these techniques can offer a precise and satisfying understanding of a wide class of problems such as dimensionality reduction and model estimation. Moreover, the existence of powerful algorithms for computing eigenvalues makes the approach also practical, with applications ranging from robotics to natural language processing. This seminar-like course will first supply an inventory of mathematical tools to understand and derive spectral techniques used in modern machine learning. It will apply these tools to examine some of the most recent developments in the literature. In the latter part of the course, the course will “flip” and students will take turns to present a paper on recent research in this area. A student can choose any relevant work under this topic, but personal guidance and references will be provided.

Audience and prerequisites: The course is designed to encourage students to start doing research in this area. Thus the expected audience is graduate students with relevant background, for example a PhD student in machine learning theory/application who is comfortable with proofs and has some basic knowledge of linear algebra (but not familiar with the techniques considered in this course). Both TTIC and UofC students are qualified to take this course, possibly the former getting priority if necessary. Class size is 10-20.

The goals of taking this course are:

- Achieving an understanding of the foundational concepts and tools used in modern spectral methods
- Obtaining an ability to accurately evaluate new works in this area at conferences
- Finding new research projects that persist beyond this course and result in publications

This course will be pass-fail. All students who actively engage in the class will receive a pass (this will be evaluated based on attendance, asking questions, presentation, etc.).

There will be optional homeworks to reinforce understanding of the materials taught in the class. They will not be graded, but solutions will be posted after a period of time for self-grading.

Course site: http://karlstratos.com/teaching/spectral_topics/spectral_topics.html.

100 units

Original academic research conducted under guidance of an instructor (normally student’s PhD advisor), directed at making progress towards advancing the state of the art in the chosen area.

Expected outcomes:

- Familiarity with peer-reviewed literature on the chosen topic of focus representing both current state of the art and historical developments.
- Ability to develop a short-, medium- and long-term research plan.
- Improved technical skills relevant to the research topic (examples include: using relevant mathematical tools, developing, documenting and using advanced software, designing and executing user studies, designing and conducting experiments).
- Ability to communicate research progress in regular meetings with advisor and colleagues.
- Ability to describe research results in a technical document at a level and in format appropriate for a submission to a peer-reviewed venue.

100 units

Original academic research conducted under guidance of an instructor (normally student’s PhD advisor), directed at making progress towards advancing the state of the art in the chosen area.

Expected outcomes:

- Familiarity with peer-reviewed literature on the chosen topic of focus representing both current state of the art and historical developments.
- Ability to develop a short-, medium- and long-term research plan.
- Improved technical skills relevant to the research topic (examples include: using relevant mathematical tools, developing, documenting and using advanced software, designing and executing user studies, designing and conducting experiments).
- Ability to communicate research progress in regular meetings with advisor and colleagues.
- Ability to describe research results in a technical document at a level and in format appropriate for a submission to a peer-reviewed venue.

100 units

Advisor

In-depth involvement in areas of computer science in a research lab, University or business setting. Internship activities and objectives must be related to the student’s program objectives. Required enrollment for F-1 CPT internship. Advisor’s Consent Required.