High-dimensional probability and statistics for the data sciences

Similar documents
1.1 Administrative Stuff

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

Machine Learning CSE546

Lecture 1 and 2: Introduction and Graph theory basics. Spring EE 194, Networked estimation and control (Prof. Khan) January 23, 2012

course overview 18.06: Linear Algebra

Course Staff. Textbook

Math 200 A and B: Linear Algebra Spring Term 2007 Course Description

Multivariable Calculus

MAT 211, Spring 2015, Introduction to Linear Algebra.

Machine Learning CSE546

Welcome to Physics 161 Elements of Physics Fall 2018, Sept 4. Wim Kloet

AS The Astronomical Universe. Prof. Merav Opher - Fall 2013

Math/EECS 1028M: Discrete Mathematics for Engineers Winter Suprakash Datta

Calculus, Series and Differential Equations

Math 410 Linear Algebra Summer Session American River College

SYLLABUS SEFS 540 / ESRM 490 B Optimization Techniques for Natural Resources Spring 2017

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

AMSC/MATH 673, CLASSICAL METHODS IN PDE, FALL Required text: Evans, Partial Differential Equations second edition

Math 1553 Introduction to Linear Algebra. School of Mathematics Georgia Institute of Technology

Physics 162a Quantum Mechanics

COMPSCI 514: Algorithms for Data Science

1 Matrix notation and preliminaries from spectral graph theory

Physics 141 Course Information

Physics 141 Course Information

MATH 341, Section 001 FALL 2014 Introduction to the Language and Practice of Mathematics

Ph 1a Fall General Information

Math 308 Midterm Answers and Comments July 18, Part A. Short answer questions

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

18733: Applied Cryptography Anupam Datta (CMU) Course Overview

Announcements Monday, October 02

Math 200D - Linear Algebra Fall Term 2017 Course Description

Homework every week. Keep up to date or you risk falling behind. Quizzes and Final exam are based on homework questions.

CS 598 Statistical Reinforcement Learning. Nan Jiang


MATH 345 Differential Equations

Physics Fall Semester. Sections 1 5. Please find a seat. Keep all walkways free for safety reasons and to comply with the fire code.

About this class. Yousef Saad 1. Noah Lebovic 2. Jessica Lee 3. Abhishek Vashist Office hours: refer to the class web-page.

AS 101: The Solar System (Spring 2017) Course Syllabus

DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM

Quinsigamond Community College School of Math and Science

CSE 241 Class 1. Jeremy Buhler. August 24,

Phys 631 Mathematical Methods of Theoretical Physics Fall 2018

Important Dates. Non-instructional days. No classes. College offices closed.

Physics 2240, Spring 2015

Eigenvalues and eigenvectors

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Course Content (visit for details)

Stochastic Gradient Descent: The Workhorse of Machine Learning. CS6787 Lecture 1 Fall 2017

We will begin by first solving this equation on a rectangle in 2 dimensions with prescribed boundary data at each edge.

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9

GEOG 105 THE DIGITAL EARTH Spring 2017

CS249: ADVANCED DATA MINING

Welcome. to Physics 2135.

MATH 122 SYLLBAUS HARVARD UNIVERSITY MATH DEPARTMENT, FALL 2014

Lecture: Local Spectral Methods (1 of 4)

Machine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang

Math 2802 Applications of Linear Algebra. School of Mathematics Georgia Institute of Technology

Linear Algebra. Introduction. Marek Petrik 3/23/2017. Many slides adapted from Linear Algebra Lectures by Martin Scharlemann

Announcements Wednesday, October 10

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

CHEM 333 Spring 2016 Organic Chemistry I California State University Northridge

CISC-102 Fall 2017 Week 1 David Rappaport Goodwin G-532 Office Hours: Tuesday 1:30-3:30

Unit 1: Functions, Equations, & Graphs

AS 102 The Astronomical Universe (Spring 2010) Lectures: TR 11:00 am 12:30 pm, CAS Room 316 Course web page:

Announcements Wednesday, August 30

Spectral Clustering on Handwritten Digits Database

CSCI 2033 Spring 2016 ELEMENTARY COMPUTATIONAL LINEAR ALGEBRA

School of Mathematics Georgia Institute of Technology. Math 1553 Introduction to Linear Algebra

Ch. 2: Lec. 1. Basics: Outline. Importance. Usages. Key problems. Three ways of looking... Colbert on Equations. References. Ch. 2: Lec. 1.

Lecture 7 Mathematics behind Internet Search

ee378a spring 2013 April 1st intro lecture statistical signal processing/ inference, estimation, and information processing Monday, April 1, 13

Announcements Wednesday, August 30

Chemistry 103: Basic General Chemistry (4.0 Credits) Fall Semester Prerequisites: Placement or concurrent enrollment in DEVM F105 or higher

MATH 567: Mathematical Techniques in Data Science Clustering II

Math 304 Handout: Linear algebra, graphs, and networks.

Extragalactic Astronomy

Chemistry 401: Modern Inorganic Chemistry (3 credits) Fall 2017

Methods of Mathematics

Chemistry 883 Computational Quantum Chemistry

Office Hours: Dr. Kruse: Tue, 14:30-15:30 & Fri, 10:30-11:30 in ABB 142 (Chemistry Help Centre) TA's: tutorial time

Spectral Graph Theory

CS6220: DATA MINING TECHNIQUES

Machine Learning for Data Science (CS4786) Lecture 11

Lecture 1/25 Chapter 2

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lahore University of Management Sciences. MATH 210 Introduction to Differential Equations

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

CS 570: Machine Learning Seminar. Fall 2016

Announcements Monday, November 13

Designing Information Devices and Systems I Spring 2017 Babak Ayazifar, Vladimir Stojanovic Homework 4

Instructor Dr. Tomislav Pintauer Mellon Hall Office Hours: 1-2 pm on Thursdays and Fridays, and by appointment.

Topics in Natural Language Processing

MATH 115 Administrative Overview Chapter 1 Review

Sparse Subspace Clustering

Introduction. Outline. EE3321 Electromagnetic Field Theory. Welcome! About this class Rules and syllabus Let s get started! 9/19/2018. E t.

3) x -7 4) 3 < x. When multiplying or dividing by a NEGATIVE number, we SWITCH the inequality sign!

Bishop Kelley High School Summer Math Program Course: Honors Pre-Calculus

Syllabus. Physics 0847, How Things Work Section II Fall 2014

PAC-learning, VC Dimension and Margin-based Bounds

Course: Math 111 Pre-Calculus Summer 2016

Transcription:

High-dimensional probability and statistics for the data sciences Larry Goldstein and Mahdi Soltanolkotabi Motivation August 21, 2017 Motivation August 21, 2017

Please ask questions!

Traditionally mathematics driven by physics...

Traditionally mathematics driven by physics... Kepler s law of planetary motion

Traditionally mathematics driven by physics... Kepler s law of planetary motion Newtonian Mechanics

Traditionally mathematics driven by physics... Kepler s law of planetary motion Newtonian Mechanics General relativity

Todays mathematics is driven by something else...

Todays mathematics is driven by something else... Can you guess?

Todays mathematics is driven by something else... Can you guess?

What is different about the data we have today?

What is different about the data we have today? Size? Data deluge?

Ye Olde Data Deluge

What is different about the data we have today? variety and complexity

What is different about the data we have today? variety and complexity Web Data

What is different about the data we have today? variety and complexity Web Data Text Data Social network data Video Data Image data

What is different about the data we have today? variety and complexity

What is different about the data we have today? variety and complexity Scientific Data

What is different about the data we have today? variety and complexity Remote Sensing Data Brain Data Scientific Data Genomic Data Sensor Network Data

What is different about the data we have today? variety and complexity

What is different about the data we have today? variety and complexity variety of platforms

Challenge

Challenge Conclusion Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It

Need new Math... Mathematics Driven by Data

Probability at the heart of data science

Example I: Community detection and stochastic block models

Standard Machine 1 Construct affinity matrix W between samples weighted graph ( ) W i,j = exp x i x j 2 l 2 2σ 2 2 Construct clusters by applying spectral clustering to W.

Spectral clustering Affinity matrix W (N N) Degree matrix D=diag(d i ) d i = j W ij Normalized graph Laplacian (symmetric form) L = I D 1/2 W D 1/2 (N N)

Spectral clustering Affinity matrix W (N N) Degree matrix D=diag(d i ) d i = j W ij Normalized graph Laplacian (symmetric form) L = I D 1/2 W D 1/2 (N N) v 1 v 2 v 3 r 1 v 11 v 12 v 13 r 2 v 21 v 22 v 23.... r N v N1 v N2 v N3 Dim. reduction: N k Spectral clustering (1) Build V R N k with first k lowest eigenvectors of L as columns (2) Interpret ith row of V as new data point r i in R k representing observation i (3) Apply k-means clustering to the points {r i } Fantastic tutorial: U. von Luxburg

Spectral Clustering: Ideal case Graph EigVs rows in R 3

Spectral Clustering: non-ideal case Graph EigVs rows in R 3

Adjacency matrix of random graphs Adjacency matrix of Graph A ij = { 1 with prob. Pij 0 otherwise

Adjacency matrix of random graphs Adjacency matrix of Graph A ij = { 1 with prob. Pij 0 otherwise Probability matrix for stochastic block model k clusters of size n/k 0 q p 1.

Where s Waldo?

Where are the clusters? n = 200, k = 4, p = 0.7, q = 0.3.

Where are the clusters? n = 200, k = 4, p = 0.6, q = 0.4.

Where are the clusters? Permuted version n = 200, k = 4, p = 0.6, q = 0.4.

Will it work? Eigenvalues of the normalized Laplacian with n = 200, k = 4, p = 0.6, q = 0.4. 1.2 1 Eigenvalues 0.8 0.6 0.4 0.2 0 0 50 100 150 200 index of eigenvalue

Eigen vectors Top two eigenvectors of the normalized Laplacian n = 200, k = 4, p = 0.6, q = 0.4. 0.1 second entry 0 0.1 6.5 7 7.5 first entry 10 2

Will it work? Eigenvalues of the normalized Laplacian with n = 200, k = 4, p = 0.7, q = 0.3. 1.2 1 Eigenvalues 0.8 0.6 0.4 0.2 0 0 50 100 150 200 index of eigenvalue

Eigen vectors Top two eigenvectors of the normalized Laplacian n = 200, k = 4, p = 0.7, q = 0.3. 0.1 2nd entry 5 10 2 0 5 10 2 0.1 0.15 7.8 7.6 7.4 7.2 7 6.8 6.6 6.4 first entry 10 2

Example II: Learning models from data and uniform concentration results

Learn model from training data Main abstraction of machine learning: parameter estimation Data (n samples) Features: x 1, x 2,..., x n R p. Response/class: y 1, y 2,..., y n. Input features Model with parameter θ Output Face Bicycle Guitar

Mathematical abstraction: Empirical Risk Minimization

Mathematical abstraction: Empirical Risk Minimization If we had infinite input/output data according to a distribution D θ = arg min θ Θ L(θ) := E (x,y) D [l(f θ (x), y)]

Mathematical abstraction: Empirical Risk Minimization If we had infinite input/output data according to a distribution D θ = arg min θ Θ L(θ) := E (x,y) D [l(f θ (x), y)] Given a data set (x i, y i ) R d R find the best function that fits this data ˆθ = arg min L(θ) := 1 n l(f θ (x i ), y i ) θ Θ n i=1

Mathematical abstraction: Empirical Risk Minimization If we had infinite input/output data according to a distribution D θ = arg min θ Θ L(θ) := E (x,y) D [l(f θ (x), y)] Given a data set (x i, y i ) R d R find the best function that fits this data ˆθ = arg min L(θ) := 1 n l(f θ (x i ), y i ) θ Θ n How good is fˆ? That is a new sample x how well can f ˆ(x) estimate y? i=1

Uniform concentration For a fixed θ we know 1 n n l(f θ (x i ), y i ) E (x,y) D [l(f θ (x), y)]. i=1

Uniform concentration For a fixed θ we know Questions? 1 n n l(f θ (x i ), y i ) E (x,y) D [l(f θ (x), y)]. i=1 Can we guarantee this for all θ Θ? 1 n sup l(f θ (x i ), y i ) E (x,y) D [l(f θ (x), y)] n δ θ Θ i=1

Uniform concentration For a fixed θ we know Questions? 1 n n l(f θ (x i ), y i ) E (x,y) D [l(f θ (x), y)]. i=1 Can we guarantee this for all θ Θ? 1 n sup l(f θ (x i ), y i ) E (x,y) D [l(f θ (x), y)] n δ θ Θ i=1 much more difficult that proving this for one point asymptotically (law of large numbers)

Uniform concentration For a fixed θ we know Questions? 1 n n l(f θ (x i ), y i ) E (x,y) D [l(f θ (x), y)]. i=1 Can we guarantee this for all θ Θ? 1 n sup l(f θ (x i ), y i ) E (x,y) D [l(f θ (x), y)] n δ θ Θ i=1 much more difficult that proving this for one point asymptotically (law of large numbers) How many data samples do we need as a function of Θ, f and δ.

Course Logistics

Goals Learn modern techniques in probability Concentration in high-dimensions geared towards applications for data sciences, statistics and machine learning

Why is this course needed? Distinctions with other courses? Advanced probability courses (while very technical e.g. cover measure theory) do not cover some of the most useful techniques in modern probability Discuss analogous results to low-dimensions in high-dimensions (law of large numbers, concentration, etc.) Over the past 5-10 years there has been tremendous progress simplifying many proofs Focus on the most useful techniques

Background and disclaimer Prerequisites: EE 599 enrollees (EE 441 and EE 503) MATH 605 enrollees (MATH 505a or MATH 507a)

Background and disclaimer Prerequisites: EE 599 enrollees (EE 441 and EE 503) MATH 605 enrollees (MATH 505a or MATH 507a) More mathy than EE 441 and EE 503 (but not abstract e.g. don t care about measure theory, Hilbert spaces, and Banach spaces) We cover a lot of material and many applications

Background and disclaimer Prerequisites: EE 599 enrollees (EE 441 and EE 503) MATH 605 enrollees (MATH 505a or MATH 507a) More mathy than EE 441 and EE 503 (but not abstract e.g. don t care about measure theory, Hilbert spaces, and Banach spaces) We cover a lot of material and many applications Do I need to know these applications? Do I need to know measure theory or Morse theory? Do I need to be a math graduate student? Do I need to be an electrical engineering student?

Background and disclaimer Prerequisites: EE 599 enrollees (EE 441 and EE 503) MATH 605 enrollees (MATH 505a or MATH 507a) More mathy than EE 441 and EE 503 (but not abstract e.g. don t care about measure theory, Hilbert spaces, and Banach spaces) We cover a lot of material and many applications Do I need to know these applications? Do I need to know measure theory or Morse theory? Do I need to be a math graduate student? Do I need to be an electrical engineering student? Answer: Absolutely not.

Logistics Class: Mon, Wed 10:30-11:50 PM, VKC 256. Instructor office hours: Larry: Monday 12-1:30, Wednesday 3:30-5, KAP 406D Mahdi: Monday and Wednesday 5:30-7 PM EEB 422 Course website: blackboard Grading % 10 participation % 90 Homework

Logistics Class: Mon, Wed 10:30-11:50 PM, VKC 256. Instructor office hours: Larry: Monday 12-1:30, Wednesday 3:30-5, KAP 406D Mahdi: Monday and Wednesday 5:30-7 PM EEB 422 Course website: blackboard Grading % 10 participation % 90 Homework Lowest homework will be dropped

Logistics Class: Mon, Wed 10:30-11:50 PM, VKC 256. Instructor office hours: Larry: Monday 12-1:30, Wednesday 3:30-5, KAP 406D Mahdi: Monday and Wednesday 5:30-7 PM EEB 422 Course website: blackboard Grading % 10 participation % 90 Homework Lowest homework will be dropped We don t care where you find the solution just write the proof in your own language (no plagarism) Course Policy: Use of sources (people, books, internet, etc.) without citation results in failing grade.

Textbook Required textbook High-Dimensional Probability: An Introduction with Applications in Data Science. Roman Vershynin. Additional textbooks Concentration Inequalities: A Non-asymptotic Theory of Independence. Stephane Boucheron, Gabor Lugosi, Pascal Massart The Concentration of measure phenomenon. Michel Ledoux

Why you should not take this class Probability is not your thing. Eclectic topics.

Why you should not take this class Probability is not your thing. Eclectic topics. This class is rated P...