COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

Size: px
Start display at page:

Download "COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)"

Transcription

1 COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24

2 Administrivia

3 This course Topics 1. Supervised learning Core issues of statistical machine learning Algorithmic, statistical, and analytical tools 2. Some topics in unsupervised learning Coursework Common statistical models Frameworks for developing new models and algorithms 1. Four homework assignments (theory & programming): 24% 2. Kaggle-style prediction competition: 26% 3. Two in-class exams (3/3, 4/28): 25% each 4. No late assignments accepted, no make-up exams 3 / 24

4 Prerequisites Mathematical prerequisites Basic algorithms and data structures Linear algebra (e.g., vector spaces, orthogonality, spectral decomposition) Multivariate calculus (e.g., limits, Taylor expansion, gradients) Probability/statistics (e.g., random variables, expectation, LLN, MLE) Computational prerequisites You should have regular access to and be able to program in MATLAB. MATLAB is available for download for SEAS students: 4 / 24

5 Resources Course staff Instructor: Satyen Kale Teaching assistants: Gaurang Sadekar, Patanjali Vakhulabharanam, Robert Ying, Xiyue Yu Office hours, online forum (Piazza): see course website Office hour attendance highly recommended. Materials Lecture slides: posted on course website Textbooks: readings from The Elements of Statistical Learning [ESL], A Course in Machine Learning [CML], Understanding Machine Learning [UML] (available free online; see course website) 5 / 24

6 Machine Learning overview

7 Algorithmic problems Minimum spanning tree Input: Graph G. Output: A minimum spanning tree in G. Input/output relationship well-specified for all inputs. 7 / 24

8 Algorithmic problems Minimum spanning tree Input: Graph G. Output: A minimum spanning tree in G. Input/output relationship well-specified for all inputs. Bird species recognition Input: Image of a bird. Output: Species name of the bird. Input/output relationship is difficult to specify. 7 / 24

9 Algorithmic problems Minimum spanning tree Input: Graph G. Output: A minimum spanning tree in G. Input/output relationship well-specified for all inputs. Bird species recognition Input: Image of a bird. Output: Species name of the bird. Input/output relationship is difficult to specify. Machine learning: use examples of input/output pairs to learn the mapping indigo bunting 7 / 24

10 Machine learning in context Perspective of intelligent systems Goal: robust system with intelligent behavior Often: hard-coded solution too complex, not robust, sub-optimal How do we learn from past experiences to perform well in the future? 8 / 24

11 Machine learning in context Perspective of intelligent systems Goal: robust system with intelligent behavior Often: hard-coded solution too complex, not robust, sub-optimal How do we learn from past experiences to perform well in the future? Perspective of algorithmic statistics Goal: statistical analysis of large, complex data sets Past: 100 data points of two variables. Data collection and statistical analysis done by hand/eye. Now: several million data and variables, collected by high-throughput automatic processes. How can we automate statistical analysis for modern applications? 8 / 24

12 Supervised learning Abstract problem Data: labeled examples (x 1, y 1), (x 2, y 2),..., (x n, y n) X Y from some population. X = input (feature) space; Y = output (label, response) space. Underlying assumption: there s a relatively simple function f : X Y such that f(x) y for most (x, y) in the population. Learning task: using the labeled examples, construct ˆf such that ˆf f. At test time: use ˆf to predict y for new (and previously unseen) x s. new (unlabeled) example past labeled examples learning algorithm learned predictor predicted label 9 / 24

13 Supervised learning: examples Spam filtering X = messages Y = {spam, not spam} 10 / 24

14 Supervised learning: examples Spam filtering X = messages Y = {spam, not spam} Optical character recognition X = pixel images Y = {A, B,..., Z, a, b,..., z, 0, 1,..., 9} 10 / 24

15 Supervised learning: examples Spam filtering X = messages Y = {spam, not spam} Optical character recognition X = pixel images Y = {A, B,..., Z, a, b,..., z, 0, 1,..., 9} Online dating X = user profiles user profiles Y = [0, 1] 10 / 24

16 Supervised learning: examples Spam filtering X = messages Y = {spam, not spam} Optical character recognition X = pixel images Y = {A, B,..., Z, a, b,..., z, 0, 1,..., 9} Online dating X = user profiles user profiles Y = [0, 1] Machine translation X = sequences of English words Y = sequences of French words 10 / 24

17 Unsupervised learning Abstract problem Data: (unlabeled) examples x 1, x 2,..., x n X from some population. Underlying assumption: there s some interesting structure in the population to be discovered. Learning task: using the unlabeled examples, find the interesting structure. Uses: visualization/interpretation, pre-process data for downstream learning, / 24

18 Unsupervised learning Abstract problem Data: (unlabeled) examples x 1, x 2,..., x n X from some population. Underlying assumption: there s some interesting structure in the population to be discovered. Learning task: using the unlabeled examples, find the interesting structure. Uses: visualization/interpretation, pre-process data for downstream learning,... Examples Discover sub-communities of individuals in a social network. Explain variability of market price movement using a few latent factors. Learn a useful representation of data that improves supervised learning. 11 / 24

19 What else is there with machine learning? Advanced issues Structured output spaces Distributed learning Incomplete data Causal inference Privacy... Other models of learning Semi-supervised learning Active learning Online learning Reinforcement learning... Major application areas Natural language processing Speech recognition Computer vision Computational biology Information retrieval... Modes of study Mathematical analysis Cross-domain evaluations End-to-end application study / 24

20 Maximum likelihood estimation

21 Example: Binary prediction On a sequence of 10 days, observe weather: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 1 = rain, 0 = no rain 14 / 24

22 Example: Binary prediction On a sequence of 10 days, observe weather: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 1 = rain, 0 = no rain Goal: understand the distribution of the weather 14 / 24

23 Example: Binary prediction On a sequence of 10 days, observe weather: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 1 = rain, 0 = no rain Goal: understand the distribution of the weather Modeling assumption: rain on each day with a fixed probability p [0, 1] Bernoulli distribtion with parameter p: X Bern(p) means X is a {0, 1}-valued random variable with Pr[X = 1] = p. 14 / 24

24 Example: Binary prediction On a sequence of 10 days, observe weather: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 1 = rain, 0 = no rain Goal: understand the distribution of the weather Modeling assumption: rain on each day with a fixed probability p [0, 1] Bernoulli distribtion with parameter p: X Bern(p) means X is a {0, 1}-valued random variable with Pr[X = 1] = p. How to estimate the probability of rain? Which Bernoulli distribution models data best? (a) Bern(0.1) (b) Bern(0.5) (c) Bern(0.8) (d) Bern(1) 14 / 24

25 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 15 / 24

26 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 If data from Bern(p), likelihood of data = p 5 (1 p) 5 15 / 24

27 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 If data from Bern(p), likelihood of data = p 5 (1 p) 5 Maximum Likelihood Estimate for p is then arg max p 5 (1 p) 5. p [0,1] 15 / 24

28 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 If data from Bern(p), likelihood of data = p 5 (1 p) 5 Maximum Likelihood Estimate for p is then arg max p 5 (1 p) 5. p [0,1] To compute maximum, take derivative, set to 0. Maximum at p = / 24

29 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 If data from Bern(p), likelihood of data = p 5 (1 p) 5 Maximum Likelihood Estimate for p is then arg max p 5 (1 p) 5. p [0,1] To compute maximum, take derivative, set to 0. Maximum at p = 0.5 Generalize: data x 1, x 2,..., x n {0, 1} MLE of the Bernoulli parameter p is p ML := 1 n x i. 15 / 24

30 General setup Data x 1, x 2,..., x n X 16 / 24

31 General setup Data x 1, x 2,..., x n X A model P = {P θ : θ Θ} is a set of probability distributions over X indexed by a parameter space Θ. 16 / 24

32 General setup Data x 1, x 2,..., x n X A model P = {P θ : θ Θ} is a set of probability distributions over X indexed by a parameter space Θ. Example: Bernoulli model. Parameter set Θ = [0, 1] For any θ Θ, Pθ = Bern(θ). 16 / 24

33 General setup Data x 1, x 2,..., x n X A model P = {P θ : θ Θ} is a set of probability distributions over X indexed by a parameter space Θ. Example: Bernoulli model. Parameter set Θ = [0, 1] For any θ Θ, Pθ = Bern(θ). Also, often deal with models where each P θ has a density function p( ; θ): X R / 24

34 General setup Data x 1, x 2,..., x n X A model P = {P θ : θ Θ} is a set of probability distributions over X indexed by a parameter space Θ. Example: Bernoulli model. Parameter set Θ = [0, 1] For any θ Θ, Pθ = Bern(θ). Also, often deal with models where each P θ has a density function p( ; θ): X R +. Example: Gaussian density in one dimension (X = R) 0.4 Θ = R R ++, parameter θ = (µ, σ) ( ) 1 p(x; µ, σ) := exp (x µ)2 2πσ 2 2σ / 24

35 Maximum likelihood estimation Setting Given: data x 1, x 2,..., x n X ; parametric model P = {P θ : θ Θ}. The i.i.d. assumption: assume x 1, x 2,..., x n are independent and identically distributed according to the same probability distribution. Likelihood of θ given data: (under the i.i.d. assumption) n p(x i; θ), the probability mass (or density) of the data, as given by P θ. 17 / 24

36 Maximum likelihood estimation Setting Given: data x 1, x 2,..., x n X ; parametric model P = {P θ : θ Θ}. The i.i.d. assumption: assume x 1, x 2,..., x n are independent and identically distributed according to the same probability distribution. Likelihood of θ given data: (under the i.i.d. assumption) n p(x i; θ), the probability mass (or density) of the data, as given by P θ. Maximum likelihood estimator The maximum likelihood estimator (MLE) for the model P is θ ML := arg max θ Θ n p(x i; θ) i.e., the parameter θ Θ whose likelihood is highest given the data. 17 / 24

37 Logarithm trick Recall: logarithms turn products into sums ( n ) log f i = log(f i) Logarithms and maxima The logarithm is monotonically increasing on R ++. Consequence: Application of log does not change the location of a maximum or minimum: max y arg max y log(g(y)) max g(y) y log(g(y)) = arg max g(y) y The value changes (in general). The location does not change. 18 / 24

38 MLE: maximality criterion Likelihood and logarithm trick θ ML = arg max θ Θ n p(x i; θ) = arg max log θ Θ = arg max θ Θ ( n ) p(x i; θ) log p(x i; θ) Maximality criterion Assuming θ is unconstrained, the log-likelihood maximizer must satisfy 0 = θ log p(x i; θ) For some models, can analytically find unique solution θ Θ. 19 / 24

39 Gaussian distribution Gaussian density in one dimension (X = R) p(x; µ, σ) := µ = mean, σ 2 = variance x µ σ ( ) 1 exp (x µ)2 2πσ 2 2σ 2 = deviation from mean in units of σ / 24

40 Gaussian distribution Gaussian density in one dimension (X = R) p(x; µ, σ) := µ = mean, σ 2 = variance x µ σ ( ) 1 exp (x µ)2 2πσ 2 2σ 2 = deviation from mean in units of σ. Gaussian density in d dimensions (X = R d ) In the density function, the quadratic function (x µ)2 = 1 2σ 2 2 (x µ)(σ2 ) 1 (x µ) is replaced by a multivariate quadratic form: 1 p(x; µ, Σ) := ( (2π)d det(σ) exp 1 ) 2 (x µ) Σ 1 (x µ) 20 / 24

41 Gaussian MLE Model: multivariate Gaussians with fixed covariance The model P is the set of all Gaussian densities on R d with fixed covariance matrix Σ: P = {g(. ; µ, Σ) : µ R d} where g is the Gaussian density function. The parameter space is Θ = R d. MLE equation Solve the following equation (from the maximality criterion) for µ: µ log g(x i; µ, Σ) = / 24

42 Gaussian MLE 0 = = = 1 µ [log ( (2π)d Σ exp 1 ) ] 2 (xi µ) Σ 1 (x i µ) ] 1 µ [log (2π)d Σ 1 2 (xi µ) Σ 1 (x i µ) µ [ 1 ] 2 (xi µ) Σ 1 (x i µ) = Σ 1 (x i µ) 22 / 24

43 Gaussian MLE 0 = = = 1 µ [log ( (2π)d Σ exp 1 ) ] 2 (xi µ) Σ 1 (x i µ) ] 1 µ [log (2π)d Σ 1 2 (xi µ) Σ 1 (x i µ) µ [ 1 ] 2 (xi µ) Σ 1 (x i µ) = Multiplication by ( Σ) on both sides gives 0 = (x i µ) = µ = 1 n Σ 1 (x i µ) x i 22 / 24

44 Gaussian MLE 0 = = = 1 µ [log ( (2π)d Σ exp 1 ) ] 2 (xi µ) Σ 1 (x i µ) ] 1 µ [log (2π)d Σ 1 2 (xi µ) Σ 1 (x i µ) µ [ 1 ] 2 (xi µ) Σ 1 (x i µ) = Multiplication by ( Σ) on both sides gives 0 = (x i µ) = µ = 1 n Σ 1 (x i µ) Conclusion The maximum likelihood estimator of the Gaussian mean parameter is µ ML := 1 n x i. x i 22 / 24

45 Example: Gaussian with unknown covariance Model: multivariate Gaussians The model P is now { } P = g(. ; µ, Σ) : µ R d, Σ S d d ++ where S d ++ is the set of symmetric positive definite d d matrices. The parameter space is Θ = R d S d d / 24

46 Example: Gaussian with unknown covariance Model: multivariate Gaussians The model P is now { } P = g(. ; µ, Σ) : µ R d, Σ S d d ++ where S d ++ is the set of symmetric positive definite d d matrices. The parameter space is Θ = R d S d d ++. ML approach Since we have just seen that the ML estimator of µ does not depend on Σ, we can compute µ ML first. We then estimate Σ using the criterion Σ log g(x i; µ ML, Σ) = / 24

47 Example: Gaussian with unknown covariance Model: multivariate Gaussians The model P is now { } P = g(. ; µ, Σ) : µ R d, Σ S d d ++ where S d ++ is the set of symmetric positive definite d d matrices. The parameter space is Θ = R d S d d ++. ML approach Since we have just seen that the ML estimator of µ does not depend on Σ, we can compute µ ML first. We then estimate Σ using the criterion Σ log g(x i; µ ML, Σ) = 0. Solution The ML estimator of Σ is Σ ML := 1 n (x i µ ML )(x i µ ML ). 23 / 24

48 Recap Maximum Likelihood Estimation: a principled way to find the best distribution in a model: choose distribution that most likely generated the data MLE for Bernoulli, Gaussian can be computed in closed form. 24 / 24

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University OVERVIEW This class will cover model-based

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Machine Learning CSE546

Machine Learning CSE546 http://www.cs.washington.edu/education/courses/cse546/17au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 28, 2017 1 You may also like ML uses past data to make personalized

More information

Machine Learning CSE546

Machine Learning CSE546 https://www.cs.washington.edu/education/courses/cse546/18au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 27, 2018 1 Traditional algorithms Social media mentions of Cats vs.

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Machine Learning (COMP-652 and ECSE-608)

Machine Learning (COMP-652 and ECSE-608) Machine Learning (COMP-652 and ECSE-608) Instructors: Doina Precup and Guillaume Rabusseau Email: dprecup@cs.mcgill.ca and guillaume.rabusseau@mail.mcgill.ca Teaching assistants: Tianyu Li and TBA Class

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Machine Learning (COMP-652 and ECSE-608)

Machine Learning (COMP-652 and ECSE-608) Machine Learning (COMP-652 and ECSE-608) Lecturers: Guillaume Rabusseau and Riashat Islam Email: guillaume.rabusseau@mail.mcgill.ca - riashat.islam@mail.mcgill.ca Teaching assistant: Ethan Macdonald Email:

More information

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January, 1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008 Outline 2 Administrivia Some basic definitions.

More information

Detection and Estimation Theory

Detection and Estimation Theory Detection and Estimation Theory Instructor: Prof. Namrata Vaswani Dept. of Electrical and Computer Engineering Iowa State University http://www.ece.iastate.edu/ namrata Slide 1 What is Estimation and Detection

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Machine Learning. Instructor: Pranjal Awasthi

Machine Learning. Instructor: Pranjal Awasthi Machine Learning Instructor: Pranjal Awasthi Course Info Requested an SPN and emailed me Wait for Carol Difrancesco to give them out. Not registered and need SPN Email me after class No promises It s a

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Introduction. Le Song. Machine Learning I CSE 6740, Fall 2013

Introduction. Le Song. Machine Learning I CSE 6740, Fall 2013 Introduction Le Song Machine Learning I CSE 6740, Fall 2013 What is Machine Learning (ML) Study of algorithms that improve their performance at some task with experience 2 Common to Industrial scale problems

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

GI01/M055: Supervised Learning

GI01/M055: Supervised Learning GI01/M055: Supervised Learning 1. Introduction to Supervised Learning October 5, 2009 John Shawe-Taylor 1 Course information 1. When: Mondays, 14:00 17:00 Where: Room 1.20, Engineering Building, Malet

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Homework 6: Image Completion using Mixture of Bernoullis

Homework 6: Image Completion using Mixture of Bernoullis Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

CS 188: Artificial Intelligence Spring 2009

CS 188: Artificial Intelligence Spring 2009 CS 188: Artificial Intelligence Spring 2009 Lecture 21: Hidden Markov Models 4/7/2009 John DeNero UC Berkeley Slides adapted from Dan Klein Announcements Written 3 deadline extended! Posted last Friday

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Lecture 11: Unsupervised Machine Learning

Lecture 11: Unsupervised Machine Learning CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

IE598 Big Data Optimization Introduction

IE598 Big Data Optimization Introduction IE598 Big Data Optimization Introduction Instructor: Niao He Jan 17, 2018 1 A little about me Assistant Professor, ISE & CSL UIUC, 2016 Ph.D. in Operations Research, M.S. in Computational Sci. & Eng. Georgia

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Review of probabilities

Review of probabilities CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Loss Functions, Decision Theory, and Linear Models

Loss Functions, Decision Theory, and Linear Models Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Machine Learning Lecture 1

Machine Learning Lecture 1 Many slides adapted from B. Schiele Machine Learning Lecture 1 Introduction 18.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer Prof.

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16 Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information