COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)
|
|
- Clare Hawkins
- 5 years ago
- Views:
Transcription
1 COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24
2 Administrivia
3 This course Topics 1. Supervised learning Core issues of statistical machine learning Algorithmic, statistical, and analytical tools 2. Some topics in unsupervised learning Coursework Common statistical models Frameworks for developing new models and algorithms 1. Four homework assignments (theory & programming): 24% 2. Kaggle-style prediction competition: 26% 3. Two in-class exams (3/3, 4/28): 25% each 4. No late assignments accepted, no make-up exams 3 / 24
4 Prerequisites Mathematical prerequisites Basic algorithms and data structures Linear algebra (e.g., vector spaces, orthogonality, spectral decomposition) Multivariate calculus (e.g., limits, Taylor expansion, gradients) Probability/statistics (e.g., random variables, expectation, LLN, MLE) Computational prerequisites You should have regular access to and be able to program in MATLAB. MATLAB is available for download for SEAS students: 4 / 24
5 Resources Course staff Instructor: Satyen Kale Teaching assistants: Gaurang Sadekar, Patanjali Vakhulabharanam, Robert Ying, Xiyue Yu Office hours, online forum (Piazza): see course website Office hour attendance highly recommended. Materials Lecture slides: posted on course website Textbooks: readings from The Elements of Statistical Learning [ESL], A Course in Machine Learning [CML], Understanding Machine Learning [UML] (available free online; see course website) 5 / 24
6 Machine Learning overview
7 Algorithmic problems Minimum spanning tree Input: Graph G. Output: A minimum spanning tree in G. Input/output relationship well-specified for all inputs. 7 / 24
8 Algorithmic problems Minimum spanning tree Input: Graph G. Output: A minimum spanning tree in G. Input/output relationship well-specified for all inputs. Bird species recognition Input: Image of a bird. Output: Species name of the bird. Input/output relationship is difficult to specify. 7 / 24
9 Algorithmic problems Minimum spanning tree Input: Graph G. Output: A minimum spanning tree in G. Input/output relationship well-specified for all inputs. Bird species recognition Input: Image of a bird. Output: Species name of the bird. Input/output relationship is difficult to specify. Machine learning: use examples of input/output pairs to learn the mapping indigo bunting 7 / 24
10 Machine learning in context Perspective of intelligent systems Goal: robust system with intelligent behavior Often: hard-coded solution too complex, not robust, sub-optimal How do we learn from past experiences to perform well in the future? 8 / 24
11 Machine learning in context Perspective of intelligent systems Goal: robust system with intelligent behavior Often: hard-coded solution too complex, not robust, sub-optimal How do we learn from past experiences to perform well in the future? Perspective of algorithmic statistics Goal: statistical analysis of large, complex data sets Past: 100 data points of two variables. Data collection and statistical analysis done by hand/eye. Now: several million data and variables, collected by high-throughput automatic processes. How can we automate statistical analysis for modern applications? 8 / 24
12 Supervised learning Abstract problem Data: labeled examples (x 1, y 1), (x 2, y 2),..., (x n, y n) X Y from some population. X = input (feature) space; Y = output (label, response) space. Underlying assumption: there s a relatively simple function f : X Y such that f(x) y for most (x, y) in the population. Learning task: using the labeled examples, construct ˆf such that ˆf f. At test time: use ˆf to predict y for new (and previously unseen) x s. new (unlabeled) example past labeled examples learning algorithm learned predictor predicted label 9 / 24
13 Supervised learning: examples Spam filtering X = messages Y = {spam, not spam} 10 / 24
14 Supervised learning: examples Spam filtering X = messages Y = {spam, not spam} Optical character recognition X = pixel images Y = {A, B,..., Z, a, b,..., z, 0, 1,..., 9} 10 / 24
15 Supervised learning: examples Spam filtering X = messages Y = {spam, not spam} Optical character recognition X = pixel images Y = {A, B,..., Z, a, b,..., z, 0, 1,..., 9} Online dating X = user profiles user profiles Y = [0, 1] 10 / 24
16 Supervised learning: examples Spam filtering X = messages Y = {spam, not spam} Optical character recognition X = pixel images Y = {A, B,..., Z, a, b,..., z, 0, 1,..., 9} Online dating X = user profiles user profiles Y = [0, 1] Machine translation X = sequences of English words Y = sequences of French words 10 / 24
17 Unsupervised learning Abstract problem Data: (unlabeled) examples x 1, x 2,..., x n X from some population. Underlying assumption: there s some interesting structure in the population to be discovered. Learning task: using the unlabeled examples, find the interesting structure. Uses: visualization/interpretation, pre-process data for downstream learning, / 24
18 Unsupervised learning Abstract problem Data: (unlabeled) examples x 1, x 2,..., x n X from some population. Underlying assumption: there s some interesting structure in the population to be discovered. Learning task: using the unlabeled examples, find the interesting structure. Uses: visualization/interpretation, pre-process data for downstream learning,... Examples Discover sub-communities of individuals in a social network. Explain variability of market price movement using a few latent factors. Learn a useful representation of data that improves supervised learning. 11 / 24
19 What else is there with machine learning? Advanced issues Structured output spaces Distributed learning Incomplete data Causal inference Privacy... Other models of learning Semi-supervised learning Active learning Online learning Reinforcement learning... Major application areas Natural language processing Speech recognition Computer vision Computational biology Information retrieval... Modes of study Mathematical analysis Cross-domain evaluations End-to-end application study / 24
20 Maximum likelihood estimation
21 Example: Binary prediction On a sequence of 10 days, observe weather: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 1 = rain, 0 = no rain 14 / 24
22 Example: Binary prediction On a sequence of 10 days, observe weather: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 1 = rain, 0 = no rain Goal: understand the distribution of the weather 14 / 24
23 Example: Binary prediction On a sequence of 10 days, observe weather: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 1 = rain, 0 = no rain Goal: understand the distribution of the weather Modeling assumption: rain on each day with a fixed probability p [0, 1] Bernoulli distribtion with parameter p: X Bern(p) means X is a {0, 1}-valued random variable with Pr[X = 1] = p. 14 / 24
24 Example: Binary prediction On a sequence of 10 days, observe weather: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 1 = rain, 0 = no rain Goal: understand the distribution of the weather Modeling assumption: rain on each day with a fixed probability p [0, 1] Bernoulli distribtion with parameter p: X Bern(p) means X is a {0, 1}-valued random variable with Pr[X = 1] = p. How to estimate the probability of rain? Which Bernoulli distribution models data best? (a) Bern(0.1) (b) Bern(0.5) (c) Bern(0.8) (d) Bern(1) 14 / 24
25 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 15 / 24
26 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 If data from Bern(p), likelihood of data = p 5 (1 p) 5 15 / 24
27 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 If data from Bern(p), likelihood of data = p 5 (1 p) 5 Maximum Likelihood Estimate for p is then arg max p 5 (1 p) 5. p [0,1] 15 / 24
28 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 If data from Bern(p), likelihood of data = p 5 (1 p) 5 Maximum Likelihood Estimate for p is then arg max p 5 (1 p) 5. p [0,1] To compute maximum, take derivative, set to 0. Maximum at p = / 24
29 Maximum Likelihood Estimation for Bernoulli Data: 1, 0, 1, 1, 0, 0, 1, 0, 1, 0 If data from Bern(p), likelihood of data = p 5 (1 p) 5 Maximum Likelihood Estimate for p is then arg max p 5 (1 p) 5. p [0,1] To compute maximum, take derivative, set to 0. Maximum at p = 0.5 Generalize: data x 1, x 2,..., x n {0, 1} MLE of the Bernoulli parameter p is p ML := 1 n x i. 15 / 24
30 General setup Data x 1, x 2,..., x n X 16 / 24
31 General setup Data x 1, x 2,..., x n X A model P = {P θ : θ Θ} is a set of probability distributions over X indexed by a parameter space Θ. 16 / 24
32 General setup Data x 1, x 2,..., x n X A model P = {P θ : θ Θ} is a set of probability distributions over X indexed by a parameter space Θ. Example: Bernoulli model. Parameter set Θ = [0, 1] For any θ Θ, Pθ = Bern(θ). 16 / 24
33 General setup Data x 1, x 2,..., x n X A model P = {P θ : θ Θ} is a set of probability distributions over X indexed by a parameter space Θ. Example: Bernoulli model. Parameter set Θ = [0, 1] For any θ Θ, Pθ = Bern(θ). Also, often deal with models where each P θ has a density function p( ; θ): X R / 24
34 General setup Data x 1, x 2,..., x n X A model P = {P θ : θ Θ} is a set of probability distributions over X indexed by a parameter space Θ. Example: Bernoulli model. Parameter set Θ = [0, 1] For any θ Θ, Pθ = Bern(θ). Also, often deal with models where each P θ has a density function p( ; θ): X R +. Example: Gaussian density in one dimension (X = R) 0.4 Θ = R R ++, parameter θ = (µ, σ) ( ) 1 p(x; µ, σ) := exp (x µ)2 2πσ 2 2σ / 24
35 Maximum likelihood estimation Setting Given: data x 1, x 2,..., x n X ; parametric model P = {P θ : θ Θ}. The i.i.d. assumption: assume x 1, x 2,..., x n are independent and identically distributed according to the same probability distribution. Likelihood of θ given data: (under the i.i.d. assumption) n p(x i; θ), the probability mass (or density) of the data, as given by P θ. 17 / 24
36 Maximum likelihood estimation Setting Given: data x 1, x 2,..., x n X ; parametric model P = {P θ : θ Θ}. The i.i.d. assumption: assume x 1, x 2,..., x n are independent and identically distributed according to the same probability distribution. Likelihood of θ given data: (under the i.i.d. assumption) n p(x i; θ), the probability mass (or density) of the data, as given by P θ. Maximum likelihood estimator The maximum likelihood estimator (MLE) for the model P is θ ML := arg max θ Θ n p(x i; θ) i.e., the parameter θ Θ whose likelihood is highest given the data. 17 / 24
37 Logarithm trick Recall: logarithms turn products into sums ( n ) log f i = log(f i) Logarithms and maxima The logarithm is monotonically increasing on R ++. Consequence: Application of log does not change the location of a maximum or minimum: max y arg max y log(g(y)) max g(y) y log(g(y)) = arg max g(y) y The value changes (in general). The location does not change. 18 / 24
38 MLE: maximality criterion Likelihood and logarithm trick θ ML = arg max θ Θ n p(x i; θ) = arg max log θ Θ = arg max θ Θ ( n ) p(x i; θ) log p(x i; θ) Maximality criterion Assuming θ is unconstrained, the log-likelihood maximizer must satisfy 0 = θ log p(x i; θ) For some models, can analytically find unique solution θ Θ. 19 / 24
39 Gaussian distribution Gaussian density in one dimension (X = R) p(x; µ, σ) := µ = mean, σ 2 = variance x µ σ ( ) 1 exp (x µ)2 2πσ 2 2σ 2 = deviation from mean in units of σ / 24
40 Gaussian distribution Gaussian density in one dimension (X = R) p(x; µ, σ) := µ = mean, σ 2 = variance x µ σ ( ) 1 exp (x µ)2 2πσ 2 2σ 2 = deviation from mean in units of σ. Gaussian density in d dimensions (X = R d ) In the density function, the quadratic function (x µ)2 = 1 2σ 2 2 (x µ)(σ2 ) 1 (x µ) is replaced by a multivariate quadratic form: 1 p(x; µ, Σ) := ( (2π)d det(σ) exp 1 ) 2 (x µ) Σ 1 (x µ) 20 / 24
41 Gaussian MLE Model: multivariate Gaussians with fixed covariance The model P is the set of all Gaussian densities on R d with fixed covariance matrix Σ: P = {g(. ; µ, Σ) : µ R d} where g is the Gaussian density function. The parameter space is Θ = R d. MLE equation Solve the following equation (from the maximality criterion) for µ: µ log g(x i; µ, Σ) = / 24
42 Gaussian MLE 0 = = = 1 µ [log ( (2π)d Σ exp 1 ) ] 2 (xi µ) Σ 1 (x i µ) ] 1 µ [log (2π)d Σ 1 2 (xi µ) Σ 1 (x i µ) µ [ 1 ] 2 (xi µ) Σ 1 (x i µ) = Σ 1 (x i µ) 22 / 24
43 Gaussian MLE 0 = = = 1 µ [log ( (2π)d Σ exp 1 ) ] 2 (xi µ) Σ 1 (x i µ) ] 1 µ [log (2π)d Σ 1 2 (xi µ) Σ 1 (x i µ) µ [ 1 ] 2 (xi µ) Σ 1 (x i µ) = Multiplication by ( Σ) on both sides gives 0 = (x i µ) = µ = 1 n Σ 1 (x i µ) x i 22 / 24
44 Gaussian MLE 0 = = = 1 µ [log ( (2π)d Σ exp 1 ) ] 2 (xi µ) Σ 1 (x i µ) ] 1 µ [log (2π)d Σ 1 2 (xi µ) Σ 1 (x i µ) µ [ 1 ] 2 (xi µ) Σ 1 (x i µ) = Multiplication by ( Σ) on both sides gives 0 = (x i µ) = µ = 1 n Σ 1 (x i µ) Conclusion The maximum likelihood estimator of the Gaussian mean parameter is µ ML := 1 n x i. x i 22 / 24
45 Example: Gaussian with unknown covariance Model: multivariate Gaussians The model P is now { } P = g(. ; µ, Σ) : µ R d, Σ S d d ++ where S d ++ is the set of symmetric positive definite d d matrices. The parameter space is Θ = R d S d d / 24
46 Example: Gaussian with unknown covariance Model: multivariate Gaussians The model P is now { } P = g(. ; µ, Σ) : µ R d, Σ S d d ++ where S d ++ is the set of symmetric positive definite d d matrices. The parameter space is Θ = R d S d d ++. ML approach Since we have just seen that the ML estimator of µ does not depend on Σ, we can compute µ ML first. We then estimate Σ using the criterion Σ log g(x i; µ ML, Σ) = / 24
47 Example: Gaussian with unknown covariance Model: multivariate Gaussians The model P is now { } P = g(. ; µ, Σ) : µ R d, Σ S d d ++ where S d ++ is the set of symmetric positive definite d d matrices. The parameter space is Θ = R d S d d ++. ML approach Since we have just seen that the ML estimator of µ does not depend on Σ, we can compute µ ML first. We then estimate Σ using the criterion Σ log g(x i; µ ML, Σ) = 0. Solution The ML estimator of Σ is Σ ML := 1 n (x i µ ML )(x i µ ML ). 23 / 24
48 Recap Maximum Likelihood Estimation: a principled way to find the best distribution in a model: choose distribution that most likely generated the data MLE for Bernoulli, Gaussian can be computed in closed form. 24 / 24
COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017
COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University OVERVIEW This class will cover model-based
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationMachine Learning CSE546
http://www.cs.washington.edu/education/courses/cse546/17au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 28, 2017 1 You may also like ML uses past data to make personalized
More informationMachine Learning CSE546
https://www.cs.washington.edu/education/courses/cse546/18au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 27, 2018 1 Traditional algorithms Social media mentions of Cats vs.
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationMachine Learning (COMP-652 and ECSE-608)
Machine Learning (COMP-652 and ECSE-608) Instructors: Doina Precup and Guillaume Rabusseau Email: dprecup@cs.mcgill.ca and guillaume.rabusseau@mail.mcgill.ca Teaching assistants: Tianyu Li and TBA Class
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationMachine Learning (COMP-652 and ECSE-608)
Machine Learning (COMP-652 and ECSE-608) Lecturers: Guillaume Rabusseau and Riashat Islam Email: guillaume.rabusseau@mail.mcgill.ca - riashat.islam@mail.mcgill.ca Teaching assistant: Ethan Macdonald Email:
More informationStat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,
1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008 Outline 2 Administrivia Some basic definitions.
More informationDetection and Estimation Theory
Detection and Estimation Theory Instructor: Prof. Namrata Vaswani Dept. of Electrical and Computer Engineering Iowa State University http://www.ece.iastate.edu/ namrata Slide 1 What is Estimation and Detection
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationMachine Learning. Instructor: Pranjal Awasthi
Machine Learning Instructor: Pranjal Awasthi Course Info Requested an SPN and emailed me Wait for Carol Difrancesco to give them out. Not registered and need SPN Email me after class No promises It s a
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationIntroduction. Le Song. Machine Learning I CSE 6740, Fall 2013
Introduction Le Song Machine Learning I CSE 6740, Fall 2013 What is Machine Learning (ML) Study of algorithms that improve their performance at some task with experience 2 Common to Industrial scale problems
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationGI01/M055: Supervised Learning
GI01/M055: Supervised Learning 1. Introduction to Supervised Learning October 5, 2009 John Shawe-Taylor 1 Course information 1. When: Mondays, 14:00 17:00 Where: Room 1.20, Engineering Building, Malet
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationHomework 6: Image Completion using Mixture of Bernoullis
Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationCS 188: Artificial Intelligence Spring 2009
CS 188: Artificial Intelligence Spring 2009 Lecture 21: Hidden Markov Models 4/7/2009 John DeNero UC Berkeley Slides adapted from Dan Klein Announcements Written 3 deadline extended! Posted last Friday
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationLecture 11: Unsupervised Machine Learning
CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationIE598 Big Data Optimization Introduction
IE598 Big Data Optimization Introduction Instructor: Niao He Jan 17, 2018 1 A little about me Assistant Professor, ISE & CSL UIUC, 2016 Ph.D. in Operations Research, M.S. in Computational Sci. & Eng. Georgia
More informationTopics in Natural Language Processing
Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationReview of probabilities
CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationLoss Functions, Decision Theory, and Linear Models
Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationMachine Learning Lecture 1
Many slides adapted from B. Schiele Machine Learning Lecture 1 Introduction 18.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer Prof.
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationMaximum likelihood estimation
Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More information