Generative modeling of data. Instructor: Taylor Berg-Kirkpatrick Slides: Sanjoy Dasgupta

Size: px
Start display at page:

Download "Generative modeling of data. Instructor: Taylor Berg-Kirkpatrick Slides: Sanjoy Dasgupta"

Transcription

1 Generative modeling of data Instructor: Taylor BergKirkpatrick Slides: Sanjoy Dasgupta

2 Parametric versus nonparametric classifiers Nearest neighbor classification: size of classifier size of data set Nonparametric: Model complexity (# of parameters ) not fixed.

3 Parametric versus nonparametric classifiers Nearest neighbor classification: size of classifier size of data set Nonparametric: Model complexity (# of parameters ) not fixed. Good: Can fit any boundary, given enough data. Bad: Typically need a lot of data.

4 Parametric versus nonparametric classifiers Nearest neighbor classification: size of classifier size of data set Nonparametric: Model complexity (# of parameters ) not fixed. Good: Can fit any boundary, given enough data. Bad: Typically need a lot of data. Parametric classifiers: fixed number of parameters.

5 Parametric classifiers Fixed number of parameters: e.g. lines.

6 Parametric classifiers Fixed number of parameters: e.g. lines. Good: Can get a reasonable model with limited data Bad: Less flexibility in modeling

7 Parametric classifiers Fixed number of parameters: e.g. lines. Good: Can get a reasonable model with limited data Bad: Less flexibility in modeling Basic principle: pick a good approximation.

8 The generative approach to classification Two ways to classify: Generative: model the individual classes. Discriminative: model the decision boundary between the classes.

9 Recall: Bayes rule Pr(A B) probability event A occurs given that we have observed B to occur Pr(A B) Pr(A and B) Pr(B) Pr(B A) Pr(B) Pr(A)

10 Recall: Bayes rule Pr(A B) probability event A occurs given that we have observed B to occur Pr(A B) Pr(A and B) Pr(B) Pr(B A) Pr(B) Pr(A) Example: Coin 1 has heads probability 1/6 and Coin 2 has heads probability 1/3. We close our eyes, pick a coin at random and flip it. It comes out heads. What is the probability it is Coin 1?

11 Generative models Example: data space X R, classes/labels Y {1, 2, 3} Pr(x) P 3 (x) P 1 (x) P 2 (x) π 1 10% π 2 50% π 3 40% x

12 Generative models Example: data space X R, classes/labels Y {1, 2, 3} Pr(x) P 3 (x) P 1 (x) P 2 (x) π 1 10% π 2 50% π 3 40% x Capture the joint distribution over (x, y) via: the probabilities of the classes, π j Pr(y j) the distribution of the individual classes, P 1 (x), P 2 (x), P 3 (x)

13 Generative models Example: data space X R, classes/labels Y {1, 2, 3} Pr(x) P 3 (x) P 1 (x) P 2 (x) π 1 10% π 2 50% π 3 40% x Capture the joint distribution over (x, y) via: the probabilities of the classes, π j Pr(y j) the distribution of the individual classes, P 1 (x), P 2 (x), P 3 (x) Overall distribution: Pr(x, y) π y P y (x).

14 Optimal predictions Pr(x) P 3 (x) P 1 (x) P 2 (x) π 1 10% π 2 50% π 3 40% x

15 Optimal predictions Pr(x) P 3 (x) P 1 (x) P 2 (x) π 1 10% π 2 50% π 3 40% x For any data point x X and any candidate label j, Pr(y j x) Pr(y j)pr(x y j) Pr(x) π j P j (x) k i1 π ip i (x)

16 Optimal predictions Pr(x) P 3 (x) P 1 (x) P 2 (x) π 1 10% π 2 50% π 3 40% x For any data point x X and any candidate label j, Pr(y j x) Pr(y j)pr(x y j) Pr(x) π j P j (x) k i1 π ip i (x) Optimal prediction: the class j with largest π j P j (x).

17 A classification problem You have a bottle of wine whose label is missing. Which winery is it from, 1, 2, or 3?

18 A classification problem You have a bottle of wine whose label is missing. Which winery is it from, 1, 2, or 3? Solve this problem using visual and chemical properties of the wine.

19 The data set Training set obtained from 130 bottles Winery 1: 43 bottles Winery 2: 51 bottles Winery 3: 36 bottles For each bottle, 13 features: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline

20 The data set Training set obtained from 130 bottles Winery 1: 43 bottles Winery 2: 51 bottles Winery 3: 36 bottles For each bottle, 13 features: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline Also, a separate test set of 48 labeled points.

21 Recall: the generative approach Pr(x) P 3 (x) P 1 (x) P 2 (x) π 1 10% π 2 50% π 3 40% x For any data point x X and any candidate label j, Pr(y j x) Pr(y j)pr(x y j) Pr(x) π j P j (x) k i1 π ip i (x) Optimal prediction: the class j with largest π j P j (x).

22 Fitting a generative model Training set of 130 bottles: Winery 1: 43 bottles, winery 2: 51 bottles, winery 3: 36 bottles For each bottle, 13 features: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline

23 Fitting a generative model Training set of 130 bottles: Winery 1: 43 bottles, winery 2: 51 bottles, winery 3: 36 bottles For each bottle, 13 features: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline Class weights: π 1 43/ , π 2 51/ , π 3 36/

24 Fitting a generative model Training set of 130 bottles: Winery 1: 43 bottles, winery 2: 51 bottles, winery 3: 36 bottles For each bottle, 13 features: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline Class weights: π 1 43/ , π 2 51/ , π 3 36/ Need distributions P 1, P 2, P 3, one per class. Base these on a single feature: Alcohol.

25 The univariate Gaussian The Gaussian N(µ, σ 2 ) has mean µ, variance σ 2, and density function ) 1 (x µ)2 p(x) (2πσ 2 exp ( ) 1/2 2σ 2.

26 The distribution for winery 1 Single feature: Alcohol

27 The distribution for winery 1 Single feature: Alcohol Mean µ 13.72, Standard deviation σ 0.44 (variance 0.20)

28 All three wineries π , P 1 N(13.7, 0.20) π , P 2 N(12.3, 0.28) π , P 3 N(13.2, 0.27)

29 All three wineries π , P 1 N(13.7, 0.20) π , P 2 N(12.3, 0.28) π , P 3 N(13.2, 0.27) To classify x: Pick the j with highest π j P j (x)

30 All three wineries π , P 1 N(13.7, 0.20) π , P 2 N(12.3, 0.28) π , P 3 N(13.2, 0.27) To classify x: Pick the j with highest π j P j (x) Test error: 14/48 29%

31 All three wineries π , P 1 N(13.7, 0.20) π , P 2 N(12.3, 0.28) π , P 3 N(13.2, 0.27) To classify x: Pick the j with highest π j P j (x) Test error: 14/48 29% What if we use two features?

32 The data set, again Training set obtained from 130 bottles Winery 1: 43 bottles Winery 2: 51 bottles Winery 3: 36 bottles For each bottle, 13 features: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline Also, a separate test set of 48 labeled points.

33 The data set, again Training set obtained from 130 bottles Winery 1: 43 bottles Winery 2: 51 bottles Winery 3: 36 bottles For each bottle, 13 features: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline Also, a separate test set of 48 labeled points. This time: Alcohol and Flavanoids.

34 Why it helps to add features Better separation between the classes!

35 Why it helps to add features Better separation between the classes!

36 Why it helps to add features Better separation between the classes!

37 Why it helps to add features Better separation between the classes! Error rate drops from 29% to 8%.

38 The bivariate Gaussian

39 The bivariate Gaussian Model class 1 by a bivariate Gaussian, parametrized by: ( ) ( ) mean µ and covariance matrix Σ

40 Dependence between two random variables Suppose X 1 has mean µ 1 and X 2 has mean µ 2. Can measure dependence between them by their covariance: cov(x 1, X 2 ) E[(X 1 µ 1 )(X 2 µ 2 )] E[X 1 X 2 ] µ 1 µ 2 Maximized when X 1 X 2, in which case it is var(x 1 ). It is at most std(x 1 )std(x 2 ).

41 The bivariate (2d) Gaussian A distribution over (x 1, x 2 ) R 2, parametrized by: Mean (µ 1, µ 2 ) R 2, where µ 1 E(X 1 ) and µ 2 E(X 2 ) [ ] Σ11 Σ Covariance matrix Σ 12 where Σ 21 Σ 22 Σ 11 var(x 1 ) Σ 22 var(x 2 ) Σ 12 Σ 21 cov(x 1, X 2 )

42 The bivariate (2d) Gaussian A distribution over (x 1, x 2 ) R 2, parametrized by: Mean (µ 1, µ 2 ) R 2, where µ 1 E(X 1 ) and µ 2 E(X 2 ) [ ] Σ11 Σ Covariance matrix Σ 12 where Σ 21 Σ 22 Σ 11 var(x 1 ) Σ 22 var(x 2 ) Σ 12 Σ 21 cov(x 1, X 2 ) Density is highest at the mean, falls off in ellipsoidal contours.

43 Density of the bivariate Gaussian Mean (µ 1, µ 2 ) R 2, where µ 1 E(X 1 ) and µ 2 E(X 2 ) [ ] Σ11 Σ Covariance matrix Σ 12 Σ 21 Σ 22 ( 1 p(x 1, x 2 ) exp 1 [ ] T [ ] ) x1 µ 1 Σ 1 x1 µ 1 2π Σ 1/2 2 x 2 µ 2 x 2 µ 2

44 Bivariate Gaussian: examples In either case, the mean is (1, 1). Σ [ ] Σ [ ]

45 The decision boundary Go from 1 to 2 features: error rate goes from 29% to 8%.

46 The decision boundary Go from 1 to 2 features: error rate goes from 29% to 8%.

47 The decision boundary Go from 1 to 2 features: error rate goes from 29% to 8%. What kind of function is this? And, can we use more features?

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

Random variables, expectation, and variance

Random variables, expectation, and variance Random variables, expectation, and variance DSE 210 Random variables Roll a die. Define X = 1 if die is 3 0 otherwise Here the sample space is = {1, 2, 3, 4, 5, 6}. Roll n dice.! =1, 2 ) X =0! =3, 4, 5,

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Day 5: Generative models, structured classification

Day 5: Generative models, structured classification Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression

More information

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45 Two hours Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER PROBABILITY 2 14 January 2015 09:45 11:45 Answer ALL four questions in Section A (40 marks in total) and TWO of the THREE questions

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

1 Probability Review. 1.1 Sample Spaces

1 Probability Review. 1.1 Sample Spaces 1 Probability Review Probability is a critical tool for modern data analysis. It arises in dealing with uncertainty, in randomized algorithms, and in Bayesian analysis. To understand any of these concepts

More information

ECE Homework Set 3

ECE Homework Set 3 ECE 450 1 Homework Set 3 0. Consider the random variables X and Y, whose values are a function of the number showing when a single die is tossed, as show below: Exp. Outcome 1 3 4 5 6 X 3 3 4 4 Y 0 1 3

More information

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else ECE 450 Homework #3 0. Consider the random variables X and Y, whose values are a function of the number showing when a single die is tossed, as show below: Exp. Outcome 1 3 4 5 6 X 3 3 4 4 Y 0 1 3 4 5

More information

Probability and random variables

Probability and random variables Probability and random variables Events A simple event is the outcome of an experiment. For example, the experiment of tossing a coin twice has four possible outcomes: HH, HT, TH, TT. A compound event

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Parsimonious Gaussian Mixture Models

Parsimonious Gaussian Mixture Models Parsimonious Gaussian Mixture Models Brendan Murphy Department of Statistics, Trinity College Dublin, Ireland. East Liguria West Liguria Umbria North Apulia Coast Sardina Inland Sardinia South Apulia Calabria

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Stat 5101 Notes: Algorithms (thru 2nd midterm) Stat 5101 Notes: Algorithms (thru 2nd midterm) Charles J. Geyer October 18, 2012 Contents 1 Calculating an Expectation or a Probability 2 1.1 From a PMF........................... 2 1.2 From a PDF...........................

More information

Combining eigenvalues and variation of eigenvectors for order determination

Combining eigenvalues and variation of eigenvectors for order determination Combining eigenvalues and variation of eigenvectors for order determination Wei Luo and Bing Li City University of New York and Penn State University wei.luo@baruch.cuny.edu bing@stat.psu.edu 1 1 Introduction

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Chapter 4 continued. Chapter 4 sections

Chapter 4 continued. Chapter 4 sections Chapter 4 sections Chapter 4 continued 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP:

More information

Probability and Statistical Decision Theory

Probability and Statistical Decision Theory Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Bayes Decision Theory - I

Bayes Decision Theory - I Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

Nonparametric Methods Lecture 5

Nonparametric Methods Lecture 5 Nonparametric Methods Lecture 5 Jason Corso SUNY at Buffalo 17 Feb. 29 J. Corso (SUNY at Buffalo) Nonparametric Methods Lecture 5 17 Feb. 29 1 / 49 Nonparametric Methods Lecture 5 Overview Previously,

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /

More information

6.041/6.431 Fall 2010 Quiz 2 Solutions

6.041/6.431 Fall 2010 Quiz 2 Solutions 6.04/6.43: Probabilistic Systems Analysis (Fall 200) 6.04/6.43 Fall 200 Quiz 2 Solutions Problem. (80 points) In this problem: (i) X is a (continuous) uniform random variable on [0, 4]. (ii) Y is an exponential

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Probability reminders

Probability reminders CS246 Winter 204 Mining Massive Data Sets Probability reminders Sammy El Ghazzal selghazz@stanfordedu Disclaimer These notes may contain typos, mistakes or confusing points Please contact the author so

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Nearest neighbor classification in metric spaces: universal consistency and rates of convergence

Nearest neighbor classification in metric spaces: universal consistency and rates of convergence Nearest neighbor classification in metric spaces: universal consistency and rates of convergence Sanjoy Dasgupta University of California, San Diego Nearest neighbor The primeval approach to classification.

More information

Review of Basic Probability Theory

Review of Basic Probability Theory Review of Basic Probability Theory James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 35 Review of Basic Probability Theory

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1 Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Expectation and Variance

Expectation and Variance Expectation and Variance August 22, 2017 STAT 151 Class 3 Slide 1 Outline of Topics 1 Motivation 2 Expectation - discrete 3 Transformations 4 Variance - discrete 5 Continuous variables 6 Covariance STAT

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Bandits, Experts, and Games

Bandits, Experts, and Games Bandits, Experts, and Games CMSC 858G Fall 2016 University of Maryland Intro to Probability* Alex Slivkins Microsoft Research NYC * Many of the slides adopted from Ron Jin and Mohammad Hajiaghayi Outline

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

MATH Solutions to Probability Exercises

MATH Solutions to Probability Exercises MATH 5 9 MATH 5 9 Problem. Suppose we flip a fair coin once and observe either T for tails or H for heads. Let X denote the random variable that equals when we observe tails and equals when we observe

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2 Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Distributions of linear combinations

Distributions of linear combinations Distributions of linear combinations CE 311S MORE THAN TWO RANDOM VARIABLES The same concepts used for two random variables can be applied to three or more random variables, but they are harder to visualize

More information

Midterm Exam 1 (Solutions)

Midterm Exam 1 (Solutions) EECS 6 Probability and Random Processes University of California, Berkeley: Spring 07 Kannan Ramchandran February 3, 07 Midterm Exam (Solutions) Last name First name SID Name of student on your left: Name

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1). Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Math 180B Problem Set 3

Math 180B Problem Set 3 Math 180B Problem Set 3 Problem 1. (Exercise 3.1.2) Solution. By the definition of conditional probabilities we have Pr{X 2 = 1, X 3 = 1 X 1 = 0} = Pr{X 3 = 1 X 2 = 1, X 1 = 0} Pr{X 2 = 1 X 1 = 0} = P

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information