Introduction to Machine Learning

Size: px
Start display at page:

Download "Introduction to Machine Learning"

Transcription

1 Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA CSE 474/574 1 / 31

2 Outline Generative Models for Discrete Data Bayesian Concept Learning Likelihood Adding a Prior Posterior Posterior Predictive Distribution Steps for Learning a Generative Model Incorporating Prior Beta Distribution Conjugate Priors Estimating Posterior Using Predictive Distribution Need for Prior Need for Bayesian Averaging Learning Gaussian Models Estimating Parameters Estimating Posterior Chandola@UB CSE 474/574 2 / 31

3 Generative Models Let us go back to our tumor example X represents the data with multiple discrete attributes Is X a discrete or continuous random variable? Y represent the class (benign or malignant) Most probable class P(Y = c X = x, θ) P(X = x Y = c, θ)p(y = c, θ) P(X = x Y = c, θ) = p(x y = c, θ) p(x y = c, θ) - class conditional density How is the data distributed for each class? Chandola@UB CSE 474/574 3 / 31

4 Bayesian Concept Learning Concept assigns binary labels to examples Features are modeled as a random variable X Class is modeled as a random variable Y We want to find out: P(Y = c X = x) Chandola@UB CSE 474/574 4 / 31

5 Concept Learning in Number Line I give you a set of numbers (training set D) belonging to a concept Choose the most likely hypothesis (concept) Assume that numbers are between 1 and 100 Hypothesis Space (H): All powers of 2 All powers of 4 All even numbers All prime numbers Numbers close to a fixed number (say 12). Socrative Game Goto: http: //b.socrative.com Enter class ID - UBML18 Chandola@UB CSE 474/574 5 / 31

6 Ready? Hypothesis Space (H) 1. Even numbers 2. Odd numbers 3. Squares 4. Powers of 2 5. Powers of 4 6. Powers of Multiples of 5 8. Multiples of Numbers within 20 ± All numbers between 1 and 100 D = {} Chandola@UB CSE 474/574 6 / 31

7 Ready? Hypothesis Space (H) 1. Even numbers 2. Odd numbers 3. Squares 4. Powers of 2 5. Powers of 4 6. Powers of Multiples of 5 8. Multiples of Numbers within 20 ± All numbers between 1 and 100 D = {16} Chandola@UB CSE 474/574 6 / 31

8 Ready? Hypothesis Space (H) 1. Even numbers 2. Odd numbers 3. Squares 4. Powers of 2 5. Powers of 4 6. Powers of Multiples of 5 8. Multiples of Numbers within 20 ± All numbers between 1 and 100 D = {60} Chandola@UB CSE 474/574 6 / 31

9 Ready? Hypothesis Space (H) 1. Even numbers 2. Odd numbers 3. Squares 4. Powers of 2 5. Powers of 4 6. Powers of Multiples of 5 8. Multiples of Numbers within 20 ± All numbers between 1 and 100 D = {16, 19, 15, 20, 18} Chandola@UB CSE 474/574 6 / 31

10 Ready? Hypothesis Space (H) 1. Even numbers 2. Odd numbers 3. Squares 4. Powers of 2 5. Powers of 4 6. Powers of Multiples of 5 8. Multiples of Numbers within 20 ± All numbers between 1 and 100 D = {16, 4, 64, 32} Chandola@UB CSE 474/574 6 / 31

11 Computing Likelihood Why choose powers of 2 concept over even numbers concept for D = {16, 4, 64, 32}? Avoid suspicious coincidences Choose concept with higher likelihood What is the likelihood of above D to be generated using the powers of 2 concept? Likelihood for even numbers concept? Chandola@UB CSE 474/574 7 / 31

12 Likelihood Why choose one hypothesis over other? Avoid suspicious coincidences Choose concept with higher likelihood p(d h) = p(x h) x D Log Likelihood log p(d h) = x D log p(x h) Chandola@UB CSE 474/574 8 / 31

13 Bayesian Concept Learning 1. Even numbers 2. Odd numbers 3. Squares 4. Powers of 2 5. Powers of 4 6. Powers of Multiples of 5 8. Multiples of Numbers within 20 ± All numbers between 1 and 100 D = {16, 4, 64, 32} Chandola@UB CSE 474/574 9 / 31

14 Adding a Prior Inside information about the hypotheses Some hypotheses are more likely apriori May not be the right hypothesis (prior can be wrong) Chandola@UB CSE 474/ / 31

15 Posterior Revised estimates for h after observing evidence (D) and the prior Posterior Likelihood Prior p(d h)p(h) p(h D) = h H p(d h )p(h ) h Prior Likelihood Posterior 1 Even Odd Squares Powers of Powers of Powers of Multiples of Multiples of Numbers within 20 ± All Numbers Chandola@UB CSE 474/ / 31

16 Finding the Best Hypothesis Maximum A Priori Estimate ĥ prior = arg max p(h) h Maximum Likelihood Estimate (MLE) ĥ MLE = arg max h = arg max h p(d h) = arg max log p(d h) h log p(x H) x D Maximum a Posteriori (MAP) Estimate ĥ MAP = arg max h p(d h)p(h) = arg max(log p(d h) + log p(h)) h Chandola@UB CSE 474/ / 31

17 MAP and MLE ĥ prior - Most likely hypothesis based on prior ĥ MLE - Most likely hypothesis based on evidence ĥ MAP - Most likely hypothesis based on posterior ĥ prior = arg max log p(h) h ĥ MLE = arg max log p(d h) h ĥ MAP = arg max(log p(d h) + log p(h)) h Chandola@UB CSE 474/ / 31

18 Interesting Properties As data increases, MAP estimate converges towards MLE Why? MAP/MLE are consistent estimators If concept is in H, MAP/ML estimates will converge If c / H, MAP/ML estimates converge to h which is closest possible to the truth Chandola@UB CSE 474/ / 31

19 From Prior to Posterior via Likelihood Prior Posterior Objective: To revise the prior distribution over the hypotheses after observing data (evidence). CSE 474/ / 31

20 Posterior Predictive Distribution New input, x What is the probability that x is also generated by the same concept as D? P(Y = c X = x, D)? Option 0: Treat h prior as the true concept P(Y = c X = x, D) = P(X = x c = h prior ) Option 1: Treat h MLE as the true concept P(Y = c X = x, D) = P(X = x c = h MLE ) Option 2: Treat h MAP as the true concept P(Y = c X = x, D) = P(X = x c = h MAP ) Option 3: Bayesian Averaging P(Y = c X = x, D) = h P(X = x c = h)p(h D) Chandola@UB CSE 474/ / 31

21 Steps for Learning a Generative Model Example: D is a sequence of N binary values (0s and 1s) (coin tosses) What is the best distribution that could describe D? What is the probability of observing a head in future? Step 1: Choose the form of the model Hypothesis Space - All possible distributions Too complicated!! Revised hypothesis space - All Bernoulli distributions (X Ber(θ), 0 θ 1) θ is the hypothesis Still infinite (θ can take infinite possible values) Chandola@UB CSE 474/ / 31

22 Compute Likelihood Likelihood of D p(d θ) = θ N1 (1 θ) N0 Maximum Likelihood Estimate ˆθ MLE = arg max θ = N 1 N 0 + N 1 p(d θ) = arg max θ N1 (1 θ) N0 θ Chandola@UB CSE 474/ / 31

23 Compute Likelihood Likelihood of D p(d θ) = θ N1 (1 θ) N0 Maximum Likelihood Estimate ˆθ MLE = arg max θ = N 1 N 0 + N 1 We can stop here (MLE approach) Probability of getting a head next: p(d θ) = arg max θ N1 (1 θ) N0 θ p(x = 1 D) = ˆθ MLE Chandola@UB CSE 474/ / 31

24 Incorporating Prior Prior encodes our prior belief on θ How to set a Bayesian prior? 1. A point estimate: θ prior = A probability distribution over θ (a random variable) Which one? For a bernoulli distribution 0 θ 1 Beta Distribution p(θ) Chandola@UB CSE 474/ / 31

25 Beta Distribution as Prior Continuous random variables defined between 0 and 1 Beta(θ a, b) p(θ a, b) = 1 B(a, b) θa 1 (1 θ) b 1 a and b are the (hyper-)parameters for the distribution B(a, b) is the beta function If x is integer B(a, b) = Γ(a)Γ(b) Γ(a + b) Γ(x) = Control the shape of the pdf 0 u x 1 e u du Γ(x) = (x 1)! We can stop here as well (prior approach) p(x = 1) = θ prior Chandola@UB CSE 474/ / 31

26 Conjugate Priors Another reason to choose Beta distribution Posterior Likelihood Prior p(d θ) = θ N1 (1 θ) N0 p(θ) θ a 1 (1 θ) b 1 p(θ D) θ N1 (1 θ) N0 θ a 1 (1 θ) b 1 θ N1+a 1 (1 θ) N0+b 1 Posterior has same form as the prior Beta distribution is a conjugate prior for Bernoulli/Binomial distribution Chandola@UB CSE 474/ / 31

27 Estimating Posterior Posterior We start with a belief that p(θ D) θ N1+a 1 (1 θ) N0+b 1 = Beta(θ N 1 + a, N 0 + b) E[θ] = a a + b After observing N trials in which we observe N 1 heads and N 0 trails, we update our belief as: E[θ D] = a + N 1 a + b + N Chandola@UB CSE 474/ / 31

28 Using Posterior We know that posterior over θ is a beta distribution MAP estimate What happens if a = b = 1? ˆθ MAP = arg max p(θ a + N 1, b + N 0 ) θ = a + N 1 1 a + b + N 2 We can stop here as well (MAP approach) Probability of getting a head next: p(x = 1 D) = ˆθ MAP Chandola@UB CSE 474/ / 31

29 True Bayesian Approach All values of θ are possible Prediction on an unknown input (x ) is given by Bayesian Averaging p(x = 1 D) = = = E[θ D] a + N 1 = a + b + N p(x = 1 θ)p(θ D)dθ θbeta(θ a + N 1, b + N 0 ) This is same as using E[θ D] as a point estimate for θ Chandola@UB CSE 474/ / 31

30 The Black Swan Paradox Why use a prior? Consider D = tails, tails, tails N 1 = 0, N = 3 ˆθMLE = 0 p(x = 1 D) = 0!! Never observe a heads The black swan paradox How does the Bayesian approach help? p(x = 1 D) = a a + b + 3 Chandola@UB CSE 474/ / 31

31 Why is MAP Estimate Insufficient? MAP is only one part of the posterior θ at which the posterior probability is maximum But is that enough? What about the posterior variance of θ? var[θ D] = (a + N 1)(b + N 0) (a + b + N) 2 (a + b + N + 1) If variance is high then θmap is not trustworthy Bayesian averaging helps in this case Chandola@UB CSE 474/ / 31

32 Multivariate Gaussian pdf for MVN with d dimensions: [ 1 N (x µ, Σ) exp 1 ] (2π) d/2 Σ 1/2 2 (x µ) Σ 1 (x µ) Chandola@UB CSE 474/ / 31

33 Estimating Parameters of MVN Problem Statement Given a set of N independent and identically distributed (iid) samples, D, learn the parameters (µ, Σ) of a Gaussian distribution that generated D. MLE approach - maximize log-likelihood Result µ MLE = 1 N Σ MLE = 1 N N x i x i=1 N (x i x)(x i x) i=1 Chandola@UB CSE 474/ / 31

34 Estimating Posterior We need posterior for both µ and Σ p(µ) p(σ) What distribution do we need to sample µ? A Gaussian distribution! p(µ) = N (µ m 0, V 0) What distribution do we need to sample Σ? An Inverse-Wishart distribution. p(σ) = IW (Σ S, ν) ( 1 = Σ (ν+d+1)/2 exp 1 ) Z IW 2 tr(s 1 Σ 1 ) where, Z IW = S ν/2 2 νd/2 Γ D (ν/2) Chandola@UB CSE 474/ / 31

35 Calculating Posterior Posterior for µ - Also a MVN p(µ D, Σ) = N (m N, V N ) V 1 N = V NΣ 1 m N = V N (Σ 1 (N x) + V 1 0 m 0) Posterior for Σ - Also an Inverse Wishart p(σ D, µ) = IW (S N, ν N ) ν N = ν 0 + N S 1 N = S 0 + S µ Chandola@UB CSE 474/ / 31

36 References CSE 474/ / 31

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Conjugate Priors, Uninformative Priors

Conjugate Priors, Uninformative Priors Conjugate Priors, Uninformative Priors Nasim Zolaktaf UBC Machine Learning Reading Group January 2016 Outline Exponential Families Conjugacy Conjugate priors Mixture of conjugate prior Uninformative priors

More information

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Point Estimation. Vibhav Gogate The University of Texas at Dallas Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

How can a Machine Learn: Passive, Active, and Teaching

How can a Machine Learn: Passive, Active, and Teaching How can a Machine Learn: Passive, Active, and Teaching Xiaojin Zhu jerryzhu@cs.wisc.edu Department of Computer Sciences University of Wisconsin-Madison NLP&CC 2013 Zhu (Univ. Wisconsin-Madison) How can

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Bayesian statistics DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist statistics

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

CS 340 Fall 2007: Homework 3

CS 340 Fall 2007: Homework 3 CS 34 Fall 27: Homework 3 1 Marginal likelihood for the Beta-Bernoulli model We showed that the marginal likelihood is the ratio of the normalizing constants: p(d) = B(α 1 + N 1, α + N ) B(α 1, α ) = Γ(α

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Discrete Binary Distributions

Discrete Binary Distributions Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Non-parametric Methods

Non-parametric Methods Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

(3) Review of Probability. ST440/540: Applied Bayesian Statistics Review of probability The crux of Bayesian statistics is to compute the posterior distribution, i.e., the uncertainty distribution of the parameters (θ) after observing the data (Y) This is the conditional

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduc)on to Bayesian methods (con)nued) - Lecture 16 Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

CS540 Machine learning L8

CS540 Machine learning L8 CS540 Machine learning L8 Announcements Linear algebra tutorial by Mark Schmidt, 5:30 to 6:30 pm today, in the CS X-wing 8th floor lounge (X836). Move midterm from Tue Oct 14 to Thu Oct 16? Hw3sol handed

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Concept Learning Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 /

More information

Lecture 11: Probability Distributions and Parameter Estimation

Lecture 11: Probability Distributions and Parameter Estimation Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, 2.1 2.3.4, Appendix B Duncan Gillies and

More information

Introduction: exponential family, conjugacy, and sufficiency (9/2/13)

Introduction: exponential family, conjugacy, and sufficiency (9/2/13) STA56: Probabilistic machine learning Introduction: exponential family, conjugacy, and sufficiency 9/2/3 Lecturer: Barbara Engelhardt Scribes: Melissa Dalis, Abhinandan Nath, Abhishek Dubey, Xin Zhou Review

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Parameter Learning With Binary Variables

Parameter Learning With Binary Variables With Binary Variables University of Nebraska Lincoln CSCE 970 Pattern Recognition Outline Outline 1 Learning a Single Parameter 2 More on the Beta Density Function 3 Computing a Probability Interval Outline

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Bayesian Inference. Introduction

Bayesian Inference. Introduction Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,

More information