Point Estimation. Vibhav Gogate The University of Texas at Dallas

Similar documents
Point Estimation. Maximum likelihood estimation for a binomial distribution. CSE 446: Machine Learning

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Gaussians Linear Regression Bias-Variance Tradeoff

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Bayesian Methods: Naïve Bayes

Parametric Models: from data to models

Naïve Bayes Lecture 17

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Introduction to Machine Learning CMU-10701

Announcements. Proposals graded

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Learning P-maps Param. Learning

Announcements. Proposals graded

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

CS 361: Probability & Statistics

CS 361: Probability & Statistics

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Bayesian Models in Machine Learning

BN Semantics 3 Now it s personal! Parameter Learning 1

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Machine Learning

Machine Learning CSE546

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

CSC321 Lecture 18: Learning Probabilistic Models

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Bayesian Learning (II)

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Inference and MCMC

Expectation Maximization Algorithm

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Machine Learning CSE546

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Lecture 11: Probability Distributions and Parameter Estimation

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Computational Cognitive Science

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Bayesian RL Seminar. Chris Mansley September 9, 2008

Lecture 2: Conjugate priors

Discrete Binary Distributions

Which coin will I use? Which coin will I use? Logistics. Statistical Learning. Topics. 573 Topics. Coin Flip. Coin Flip

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Page Max. Possible Points Total 100

Probability and Estimation. Alan Moses

Computational Cognitive Science

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

(1) Introduction to Bayesian statistics

Machine Learning

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Computational Perception. Bayesian Inference

Maximum-Likelihood Estimation: Basic Ideas

MLE/MAP + Naïve Bayes

CPSC 540: Machine Learning

Introduction to Machine Learning. Lecture 2

Compute f(x θ)f(θ) dθ

COMP90051 Statistical Machine Learning

Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful.

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring 2016

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Non-parametric Methods

Introduction to Probabilistic Machine Learning

Statistics: Learning models from data

Parameter Learning With Binary Variables

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Probability Theory for Machine Learning. Chris Cremer September 2015

Machine Learning

Lecture 8: Maximum Likelihood Estimation (MLE) (cont d.) Maximum a posteriori (MAP) estimation Naïve Bayes Classifier

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Learning Theory. Aar$ Singh and Barnabas Poczos. Machine Learning / Apr 17, Slides courtesy: Carlos Guestrin

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Exam 2 Practice Questions, 18.05, Spring 2014

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Support vector machines Lecture 4

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

PMR Learning as Inference

Day 5: Generative models, structured classification

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Machine Learning

Naïve Bayes classification

Cheng Soon Ong & Christian Walder. Canberra February June 2018

MLE/MAP + Naïve Bayes

Machine Learning

CSE 473: Artificial Intelligence Autumn Topics

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Transcription:

Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer.

Basics: Expectation and Variance

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution

Binary Variables (2) N coin flips: Binomial Distribution

Your first consulting job Billionaire in Dallas asks: He says: I have thumbtack, if I flip it, what s the probability it will fall with the nail up? You say: Please flip it a few times: You say: The probability is: P(H) = 3/5 He says: Why??? You say: Because

Thumbtack Binomial Distribution P(Heads) =, P(Tails) = 1- Flips are i.i.d.: Independent events Identically distributed according to Binomial distribution Sequence D of H Heads and T Tails

Maximum Likelihood Estimation Data: Observed set D of H Heads and T Tails Hypothesis: Binomial distribution Learning: finding is an optimization problem What s the objective function? MLE: Choose to maximize probability of D

Your first parameter learning algorithm Set derivative to zero, and solve!

At each point, the derivative is the slope of a line that is tangent to the curve. Note: derivative is positive where green, negative where red, and zero where black. Source: Wikipedia.com

Data

But, how many flips do I need? Billionaire says: I flipped 3 heads and 2 tails. You say: = 3/5, I can prove it! He says: What if I flipped 30 heads and 20 tails? You say: Same answer, I can prove it! He says: What s better? You say: Umm The more the merrier??? He says: Is this why I am paying you the big bucks??? You say: I will give you a theoretical bound.

Prob. of Mistake A bound (from Hoeffding s inequality) For N = H + T, and Let * be the true parameter, for any >0: Exponential Decay! N

PAC Learning PAC: Probably Approximate Correct Billionaire says: I want to know the thumbtack, within = 0.1, with probability of mistake, <= 0.05. How many flips? Or, how big do I set N? Interesting! Lets look at some numbers! = 0.1, =0.05

What if I have prior beliefs? Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you do for me now? You say: I can learn it the Bayesian way Rather than estimating a single, we obtain a distribution over possible values of In the beginning Observe flips e.g.: {tails, tails} After observations

Bayesian Learning Use Bayes rule: Data Likelihood Prior Posterior Normalization Or equivalently: Also, for uniform priors: reduces to MLE objective

Bayesian Learning for Thumbtacks Likelihood function is Binomial: What about prior? Represent expert knowledge Simple posterior form Conjugate priors: Closed-form representation of posterior For Binomial, conjugate prior is Beta distribution

Beta Distribution Distribution over. B a, b = Γ(a + b) Γ(a)Γ(b) 1 0 B a, b = u a 1 (1 u) b 1 du, a>0, b>0 Γ a = u a 1 e a du 0

Beta Distribution

Beta prior distribution P( ) Likelihood function: Posterior: = θ α H +β H 1 (1 θ) α T +β T 1

Posterior Distribution Prior: Data: H heads and T tails Posterior distribution:

Bayesian Posterior Inference Posterior distribution: Bayesian inference: No longer single parameter For any specific f, the function of interest Compute the expected value of f Integral is often hard to compute

MAP: Maximum a Posteriori Approximation As more data is observed, Beta is more certain MAP: use most likely parameter to approximate the expectation

MAP for Beta distribution MAP: use most likely parameter: Beta prior equivalent to extra thumbtack flips As N, prior is forgotten But, for small sample size, prior is important!

What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do for me? You say: Let me tell you about Gaussians

Learning a Gaussian Collect a bunch of data Hopefully, i.i.d. samples e.g., exam scores Learn parameters Mean: μ Variance: σ X i =i Exam Score 0 85 1 95 2 100 3 12 99 89

MLE for Gaussian: Prob. of i.i.d. samples D={x 1,,x N }: Log-likelihood of data:

Your second learning algorithm: MLE for mean of a Gaussian What s MLE for mean?

MLE for variance Again, set derivative to zero:

MLE: Learning Gaussian parameters BTW. MLE for the variance of a Gaussian is biased Expected result of estimation is not true parameter! Unbiased variance estimator: