Lecture 18: Bayesian Inference

Similar documents
Compute f(x θ)f(θ) dθ

CSC321 Lecture 18: Learning Probabilistic Models

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Mathematical statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Bayesian Methods: Naïve Bayes

Statistics: Learning models from data

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Computational Perception. Bayesian Inference

Naïve Bayes classification

Lecture 2: Convergence of Random Variables

Review. December 4 th, Review

Statistical learning. Chapter 20, Sections 1 3 1

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Probabilistic Graphical Models

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Mathematical statistics

COMP90051 Statistical Machine Learning

Parametric Techniques Lecture 3

Introduction to Probabilistic Machine Learning

Data Analysis and Uncertainty Part 2: Estimation

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

CPSC 540: Machine Learning

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

COS513 LECTURE 8 STATISTICAL CONCEPTS

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Lecture 8: Information Theory and Statistics

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Probability and Estimation. Alan Moses

Lecture 2: Convex Sets and Functions

Statistical learning. Chapter 20, Sections 1 3 1

Statistical Data Analysis Stat 3: p-values, parameter estimation

(1) Introduction to Bayesian statistics

Learning Bayesian network : Given structure and completely observed data

Lecture 7 Introduction to Statistical Decision Theory

Conjugate Priors: Beta and Normal Spring January 1, /15

Chapters 9. Properties of Point Estimators

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Parametric Techniques

Mathematical statistics

Computational Cognitive Science

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

CS540 Machine learning L9 Bayesian statistics

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Motif representation using position weight matrix

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Probabilistic and Bayesian Machine Learning

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

David Giles Bayesian Econometrics

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

EIE6207: Maximum-Likelihood and Bayesian Estimation

Bayesian RL Seminar. Chris Mansley September 9, 2008

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Comparison of Bayesian and Frequentist Inference

Lecture 2: Conjugate priors

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Time Series and Dynamic Models

Introduction to Machine Learning

Chapter 8.8.1: A factorization theorem

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Lecture 8 October Bayes Estimators and Average Risk Optimality

Bayesian Analysis (Optional)

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Statistics 3858 : Maximum Likelihood Estimators

Introduction. Le Song. Machine Learning I CSE 6740, Fall 2013

Parameter Estimation

Discrete Binary Distributions

CS 630 Basic Probability and Information Theory. Tim Campbell

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Lecture 2: Priors and Conjugacy

Math 494: Mathematical Statistics

Lecture 11: Probability Distributions and Parameter Estimation

Foundations of Statistical Inference

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Bayesian Inference and MCMC

Hierarchical Models & Bayesian Model Selection

Conjugate Priors: Beta and Normal Spring 2018

Introduction to Machine Learning. Lecture 2

Classical and Bayesian inference

Lecture 2: Repetition of probability theory and statistics

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Statistical learning. Chapter 20, Sections 1 4 1

an introduction to bayesian inference

ECE531 Lecture 10b: Maximum Likelihood Estimation

Transcription:

Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring 2014 1 / 10

Bayesian Statistical Inference Statiscal inference Process of estimating information about an unknown variable/model For example, biased coin with a head coming up w.p. p(unknown) Bayesian vs. Classical Inference Bayesian Unknowns Random variables with known distributions Prior distribution p Θ (θ) Posterior p Θ X (θ x) (x: observed data) Classical Deterministic quantities that happen to be unknown θ: constant Estimate θ with some performance guarantee Lecture 18 Probability and Statistics, Spring 2014 2 / 10

Bayesian Statistical Inference (contd.) Inference(model/variable) problem i) Model inference : construct a process and predict the future(weather forecast) ii) Variable inference : estimate unknown parameter(gps reachings and current position) - Example : Noisy channel i) sequence of binary messages S i {0, 1} transmitted over a wireless channel ii) receiver observes X i = as i + W i, i = 1,, n W i N(0, ), a : scalar iii) model inference problem : a unknown(s i s known) iv) variable inference problem : infer S i s(a known) based on X i s Lecture 18 Probability and Statistics, Spring 2014 3 / 10

Bayesian Statistical Inference Types of statistical inference problems - Estimation(of unknown { constant or RV) binary - Hypothesis testing m-ary Bayesian inference methods i) Maximum a posterior probability (MAP) rule ii) Least mean squares (LMS) estimation iii) Linear least mean squares estimation Lecture 18 Probability and Statistics, Spring 2014 4 / 10

Bayesian Inference and Posterior Distribution Pictorial introduction Bayes rule (i) Θ discrete, X discrete : p Θ X (θ x) = (ii) Θ discrete, X continuous : p Θ X (θ x) = (iii) Θ continuous, X discrete : f Θ X (θ x) = (iv) Θ continuous, X continuous : f Θ X (θ x) = pθ(θ)p X Θ(x θ) θ pθ(θ )p X Θ (x θ ) pθ(θ)f X Θ(x θ) θ pθ(θ )f X Θ (x θ ) fθ(θ)p X Θ(x θ) fθ(θ )p X Θ (x θ )dθ fθ(θ)f X Θ(x θ) fθ(θ )f X Θ (x θ )dθ Lecture 18 Probability and Statistics, Spring 2014 5 / 10

Conditional Probability Revisited 4 versions of conditional probability (i) p Θ X (θ x) = p Θ,X (θ,x) p X (x) (ii) (iii) p Θ X (θ x) = P(Θ = θ X = x) = lim δ 0 P(Θ = θ x X x + δ) = lim δ 0 p Θ (θ)p(x X x + δ Θ = θ) P(x X x + δ) = p Θ (θ)f X Θ (x θ) θ p Θ(θ )f X Θ (x θ ) = p Θ(θ)f X Θ (x θ) f X (x) f Θ X (θ x) = lim δ 0 P(θ Θ θ + δ X = x) δ = lim δ 0 P(θ Θ θ + δ)p(x = x θ Θ θ + δ) δp(x = x) = f Θ(θ)p X Θ (x θ) p X (x) (iv) f Θ X (θ x) = f Θ(θ)f X Θ (x θ) f X (x) Lecture 18 Probability and Statistics, Spring 2014 6 / 10

Example Romeo & Juliet I - Juliet will be late on any date by a random amout X U(0, θ) - θ unknown modeled as RV Θ U(0, 1) - Assume that Juliet was late by an amout x on 1st date - How to update the distribution of Θ What we know { 1, if 0 θ 1 1) The prior PDF f Θ(θ) = 0, o.w. 2) Conditional PDF of observation f X Θ (x Θ) = Posterior PDF f Θ X (θ x) = f Θ (θ)f X Θ (x θ) 10 f Θ (θ )f X Θ (x θ )dθ = f Θ X (θ x) = 0 if θ < x or θ > 1 { 1, θ 0, o.w. if 0 x θ 1 θ 1x 1 θ dθ = 1 θ logx, if x θ 1 Lecture 18 Probability and Statistics, Spring 2014 7 / 10

Example ii) Romeo & Juliet II - Observe first n dates - Juliet is late by X 1, X 2,, X n U(0, θ) given Θ = θ - Let X = [X 1, X 2,, X n ], x = [x 1, x 2,, x n ] - Conditional PDF f X Θ (x θ) = f { X1 Θ(x 1 θ) f Xn Θ(x n θ) 1 = θ, if max{x n 1,, x n } θ 1 0, o.w. - Posterior PDF 1 f Θ(θ)f X Θ (x θ) 1 = θ n f Θ X (θ x) = 0 fθ(θ )f X Θ (x θ )dθ 1, if x θ 1 1 x (θ ) n dθ 0, o.w. Lecture 18 Probability and Statistics, Spring 2014 8 / 10

Example iii) Beta priors on the bias of a coin - Biased coin with probability of heads θ - θ unknown modeled as RV Θ with known prior PDF of Θ - Consider n independent tosses : X = # heads observed - Posterior PDF f Θ X (θ k) = f Θ(θ)p X Θ (k θ) 1 0 fθ(θ )p X Θ (k θ )dθ 1 c = cf Θ (θ)p X Θ (k θ) = c( n k )f Θ(θ)θ k (1 θ) n k - Suppose { 1 B(α,β) f Θ (θ) = θα 1 (1 θ) β 1, if 0 < θ < 1 0, o.w. B(α, β) = 1 0 θα 1 (1 θ) β 1 dθ = (α 1)!(β 1)! (α+β 1)! f Θ X (θ k) = d B(α,β) θk+α 1 (1 θ) n k+β 1, 0 θ 1 α > 0, β > 0 Lecture 18 Probability and Statistics, Spring 2014 9 / 10

Example iv) Spam filtering - An email message is spam or legitimate - Θ = 1 if spam, Θ = 2 if legit p Θ (1) p Θ (2) - {w 1,, w n } : collection of special words whose appearance suggest a spam - For each i, X i =Bernoulli RV modeling appearance of w i X i = 1 if w i appears, 0 o.w. - Conditional probability p Xi Θ(x i 1), p Xi Θ(x i 2) known - X 1,, X n independent, given Θ - Posterior probability P(Θ = m X i = x i, i = 1,, n) = pθ(m) n i=1 p X i Θ(x i m) 2 j=1 pθ(j) n i=1 p X i Θ(x i j) Lecture 18 Probability and Statistics, Spring 2014 10 / 10

Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring 2014 1 / 10

Bayesian Statistical Inference Statiscal inference Process of estimating information about an unknown variable/model For example, biased coin with a head coming up w.p. p(unknown) Bayesian vs. Classical Inference Bayesian Unknowns Random variables with known distributions Prior distribution p Θ (θ) Posterior p Θ X (θ x) (x: observed data) Classical Deterministic quantities that happen to be unknown θ: constant Estimate θ with some performance guarantee Lecture 18 Probability and Statistics, Spring 2014 2 / 10

Bayesian Statistical Inference (contd.) Inference(model/variable) problem i) Model inference : construct a process and predict the future(weather forecast) ii) Variable inference : estimate unknown parameter(gps reachings and current position) - Example : Noisy channel i) sequence of binary messages S i {0, 1} transmitted over a wireless channel ii) receiver observes X i = as i + W i, i = 1,, n W i N(0, ), a : scalar iii) model inference problem : a unknown(s i s known) iv) variable inference problem : infer S i s(a known) based on X i s Lecture 18 Probability and Statistics, Spring 2014 3 / 10

Bayesian Statistical Inference Types of statistical inference problems - Estimation(of unknown { constant or RV) binary - Hypothesis testing m-ary Bayesian inference methods i) Maximum a posterior probability (MAP) rule ii) Least mean squares (LMS) estimation iii) Linear least mean squares estimation Lecture 18 Probability and Statistics, Spring 2014 4 / 10

Bayesian Inference and Posterior Distribution Pictorial introduction Bayes rule (i) Θ discrete, X discrete : p Θ X (θ x) = (ii) Θ discrete, X continuous : p Θ X (θ x) = (iii) Θ continuous, X discrete : f Θ X (θ x) = (iv) Θ continuous, X continuous : f Θ X (θ x) = pθ(θ)p X Θ(x θ) θ pθ(θ )p X Θ (x θ ) pθ(θ)f X Θ(x θ) θ pθ(θ )f X Θ (x θ ) fθ(θ)p X Θ(x θ) fθ(θ )p X Θ (x θ )dθ fθ(θ)f X Θ(x θ) fθ(θ )f X Θ (x θ )dθ Lecture 18 Probability and Statistics, Spring 2014 5 / 10

Conditional Probability Revisited 4 versions of conditional probability (i) p Θ X (θ x) = p Θ,X (θ,x) p X (x) (ii) (iii) p Θ X (θ x) = P(Θ = θ X = x) = lim δ 0 P(Θ = θ x X x + δ) = lim δ 0 p Θ (θ)p(x X x + δ Θ = θ) P(x X x + δ) = p Θ (θ)f X Θ (x θ) θ p Θ(θ )f X Θ (x θ ) = p Θ(θ)f X Θ (x θ) f X (x) f Θ X (θ x) = lim δ 0 P(θ Θ θ + δ X = x) δ = lim δ 0 P(θ Θ θ + δ)p(x = x θ Θ θ + δ) δp(x = x) = f Θ(θ)p X Θ (x θ) p X (x) (iv) f Θ X (θ x) = f Θ(θ)f X Θ (x θ) f X (x) Lecture 18 Probability and Statistics, Spring 2014 6 / 10

Example Romeo & Juliet I - Juliet will be late on any date by a random amout X U(0, θ) - θ unknown modeled as RV Θ U(0, 1) - Assume that Juliet was late by an amout x on 1st date - How to update the distribution of Θ What we know { 1, if 0 θ 1 1) The prior PDF f Θ(θ) = 0, o.w. 2) Conditional PDF of observation f X Θ (x Θ) = Posterior PDF f Θ X (θ x) = f Θ (θ)f X Θ (x θ) 10 f Θ (θ )f X Θ (x θ )dθ = f Θ X (θ x) = 0 if θ < x or θ > 1 { 1, θ 0, o.w. if 0 x θ 1 θ 1x 1 θ dθ = 1 θ logx, if x θ 1 Lecture 18 Probability and Statistics, Spring 2014 7 / 10

Example ii) Romeo & Juliet II - Observe first n dates - Juliet is late by X 1, X 2,, X n U(0, θ) given Θ = θ - Let X = [X 1, X 2,, X n ], x = [x 1, x 2,, x n ] - Conditional PDF f X Θ (x θ) = f { X1 Θ(x 1 θ) f Xn Θ(x n θ) 1 = θ, if max{x n 1,, x n } θ 1 0, o.w. - Posterior PDF 1 f Θ(θ)f X Θ (x θ) 1 = θ n f Θ X (θ x) = 0 fθ(θ )f X Θ (x θ )dθ 1, if x θ 1 1 x (θ ) n dθ 0, o.w. Lecture 18 Probability and Statistics, Spring 2014 8 / 10

Example iii) Beta priors on the bias of a coin - Biased coin with probability of heads θ - θ unknown modeled as RV Θ with known prior PDF of Θ - Consider n independent tosses : X = # heads observed - Posterior PDF f Θ X (θ k) = f Θ(θ)p X Θ (k θ) 1 0 fθ(θ )p X Θ (k θ )dθ 1 c = cf Θ (θ)p X Θ (k θ) = c( n k )f Θ(θ)θ k (1 θ) n k - Suppose { 1 B(α,β) f Θ (θ) = θα 1 (1 θ) β 1, if 0 < θ < 1 0, o.w. B(α, β) = 1 0 θα 1 (1 θ) β 1 dθ = (α 1)!(β 1)! (α+β 1)! f Θ X (θ k) = d B(α,β) θk+α 1 (1 θ) n k+β 1, 0 θ 1 α > 0, β > 0 Lecture 18 Probability and Statistics, Spring 2014 9 / 10

Example iv) Spam filtering - An email message is spam or legitimate - Θ = 1 if spam, Θ = 2 if legit p Θ (1) p Θ (2) - {w 1,, w n } : collection of special words whose appearance suggest a spam - For each i, X i =Bernoulli RV modeling appearance of w i X i = 1 if w i appears, 0 o.w. - Conditional probability p Xi Θ(x i 1), p Xi Θ(x i 2) known - X 1,, X n independent, given Θ - Posterior probability P(Θ = m X i = x i, i = 1,, n) = pθ(m) n i=1 p X i Θ(x i m) 2 j=1 pθ(j) n i=1 p X i Θ(x i j) Lecture 18 Probability and Statistics, Spring 2014 10 / 10

Lecture 15: MAP and LMS Estimation Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 15 Probability and Statistics, Fall 2013 1 / 8

Maximum a Posterior Probability (MAP) Rule MAP rule i) Observation x given ii) p Θ (θ), p X Θ (x θ) given iii) Want to estimate Θ MAP Rule ˆθ = arg max p Θ X (θ x) θ ˆθ = arg max f Θ X (θ x) θ (θ : discrete) (θ : continuous) Visualization on the board * For discrete Θ, MAP rule minimizes the prob. of an incorrect decision Notes on computation of ˆθ i) From Bayes rule, the denominator is the same for all values of θ e.g. p Θ X (θ x) = pθ(θ)p X Θ(x θ) θ pθ(θ )p X Θ (x θ ) { function of θ constant w.r.t θ Lecture 15 Probability and Statistics, Fall 2013 2 / 8

MAP Rule (contd.) ii) Only need to maximize the numerator as p Θ (θ)p X Θ (x θ), (Θ, X both discrete) p Θ (θ)f X Θ (x θ), (Θ discrete, X continuous) ˆθ = arg max θ f Θ (θ)p X Θ (x θ), (Θ continuous, X discrete) f Θ (θ)f X Θ (x θ), (Θ, Xboth continuous) Example (Spam Filtering) i) Θ = 1(spam), Θ = 2(legit) p Θ (1) { p Θ (2) 1, if w i appears in the message ii) X i : Bernoulli 0, o.w. iii) Posterior probability P(Θ = θ X 1 = x 1,, X n = x n ) = p n Θ(θ) i=1 p X i Θ(x i θ) 2 θ =1 pθ(θ ), θ = 1, 2 n i=1 p X i Θ(x i θ ) Lecture 15 Probability and Statistics, Fall 2013 3 / 8

Spam Filtering Example (contd.) iv) MAP ˆθ = arg max P(Θ = θ X i = x i, i = 1,..., n) θ n = arg max p Θ (θ) p Xi Θ(x i θ) θ i=1 n n θ = 1 (spam) if p Θ (1) p Xi Θ(x i 1) > p Θ (2) p Xi Θ(x i 2) i=1 i=1 Lecture 15 Probability and Statistics, Fall 2013 4 / 8

Example Romeo & Juliet I i) Juliet is late on the first date by a random amount X U(0, Θ) ii) Θ U(0, 1) unknown iii) Posterior PDF {(x [0, 1]) 1 θ log x f Θ X (θ x) =, if x θ 1 0, o.w. MAP : ˆθ = x Pictorial description on the board Lecture 15 Probability and Statistics, Fall 2013 5 / 8

Probability of (In)Correct Decision Hypothesis Testing i) Unknown parameter takes one of a finite # of values, each corresponding to a competing hypothesis ii) In the language of Bayesian inference: Θ {θ 1,, θ m} (m=2 : binary hypothesis testing) θ i : hypothesis H i Computing the probability of correct decision i) Given observation X = x ii) MAP rule : g MAP (x) hypothesis selected by MAP given X = x iii) Probability of correct decision P(Θ = g MAP (x) X = x) P(Θ = g MAP (x)) = i P(Θ = θi, X = Si) (S i = {x g MAP (x) = θ i}) iv) Probability of error i P(Θ θi, X = Si) Lecture 15 Probability and Statistics, Fall 2013 6 / 8

Example Two biased coins i) Coin 1: prob. of heads=p 1, coin 2: prob. of heads=p 2 ii) Choose a coin at random with equal probability iii) Want to infer its identity based on the outcome of single toss iv) Θ = 1 : coin 1, Θ = 2 : coin 2 X = 1 : head, X = 0 : tail v) MAP rule p Θ(1)p X Θ (x 1) > p Θ(2)p X Θ (x 2) coin 1 P 1 = 0.46 1 if P 2 = 0.52 0.54 > 1 0.48 coin 1 2 2 x = tail What is the probability of incorrect decision? Lecture 15 Probability and Statistics, Fall 2013 7 / 8

Example Two biased coins (contd.) vi) n coin toss, X = # heads p Θ (1)p X Θ (k 1) = 1 2 (n k )P 1 k (1 P 1 ) n k p Θ (2)p X Θ (k 2) = 1 2 (n k )P 2 k (1 P 2 ) n k P k 1 (1 P 1 ) n k > P k 2 (1 P 2 ) n k coin 1 o.w. coin 2 Pictorial description on the board vii) Probability of error P(error) = P(Θ = 1, X > k ) + P(Θ = 2, X k ) = p Θ (1) n k=k +1 c(k)p 1 k (1 P 1 ) n k +p Θ (2) k k=1 c(k)p 2 k (1 P 2 ) n k Pictorial description on the board Lecture 15 Probability and Statistics, Fall 2013 8 / 8

Lecture 20: Classical Statistical Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 20 Probability and Statistics, Spring 2014 1 / 6

Classical Statistical Inference Setup X : random observation p X (x; θ) or f X (x; θ) θ : unknown constant dependence on θ one prob. model for each value of θ Notation E θ [h(x)], P θ (A) dependence on θ Inference methods i) Maximum Likelihood(ML) estimation ii) Linear regression iii) Likelihood ratio test iv) Significance testing Lecture 20 Probability and Statistics, Spring 2014 2 / 6

Maximum Likelihood Estimation Maximum Likelihood Estimation X = (X 1,, X n ) vector of observations p X (x; θ) : joint PMF p X (x; θ) ˆθ : estimate of θ ML : ˆθ = arg max θ p X (x 1,, x n ; θ) X discrete arg max θ f X (x 1,, x n ; θ) X continuous Likelihood function p X (x; θ), f X (x; θ) Log-likelihood function Assume X i s are independent p X (x 1,, x n ; θ) = n i=1 p X i (x i ; θ) log p X (x; θ) = n i=1 log p X i (x i ; θ) Interpretation of p X (x; θ) Incorrect : prob that θ is equal to θ Correct : prob of X = x when the unknown parameter is θ ML : with what value of θ, the observations X = x are most likely to arise? Lecture 20 Probability and Statistics, Spring 2014 3 / 6

Examples ML vs MAP MAP : arg max θ p Θ (θ)p X Θ (x θ) p Θ flat and p X Θ (x θ) = p X (x; θ) ML : arg max θ p X (x; θ) Example 1 i) Juliet is always late by X U(0, θ) ii) θ unknown constant iii) ML : ˆθ? { 1 f X (x; θ) = θ, 0 x θ ˆθ = x (compare w/ MAP) 0, otherwise Example 2 i) Biased coin(prob of heads θ unknown) ii) X 1,, X n : n independent coin tosses (X i = 1 if head, 0 if tail) iii) p X (x; θ) = n i=1 θxi (1 θ) 1 xi = θ i xi (1 θ) n i xi When x i = k(i.e., k heads out of n tosses) arg max θ θ k (1 θ) n k Lecture 20 Probability and Statistics, Spring 2014 4 / 6

Estimation of Mean and Variance Estimation of mean and variance of an RV i) Observations X 1,, X n are i.i.d.,with an unknown common mean θ and known variance v ii) Sample mean M n = M n i.p. i X i n E θ [M n] = θ unbiased θ (weak law of large numbers) consistent iii) Sample Variance S 2 n = 1 n n i=1 (Xi Mn)2 E (θ,v) [ S 2 n ] = n 1 v asymptotically unbiased n 2 Sˆ n = 1 n n 1 i=1 (Xi Mn)2 unbiased E (θ,v) [ S ˆ 2 n ] = v Confidence interval P θ ( ˆΘ n θ ˆΘ + n ) 1 α 1 α confidence interval : [Θ n, Θ + n ] Θ ˆ n : lower estimator Θ ˆ + n : upper estimator *compare w/ point estimator Lecture 20 Probability and Statistics, Spring 2014 5 / 6

Example Example i) Observations X 1,, X n are i.i.d. normal, with unknown mean θ known variance v ii) sample mean estimator ˆΘ n = X 1+ +X n normal w/ mean θ, variance v n n iii) α = 0.05 ˆΘ n θ std normal, P θ ( ˆΘ n θ 1.96) = 0.95 v n v n P θ ( ˆΘ n 1.96 v n θ ˆΘ n + 1.96 v n ) = 0.95 [ ˆΘ n 1.96 v n, ˆΘ n + 1.96 v n ] : 0.95 C.I. Interpretation of confidence interval Incorrect : θ is in the CI w.p. at least 1 α Correct : construct a confidence interval many times about 1 α of them are expected to contain θ Lecture 20 Probability and Statistics, Spring 2014 6 / 6