Statistics for Applications. Chapter 3: Maximum Likelihood Estimation 1/23

Size: px
Start display at page:

Download "Statistics for Applications. Chapter 3: Maximum Likelihood Estimation 1/23"

Transcription

1 Statistics for Applicatios Chapter 3: Maximum Likelihood Estimatio 1/23

2 Total variatio distace (1) ( ) Let E,(IPθ ) θ Θ be a statistical model associated with a sample of i.i.d. r.v. X 1,...,X. Assume that there exists θ Θ such that X 1 IP θ : θ is the true parameter. Statisticia s goal: give X 1,...,X, fid a estimator θ ˆ= θ(x ˆ 1,...,X ) such that IPˆ is close to IP θ θ for the true parameter θ. This meas: IPˆ(A) IP θ θ (A) is small for all A E. Defiitio The total variatio distace betwee two probability measures IP θ ad IP θ is defied by TV(IP θ,ip θ ) = max IP θ (A) IP θ (A). A E 2/23

3 Total variatio distace (2) Assume that E is discrete (i.e., fiite or coutable). This icludes Beroulli, Biomial, Poisso,... Therefore X has a PMF (probability mass fuctio): IP θ (X = x) = p θ (x) for all x E, L p θ (x) 0, p θ (x) = 1. x E The total variatio distace betwee IP θ ad IP θ is a simple fuctio of the PMF s p θ ad p θ : 1 L TV(IP θ,ip θ ) = p θ (x) p θ (x). 2 x E 3/23

4 Total variatio distace (3) Assume that E is cotiuous. This icludes Gaussia, Expoetial,... J Assume that X has a desity IP θ (X A) = A fθ (x)dx for all A E. l f θ (x) 0, f θ (x)dx = 1. E The total variatio distace betwee IP θ ad IP θ is a simple fuctio of the desities f θ ad f θ : l 1 TV(IP θ,ip θ ) = f θ (x) f θ (x) dx. 2 E 4/23

5 Total variatio distace (4) Properties of Total variatio: TV(IP θ,ip θ ) = TV(IP θ,ip θ ) (symmetric) TV(IP θ,ip θ ) 0 If TV(IP θ,ip θ ) = 0 the IP θ = IP θ (defiite) TV(IP θ,ip θ ) TV(IP θ,ip θ )+TV(IP θ,ip θ ) (triagle iequality) These imply that the total variatio is a distace betwee probability distributios. 5/23

6 Total variatio distace (5) A estimatio strategy: Build a estimator TV(IP T θ,ip θ ) for all θ Θ. The fid θ ˆthat miimizes the fuctio θ TV(IP T θ,ip θ ). 6/23

7 Total variatio distace (5) A estimatio strategy: Build a estimator TV(IP T θ,ip θ ) for all θ Θ. The fid θ ˆthat miimizes the fuctio θ TV(IP T θ,ip θ ). problem: Uclear how to build T TV(IPθ,IP θ )! 6/23

8 Kullback-Leibler (KL) divergece (1) There are may distaces betwee probability measures to replace total variatio. Let us choose oe that is more coveiet. Defiitio The Kullback-Leibler (KL) divergece betwee two probability measures IP θ ad IP θ is defied by L p θ (x)log KL(IP θ,ip θ ) = x E l E ( pθ (x) ) p θ (x) if E is discrete ( fθ (x) ) f θ (x)log dx if E is cotiuous f θ (x) 7/23

9 Kullback-Leibler (KL) divergece (2) Properties of KL-divergece: KL(IP θ,ip θ ) = KL(IP θ,ip θ ) i geeral KL(IP θ,ip θ ) 0 If KL(IP θ,ip θ ) = 0 the IP θ = IP θ (defiite) KL(IP θ,ip θ ) i KL(IP θ,ip θ )+KL(IP θ,ip θ ) i geeral Not a distace. This is is called a divergece. Asymmetry is the key to our ability to estimate it! 8/23

10 Kullback-Leibler (KL) divergece (3) KL(IP θ,ip θ ) = IE θ [ ( pθ (X) )] log p θ (X) [ = IE θ logpθ (X) ] [ IE θ logpθ (X) ] So the fuctio θ KL(IP θ [,IP θ ) is of the form: costat IE θ logpθ (X) ] 1 Ca be estimated: IE θ [h(x)] - L h(x i ) (by LLN) 1 KL(IP T θ,ip θ ) = costat L logpθ (X i ) 9/23

11 Kullback-Leibler (KL) divergece (4) KL(IP T 1 L θ,ip θ ) = costat logp θ (X i ) KL(IP T 1 L mi θ,ip θ ) mi logp θ (X i ) θ Θ θ Θ 1 L max logp θ (X i ) max θ Θ L θ Θ logp θ (X i ) max p θ (X i ) θ Θ This is the maximum likelihood priciple. 10/23

12 Iterlude: maximizig/miimizig fuctios (1) Note that mi h(θ) max h(θ) θ Θ θ Θ I this class, we focus o maximizatio. Maximizatio of arbitrary fuctios ca be difficult: Example: θ (θ X i ) 11/23

13 Iterlude: maximizig/miimizig fuctios (2) Defiitio A fuctio twice differetiable fuctio h : Θ IR IR is said to be cocave if its secod derivative satisfies h (θ) 0, θ Θ It is said to be strictly cocave if the iequality is strict: h (θ) < 0 Moreover, h is said to be (strictly) covex if h is (strictly) cocave, i.e. h (θ) 0 (h (θ) > 0). Examples: Θ = IR, h(θ) = θ 2, Θ = (0, ), h(θ) = θ, Θ = (0, ), h(θ) = logθ, Θ = [0,π], h(θ) = si(θ) Θ = IR, h(θ) = 2θ 3 12/23

14 Iterlude: maximizig/miimizig fuctios (3) More geerally for a multivariate fuctio: h : Θ IR d IR, d 2, defie the gradiet vector: h(θ) = Hessia matrix: 2 h(θ) = h θ 1 (θ).. h θ d (θ) IR d 2 h 2 h θ 1 θ 1 (θ) θ 1 θ d (θ)... IR d d 2 h θ d θ d (θ) 2 h θ d θ d (θ) h is cocave x 2 h(θ)x 0 x IR d, θ Θ. h is strictly cocave x 2 h(θ)x < 0 x IR d, θ Θ. Examples: Θ = IR 2, h(θ) = θ2 1 2θ2 2 or h(θ) = (θ 1 θ 2 ) 2 Θ = (0, ), h(θ) = log(θ 1 +θ 2 ), 13/23

15 Iterlude: maximizig/miimizig fuctios (4) Strictly cocave fuctios are easy to maximize: if they have a maximum, the it is uique. It is the uique solutio to or, i the multivariate case h (θ) = 0, h(θ) = 0 IR d. There are may algorithms to fid it umerically: this is the theory of covex optimizatio. I this class, ofte a closed form formula for the maximum. 14/23

16 Likelihood, Discrete case (1) ( ) Let E,(IPθ ) θ Θ be a statistical model associated with a sample of i.i.d. r.v. X 1,...,X. Assume that E is discrete (i.e., fiite or coutable). Defiitio The likelihood of the model is the map L (or just L) defied as: L : E Θ IR (x 1,...,x,θ) IP θ [X 1 = x 1,...,X = x ]. 15/23

17 Likelihood, Discrete case (2) iid Example 1 (Beroulli trials): If X 1,...,X Ber(p) for some p (0,1): E = {0,1}; Θ = (0,1); (x 1,...,x ) {0,1}, p (0,1), L(x 1,...,x,p) = IPp [X i = x i ] = p x i (1 p) 1 x i = p x i (1 p) x i. 16/23

18 Likelihood, Discrete case (3) Example 2 (Poisso model): iid If X 1,...,X Poiss(λ) for some λ > 0: E = IN; Θ = (0, ); (x 1,...,x ) IN, λ > 0, L(x 1,...,x,p) = IPλ [X i = x i ] λ λ x i = e x i! λ x i λ = e. x 1!...x! 17/23

19 Likelihood, Cotiuous case (1) ( ) Let E,(IPθ ) θ Θ be a statistical model associated with a sample of i.i.d. r.v. X 1,...,X. Assume that all the IP θ have desity f θ. Defiitio The likelihood of the model is the map L defied as: L : E Θ IR (x 1,...,x,θ) f θ (x i ). 18/23

20 Likelihood, Cotiuous case (2) iid Example 1 (Gaussia model): If X 1,...,X N(µ,σ 2 ), for some µ IR,σ 2 > 0: E = IR; Θ = IR (0, ) (x 1,...,x ) IR, (µ,σ 2 ) IR (0, ), 1 1 L(x 1,...,x,µ,σ 2 ) = exp (σ 2π) 2σ 2 L (xi µ) 2. 19/23

21 Maximum likelihood estimator (1) Let X 1 (,...,X be ) a i.i.d. sample associated with a statistical model E,(IPθ ) θ Θ ad let L be the correspodig likelihood. Defiitio The likelihood estimator of θ is defied as: provided it exists. θˆmle = argmax L(X 1,...,X,θ), θ Θ Remark (log-likelihood estimator): I practice, we use the fact that θˆmle = argmax logl(x 1,...,X,θ). θ Θ 20/23

22 Maximum likelihood estimator (2) Examples Beroulli trials: pˆmle = X. Poisso model: ˆ = X. Gaussia model: λ MLE ( µˆ,σˆ2) = (X,Ŝ ). 21/23

23 Maximum likelihood estimator (3) Defiitio: Fisher iformatio Defie the log-likelihood for oe observatio as: l(θ) = logl 1 (X,θ), θ Θ IR d Assume that l is a.s. twice differetiable. Uder some regularity coditios, the Fisher iformatio of the statistical model is defied as: I(θ) = IE [ l(θ) l(θ) ] IE [ l(θ) ] IE [ l(θ) ] = IE [ 2 l(θ) ]. If Θ IR, we get: I(θ) = var [ l (θ) ] = IE [ l (θ) ] 22/23

24 Maximum likelihood estimator (4) Theorem Let θ Θ (the true parameter). Assume the followig: 1. The model is idetified. 2. For all θ Θ, the support of IP θ does ot deped o θ; 3. θ is ot o the boudary of Θ; 4. I(θ) is ivertible i a eighborhood of θ ; 5. A few more techical coditios. The, ˆ θ MLE satisfies: θ MLE IP ˆ θ w.r.t. IP θ ; ( ) (d) θˆmle θ N ( 0,I(θ ) 1) w.r.t. IP θ. 23/23

25 MIT OpeCourseWare / Statistics for Applicatios Fall 2016 For iformatio about citig these materials or our Terms of Use, visit:

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation ECE 645: Estimatio Theory Sprig 2015 Istructor: Prof. Staley H. Cha Maximum Likelihood Estimatio (LaTeX prepared by Shaobo Fag) April 14, 2015 This lecture ote is based o ECE 645(Sprig 2015) by Prof. Staley

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

5 : Exponential Family and Generalized Linear Models

5 : Exponential Family and Generalized Linear Models 0-708: Probabilistic Graphical Models 0-708, Sprig 206 5 : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Statistical Theory MT 2008 Problems 1: Solution sketches

Statistical Theory MT 2008 Problems 1: Solution sketches Statistical Theory MT 008 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. a) Let 0 < θ < ad put fx, θ) = θ)θ x ; x = 0,,,... b) c) where α

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Statistical Theory MT 2009 Problems 1: Solution sketches

Statistical Theory MT 2009 Problems 1: Solution sketches Statistical Theory MT 009 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. (a) Let 0 < θ < ad put f(x, θ) = ( θ)θ x ; x = 0,,,... (b) (c) where

More information

Spring Information Theory Midterm (take home) Due: Tue, Mar 29, 2016 (in class) Prof. Y. Polyanskiy. P XY (i, j) = α 2 i 2j

Spring Information Theory Midterm (take home) Due: Tue, Mar 29, 2016 (in class) Prof. Y. Polyanskiy. P XY (i, j) = α 2 i 2j Sprig 206 6.44 - Iformatio Theory Midterm (take home) Due: Tue, Mar 29, 206 (i class) Prof. Y. Polyaskiy Rules. Collaboratio strictly prohibited. 2. Write rigorously, prove all claims. 3. You ca use otes

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments LECTURE NOTES 9 Poit Estimatio Uder the hypothesis that the sample was geerated from some parametric statistical model, a atural way to uderstad the uderlyig populatio is by estimatig the parameters of

More information

Questions and Answers on Maximum Likelihood

Questions and Answers on Maximum Likelihood Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i,

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

Solutions: Homework 3

Solutions: Homework 3 Solutios: Homework 3 Suppose that the radom variables Y,...,Y satisfy Y i = x i + " i : i =,..., IID where x,...,x R are fixed values ad ",...," Normal(0, )with R + kow. Fid ˆ = MLE( ). IND Solutio: Observe

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets

More information

AMS570 Lecture Notes #2

AMS570 Lecture Notes #2 AMS570 Lecture Notes # Review of Probability (cotiued) Probability distributios. () Biomial distributio Biomial Experimet: ) It cosists of trials ) Each trial results i of possible outcomes, S or F 3)

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

4. Basic probability theory

4. Basic probability theory Cotets Basic cocepts Discrete radom variables Discrete distributios (br distributios) Cotiuous radom variables Cotiuous distributios (time distributios) Other radom variables Lect04.ppt S-38.45 - Itroductio

More information

Lecture 23: Minimal sufficiency

Lecture 23: Minimal sufficiency Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic

More information

4. Understand the key properties of maximum likelihood estimation. 6. Understand how to compute the uncertainty in maximum likelihood estimates.

4. Understand the key properties of maximum likelihood estimation. 6. Understand how to compute the uncertainty in maximum likelihood estimates. 9.07 Itroductio to Statistics for Brai ad Cogitive Scieces Emery N. Brow Lecture 9: Likelihood Methods I. Objectives 1. Review the cocept of the joit distributio of radom variables.. Uderstad the cocept

More information

Brief Review of Functions of Several Variables

Brief Review of Functions of Several Variables Brief Review of Fuctios of Several Variables Differetiatio Differetiatio Recall, a fuctio f : R R is differetiable at x R if ( ) ( ) lim f x f x 0 exists df ( x) Whe this limit exists we call it or f(

More information

Lecture 6 Ecient estimators. Rao-Cramer bound.

Lecture 6 Ecient estimators. Rao-Cramer bound. Lecture 6 Eciet estimators. Rao-Cramer boud. 1 MSE ad Suciecy Let X (X 1,..., X) be a radom sample from distributio f θ. Let θ ˆ δ(x) be a estimator of θ. Let T (X) be a suciet statistic for θ. As we have

More information

Probability and statistics: basic terms

Probability and statistics: basic terms Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2 82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

IIT JAM Mathematical Statistics (MS) 2006 SECTION A IIT JAM Mathematical Statistics (MS) 6 SECTION A. If a > for ad lim a / L >, the which of the followig series is ot coverget? (a) (b) (c) (d) (d) = = a = a = a a + / a lim a a / + = lim a / a / + = lim

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Unsupervised Learning 2001

Unsupervised Learning 2001 Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer

More information

Homework for 2/3. 1. Determine the values of the following quantities: a. t 0.1,15 b. t 0.05,15 c. t 0.1,25 d. t 0.05,40 e. t 0.

Homework for 2/3. 1. Determine the values of the following quantities: a. t 0.1,15 b. t 0.05,15 c. t 0.1,25 d. t 0.05,40 e. t 0. Name: ID: Homework for /3. Determie the values of the followig quatities: a. t 0.5 b. t 0.055 c. t 0.5 d. t 0.0540 e. t 0.00540 f. χ 0.0 g. χ 0.0 h. χ 0.00 i. χ 0.0050 j. χ 0.990 a. t 0.5.34 b. t 0.055.753

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

Final Examination Statistics 200C. T. Ferguson June 10, 2010

Final Examination Statistics 200C. T. Ferguson June 10, 2010 Fial Examiatio Statistics 00C T. Ferguso Jue 0, 00. (a State the Borel-Catelli Lemma ad its coverse. (b Let X,X,... be i.i.d. from a distributio with desity, f(x =θx (θ+ o the iterval (,. For what value

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

Last Lecture. Unbiased Test

Last Lecture. Unbiased Test Last Lecture Biostatistics 6 - Statistical Iferece Lecture Uiformly Most Powerful Test Hyu Mi Kag March 8th, 3 What are the typical steps for costructig a likelihood ratio test? Is LRT statistic based

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Probability, Random Variables and Random Processes

Probability, Random Variables and Random Processes Appedix A robability, Radom Variables ad Radom rocesses I this appedix basic cocepts from probability, radom processes ad sigal theory are reviewed.. robability ad Radom Variables robability Space Ω F

More information

Generalized Semi- Markov Processes (GSMP)

Generalized Semi- Markov Processes (GSMP) Geeralized Semi- Markov Processes (GSMP) Summary Some Defiitios Markov ad Semi-Markov Processes The Poisso Process Properties of the Poisso Process Iterarrival times Memoryless property ad the residual

More information

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS Ryszard Zieliński Ist Math Polish Acad Sc POBox 21, 00-956 Warszawa 10, Polad e-mail: rziel@impagovpl ABSTRACT Weak laws of large umbers (W LLN), strog

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

Lecture Stat Maximum Likelihood Estimation

Lecture Stat Maximum Likelihood Estimation Lecture Stat 461-561 Maximum Likelihood Estimatio A.D. Jauary 2008 A.D. () Jauary 2008 1 / 63 Maximum Likelihood Estimatio Ivariace Cosistecy E ciecy Nuisace Parameters A.D. () Jauary 2008 2 / 63 Parametric

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

EE 4TM4: Digital Communications II Probability Theory

EE 4TM4: Digital Communications II Probability Theory 1 EE 4TM4: Digital Commuicatios II Probability Theory I. RANDOM VARIABLES A radom variable is a real-valued fuctio defied o the sample space. Example: Suppose that our experimet cosists of tossig two fair

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Summary. Recap ... Last Lecture. Summary. Theorem

Summary. Recap ... Last Lecture. Summary. Theorem Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthore Sprig 2016 1 MIT 18.655 Outlie 1 2 MIT 18.655 Beroulli s Weak Law of Large Numbers X 1, X 2,... iid Beroulli(θ). S i=1 = X i Biomial(, θ). S P θ. Proof: Apply Chebychev s Iequality,

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

This section is optional.

This section is optional. 4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore

More information

2.1. The Algebraic and Order Properties of R Definition. A binary operation on a set F is a function B : F F! F.

2.1. The Algebraic and Order Properties of R Definition. A binary operation on a set F is a function B : F F! F. CHAPTER 2 The Real Numbers 2.. The Algebraic ad Order Properties of R Defiitio. A biary operatio o a set F is a fuctio B : F F! F. For the biary operatios of + ad, we replace B(a, b) by a + b ad a b, respectively.

More information

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators Lecture 2: Poisso Sta*s*cs Probability Desity Fuc*os Expecta*o ad Variace Es*mators Biomial Distribu*o: P (k successes i attempts) =! k!( k)! p k s( p s ) k prob of each success Poisso Distributio Note

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 6 9/23/203 Browia motio. Itroductio Cotet.. A heuristic costructio of a Browia motio from a radom walk. 2. Defiitio ad basic properties

More information

International Journal of Mathematical Archive-5(7), 2014, Available online through ISSN

International Journal of Mathematical Archive-5(7), 2014, Available online through  ISSN Iteratioal Joural of Mathematical Archive-5(7), 214, 11-117 Available olie through www.ijma.ifo ISSN 2229 546 USING SQUARED-LOG ERROR LOSS FUNCTION TO ESTIMATE THE SHAPE PARAMETER AND THE RELIABILITY FUNCTION

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

Stanford Statistics 311/Electrical Engineering 377

Stanford Statistics 311/Electrical Engineering 377 I. Uiversal predictio ad codig a. Gae: sequecex ofdata, adwattopredict(orcode)aswellasifwekew distributio of data b. Two versios: probabilistic ad adversarial. I either case, let p ad q be desities or

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics ad Egieerig Lecture otes Sergei V. Shabaov Departmet of Mathematics, Uiversity of Florida, Gaiesville, FL 326 USA CHAPTER The theory of covergece. Numerical sequeces..

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Approximations and more PMFs and PDFs

Approximations and more PMFs and PDFs Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.

More information

Lecture 5. Power properties of EL and EL for vectors

Lecture 5. Power properties of EL and EL for vectors Stats 34 Empirical Likelihood Oct.8 Lecture 5. Power properties of EL ad EL for vectors Istructor: Art B. Owe, Staford Uiversity. Scribe: Jigshu Wag Power properties of empirical likelihood Power of the

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

Lectures 12 and 13 - Complexity Penalized Maximum Likelihood Estimation

Lectures 12 and 13 - Complexity Penalized Maximum Likelihood Estimation Lectures ad 3 - Complexity Pealized Maximum Likelihood Estimatio Rui Castro May 5, 03 Itroductio As you leared i previous courses, if we have a statistical model we ca ofte estimate ukow parameters by

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom Variace of Discrete Radom Variables Class 5, 18.05 Jeremy Orloff ad Joatha Bloom 1 Learig Goals 1. Be able to compute the variace ad stadard deviatio of a radom variable.. Uderstad that stadard deviatio

More information

B Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets

B Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets B671-672 Supplemetal otes 2 Hypergeometric, Biomial, Poisso ad Multiomial Radom Variables ad Borel Sets 1 Biomial Approximatio to the Hypergeometric Recall that the Hypergeometric istributio is fx = x

More information

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8) Elemets of Statistical Methods Lots of Data or Large Samples (Ch 8) Fritz Scholz Sprig Quarter 2010 February 26, 2010 x ad X We itroduced the sample mea x as the average of the observed sample values x

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Binomial Distribution

Binomial Distribution 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible

More information

Classification with linear models

Classification with linear models Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic

More information

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Statistics for Applications Fall Problem Set 7

Statistics for Applications Fall Problem Set 7 18.650. Statistics for Applicatios Fall 016. Proble Set 7 Due Friday, Oct. 8 at 1 oo Proble 1 QQ-plots Recall that the Laplace distributio with paraeter λ > 0 is the cotiuous probaλ bility easure with

More information

4.1 Data processing inequality

4.1 Data processing inequality ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall

More information

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

Lecture 16: UMVUE: conditioning on sufficient and complete statistics Lecture 16: UMVUE: coditioig o sufficiet ad complete statistics The 2d method of derivig a UMVUE whe a sufficiet ad complete statistic is available Fid a ubiased estimator of ϑ, say U(X. Coditioig o a

More information

J. Stat. Appl. Pro. Lett. 2, No. 1, (2015) 15

J. Stat. Appl. Pro. Lett. 2, No. 1, (2015) 15 J. Stat. Appl. Pro. Lett. 2, No. 1, 15-22 2015 15 Joural of Statistics Applicatios & Probability Letters A Iteratioal Joural http://dx.doi.org/10.12785/jsapl/020102 Martigale Method for Rui Probabilityi

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Probability and Statistics

Probability and Statistics robability ad Statistics rof. Zheg Zheg Radom Variable A fiite sigle valued fuctio.) that maps the set of all eperimetal outcomes ito the set of real umbers R is a r.v., if the set ) is a evet F ) for

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Math 220B Final Exam Solutions March 18, 2002

Math 220B Final Exam Solutions March 18, 2002 Math 0B Fial Exam Solutios March 18, 00 1. (1 poits) (a) (6 poits) Fid the Gree s fuctio for the tilted half-plae {(x 1, x ) R : x 1 + x > 0}. For x (x 1, x ), y (y 1, y ), express your Gree s fuctio G(x,

More information

Department of Mathematics

Department of Mathematics Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets

More information

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10. Poit Estimatio: properties of estimators fiite-sample properties CB 7.3) large-sample properties CB 10.1) 1 FINITE-SAMPLE PROPERTIES How a estimator performs for fiite umber of observatios. Estimator:

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information