CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Size: px
Start display at page:

Download "CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5"

Transcription

1 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5

2 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio

3 Itroducto Bayesia Decisio Theory i previous lectures tells us how to desig a optimal classifier if we kew: P(c i ) (priors) P( c i ) (class-coditioal desities) Ufortuately, we rarely have this complete iformatio! Suppose we kow the shape of distributio, but ot the parameters Two types of parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio

4 ML ad Bayesia Parameter Estimatio Shape of probability distributio is kow Happes sometimes Labeled traiig data salmo bass salmo salmo Need to estimate parameters of probability distributio from the traiig data Eample respected fish epert says salmo s ( 2 legth has distributio N µ, σ ) ad sea ( 2 bass s legth has distributio N µ σ ) 2, Need to estimate parameters µ, σ, µ σ 2, The desig classifiers accordig to the bayesia decisio theory 2 a lot is kow easier little is kow harder

5 Idepedece Across Classes We have traiig data for each class salmo sea bass salmo salmo sea bass sea bass Whe estimatig parameters for oe class, will oly use the data collected for that class reasoable assumptio that data from class ci gives o iformatio about distributio of class cj estimate parameters for distributio of salmo from estimate parameters for distributio of bass from

6 Idepedece Across Classes For each class c i we have a proposed desity p i ( c i ) with ukow parameters i which we eed to estimate Sice we assumed idepedece of data across the classes, estimatio is a idetical procedure for all classes To simplify otatio, we drop sub-idees ad say that we eed to estimate parameters for desity p() the fact that we eed to do so for each class o the traiig data that came from that class is implied

7 ML vs. Bayesia Parameter Estimatio Maimum Likelihood Parameters are ukow but fied (i.e. ot radom variables) Bayesia Estimatio Parameters are radom variables havig some kow a priori distributio (prior) p() After parameters are estimated with either ML or Bayesia Estimatio we use methods from Bayesia decisio theory for classificatio

8 Maimum Likelihood Parameter Estimatio We have desity p() which is completely specified by parameters [,, k ] If p() is N(µ, σ 2 ) the [µ, σ 2 ] To highlight that p() depeds o parameters we will write p( ) Note overloaded otatio, p( ) is ot a coditioal desity Let D{, 2,, } be the idepedet traiig samples i our data If p() is N(µ, σ 2 ) the, 2,, are iid samples from N(µ, σ 2 )

9 Maimum Likelihood Parameter Estimatio Cosider the followig fuctio, which is called likelihood of with respect to the set of samples D k p(d ) p( ) k k F( ) Note if D is fied p(d ) is ot a desity Maimum likelihood estimate (abbreviated MLE) of is the value of that maimizes the likelihood fuctio p(d ) ˆ arg ma ( ) p(d )

10 Maimum Likelihood Estimatio (MLE) k p(d ) k p( k ) If D is allowed to vary ad is fied, by idepedece p(d ) is the joit desity for D{, 2,, } If is allowed to vary ad D is fied, p(d ) is ot desity, it is likelihood F()! Recall our approimatio of itegral trick k Pr D B,..., ε p( [ [ ] ] ) k k Thus ML chooses that is most likely to have give the observed data D

11 ML Parameter Estimatio vs. ML Classifier Recall ML classifier fied data decide class c i which maimizes p( c i ) Compare with ML parameter estimatio fied data choose that maimizes p(d ) ML classifier ad ML parameter estimatio use the same priciples applied to differet problems

12 Maimum Likelihood Estimatio (MLE) Istead of maimizig p(d ), it is usually easier to miimize l(p(d )) Sice log is mootoic ˆ arg ma arg ma ( ) p(d ) ( ) l p(d ) p(d ) l(p(d )) To simplify otatio, l(p(d ))l() k ˆ arg ma l ( ) arg ma l p( k ) arg ma l p( k ) k k

13 2

14 MLE: Maimizatio Methods Maimizig l() ca be solved usig stadard methods from Calculus Let (, 2,, p ) t ad let be the gradiet operator, 2,..., Set of ecessary coditios for a optimum is: l 0 Also have to check that that satisfies the above coditio is maimum, ot miimum or saddle poit. Also check the boudary of rage of p t

15 MLE Eample: Gaussia with ukow µ Fortuately for us, all the ML estimates of ay desities we would care about have bee computed Let s go through a eample ayway Let p( µ) be N(µ,σ 2 ) that is σ 2 is kow, but µ is ukow ad eeds to be estimated, so µ ˆ µ arg ma l µ µ µ k 2 ( ) k µ arg ma l ep 2 µ k 2πσ 2σ arg ma µ ( µ ) arg ma l p( k ) k l 2πσ ( µ ) k 2σ 2 2

16 MLE Eample: Gaussia with ukow µ arg ma µ ( l( µ )) arg ma µ k l 2πσ ( µ ) k 2σ 2 2 d d µ k σ ( l( µ )) ( µ ) 0 2 k ˆµ k k µ 0 k k Thus the ML estimate of the mea is just the average value of the traiig data, very ituitive! average of the traiig data would be our guess for the mea eve if we did t kow about ML estimates

17 MLE for Gaussia with ukow µ, σ 2 Similarly it ca be show that if p( µ,σ 2 ) is N(µ, σ 2 ), that is both mea ad variace are ukow, the agai very ituitive result 2 ˆµ σˆ ( ) k ˆ µ k k Similarly it ca be show that if p( µ,σ) is N(µ, Σ), that is is a multivariate gaussia with both mea ad covariace matri ukow, the ˆµ k ˆ ( )( ) Σ k ˆ µ k ˆ µ k k k 2 t

18 Today Fiish Maimum Likelihood Estimatio Bayesia Parameter estimatio New Topic No Parametric Desity Estimatio

19 How to Measure Performace of MLE? How good is a ML estimate? or actually ay other estimate of a parameter? ˆ The atural measure of error would be But ˆ is radom, we caot compute it before we carry out eperimets We wat to say somethig meaigful about our estimate as a fuctio of A way to solve this difficulty is to average the error, i.e. compute the mea absolute error E [ ] ˆ ˆ p(,,..., ) ˆ 2 dd2... d

20 How to Measure Performace of MLE?s It is usually much easier to compute a almost equivalet measure of performace, the mea squared error: E [( ) 2 ] ˆ Do a little algebra, ad use Var(X)E(X 2 )-(E(X)) 2 E [( ) ] ˆ 2 Var ( ˆ ) + ( E( ˆ ) ) 2 variace estimator should have low variace bias epectatio should be close to the true

21 How to Measure Performace of MLE?s E [( ) ] ˆ 2 Var ( ˆ ) + ( E( ˆ ) ) 2 variace bias ideal case bad case bad case p( ˆ ) p( ˆ ) p( ˆ ) E ( ˆ ) E( ˆ ) E ( ˆ ) o bias low variace large bias low variace o bias high variace

22 Bias ad Variace for MLE of the Mea Let s compute the bias for ML estimate of the mea [ ˆ ] E E µ k k k E [ ] k k µ µ Thus this estimate is ubiased! How about variace of ML estimate of the mea? E ( ˆ µ µ ) E ˆ µ 2µµ ˆ + µ µ 2 µ E( ˆ µ ) + E [ ] [ ] 2 σ Thus variace is very small for a large umber of samples (the more samples, the smaller is variace) Thus the MLE of the mea is a very good estimator k k 2

23 Bias ad Variace for MLE of the Mea Suppose someoe claims they have a ew great estimator for the mea, just take the first sample! ˆ µ Thus this estimator is ubiased: However its variace is: E [( ) ] 2 ˆ µ E ( µ ) [ ] 2 2 σ µ E ( ) ( ) µ ˆ µ E p( ˆ ) Thus variace ca be very large ad does ot improve as we icrease the umber of samples E ( ˆ ) o bias high variace

24 MLE Bias for Mea ad Variace How about ML estimate for the variace? E 2 k k [ ] ˆ σ E ( ˆ µ ) σ σ Thus this estimate is biased! This is because we used µˆ istead of true µ Bias 0 as ifiity, asymptotically ubiased Ubiased estimate 2 σˆ µ k ( k ˆ ) Variace of MLE of variace ca be show to go to 0 as goes to ifiity 2

25 MLE for Uiform distributio U[0,] X is U[0, ] if its desity is / iside [0,] ad 0 otherwise (uiform distributio o [0,] ) p ( ) F ( ) The likelihood is F ( ) k 0 if ma{,..., } p( ) k k if < ma{,..., } Thus k ˆ arg ma k p ( k ) ma{,..., This is ot very pleasig sice for sure should be larger tha ay observed! }

26 Bayesia Parameter Estimatio Suppose we have some idea of the rage where parameters should be Should t we formalize such prior kowledge i hopes that it will lead to better parameter estimatio? Let be a radom variable with prior distributio P() This is the key differece betwee ML ad Bayesia parameter estimatio This key assumptio allows us to fully eploit the iformatio provided by the data

27 Bayesia Parameter Estimatio As i MLE, suppose p( ) is completely specified if is give But ow is a radom variable with prior p() Ulike MLE case, p( ) is a coditioal desity After we observe the data D, usig Bayes rule we ca compute the posterior p( D) Recall that for the MAP classifier we fid the class c i that maimizes the posterior p(c D) By aalogy, a reasoable estimate of is the oe that maimizes the posterior p( D) But is ot our fial goal, our fial goal is the ukow p() Therefore a better thig to do is to maimize p( D), this is as close as we ca come to the ukow p()!

28 Bayesia Estimatio: Formula for p( D) From the defiitio of joit distributio: ( D ) p(, D ) p d Usig the defiitio of coditioal probability: ( D ) p(, D ) p( D ) p d But p(,d)p( ) sice p( ) is completely specified by Usig Bayes formula, p kow ukow p D p p D d ( ) ( ) ( ) p ( ) ( D ) p( ) D p( D ) ( ) p( D ) p( ) p k d k

29 Bayesia Estimatio vs. MLE So i priciple p( D) ca be computed I practice, it may be hard to do itegratio aalytically, may have to resort to umerical methods p ( D ) p( ) k ( ) p( ) k k d p p ( ) p( ) Cotrast this with the MLE solutio which requires differetiatio of likelihood to get p ˆ k d ( ) Differetiatio is easy ad ca always be doe aalytically

30 Bayesia Estimatio vs. MLE p( D) ca be thought of as the weighted average of the proposed model all possible values of support receives from the data ( D ) p( ) p( D ) p d proposed model with certai Cotrast this with the MLE solutio which always gives us a sigle model: p ( ˆ ) Whe we have may possible solutios, takig their sum averaged by their probabilities seems better tha spittig out oe solutio

31 Bayesia Estimatio: Eample for U[0,] Let X be U[0,]. Recall p( )/ iside [0,], else 0 p( ) p( ) Suppose we assume a U[0,0] prior o good prior to use if we just ow the rage of but do t kow aythig else We eed to compute ( D ) p( ) p( D ) p d p ( ) ( D ) p( ) with p D ad p( D ) p( ) p( D ) p( ) k d k 0 0

32 Bayesia Estimatio: Eample for U[0,] We eed to compute p ( D ) p( ) p( D ) usig d p ( ) ( D ) p( ) p D ad p( D ) ( ) p( D ) p( ) p k d k Whe computig MLE of, we had p ( D ) Thus p ( D ) 0 for ma{ otherwise c for ma{ 0 otherwise,...,,..., } } 0 where c is the ormalizig costat, i.e. 0 c p( ) d ma {..., }, p( D ) 0

33 Bayesia Estimatio: Eample for U[0,] We eed to compute p ( D ) p( ) p( D ) p ( D ) p( ) c for ma{ 0 otherwise We have 2 cases:. case < ma{, 2,, } p 2. case > ma{, 2,, } p d,..., } ( D ) c d α + ma{,... } ( D) p 0 0 ( D ) c d + c c 0 c 0 costat idepedet of

34 Bayesia Estimatio: Eample for U[0,] α ML p ( ˆ ) Bayes p ( D ) Note that eve after >ma {, 2,, }, Bayes desity is ot zero, which makes sese curious fact: Bayes desity is ot uiform, i.e. does ot have the fuctioal form that we have assumed!

35 ML vs. Bayesia Estimatio with Broad Prior Suppose p() is flat ad broad (close to uiform prior) p( D) teds to sharpe if there is a lot of data Thus p(d ) p( D)/p() will have the same sharp peak as p( D) But by defiitio, peak of p(d ) is the ML estimate ^ The itegral is domiated by the peak: p ( D) p ˆ p( D ) p( ) ( D) p p( ) ˆ ( D) ( D) p( ) p( D) d p( ˆ ) p( D) d p( ˆ ) Thus as goes to ifiity, Bayesia estimate will approach the desity correspodig to the MLE! p ˆ p( ˆ )

36 ML vs. Bayesia Estimatio: Geeral Prior Maimum Likelihood Estimatio Easy to compute, use differetial calculus Easy to iterpret (returs oe model) p( ) ^ has the assumed parametric form Bayesia Estimatio Hard compute, eed multidimesioal itegratio Hard to iterpret, returs weighted average of models p( D) does ot ecessarily have the assumed parametric form Ca give better results sice use more iformatio about the problem (prior iformatio)

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

More information

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1 Parameter Estimatio Samples from a probability distributio F () are: [,,..., ] T.Theprobabilitydistributio has a parameter vector [,,..., m ] T. Estimator: Statistic used to estimate ukow. Estimate: Observed

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

Elementary manipulations of probabilities

Elementary manipulations of probabilities Elemetary maipulatios of probabilities Set probability of multi-valued r.v. {=Odd} = +3+5 = /6+/6+/6 = ½ X X,, X i j X i j Multi-variat distributio: Joit probability: X true true X X,, X X i j i j X X

More information

Chapter 7 Maximum Likelihood Estimate (MLE)

Chapter 7 Maximum Likelihood Estimate (MLE) Chapter 7 aimum Likelihood Estimate (LE) otivatio for LE Problems:. VUE ofte does ot eist or ca t be foud . BLUE may ot be applicable ( Hθ w) Solutio: If the PDF

More information

6. Sufficient, Complete, and Ancillary Statistics

6. Sufficient, Complete, and Ancillary Statistics Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

EE 6885 Statistical Pattern Recognition

EE 6885 Statistical Pattern Recognition EE 6885 Statistical Patter Recogitio Fall 5 Prof. Shih-Fu Chag http://www.ee.columbia.edu/~sfchag Lecture 6 (9/8/5 EE6887-Chag 6- Readig EM for Missig Features Textboo, DHS 3.9 Bayesia Parameter Estimatio

More information

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS AAEC/ECON 5126 FINAL EXAM: SOLUTIONS SPRING 2015 / INSTRUCTOR: KLAUS MOELTNER This exam is ope-book, ope-otes, but please work strictly o your ow. Please make sure your ame is o every sheet you re hadig

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

4. Basic probability theory

4. Basic probability theory Cotets Basic cocepts Discrete radom variables Discrete distributios (br distributios) Cotiuous radom variables Cotiuous distributios (time distributios) Other radom variables Lect04.ppt S-38.45 - Itroductio

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)]. Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments: Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Lecture 10: Universal coding and prediction

Lecture 10: Universal coding and prediction 0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved

More information

Solutions: Homework 3

Solutions: Homework 3 Solutios: Homework 3 Suppose that the radom variables Y,...,Y satisfy Y i = x i + " i : i =,..., IID where x,...,x R are fixed values ad ",...," Normal(0, )with R + kow. Fid ˆ = MLE( ). IND Solutio: Observe

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1 MATH88T Maria Camero Cotets Basic cocepts of statistics Estimators, estimates ad samplig distributios 2 Ordiary least squares estimate 3 3 Maximum lielihood estimator 3 4 Bayesia estimatio Refereces 9

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Classification with linear models

Classification with linear models Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic

More information

PRACTICE PROBLEMS FOR THE FINAL

PRACTICE PROBLEMS FOR THE FINAL PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Department of Mathematics

Department of Mathematics Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Lecture 20: Multivariate convergence and the Central Limit Theorem

Lecture 20: Multivariate convergence and the Central Limit Theorem Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece

More information

INTEGRATION BY PARTS (TABLE METHOD)

INTEGRATION BY PARTS (TABLE METHOD) INTEGRATION BY PARTS (TABLE METHOD) Suppose you wat to evaluate cos d usig itegratio by parts. Usig the u dv otatio, we get So, u dv d cos du d v si cos d si si d or si si d We see that it is ecessary

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9 Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Lecture Chapter 6: Convergence of Random Sequences

Lecture Chapter 6: Convergence of Random Sequences ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite

More information

Power and Type II Error

Power and Type II Error Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Lecture 6: Integration and the Mean Value Theorem. slope =

Lecture 6: Integration and the Mean Value Theorem. slope = Math 8 Istructor: Padraic Bartlett Lecture 6: Itegratio ad the Mea Value Theorem Week 6 Caltech 202 The Mea Value Theorem The Mea Value Theorem abbreviated MVT is the followig result: Theorem. Suppose

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19 CS 70 Discrete Mathematics ad Probability Theory Sprig 2016 Rao ad Walrad Note 19 Some Importat Distributios Recall our basic probabilistic experimet of tossig a biased coi times. This is a very simple

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information