CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
|
|
- Deborah Gallagher
- 5 years ago
- Views:
Transcription
1 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5
2 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio
3 Itroducto Bayesia Decisio Theory i previous lectures tells us how to desig a optimal classifier if we kew: P(c i ) (priors) P( c i ) (class-coditioal desities) Ufortuately, we rarely have this complete iformatio! Suppose we kow the shape of distributio, but ot the parameters Two types of parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio
4 ML ad Bayesia Parameter Estimatio Shape of probability distributio is kow Happes sometimes Labeled traiig data salmo bass salmo salmo Need to estimate parameters of probability distributio from the traiig data Eample respected fish epert says salmo s ( 2 legth has distributio N µ, σ ) ad sea ( 2 bass s legth has distributio N µ σ ) 2, Need to estimate parameters µ, σ, µ σ 2, The desig classifiers accordig to the bayesia decisio theory 2 a lot is kow easier little is kow harder
5 Idepedece Across Classes We have traiig data for each class salmo sea bass salmo salmo sea bass sea bass Whe estimatig parameters for oe class, will oly use the data collected for that class reasoable assumptio that data from class ci gives o iformatio about distributio of class cj estimate parameters for distributio of salmo from estimate parameters for distributio of bass from
6 Idepedece Across Classes For each class c i we have a proposed desity p i ( c i ) with ukow parameters i which we eed to estimate Sice we assumed idepedece of data across the classes, estimatio is a idetical procedure for all classes To simplify otatio, we drop sub-idees ad say that we eed to estimate parameters for desity p() the fact that we eed to do so for each class o the traiig data that came from that class is implied
7 ML vs. Bayesia Parameter Estimatio Maimum Likelihood Parameters are ukow but fied (i.e. ot radom variables) Bayesia Estimatio Parameters are radom variables havig some kow a priori distributio (prior) p() After parameters are estimated with either ML or Bayesia Estimatio we use methods from Bayesia decisio theory for classificatio
8 Maimum Likelihood Parameter Estimatio We have desity p() which is completely specified by parameters [,, k ] If p() is N(µ, σ 2 ) the [µ, σ 2 ] To highlight that p() depeds o parameters we will write p( ) Note overloaded otatio, p( ) is ot a coditioal desity Let D{, 2,, } be the idepedet traiig samples i our data If p() is N(µ, σ 2 ) the, 2,, are iid samples from N(µ, σ 2 )
9 Maimum Likelihood Parameter Estimatio Cosider the followig fuctio, which is called likelihood of with respect to the set of samples D k p(d ) p( ) k k F( ) Note if D is fied p(d ) is ot a desity Maimum likelihood estimate (abbreviated MLE) of is the value of that maimizes the likelihood fuctio p(d ) ˆ arg ma ( ) p(d )
10 Maimum Likelihood Estimatio (MLE) k p(d ) k p( k ) If D is allowed to vary ad is fied, by idepedece p(d ) is the joit desity for D{, 2,, } If is allowed to vary ad D is fied, p(d ) is ot desity, it is likelihood F()! Recall our approimatio of itegral trick k Pr D B,..., ε p( [ [ ] ] ) k k Thus ML chooses that is most likely to have give the observed data D
11 ML Parameter Estimatio vs. ML Classifier Recall ML classifier fied data decide class c i which maimizes p( c i ) Compare with ML parameter estimatio fied data choose that maimizes p(d ) ML classifier ad ML parameter estimatio use the same priciples applied to differet problems
12 Maimum Likelihood Estimatio (MLE) Istead of maimizig p(d ), it is usually easier to miimize l(p(d )) Sice log is mootoic ˆ arg ma arg ma ( ) p(d ) ( ) l p(d ) p(d ) l(p(d )) To simplify otatio, l(p(d ))l() k ˆ arg ma l ( ) arg ma l p( k ) arg ma l p( k ) k k
13 2
14 MLE: Maimizatio Methods Maimizig l() ca be solved usig stadard methods from Calculus Let (, 2,, p ) t ad let be the gradiet operator, 2,..., Set of ecessary coditios for a optimum is: l 0 Also have to check that that satisfies the above coditio is maimum, ot miimum or saddle poit. Also check the boudary of rage of p t
15 MLE Eample: Gaussia with ukow µ Fortuately for us, all the ML estimates of ay desities we would care about have bee computed Let s go through a eample ayway Let p( µ) be N(µ,σ 2 ) that is σ 2 is kow, but µ is ukow ad eeds to be estimated, so µ ˆ µ arg ma l µ µ µ k 2 ( ) k µ arg ma l ep 2 µ k 2πσ 2σ arg ma µ ( µ ) arg ma l p( k ) k l 2πσ ( µ ) k 2σ 2 2
16 MLE Eample: Gaussia with ukow µ arg ma µ ( l( µ )) arg ma µ k l 2πσ ( µ ) k 2σ 2 2 d d µ k σ ( l( µ )) ( µ ) 0 2 k ˆµ k k µ 0 k k Thus the ML estimate of the mea is just the average value of the traiig data, very ituitive! average of the traiig data would be our guess for the mea eve if we did t kow about ML estimates
17 MLE for Gaussia with ukow µ, σ 2 Similarly it ca be show that if p( µ,σ 2 ) is N(µ, σ 2 ), that is both mea ad variace are ukow, the agai very ituitive result 2 ˆµ σˆ ( ) k ˆ µ k k Similarly it ca be show that if p( µ,σ) is N(µ, Σ), that is is a multivariate gaussia with both mea ad covariace matri ukow, the ˆµ k ˆ ( )( ) Σ k ˆ µ k ˆ µ k k k 2 t
18 Today Fiish Maimum Likelihood Estimatio Bayesia Parameter estimatio New Topic No Parametric Desity Estimatio
19 How to Measure Performace of MLE? How good is a ML estimate? or actually ay other estimate of a parameter? ˆ The atural measure of error would be But ˆ is radom, we caot compute it before we carry out eperimets We wat to say somethig meaigful about our estimate as a fuctio of A way to solve this difficulty is to average the error, i.e. compute the mea absolute error E [ ] ˆ ˆ p(,,..., ) ˆ 2 dd2... d
20 How to Measure Performace of MLE?s It is usually much easier to compute a almost equivalet measure of performace, the mea squared error: E [( ) 2 ] ˆ Do a little algebra, ad use Var(X)E(X 2 )-(E(X)) 2 E [( ) ] ˆ 2 Var ( ˆ ) + ( E( ˆ ) ) 2 variace estimator should have low variace bias epectatio should be close to the true
21 How to Measure Performace of MLE?s E [( ) ] ˆ 2 Var ( ˆ ) + ( E( ˆ ) ) 2 variace bias ideal case bad case bad case p( ˆ ) p( ˆ ) p( ˆ ) E ( ˆ ) E( ˆ ) E ( ˆ ) o bias low variace large bias low variace o bias high variace
22 Bias ad Variace for MLE of the Mea Let s compute the bias for ML estimate of the mea [ ˆ ] E E µ k k k E [ ] k k µ µ Thus this estimate is ubiased! How about variace of ML estimate of the mea? E ( ˆ µ µ ) E ˆ µ 2µµ ˆ + µ µ 2 µ E( ˆ µ ) + E [ ] [ ] 2 σ Thus variace is very small for a large umber of samples (the more samples, the smaller is variace) Thus the MLE of the mea is a very good estimator k k 2
23 Bias ad Variace for MLE of the Mea Suppose someoe claims they have a ew great estimator for the mea, just take the first sample! ˆ µ Thus this estimator is ubiased: However its variace is: E [( ) ] 2 ˆ µ E ( µ ) [ ] 2 2 σ µ E ( ) ( ) µ ˆ µ E p( ˆ ) Thus variace ca be very large ad does ot improve as we icrease the umber of samples E ( ˆ ) o bias high variace
24 MLE Bias for Mea ad Variace How about ML estimate for the variace? E 2 k k [ ] ˆ σ E ( ˆ µ ) σ σ Thus this estimate is biased! This is because we used µˆ istead of true µ Bias 0 as ifiity, asymptotically ubiased Ubiased estimate 2 σˆ µ k ( k ˆ ) Variace of MLE of variace ca be show to go to 0 as goes to ifiity 2
25 MLE for Uiform distributio U[0,] X is U[0, ] if its desity is / iside [0,] ad 0 otherwise (uiform distributio o [0,] ) p ( ) F ( ) The likelihood is F ( ) k 0 if ma{,..., } p( ) k k if < ma{,..., } Thus k ˆ arg ma k p ( k ) ma{,..., This is ot very pleasig sice for sure should be larger tha ay observed! }
26 Bayesia Parameter Estimatio Suppose we have some idea of the rage where parameters should be Should t we formalize such prior kowledge i hopes that it will lead to better parameter estimatio? Let be a radom variable with prior distributio P() This is the key differece betwee ML ad Bayesia parameter estimatio This key assumptio allows us to fully eploit the iformatio provided by the data
27 Bayesia Parameter Estimatio As i MLE, suppose p( ) is completely specified if is give But ow is a radom variable with prior p() Ulike MLE case, p( ) is a coditioal desity After we observe the data D, usig Bayes rule we ca compute the posterior p( D) Recall that for the MAP classifier we fid the class c i that maimizes the posterior p(c D) By aalogy, a reasoable estimate of is the oe that maimizes the posterior p( D) But is ot our fial goal, our fial goal is the ukow p() Therefore a better thig to do is to maimize p( D), this is as close as we ca come to the ukow p()!
28 Bayesia Estimatio: Formula for p( D) From the defiitio of joit distributio: ( D ) p(, D ) p d Usig the defiitio of coditioal probability: ( D ) p(, D ) p( D ) p d But p(,d)p( ) sice p( ) is completely specified by Usig Bayes formula, p kow ukow p D p p D d ( ) ( ) ( ) p ( ) ( D ) p( ) D p( D ) ( ) p( D ) p( ) p k d k
29 Bayesia Estimatio vs. MLE So i priciple p( D) ca be computed I practice, it may be hard to do itegratio aalytically, may have to resort to umerical methods p ( D ) p( ) k ( ) p( ) k k d p p ( ) p( ) Cotrast this with the MLE solutio which requires differetiatio of likelihood to get p ˆ k d ( ) Differetiatio is easy ad ca always be doe aalytically
30 Bayesia Estimatio vs. MLE p( D) ca be thought of as the weighted average of the proposed model all possible values of support receives from the data ( D ) p( ) p( D ) p d proposed model with certai Cotrast this with the MLE solutio which always gives us a sigle model: p ( ˆ ) Whe we have may possible solutios, takig their sum averaged by their probabilities seems better tha spittig out oe solutio
31 Bayesia Estimatio: Eample for U[0,] Let X be U[0,]. Recall p( )/ iside [0,], else 0 p( ) p( ) Suppose we assume a U[0,0] prior o good prior to use if we just ow the rage of but do t kow aythig else We eed to compute ( D ) p( ) p( D ) p d p ( ) ( D ) p( ) with p D ad p( D ) p( ) p( D ) p( ) k d k 0 0
32 Bayesia Estimatio: Eample for U[0,] We eed to compute p ( D ) p( ) p( D ) usig d p ( ) ( D ) p( ) p D ad p( D ) ( ) p( D ) p( ) p k d k Whe computig MLE of, we had p ( D ) Thus p ( D ) 0 for ma{ otherwise c for ma{ 0 otherwise,...,,..., } } 0 where c is the ormalizig costat, i.e. 0 c p( ) d ma {..., }, p( D ) 0
33 Bayesia Estimatio: Eample for U[0,] We eed to compute p ( D ) p( ) p( D ) p ( D ) p( ) c for ma{ 0 otherwise We have 2 cases:. case < ma{, 2,, } p 2. case > ma{, 2,, } p d,..., } ( D ) c d α + ma{,... } ( D) p 0 0 ( D ) c d + c c 0 c 0 costat idepedet of
34 Bayesia Estimatio: Eample for U[0,] α ML p ( ˆ ) Bayes p ( D ) Note that eve after >ma {, 2,, }, Bayes desity is ot zero, which makes sese curious fact: Bayes desity is ot uiform, i.e. does ot have the fuctioal form that we have assumed!
35 ML vs. Bayesia Estimatio with Broad Prior Suppose p() is flat ad broad (close to uiform prior) p( D) teds to sharpe if there is a lot of data Thus p(d ) p( D)/p() will have the same sharp peak as p( D) But by defiitio, peak of p(d ) is the ML estimate ^ The itegral is domiated by the peak: p ( D) p ˆ p( D ) p( ) ( D) p p( ) ˆ ( D) ( D) p( ) p( D) d p( ˆ ) p( D) d p( ˆ ) Thus as goes to ifiity, Bayesia estimate will approach the desity correspodig to the MLE! p ˆ p( ˆ )
36 ML vs. Bayesia Estimatio: Geeral Prior Maimum Likelihood Estimatio Easy to compute, use differetial calculus Easy to iterpret (returs oe model) p( ) ^ has the assumed parametric form Bayesia Estimatio Hard compute, eed multidimesioal itegratio Hard to iterpret, returs weighted average of models p( D) does ot ecessarily have the assumed parametric form Ca give better results sice use more iformatio about the problem (prior iformatio)
Pattern Classification
Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher
More informationPattern Classification
Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More information1.010 Uncertainty in Engineering Fall 2008
MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationLecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett
Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationUnbiased Estimation. February 7-12, 2008
Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationf(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1
Parameter Estimatio Samples from a probability distributio F () are: [,,..., ] T.Theprobabilitydistributio has a parameter vector [,,..., m ] T. Estimator: Statistic used to estimate ukow. Estimate: Observed
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationLecture 9: September 19
36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationProbability and MLE.
10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai
More informationThe variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.
SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample
More informationElementary manipulations of probabilities
Elemetary maipulatios of probabilities Set probability of multi-valued r.v. {=Odd} = +3+5 = /6+/6+/6 = ½ X X,, X i j X i j Multi-variat distributio: Joit probability: X true true X X,, X X i j i j X X
More informationChapter 7 Maximum Likelihood Estimate (MLE)
Chapter 7 aimum Likelihood Estimate (LE) otivatio for LE Problems:. VUE ofte does ot eist or ca t be foud . BLUE may ot be applicable ( Hθ w) Solutio: If the PDF
More information6. Sufficient, Complete, and Ancillary Statistics
Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationEE 6885 Statistical Pattern Recognition
EE 6885 Statistical Patter Recogitio Fall 5 Prof. Shih-Fu Chag http://www.ee.columbia.edu/~sfchag Lecture 6 (9/8/5 EE6887-Chag 6- Readig EM for Missig Features Textboo, DHS 3.9 Bayesia Parameter Estimatio
More informationAAEC/ECON 5126 FINAL EXAM: SOLUTIONS
AAEC/ECON 5126 FINAL EXAM: SOLUTIONS SPRING 2015 / INSTRUCTOR: KLAUS MOELTNER This exam is ope-book, ope-otes, but please work strictly o your ow. Please make sure your ame is o every sheet you re hadig
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationUnderstanding Samples
1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More information4. Basic probability theory
Cotets Basic cocepts Discrete radom variables Discrete distributios (br distributios) Cotiuous radom variables Cotiuous distributios (time distributios) Other radom variables Lect04.ppt S-38.45 - Itroductio
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationProbability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].
Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x
More informationSection 14. Simple linear regression.
Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo
More informationSTATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:
Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal
More informationVector Quantization: a Limiting Case of EM
. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationSolutions: Homework 3
Solutios: Homework 3 Suppose that the radom variables Y,...,Y satisfy Y i = x i + " i : i =,..., IID where x,...,x R are fixed values ad ",...," Normal(0, )with R + kow. Fid ˆ = MLE( ). IND Solutio: Observe
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationn n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1
MATH88T Maria Camero Cotets Basic cocepts of statistics Estimators, estimates ad samplig distributios 2 Ordiary least squares estimate 3 3 Maximum lielihood estimator 3 4 Bayesia estimatio Refereces 9
More informationPattern Classification, Ch4 (Part 1)
Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationClassification with linear models
Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic
More informationPRACTICE PROBLEMS FOR THE FINAL
PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationBig Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.
5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More informationDepartment of Mathematics
Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationLecture 20: Multivariate convergence and the Central Limit Theorem
Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece
More informationINTEGRATION BY PARTS (TABLE METHOD)
INTEGRATION BY PARTS (TABLE METHOD) Suppose you wat to evaluate cos d usig itegratio by parts. Usig the u dv otatio, we get So, u dv d cos du d v si cos d si si d or si si d We see that it is ecessary
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationPSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9
Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationLecture Chapter 6: Convergence of Random Sequences
ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite
More informationPower and Type II Error
Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationLecture 6: Integration and the Mean Value Theorem. slope =
Math 8 Istructor: Padraic Bartlett Lecture 6: Itegratio ad the Mea Value Theorem Week 6 Caltech 202 The Mea Value Theorem The Mea Value Theorem abbreviated MVT is the followig result: Theorem. Suppose
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19
CS 70 Discrete Mathematics ad Probability Theory Sprig 2016 Rao ad Walrad Note 19 Some Importat Distributios Recall our basic probabilistic experimet of tossig a biased coi times. This is a very simple
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More information