ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
|
|
- Bruce Jacobs
- 5 years ago
- Views:
Transcription
1 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2], [0,2]} ω 2 :{[1,1], [2,1], [1,2], [3,3]} (a) Compute the miimum achievable error rate by a liear machie (hit: draw a picture of the data). Assume the classes are equiprobable. (b) Assume the priors for each class are: P(ω 1 ) = α ad P(ω 2 ) = 1-α. Sketch P(E) as a fuctio of α for a maximum likelihood classifier based o the assumptio that each class is draw from a multivariate Gaussia distributio. Compare ad cotrast your aswer with your aswer to (a). Be very specific i your sketch ad label all critical poits. Ulabeled plots will receive o partial credit. (c) Assume you are ot costraied to a liear machie. What is the miimum achievable error rate that ca be achieved for this data? Is this value differet tha (a)? If so, why? How might you achieve such a solutio? Compare ad cotrast this solutio to (a). Solutio: (a) Let s assume a Gaussia model for the data i each class, as without additioal kowledge, we shall assume simplest possible model. Mea of class-1 data, μ 1 = 1 Mea of class-2 data, μ 2 = 1 Covariace of class-1, Σ 1 =[1.33, 0; 0, 1.33] Covariace of class-2, Σ 2 =[0.92, 0.58; 0.58, 0.92] X i=1 i = [1, 1]; Where, =No. of samples = 4 X j=1 j = [1.75, 1.75]; Where, =No. of samples = 4 P(E 1 )=Probability of error for class-1 P(E 2 )=Probability of error for class-2 P(E)= Average Probability of error for give two class classificatio Page 1
2 Approach-1: Oe ituitive possible approach to this problem is to plot graph of the give data. By observatio of the data, we ca determie liear threshold to achieve miimum error rate. By observig value of μ 1 ad μ 2, we ca determie equatio of the lie joiig μ 1 ad μ 2. This equatio is give by x = y, with slope, m=1. Oe ituitio says that if we draw a lie, which is perpedicular to the lie joiig two meas ad passes through oe of the poits o the lie segmet joiig two meas, we might get such liear threshold to achieve miimum error rate. Slope, m, of this perpedicular lie would be -1. Threshold-1: A lie draw at -45º from x-axis, y + x = α; where 2 α < 3 Decisio: Choose ω 1 for X α, choose ω 2 for X > α Oe of such threshold, for y + x = 2, is show i Figure-1, i gree color. Error for class-1, X ω1, for X > α Error for class-2, X ω2, for X α P(E 1 )=Number of miss-classified samples for class-1/total samples for class-1...eq-1.1 =1/4 = 0.25 P(E 2 )=Number of miss-classified samples for class-2/total samples for class-2...eq-1.2 =1/4 = 0.25 P(E)= P(E1)+ P(E2) 2 = ( )/2...Eq-1.3 Miimum probability of Error usig liear Machie, P(E)= 0.25 Aswer Threshold-2 (show i Figure-1, i pik color): A lie draw at -45º from x-axis, y + x = 3; Decisio: Choose ω 1 for X < 3, choose ω 2 for X 3 Error for class-1, X ω1, for X 3 Error for class-2, X ω2, for X < 3 Usig Eq.-1.1, 1.2 ad 1.3, P(E 1 )=0.25, P(E 2 )=0.25 ad P(E)=0.25 Miimum probability of Error usig liear Machie, P(E)= 0.25 Aswer Page 2
3 Figure-1 Geeralized approach: It s ot possible to fid liear discrimiatio fuctio (threshold) for large umber of data by just observatio of the data. With kowledge of differet parameters such as mea ad covariace of the data, liear discrimiatio fuctio for two class problem could be achieved for miimum error rate usig followig equatios. Where, g i (X) g j (X) = X t A X + b t X + c = 0 (b) Prior for class-1, P(ω 1 ) = α ad Prior for class-2, P(ω 2 ) = 1-α; where 0 α 1 Coditioal Probability of Error is give by, P error x P( 2 x) P( 1 x) x 1 x 2 Page 3
4 For maximum likelyhood classifier, we use followig Bayes decisio rule for miimizig the probability of error, decide ω 1 if P(ω 1 /x) > P(ω 2 /x); otherwise decide ω2 Bayes formula, j P x p x P j px Where, Evidece, p(x), is a scale factor ad take as 1. It assures that, P(ω 1 /X)+ P(ω 2 /X)=1 Hece, decisio rule for maximum likelyhood classifier is, decide ω 1 if p(x /ω 1 ) P(ω 1) > p(x /ω 2 ) P(ω 2 ); otherwise decide ω2 Hece, P error x) mi[ P( x), P( )].. Eq 1.4 ( 1 2 x j Assume, p(x /ω 1 ) = p(x /ω 2 ). i.e. both the classes are equally likely. Hece, measuremet, x, give o useful iformatio ad decisio is completely based o prior iformatio. I this case, decisio rule for maximum likelihood classifier is, decide ω 1 if P(ω 1) > P(ω 2 ); otherwise decide ω2 For this assumptio, P(E)=mi [P(ω 1), P(ω 2 )].. Eq 1.5 Figure-2 Shows, Plot of probability of Error, P(E), calculated usig Eq-1.5, as fuctio of α. Critical Poits: The maximum probability of Error occurs at α = Whe P(ω1)= P(ω2)=0.5 With equal probability of both the classes, the ucertaity associated with classificatio is maximum, ad hece, the error. P(E)=0.5 For α = 1, P(ω1)= 1 ad P(ω2)=0 For α = 0, P(ω1)= 0 ad P(ω2)=1 For the above values of α, the classificatio is completely predictable as oly data of oe class exists at a time. Hece, P(E)=0 for α = 1 ad 0. Compariso/Cotrast with solutio i part (a): Solutio i part (a) assumes that both classes are equiprobable, where as, i part (b), it is assumed that both classes are equally likely. Hece, for P(ω1)= P(ω2)=0.5, better results i terms of error rate could be achieved usig solutio i part (a). For solutio i part (a), if both classes are ot equiprobable, to get miimum error rate, decisio surface will shift away from the more likely class, ad towards the less likely class. Both the above solutios assume zero-oe loss fuctio. If both classes are equiprobable, but the cost associated with misclassificatio of oe class is higher tha that of the other class, decisio surface will shift away from the class with higher cost of misclassificatio, towards the other class. Page 4
5 Figure-2 (c) Miimum achievable Error-rate: Miimum error rate, P(E) = 0 could be achieved usig highly o-liear surface, show i Figure-3. For give data, this type of surface could be obtaied usig Support Vector Machies (SVM). Compariso/Cotrast with solutio i part (a): Highly o-liear ad complicated decisio surfaces, such as show i Figure-3, may lead to perfect classificatio of traiig samples, givig 0% error rate, but may lead to poor performace o future patters by lackig geeralizatio; whereas the liear decisio surface show i Figure-1 of part (a) provides good geeralizatio of the data uder classificatio. Solutio i part (c) demads use of complex models, which are tued to the particular traiig samples, whereas classifier from part (a) use simple models based o some uderlyig characteristics. Page 5
6 Figure-3 Problem No. 2: Suppose we have a radom sample X 1, X 2,,X where: X i = 0 if a radomly selected studet does ot ow a laptop, ad X i = 1 if a radomly selected studet does ow a laptop. Assumig that the X i are idepedet Beroulli radom variables with ukow parameter p: p(x; p) = (p) xi (1-p) 1-xi Where, x i = 0 or 1 ad 0 < p < 1. Fid the maximum likelihood estimator of p, the proportio of studets who ow a laptop. Solutio: Probability mass fuctio for a Beroulli radom variable, x i = f(x i ; p) = (p) xi (1-p) 1-xi Let s defie set D of traiig samples draw idepedetly from the probability desity, p(x; p), to estimate the ukow parameter, p. D = {x1, x2,, x}; where, values of x1, x2,, x are kow. Page 6
7 Because these samples are draw idepedetly, the likelihood fuctio f of ukow parameter, p = f(d /p ) = f(xi; p) i=1 f (D / p) = f(x 1 ; p) f(x 2 ; p) f(x 3 ; p) f(x ; p) = (p) x1 (1-p) 1-x1 (p) x2 (1-p) 1-x2 (p) x3 (1-p) 1-x3.. (p) x (1-p) 1-x f (D / p) = p i=1 xi (1 p) ( xi) i=1..eq-2.1 Now, atural logarithm is a icreasig fuctio of x. i.e. for x 1 >x 2, l (x 1 ) > l (x 2 ). Hece, the value of p which maximize the atural logarithm of the likelihood fuctio, l (f (D / p)), is also the value of p which maximize the likelihood fuctio, f (D / p). Now, takig atural log both sides of the equatio (2.1), l (f (D / p) ) = l (p i=1 xi (1 p) ( i=1 xi) ) = ( i=1 xi) l (p) + ( i=1 xi) l (1 p)..eq-2.2 To get maximum value of l (f (D / p) ), we eed to differetiate the equatio-2.2 with respect to the ukow parameter p ad set it to 0. d (l (f (D / p) )) dp = i=1 xi p ( i=1 xi) = 0 (1 p) (1 p) i=1 xi p ( i=1 xi) = 0 i=1 xi p = 0 p = 1 xi Alteratively, The likelihood estimator of p, the proportio of studets who ow a laptop, i=1 p = 1 Xi i=1 Aswer However, to cofirm that the solutio p is true global maximum, we eed to take secod derivative of the equatio-2.2 with respect to p. The derivative must come egative for our estimate, p, to be global maximum. Page 7
8 i=1 d 2 l (f (D/p)) dp 2 = xi p 2 ( i=1 xi) (1 p) 2 < 0.. Eq 2. 3 Equatio-2.3 cofirms that estimate, p, maximizes the likelihood of the proportio of the studets who ow a laptop. Problem No. 3: Let s assume you have a 2D Gaussia source which geerates radom vectors of the form [x 1, x 2 ]. You observe the followig data: [1, 1], [2, 2], [3, 3]. You were told the mea of this source was 0 ad the stadard deviatio was 1. (a) Usig Bayesia estimate techiques, what is your best estimate of the mea based o these observatios? (b) Now, suppose you observe a 4 th value: [0, 0]. How does this impact your estimate of the mea? Explai, beig as specific as possible. Support your explaatio with calculatios ad equatios. Solutio: (a) Derivatio of Bayesia estimate for ukow mea, based o observatios: D = {x 1, x 2,, x}, is set of idepedet samples, x 1, x 2,, x. Let s assume oly µ is ukow parameter. For give Bi-variate case, p (x/μ) ~ N (μ/ Σ) Kow prior desity, p(μ)~ N ( μ 0 / Σ 0 ), where, Σ, μ 0 ad Σ 0 are assumed to be kow. Applyig Bayes Formula, to fid Posteriori desity, p(μ/d): p(μ D) = p p (D/μ)p(μ) p (D/μ)p(μ)dμ D p( x ) p k1 1 t 1 1 t 1 exp ( ( 0 ) 2 ( x 2 i1 Which has the form, 1 t 1 p D exp ( ) ( ) 2 Here, α is ormalizatio factor which depeds o D, but idepedet of μ. Here, p D ~ N(, ) Equatig the coefficiets betwee the two Gaussias, we obtai the followig: k k 1 0 0)) 1 1 ˆ x k k 1 ˆ Eq. 3.1 Page 8
9 Where, μ is the sample mea. Usig kowledge of matrix idetity ad applyig little maipulatio, the solutios of the Eq-3.1 is give by, ( 0 ) ˆ ( 0 ) ( 0 ).Eq. 3.2 I Eq-3.2, µ represets our best guess for µ after observatio of samples, ad Σ measures our ucertaity about this guess. Equatios-3.2 shows how prior iformatio is combied with the empirical iformatio i the samples to obtai a posteriori desity, p (μ/d), usig Bayesia Parameter Estimatio Techiques. For give bi-variate problem: x 1 =[1, 1], x 2 = [2, 2], x 3 = [3, 3], =3, Σ 0 = [1, 0; 0, 1], µ 0 = [0 0] Covariace Matrix of observed samples, Σ = [1, 1; 1, 1] 3 μ = 1 x 3 k=1 k = ( [1, 1] + [2, 2] + [3, 3] ) /3 = [2, 2] Usig Eq.-3.2, our best guess for µ after observatio of =3 samples, µ = [1.2, 1.2] Aswer Ucertaity about this guess, Σ = [0.20, 0.20; 0.20, 0.20] The above was calculated usig the followig Matlab code: % Part (a) _1=3; Sigma_Sample=cov([1 1; 2 2; 3 3]); Sigma_0=[1, 0; 0, 1]; Mea_0= [0; 0]; Mea_sample_1=[2; 2]; Mea=(Sigma_0*iv(Sigma_0+Sigma_Sample/_1))*Mea_sample_1+(Sigma_Sample*iv( Sigma_0+Sigma_Sample/_1)*Mea_0)/_1; Covariace=Sigma_0*iv(Sigma_0+Sigma_Sample/_1)*(Sigma_Sample/_1); display(mea); display(covariace); (b) Impact o Estimatio of mea, µ, by addig aother observatio poit, x 4 = [0, 0] 4 Sample mea, μ = 1 x 4 k=1 k = ( [1, 1] + [2, 2] + [3, 3] + [0, 0]) /4 = [1.5, 1.5] Page 9
10 Matlab code: _2=4; Mea_sample_2=[1.5; 1.5]; Mea_2=(Sigma_0*iv(Sigma_0+Sigma_Sample/_2))*Mea_sample_2+(Sigma_Sample*i v(sigma_0+sigma_sample/_2)*mea_0)/_2; Covariace_2=Sigma_0*iv(Sigma_0+Sigma_Sample/_2)*(Sigma_Sample/_2); display(mea_2); display(covariace_2); Usig Eq.-3.2 ad above Matlab code, our best guess for µ, after observatio of =4 samples, µ = [1.0, 1.0] Aswer Observatios: By comparig of aswers of part (a) ad (b), we ca commet that by icreasig the umber of observatios i.e. data poits, the best estimate of the mea come closer to the sample mea. Also, ucertaity associated with this estimatio reduces with icrease i observed data poits, i.e., the best estimate of the mea coverges to the true mea. Proof: From Eq-3.2, it is clear that Σ decreases mootoically with. Hece, each additioal observatio decreases our ucertaity about the true value of μ. As approaches ifiity: Σ, i.e. ucertaity associated with our best estimatio of mea approaches to zero. I the equatio of µ, the term, "Σ0(Σ0 + Σ ) 1 μ ", becomes μ Ucertaity about this guess, Σ = [0.17, 0.17; 0.17, 0.17] ad the term, " 1 Σ(Σ0 + Σ ) 1 μ 0 " becomes 0. Hece, our best estimate of the mea, µ, approaches to the sample mea, μ, with zero ucertaity of measuremet; ad its reliace o prior iformatio, μ 0, decreases, provided Σ0 0 matrix (This is the case for most of the observatios). Posteriori desity, p (μ/d), becomes more ad more sharply peaked, approachig a Dirac delta fuctio as approaches ifiity. Special Cases: If Σ0 = 0 Matrix (rare possibility), we have a degeerate case i which our prior certaity that µ = μ 0 is so strog that o umber of observatios ca chage our opiio. If Σ0 Σ, we are extremely ucertai about our prior guesses ad we would take µ = μ usig oly samples to estimate, µ. Geeralizatio: If ratio of Σ/Σ0 is ot ifiity, after observatio of sufficietly large umber of samples, prior iformatio μ 0 ad Σ0 will be uimportat, ad µ will coverge to the sample mea, μ. Page 10
CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationSTAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)
STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationPattern Classification
Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher
More informationENGI 4421 Confidence Intervals (Two Samples) Page 12-01
ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationx = Pr ( X (n) βx ) =
Exercise 93 / page 45 The desity of a variable X i i 1 is fx α α a For α kow let say equal to α α > fx α α x α Pr X i x < x < Usig a Pivotal Quatity: x α 1 < x < α > x α 1 ad We solve i a similar way as
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More information( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2
82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,
More informationTable 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab
Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationData Analysis and Statistical Methods Statistics 651
Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationChapter 8: Estimating with Confidence
Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig
More informationProbability and MLE.
10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationPattern Classification
Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher
More informationParameter, Statistic and Random Samples
Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationCastiel, Supernatural, Season 6, Episode 18
13 Differetial Equatios the aswer to your questio ca best be epressed as a series of partial differetial equatios... Castiel, Superatural, Seaso 6, Episode 18 A differetial equatio is a mathematical equatio
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationFirst Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >
More informationTopic 5 [434 marks] (i) Find the range of values of n for which. (ii) Write down the value of x dx in terms of n, when it does exist.
Topic 5 [44 marks] 1a (i) Fid the rage of values of for which eists 1 Write dow the value of i terms of 1, whe it does eist Fid the solutio to the differetial equatio 1b give that y = 1 whe = π (cos si
More informationDirection: This test is worth 150 points. You are required to complete this test within 55 minutes.
Term Test 3 (Part A) November 1, 004 Name Math 6 Studet Number Directio: This test is worth 10 poits. You are required to complete this test withi miutes. I order to receive full credit, aswer each problem
More information11 Correlation and Regression
11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationf(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1
Parameter Estimatio Samples from a probability distributio F () are: [,,..., ] T.Theprobabilitydistributio has a parameter vector [,,..., m ] T. Estimator: Statistic used to estimate ukow. Estimate: Observed
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe
More informationTHE KALMAN FILTER RAUL ROJAS
THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationChapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo
More informationConfidence Intervals for the Population Proportion p
Cofidece Itervals for the Populatio Proportio p The cocept of cofidece itervals for the populatio proportio p is the same as the oe for, the samplig distributio of the mea, x. The structure is idetical:
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationStatistical Properties of OLS estimators
1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of
More informationDirection: This test is worth 250 points. You are required to complete this test within 50 minutes.
Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely
More informationSequences. A Sequence is a list of numbers written in order.
Sequeces A Sequece is a list of umbers writte i order. {a, a 2, a 3,... } The sequece may be ifiite. The th term of the sequece is the th umber o the list. O the list above a = st term, a 2 = 2 d term,
More informationOn an Application of Bayesian Estimation
O a Applicatio of ayesia Estimatio KIYOHARU TANAKA School of Sciece ad Egieerig, Kiki Uiversity, Kowakae, Higashi-Osaka, JAPAN Email: ktaaka@ifokidaiacjp EVGENIY GRECHNIKOV Departmet of Mathematics, auma
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationInferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.
Iferetial Statistics ad Probability a Holistic Approach Iferece Process Chapter 8 Poit Estimatio ad Cofidece Itervals This Course Material by Maurice Geraghty is licesed uder a Creative Commos Attributio-ShareAlike
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationOutline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationSection 5.5. Infinite Series: The Ratio Test
Differece Equatios to Differetial Equatios Sectio 5.5 Ifiite Series: The Ratio Test I the last sectio we saw that we could demostrate the covergece of a series a, where a 0 for all, by showig that a approaches
More informationSTA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:
STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio
More informationBasis for simulation techniques
Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios
More information5. Likelihood Ratio Tests
1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationThe picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled
1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationCEU Department of Economics Econometrics 1, Problem Set 1 - Solutions
CEU Departmet of Ecoomics Ecoometrics, Problem Set - Solutios Part A. Exogeeity - edogeeity The liear coditioal expectatio (CE) model has the followig form: We would like to estimate the effect of some
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationNO! This is not evidence in favor of ESP. We are rejecting the (null) hypothesis that the results are
Hypothesis Testig Suppose you are ivestigatig extra sesory perceptio (ESP) You give someoe a test where they guess the color of card 100 times They are correct 90 times For guessig at radom you would expect
More informationSolutions to Odd Numbered End of Chapter Exercises: Chapter 4
Itroductio to Ecoometrics (3 rd Updated Editio) by James H. Stock ad Mark W. Watso Solutios to Odd Numbered Ed of Chapter Exercises: Chapter 4 (This versio July 2, 24) Stock/Watso - Itroductio to Ecoometrics
More informationMA Advanced Econometrics: Properties of Least Squares Estimators
MA Advaced Ecoometrics: Properties of Least Squares Estimators Karl Whela School of Ecoomics, UCD February 5, 20 Karl Whela UCD Least Squares Estimators February 5, 20 / 5 Part I Least Squares: Some Fiite-Sample
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationDiscrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2013 Aat Sahai Lecture 18 Iferece Oe of the major uses of probability is to provide a systematic framework to perform iferece uder ucertaity. A
More informationDepartment of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution
Departmet of Civil Egieerig-I.I.T. Delhi CEL 899: Evirometal Risk Assessmet HW5 Solutio Note: Assume missig data (if ay) ad metio the same. Q. Suppose X has a ormal distributio defied as N (mea=5, variace=
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationMBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS
MBACATÓLICA Quatitative Methods Miguel Gouveia Mauel Leite Moteiro Faculdade de Ciêcias Ecoómicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACatólica 006/07 Métodos Quatitativos
More informationTopic 18: Composite Hypotheses
Toc 18: November, 211 Simple hypotheses limit us to a decisio betwee oe of two possible states of ature. This limitatio does ot allow us, uder the procedures of hypothesis testig to address the basic questio:
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationChapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more
More informationTopic 1 2: Sequences and Series. A sequence is an ordered list of numbers, e.g. 1, 2, 4, 8, 16, or
Topic : Sequeces ad Series A sequece is a ordered list of umbers, e.g.,,, 8, 6, or,,,.... A series is a sum of the terms of a sequece, e.g. + + + 8 + 6 + or... Sigma Notatio b The otatio f ( k) is shorthad
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY
ECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY (1) A distributio that allows asymmetry differet probabilities for egative ad positive outliers is the asymmetric double expoetial,
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More information1 Models for Matched Pairs
1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More information