Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
|
|
- Blake Taylor
- 5 years ago
- Views:
Transcription
1 Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p + ɛ. () Liearity is a assumptio, which ca be wrog. Further assumptios take the form of restrictios o the oise: E [ɛ X = 0, [ɛ X = σ 2. Moreover, we assume ɛ is ucorrelated across observatios. We covert this to matrix form: Y = Xβ + ɛ (2) Y is a matrix of radom variables; X is a (p + ) matrix, with a extra colum of all s; ɛ is a matrix. Beyod liearity, the assumptios traslate to E [ɛ X = 0, [ɛ X = σ 2 I. (3) We do t kow β. If we guess it is b, we will make a vector of predictios Xb ad have a vector of errors Y Xb. The mea squared error, as a fuctio of b, is the MSE(b) = (Y Xb)T (Y Xb). (4) 2 Least Squares Estimatio ad Its Properties The least squares estimate of the coefficiets is the oe which miimizes the MSE: To fid this, we eed the derivatives: β argmi MSE(b). (5) b We set the derivative to zero at the optimum: b MSE = 2 (XT Y X T Xb). (6) ( ) XT Y X β = 0. (7) The term i paretheses is the vector of errors whe we use the least-squares estimate. This is the vector of residuals, e Y X β (8) so the have the ormal, estimatig or score equatios, XT e = 0. (9)
2 We say equatios, plural, because this is equivalet to the set of p + equatios e i i= = 0 (0) e i X ij = 0 () i= (May people omit the factor of /.) This tells us that while e is a -dimesioal vector, it is subject to p + liear costraits, so it is cofied to a liear subspace of dimesio p. Thus p is the umber of residual degrees of freedom. The solutio to the estimatig equatios is β = (X T X) X T Y. (2) This is oe of the two most importat equatios i the whole subject. It says that the coefficiets are a liear fuctio of the respose vector Y. The least squares estimate is a costat plus oise: β = (X T X) X T Y (3) = (X T X) X T (Xβ + ɛ) (4) = (X T X) X T Xβ + (X T X) X T ɛ (5) = β + (X T X) X T ɛ. (6) The least squares estimate is ubiased: E [ β = β + (X T X) X T E [ɛ = β. (7) Its variace is [ β = σ 2 (X T X). (8) Sice the etries i X T X are usual proportioal to, it ca be helpful to write this as The variace of ay oe coefficiet estimator is The vector of fitted meas or coditioal values is ( ) [ β = σ2 XT X. (9) ( ) [ βi = σ2 XT X. (20) i+,i+ This is more coveietly expressed i terms of the origial matrices: Ŷ X β. (2) Ŷ = X(X T X) X T Y = HY. (22) 2
3 The fitted values are thus liear i Y: set the resposes all to zero ad all the fitted values will be zero; double all the resposes ad all the fitted values will double. The hat matrix H X(X T X) X T, also called the ifluece, projectio or predictio matrix, cotrols the fitted values. It is a fuctio of X aloe, igorig the respose variable totally. It is a matrix with several importat properties: It is symmetric, H T = H. It is idempotet, H 2 = H. Its trace tr H = i H ii = p +, the umber of degrees of freedom for the fitted values. The variace-covariace matrix of the fitted values is [Ŷ = Hσ 2 IH T = σ 2 H. (23) To make a predictio at a ew poit, ot i the data used for estimatio, we take its predictor coordiates ad group them ito a (p + ) matrix X ew (icludig the for the itercept). The[ poit predictio for Y is the X ew β. The expected value is Xew β, ad the variace is X ew β = X ew [ β X T ew = σ 2 X ew (X T X) X T ew. The residuals are also liear i the respose: e Y m = (I H)Y. (24) The trace of I H is p. The variace-covariace matrix of the residuals is The mea squared error (traiig error) is [e = σ 2 (I H). (25) MSE = e 2 i = et e. (26) i= Its expectatio value is slightly below σ 2 : E [MSE = σ 2 p. (27) (This may be proved usig the trace of I H.) A ubiased estimate of σ 2, which I will call σ 2 throughout the rest of this, is σ 2 MSE p. (28) The leverage of data poit i is H ii. This has several iterpretatios:. [Ŷi = σ 2 H ii ; the leverage cotrols how much variace there is i the fitted value. 2. Ŷi/ Y i = H ii ; the leverage says how much chagig the respose value for poit i chages the fitted value there. 3
4 3. Cov [Ŷi, Y i = σ 2 H ii ; the leverage says how much covariace there is betwee the i th respose ad the i th fitted value. 4. [e i = σ 2 ( H ii ); the leverage cotrols how big the i th residual is. The stadardized residual is r i = e i σ H ii. (29) The oly restrictio we have to impose o the predictor variables X i is that (X T X) eeds to exist. This is equivalet to X is ot colliear: oe of its colums is a liear combiatio of other colums; which is also equivalet to The eigevalues of X T X are all > 0. (If there are zero eigevalues, the correspodig eigevectors idicate liearly-depedet combiatios of predictor variables.) Nearly-colliear predictor variables ted to lead to large variaces for coefficiet estimates, with high levels of correlatio amog the estimates. It is perfectly OK for oe colum of X to be a fuctio of aother, provided it is a oliear fuctio. Thus i polyomial regressio we add extra colums for powers of oe or more of the predictor variables. (Ay other oliear fuctio is however also legitimate.) This complicates the iterpretatio of coefficiets as slopes, just as though we had doe a trasformatio of a colum. Estimatio ad iferece for the coefficiets o these predictor variables goes exactly like estimatio ad iferece for ay other coefficiet. Oe colum of X could be a (oliear) fuctio of two or more of the other colums; this is how we represet iteractios. Usually the iteractio colum is just a product of two other colums, for a product or multiplicative iteractio; this also complicates the iterpretatio of coefficiets as slopes. (See the otes o iteractios.) Estimatio ad iferece for the coefficiets o these predictor variables goes exactly like estimatio ad iferece for ay other coefficiet. We ca iclude qualitative predictor variables with k discrete categories or levels by itroducig biary idicator variables for k of the levels, ad addig them to X. The coefficiets o these idicators tell us about amouts that are added (or subtracted) to the respose for every idividual who is a member of that category or level, compared to what would be predicted for a otherwiseidetical idividual i the baselie category. Equivaletly, every category gets its ow itercept. Estimatio ad iferece for the coefficiets o these predictor variables goes exactly like estimatio ad iferece for ay other coefficiet. Iteractig the idicator variables for categories with other variables gives coefficiets which say what amout is added to the slope used for each member of that category (compared to the slope for members of the baselie level). Equivaletly, each category gets its ow slope. Estimatio ad iferece for the coefficiets o these predictor variables goes exactly like estimatio ad iferece for ay other coefficiet. Model selectio for predictio aims at pickig a model which will predict well o ew data draw from the same distributio as the data we ve see. Oe way to estimate this out-of-sample performace is to look at what the expected squared error would be o ew data with the same X 4
5 matrix, but a ew, idepedet realizatio of Y. I the otes o model selectio, we showed that [ E (Y m) T (Y m) [ = E [ = E = E (Y m)t (Y m) + 2 Cov [Y i, m i (30) i= (Y m)t (Y m) + 2 σ2 tr H (3) [ (Y m)t (Y m) + 2 σ2 (p + ). (32) Mallow s C p estimates this by MSE + 2 σ2 (p + ) (33) usig the σ 2 from the largest, model beig selected amog (which icludes all the other models as special cases). A alterative is leave-oe-out cross-validatio, which amouts to i= ( ei H ii We also cosidered K-fold cross-validatio, AIC ad BIC. 3 Gaussia Noise ) 2. (34) The Gaussia oise assumptio is added o to the other assumptios already made. It is that ɛ i N(0, σ 2 ), idepedet of the predictor variables ad all other ɛ j. I other words, ɛ has a multivariate Gaussia distributio, ɛ MV N(0, σ 2 I). (35) Uder this assumptio, it follows that, sice β is a liear fuctio of ɛ, it also has a multivariate Gaussia distributio: β MV N(β, σ 2 (X T X) ) (36) ad It follows from this that ad Ŷ MV N(Xβ, σ 2 H). (37) β i N(β i, σ 2 (X T X) i+,i+ (38) Ŷ i N(X i β, σ 2 H ii ). (39) The samplig distributio of the estimated coditioal mea at a ew poit X ew is N(X ew β, σ 2 X ew (X T X) X T ew). The mea squared error follows a χ 2 distributio: MSE σ 2 χ 2 p. (40) 5
6 Moreover, the MSE is statistically idepedet of β. We may therefore defie [ βi = σ (X T X) i+,i+ (4) ad ad get t distributios: [Ŷi = σ H ii (42) β i β i t p N(0, ) (43) [ βi ad Ŷ i m(x i ) [ m i The Wald test for the hypothesis that β i = β i t p N(0, ). (44) therefore forms the test statistic β i β i (45) [ βi ad rejects the hypothesis if it is too large (above or below zero) compared to the quatiles of a t p distributio. The summary fuctio of R rus such a test of the hypothesis that β i = 0. There is othig magic or eve especially importat about testig for a 0 coefficiet, ad the same test works for testig whether a slope = 42 (for example). Importat! The ull hypothesis beig test is Y is a liear fuctio of X,... X p, ad of o other predictor variables, with idepedet, costat-variace Gaussia oise, ad the coefficiet β i = 0 exactly. ad the alterative hypothesis is Y is a liear fuctio of X,... X p, ad of o other predictor variables, with idepedet, costat-variace Gaussia oise, ad the coefficiet β i 0. The Wald test does ot test ay of the model assumptios (it presumes them all), ad it caot say whether i a absolutely sese X i matters for Y ; addig or removig other predictors ca chage whether the true β i = 0. Warig! Retaiig the ull hypothesis β i = 0 ca happe if either the parameter is precisely estimated, ad cofidetly kow to be close to zero, or if it is im-precisely estimated, ad might as well be zero or somethig huge o either side. Sayig We ca igore this because we ca be quite sure it s small ca make sese; sayig We ca igore this because we have o idea what it is is preposterous. To test whether several coefficiets (β j : j S) are all simultaeously zero, use a F test. The ull hypothesis is H 0 : β j = 0 for all j S ad the altermative is H : β j 0 for at least oe j S. 6
7 The F statistic is F stat = ( σ2 ull σ2 full )/s σ 2 full /( p ) (46) where s is the umber of elemets i S. Uder that ull hypothesis, F stat F s, p (47) If we are testig a subset of coefficiets, we have a partial F test. A full F test sets s = p, i.e., it tests the ull hypothesis of a itercept-oly model (with idepedet, costat-variace Gaussia oise) agaist the alterative of the liear model o X,... X p (ad oly those variables, with idepedet, costat-variace Gaussia oise). This is oly of iterest uder very uusual circumstaces. Oce agai, o F test is capable of checkig ay modelig assumptios. This is because both the ull hypothesis ad the alterative hypothesis presume that the all of the modelig assumptios are exactly correct. A α cofidece iterval for β i is β i ± [β i t p (α/2) β i ± [β i z α/2. (48) We saw how to create a cofidece ellipsoid for several coefficiets. These make a simultaeous guaratee: all the parameters are trapped iside the cofidece regio with probabiluty α. A simpler way to get a simultaeous cofidece regio for all p parameters is to use α/p cofidece itervals for each oe ( Boferroi correctio ). This gives a cofidece hyper-rectagle. A α cofidece iterval for the regressio fuctio at a poit is m(x i ) ± [ m(x i ) t p (α/2). (49) Residuals. The cross-validated or studetized residuals are:. Temporarily hold out data poit i 2. Re-estimate the coefficiets to get β ( i) ad σ ( i). 3. Make a predictio for Y i, amely, Ŷi(i) = m ( i) (X i ). 4. Calculate t i = Y i Ŷi(i) [ σ ( i) + m ( i) i This ca be doe without recourse to actually re-fittig the model: p t i = r i p ri 2 (Note that for large, this is typically extremely close to r i.) Also,. (50) (5) t i t p 2 (52) (The 2 is because we re usig data poits to estimate p + coefficiets.) Cook s distace for poit i is the sum of the (squared) chages to all the fitted values if i was omitted; it is D i = p + e2 i H ii ( H ii ) 2. (53) 7
1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More information(all terms are scalars).the minimization is clearer in sum notation:
7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationTMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.
Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx
More informationLinear Regression Models
Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect
More informationSTATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:
Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal
More informationStat 139 Homework 7 Solutions, Fall 2015
Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,
More informationMatrix Representation of Data in Experiment
Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationECON 3150/4150, Spring term Lecture 3
Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +
More informationApply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.
Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α
More informationSection 14. Simple linear regression.
Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo
More informationChapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).
Chapters 5 ad 13: REGREION AND CORRELATION (ectios 5.5 ad 13.5 are omitted) Uivariate data: x, Bivariate data (x,y). Example: x: umber of years studets studied paish y: score o a proficiecy test For each
More informationDr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)
Dr Maddah NMG 617 M Statistics 11/6/1 Multiple egressio () (Chapter 15, Hies) Test for sigificace of regressio This is a test to determie whether there is a liear relatioship betwee the depedet variable
More informationStat 200 -Testing Summary Page 1
Stat 00 -Testig Summary Page 1 Mathematicias are like Frechme; whatever you say to them, they traslate it ito their ow laguage ad forthwith it is somethig etirely differet Goethe 1 Large Sample Cofidece
More informationOpen book and notes. 120 minutes. Cover page and six pages of exam. No calculators.
IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits
More informationSimple Linear Regression
Simple Liear Regressio 1. Model ad Parameter Estimatio (a) Suppose our data cosist of a collectio of pairs (x i, y i ), where x i is a observed value of variable X ad y i is the correspodig observatio
More informationSTA6938-Logistic Regression Model
Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve
More information3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.
3/3/04 CDS M Phil Old Least Squares (OLS) Vijayamohaa Pillai N CDS M Phil Vijayamoha CDS M Phil Vijayamoha Types of Relatioships Oly oe idepedet variable, Relatioship betwee ad is Liear relatioships Curviliear
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationTAMS24: Notations and Formulas
TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationFirst, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,
0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical
More informationFactor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis
Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model
More informationComputing Confidence Intervals for Sample Data
Computig Cofidece Itervals for Sample Data Topics Use of Statistics Sources of errors Accuracy, precisio, resolutio A mathematical model of errors Cofidece itervals For meas For variaces For proportios
More informationSimple Random Sampling!
Simple Radom Samplig! Professor Ro Fricker! Naval Postgraduate School! Moterey, Califoria! Readig:! 3/26/13 Scheaffer et al. chapter 4! 1 Goals for this Lecture! Defie simple radom samplig (SRS) ad discuss
More informationUniversity of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 00C Istructor: Nicolas Christou EXERCISE Aswer the followig questios: Practice problems - simple regressio - solutios a Suppose y,
More informationRegression, Inference, and Model Building
Regressio, Iferece, ad Model Buildig Scatter Plots ad Correlatio Correlatio coefficiet, r -1 r 1 If r is positive, the the scatter plot has a positive slope ad variables are said to have a positive relatioship
More informationChapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more
More informationStatistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions
Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet
More informationStudy the bias (due to the nite dimensional approximation) and variance of the estimators
2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationCLRM estimation Pietro Coretto Econometrics
Slide Set 4 CLRM estimatio Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Thursday 24 th Jauary, 2019 (h08:41) P. Coretto
More informationST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.
ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationStatistics 20: Final Exam Solutions Summer Session 2007
1. 20 poits Testig for Diabetes. Statistics 20: Fial Exam Solutios Summer Sessio 2007 (a) 3 poits Give estimates for the sesitivity of Test I ad of Test II. Solutio: 156 patiets out of total 223 patiets
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +
More informationChapter 3: Other Issues in Multiple regression (Part 1)
Chapter 3: Other Issues i Multiple regressio (Part 1) 1 Model (variable) selectio The difficulty with model selectio: for p predictors, there are 2 p differet cadidate models. Whe we have may predictors
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that
More informationData Analysis and Statistical Methods Statistics 651
Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationCircle the single best answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe
More informationStatistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005
Statistics 203 Itroductio to Regressio ad Aalysis of Variace Assigmet #1 Solutios Jauary 20, 2005 Q. 1) (MP 2.7) (a) Let x deote the hydrocarbo percetage, ad let y deote the oxyge purity. The simple liear
More information[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:
PROBABILITY FUNCTIONS A radom variable X has a probabilit associated with each of its possible values. The probabilit is termed a discrete probabilit if X ca assume ol discrete values, or X = x, x, x 3,,
More informationResponse Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable
Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated
More informationChapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo
More information1 General linear Model Continued..
Geeral liear Model Cotiued.. We have We kow y = X + u X o radom u v N(0; I ) b = (X 0 X) X 0 y E( b ) = V ar( b ) = (X 0 X) We saw that b = (X 0 X) X 0 u so b is a liear fuctio of a ormally distributed
More informationLinear Regression Models, OLS, Assumptions and Properties
Chapter 2 Liear Regressio Models, OLS, Assumptios ad Properties 2.1 The Liear Regressio Model The liear regressio model is the sigle most useful tool i the ecoometricia s kit. The multiple regressio model
More informationBinomial Distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible
More information1 Models for Matched Pairs
1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notatio Math 113 - Itroductio to Applied Statistics Name : Use Word or WordPerfect to recreate the followig documets. Each article is worth 10 poits ad ca be prited ad give to the istructor
More informationStatistical Properties of OLS estimators
1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS
PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed
More informationt distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference
EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The
More information¹Y 1 ¹ Y 2 p s. 2 1 =n 1 + s 2 2=n 2. ¹X X n i. X i u i. i=1 ( ^Y i ¹ Y i ) 2 + P n
Review Sheets for Stock ad Watso Hypothesis testig p-value: probability of drawig a statistic at least as adverse to the ull as the value actually computed with your data, assumig that the ull hypothesis
More informationLecture 4: Simple Linear Regression Models, with Hints at Their Estimation
Lecture 4: Simple Liear Regressio Models, with Hits at Their Estimatio 36-40, Fall 207, Sectio B The Simple Liear Regressio Model Let s recall the simple liear regressio model from last time. This is a
More informationCorrelation Regression
Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationLast time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object
6.3 Stochastic Estimatio ad Cotrol, Fall 004 Lecture 7 Last time: Momets of the Poisso distributio from its geeratig fuctio. Gs () e dg µ e ds dg µ ( s) µ ( s) µ ( s) µ e ds dg X µ ds X s dg dg + ds ds
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationChapter 1 Simple Linear Regression (part 6: matrix version)
Chapter Simple Liear Regressio (part 6: matrix versio) Overview Simple liear regressio model: respose variable Y, a sigle idepedet variable X Y β 0 + β X + ε Multiple liear regressio model: respose Y,
More informationDS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationIt should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.
Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig
More informationLecture 11 Simple Linear Regression
Lecture 11 Simple Liear Regressio Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech Midterm 2 mea: 91.2 media: 93.75 std: 6.5 2 Meddicorp
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationLesson 11: Simple Linear Regression
Lesso 11: Simple Liear Regressio Ka-fu WONG December 2, 2004 I previous lessos, we have covered maily about the estimatio of populatio mea (or expected value) ad its iferece. Sometimes we are iterested
More informationRead through these prior to coming to the test and follow them when you take your test.
Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1
More informationContinuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised
Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for
More informationRegression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.
Regressio Correlatio vs. regressio Predicts Y from X Liear regressio assumes that the relatioship betwee X ad Y ca be described by a lie Regressio assumes... Radom sample Y is ormally distributed with
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More information4. Hypothesis testing (Hotelling s T 2 -statistic)
4. Hypothesis testig (Hotellig s T -statistic) Cosider the test of hypothesis H 0 : = 0 H A = 6= 0 4. The Uio-Itersectio Priciple W accept the hypothesis H 0 as valid if ad oly if H 0 (a) : a T = a T 0
More informationCommon Large/Small Sample Tests 1/55
Commo Large/Small Sample Tests 1/55 Test of Hypothesis for the Mea (σ Kow) Covert sample result ( x) to a z value Hypothesis Tests for µ Cosider the test H :μ = μ H 1 :μ > μ σ Kow (Assume the populatio
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationChapter Vectors
Chapter 4. Vectors fter readig this chapter you should be able to:. defie a vector. add ad subtract vectors. fid liear combiatios of vectors ad their relatioship to a set of equatios 4. explai what it
More informationFinal Examination Solutions 17/6/2010
The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationLecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS
Lecture 8: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review What do we mea by oparametric? What is a desirable locatio statistic for ordial data? What
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationMA Advanced Econometrics: Properties of Least Squares Estimators
MA Advaced Ecoometrics: Properties of Least Squares Estimators Karl Whela School of Ecoomics, UCD February 5, 20 Karl Whela UCD Least Squares Estimators February 5, 20 / 5 Part I Least Squares: Some Fiite-Sample
More informationCov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.
CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model
More information3 Why is the Training Error Smaller than the Generalization Error?
Choosig Models Lecture 2: Model Selectio Sometimes we eed to choose from two or more possible models. For example, we might wat to cosider these two models: Y = β 0 + β X + β 2 X 2 + ɛ ad Y = β 0 + β X
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationIsmor Fischer, 1/11/
Ismor Fischer, //04 7.4-7.4 Problems. I Problem 4.4/9, it was show that importat relatios exist betwee populatio meas, variaces, ad covariace. Specifically, we have the formulas that appear below left.
More information6 Sample Size Calculations
6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig
More information