Lecture 4: Simple Linear Regression Models, with Hints at Their Estimation
|
|
- Melissa Poole
- 5 years ago
- Views:
Transcription
1 Lecture 4: Simple Liear Regressio Models, with Hits at Their Estimatio 36-40, Fall 207, Sectio B The Simple Liear Regressio Model Let s recall the simple liear regressio model from last time. This is a statistical model with two variables X ad Y, where we try to predict Y from X. The assumptios of the model are as follows:. The distributio of X is arbitrary (ad perhaps X is eve o-radom). 2. If X = x, the Y = β 0 + β x + ɛ, for some costats ( coefficiets, parameters ) β 0 ad β, ad some radom oise variable ɛ. 3. E [ɛ X = x] = 0 (o matter what x is), Var [ɛ X = x] = σ 2 (o matter what x is). 4. ɛ is ucorrelated across observatios. To elaborate, with multiple data poits, (X, Y ), (X 2, Y 2 ),... (X, Y ), the the model says that, for each i :, Y i = β 0 + β X i + ɛ i () where the oise variables ɛ i all have the same expectatio (0) ad the same variace (σ 2 ), ad Cov [ɛ i, ɛ j ] = 0 (uless i = j, of course).. Plug-I Estimates I lecture, we saw that the optimal liear predictor of Y from X has slope β = Cov [X, Y ] /Var [X], ad itercept β 0 = E [Y ] β E [X]. A commo tactic i devisig estimators is to use what s sometimes called the plug-i priciple, where we fid equatios for the parameters which would hold if we kew the full distributio, ad plug i the sample versios of the populatio quatities. We saw this i the last lecture, where we estimated β by the ratio of the sample covariace to the sample variace: β = c XY (2)
2 We also saw, i the otes to the last lecture, that so log as the law of large umbers holds, β β (3) as. It follows easily that will also coverge o β 0..2 Least Squares Estimates β 0 = Y β X (4) A alterative way of estimatig the simple liear regressio model starts from the objective we are tryig to reach, rather tha from the formula for the slope. Recall, from lecture, that the true optimal slope ad itercept are the oes which miimize the mea squared error: (β 0, β ) = argmi E [ (Y (b 0 + b X)) 2] (5) (b 0,b ) This is a fuctio of the complete distributio, so we ca t get it from data, but we ca approximate it with data. The i-sample, empirical or traiig MSE is MSE(b 0, b ) (y i (b 0 + b x i )) 2 (6) Notice that this is a fuctio of b 0 ad b ; it is also, of course, a fuctio of the data, (x, y ), (x 2, y 2 ),... (x, y ), but we will geerally suppress that i our otatio. If our samples are all idepedet, for ay fixed (b 0, b ), the law of large umbers tells us that MSE(b 0, b ) MSE(b 0, b ) as. So it does t seem ureasoable to try miimizig the i-sample error, which we ca compute, as a proxy for miimizig the true MSE, which we ca t. Where does it lead us? Start by takig the derivatives with respect to the slope ad the itercept: MSE b 0 = MSE b 0 = (y i (b 0 + b x i ))( 2) (7) (y i (b 0 + b x i ))( 2x i ) (8) Set these to zero at the optimum ( ˆβ 0, ˆβ ): (y i ( 0 + ˆβ x i )) = 0 (9) (y i ( ˆβ 0 + ˆβ x i ))(x i ) = 0 2
3 These are ofte called the ormal equatios for least-squares estimatio, or the estimatig equatios: a system of two equatios i two ukows, whose solutio gives the estimate. May people would, at this poit, remove the factor of /, but I thik it makes it easier to uderstad the ext steps: The first equatio, re-writte, gives Substitutig this ito the remaiig equatio, y ˆβ 0 ˆβ x = 0 (0) xy ˆβ 0 x ˆβ x 2 = 0 () ˆβ 0 = y ˆβ x (2) 0 = xy ȳ x + ˆβ x x ˆβ x 2 (3) 0 = c XY ˆβ (4) ˆβ = c XY (5) That is, the least-squares estimate of the slope is our old fried the plug-i estimate of the slope, ad thus the least-squares itercept is also the plug-i itercept. Goig forward The equivalece betwee the plug-i estimator ad the leastsquares estimator is a bit of a special case for liear models. I some o-liear models, least squares is quite feasible (though the optimum ca oly be foud umerically, ot i closed form); i others, plug-i estimates are more useful tha optimizatio..3 Bias, Variace ad Stadard Error of Parameter Estimates Whether we thik of it as derivig from plugig-i or from least squares, we work out some of the properties of this estimator of the coefficiets, usig the model assumptios. We ll start with the slope, ˆβ. 3
4 ˆβ = c XY = = x iy i xȳ x i(β 0 + β x i + ɛ ) x(β 0 + β x + ɛ) = β 0 x + β x 2 + x iɛ i xβ 0 β x 2 x ɛ s 2 x = β + x iɛ i x ɛ = β + x iɛ i x ɛ (6) (7) (8) (9) (20) (2) Sice x ɛ = i xɛ i, ˆβ = β + (x i x)ɛ i (22) This represetatio of the slope estimate shows that it is equal to the true slope (β ) plus somethig which depeds o the oise terms (the ɛ i, ad their sample average ɛ). We ll use this to fid the expected value ad the variace of the estimator ˆβ. I the ext couple of paragraphs, I am goig to treat the x i as o-radom variables. This is appropriate i desiged or cotrolled experimets, where we get to chose their value. I radomized experimets or i observatioal studies, obviously the x i are t ecessarily fixed; [ however, these ] expressios will be correct for the coditioal expectatio E ˆβ x,... x ad coditioal variace Var ˆβ x,... x, ad I will come back to how we get the ucoditioal [ ] expectatio ad variace. Expected value ad bias Recall that E [ɛ i X i ] = 0, so (x i x)e [ɛ i ] = 0 (23) Thus, [ ] E ˆβ = β (24) Sice the bias of a estimator is the differece betwee its expected value ad the truth, ˆβ is a ubiased estimator of the optimal slope. 4
5 (To repeat what I m sure you remember from mathematical [ statistics: ] bias here is a techical term, meaig o more ad o less tha E ˆβ β. A ubiased estimator could still make systematic mistakes for istace, it could uderestimate 99% of the time, provided that the % of the time it over-estimates, it does so by much more tha it uder-estimates. Moreover, ubiased estimators are ot ecessarily superior to biased oes: the total error depeds o both the bias of the estimator ad its variace, ad there are may situatios where you ca remove lots of bias at the cost of addig a little variace. Least squares for simple liear regressio happes ot to be oe of them, but you should t expect that as a geeral rule.) Turig to the itercept, [ ] [ E ˆβ0 = E Y ˆβ ] X (25) [ ] = β 0 + β X E ˆβ X (26) so it, too, is ubiased. = β 0 + β X β X (27) = β 0 (28) Variace ad Stadard Error Usig the formula for the variace of a sum from lecture, ad the model assumptio that all the ɛ i are ucorrelated with each other, [ ] [ Var ˆβ = Var β + (x ] i x)ɛ i (29) [ = Var (x ] i x)ɛ i = = (30) 2 (x i x) 2 Var [ɛ i ] ( )2 (3) σ 2 s2 X ( )2 (32) = σ2 s 2 (33) X I words, this says that the variace of the slope estimate goes up as the oise aroud the regressio lie (σ 2 ) gets bigger, ad goes dow as we have more observatios (), which are further spread out alog the horizotal axis ( ); it should ot be surprisig that it s easier to work out the slope of a lie from may, well-separated poits o the lie tha from a few poits smushed together. The stadard error of a estimator is just its stadard deviatio, or the square root of its variace: se( ˆβ ) = σ (34) s 2 X 5
6 I will leave workig out the variace of ˆβ 0 as a exercise. Ucoditioal-o-X Properties The last few paragraphs, as I said, have looked at the expectatio ad variace of ˆβ coditioal o x,... x, either because the x s really are o-radom (e.g., cotrolled by us), or because we re just iterested i coditioal iferece. [ ] If we do care [ ] about ucoditioal [ properties, the we still eed to fid E ˆβ ad Var ˆβ, ot just E ˆβ x,... x ] [ ] ad Var ˆβ x,... x. Fortuately, this is easy, so log as the simple liear regressio model holds. To get the ucoditioal expectatio, we use the law of total expectatio : [ ] [ [ ]] E ˆβ = E E ˆβ X,... X (35) = E [β ] = β (36) That is, the estimator is ucoditioally ubiased. To get the ucoditioal variace, we use the law of total variace : [ ] [ [ ]] [ [ ]] Var ˆβ = E Var ˆβ X,... X + Var E ˆβ X,... X (37) [ ] σ 2 = E + Var [β ] (38) [ = σ2 E.4 Parameter Iterpretatio; Causality Two of the parameters are easy to iterpret. ] (39) σ 2 β 0 is the variace of the oise aroud the regressio lie; σ is a typical distace of a poit from the lie. ( Typical here i a special sese, it s the rootmea-squared distace, rather tha, say, the average absolute distace.) is the simply the expected value of Y whe X is 0, E [Y X = 0]. The poit X = 0 usually has o special sigificace, but this settig does esure that the lie goes through the poit (E [X], E [Y ]). The iterpretatio of the slope is both very straightforward ad very tricky. Mathematically, it s easy to covice yourself that, for ay x or, for ay x, x 2, β = E [Y X = x] E [Y X = x ] (40) β = E [Y X = x 2] E [Y X = x ] x 2 x (4) This is just sayig that the slope of a lie is rise/ru. 6
7 The tricky part is that we have a very strog, atural tedecy to iterpret this as tellig us somethig about causatio If we chage X by, the o average Y will chage by β. This iterpretatio is usually completely usupported by the aalysis. If I use a old-fashioed mercury thermometer, the height of mercury i the tube usually has a ice liear relatioship with the temperature of the room the thermometer is i. This liear relatioship goes both ways, so we could regress temperature (Y ) o mercury height (X). But if I maipulate the height of the mercury (say, by chagig the ambiet pressure, or shiig a laser ito the tube, etc.), chagig the height X will ot, i fact, chage the temperature outside. The right way to iterpret β is ot as the result of a chage, but as a expected differece. The correct catch-phrase would be somethig like If we select two sets of cases from the u-maipulated distributio where X differs by, we expect Y to differ by β. This covers the thermometer example, ad every other I ca thik of. It is, I admit, much more ielegat tha If X chages by, Y chages by β o average, but it has the advatage of beig true, which the other does ot. There are circumstaces where regressio ca be a useful part of causal iferece, but we will eed a lot more tools to grasp them; that will come towards the ed of The Gaussia-Noise Simple Liear Regressio Model We have, so far, assumed comparatively little about the oise term ɛ. The advatage of this is that our coclusios apply to lots of differet situatios; the drawback is that there s really ot all that much more to say about our estimator β or our predictios tha we ve already goe over. If we made more detailed assumptios about ɛ, we could make more precise ifereces. There are lots of forms of distributios for ɛ which we might cotemplate, ad which are compatible with the assumptios of the simple liear regressio model (Figure ). The oe which has become the most commo over the last two ceturies is to assume ɛ follows a Gaussia distributio. The result is the Gaussia-oise simple liear regressio model :. The distributio of X is arbitrary (ad perhaps X is eve o-radom). 2. If X = x, the Y = β 0 + β x + ɛ, for some costats ( coefficiets, parameters ) β 0 ad β, ad some radom oise variable ɛ. 3. ɛ N(0, σ 2 ), idepedet of X. Our textbook, rather old-fashioedly, calls this the ormal error model rather tha Gaussia oise. I dislike this: ormal is a over-loaded word i math, while Gaussia is (comparatively) specific; error made sese i Gauss s origial cotext of modelig, specifically, errors of observatio, but is misleadig geerally; ad callig Gaussia distributios ormal suggests they are much more commo tha they really are. 7
8 ε ε ε ε ε ε par(mfrow=c(2,3)) curve(dorm(x), from=-3, to=3, xlab=expressio(epsilo), ylab="", ylim=c(0,)) curve(exp(-abs(x))/2, from=-3, to=3, xlab=expressio(epsilo), ylab="", ylim=c(0,)) curve(sqrt(pmax(0,-x^2))/(pi/2), from=-3, to=3, xlab=expressio(epsilo), ylab="", ylim=c(0,)) curve(dt(x,3), from=-3, to=3, xlab=expressio(epsilo), ylab="", ylim=c(0,)) curve(dgamma(x+.5, shape=.5, scale=), from=-3, to=3, xlab=expressio(epsilo), ylab="", ylim=c(0,)) curve(0.5*dgamma(x+.5, shape=.5, scale=) + 0.5*dgamma(0.5-x, shape=0.5, scale=), from=-3, to=3, xlab=expressio(epsilo), ylab="", ylim=c(0,)) par(mfrow=c(,)) Figure : Some possible oise distributios for the simple liear regressio model, sice all have E [ɛ] = 0, ad could get ay variace by scalig. (The model is eve compatible with each observatio takig ɛ from a differet distributio.) From top left to bottom right: Gaussia; double-expoetial ( Laplacia ); circular distributio; t with 3 degrees of freedom; a gamma distributio (shape.5, scale ) shifted to have 8 mea 0; mixture of two gammas with shape.5 ad shape 0.5, each off-set to have expectatio 0. The first three were all used as error models i the 8th ad 9th ceturies. (See the source file for how to get the code below the figure.)
9 4. ɛ is idepedet across observatios. You will otice that these assumptios are strictly stroger tha those of the simple liear regressio model. More exactly, the first two assumptios are the same, while the third ad fourth assumptios of the Gaussia-oise model imply the correspodig assumptios of the other model. This meas that everythig we have doe so far directly applies to the Gaussia-oise model. O the other had, the stroger assumptios let us say more. They tell us, exactly, the probability distributio for Y give X, ad so will let us get exact distributios for predictios ad for other iferetial statistics. Why the Gaussia oise model? Why should we thik that the oise aroud the regressio lie would follow a Gaussia distributio, idepedet of X? There are two big reasos.. Cetral limit theorem The oise might be due to addig up the effects of lots of little radom causes, all early idepedet of each other ad of X, where each of the effects are of roughly similar magitude. The the cetral limit theorem will take over, ad the distributio of the sum of effects will ideed be pretty Gaussia. For Gauss s origial cotext, X was (simplifyig) Where is such-ad-such-a-plaet i space?, Y was Where does a astroomer record the plaet as appearig i the sky?, ad oise came from defects i the telescope, eye-twitches, atmospheric distortios, etc., etc., so this was pretty reasoable. It is clearly ot a uiversal truth of ature, however, or eve somethig we should expect to hold true as a geeral rule, as the ame ormal suggests. 2. Mathematical coveiece Assumig Gaussia oise lets us work out a very complete theory of iferece ad predictio for the model, with lots of closed-form aswers to questios like What is the optimal estimate of the variace? or What is the probability that we d see a fit this good from a lie with a o-zero itercept if the true lie goes through the origi?, etc., etc. Aswerig such questios without the Gaussia-oise assumptio eeds somewhat more advaced techiques, ad much more advaced computig; we ll get to it towards the ed of the class. 2. Visualizig the Gaussia Noise Model The Gaussia oise model gives us ot just a expected value for Y at each x, but a whole coditioal distributio for Y at each x. To visualize it, the, it s ot eough to just sketch a curve; we eed a three-dimesioal surface, showig, for each combiatio of x ad y, the probability desity of Y aroud that y give that x. Figure 2 illustrates. 2.2 Maximum Likelihood vs. Least Squares As you remember from your mathematical statistics class, the likelihood of a parameter value o a data set is the probability desity at the data uder 9
10 4 2 y x p(y x) x.seq <- seq(from=-, to=3, legth.out=50) y.seq <- seq(from=-5, to=5, legth.out=50) cod.pdf <- fuctio(x,y) { dorm(y, mea=0-5*x, sd=0.5) } z <- outer(x.seq, y.seq, cod.pdf) persp(x.seq,y.seq,z, ticktype="detailed", phi=75, xlab="x", ylab="y", zlab=expressio(p(y x)), cex.axis=0.8) Figure 2: Illustratig how the coditioal pdf of Y varies as a fuctio of X, for a hypothetical Gaussia oise simple liear regressio where β 0 = 0, β = 5, ad σ 2 = (0.5) 2. The perspective is adjusted so that we are lookig early straight dow from above o the surface. (Ca you fid a better viewig agle?) See help(persp) for the 3D plottig (especially the examples), ad help(outer) for the outer fuctio, which takes all combiatios of elemets from two vectors ad pushes them through a fuctio. How would you modify this so that the regressio lie wet through the origi with a slope of 4/3 ad a stadard deviatio of 5? 0
11 those parameters. We could ot work with the likelihood with the simple liear regressio model, because it did t specify eough about the distributio to let us calculate a desity. With the Gaussia-oise model, however, we ca write dow a likelihood 2 By the model s assumptios, if thik the parameters are the parameters are b 0, b, s 2 (reservig the Greek letters for their true values), the Y X = x N(b 0 + b x, s 2 ), ad Y i ad Y j are idepedet give X i ad X j, so the over-all likelihood is (y 2πs 2 e i (b 0 +b x i )) 2 2s 2 (42) As usual, we work with the log-likelihood, which gives us the same iformatio 3 but replaces products with sums: L(b 0, b, s 2 ) = 2 log 2π log s 2s 2 (y i (b 0 + b x i )) 2 (43) We recall from mathematical statistics that whe we ve got a likelihood fuctio, we geerally wat to maximize it. That is, we wat to fid the parameter values which make the data we observed as likely, as probable, as the model will allow. (This may ot be very likely; that s aother issue.) We recall from calculus that oe way to maximize is to take derivatives ad set them to zero. L b 0 = 2s 2 L b = 2s 2 2(y i (b 0 + b x i ))( ) (44) 2(y i (b 0 + b x i ))( x i ) (45) Notice that whe we set these derivatives to zero, all the multiplicative costats i particular, the prefactor of 2s go away. We are left with 2 y i ( β 0 + β x i ) = 0 (46) (y i ( β 0 + β x i ))x i = 0 (47) These are, up to a factor of /, exactly the equatios we got from the method of least squares (Eq. 9). That meas that the least squares solutio is the maximum likelihood estimate uder the Gaussia oise model; this is o coicidece 4. 2 Strictly speakig, this is a coditioal (o X) likelihood; but oly pedats use the adjective i this cotext. 3 Why is this? 4 It s o coicidece because, to put it somewhat aachroistically, what Gauss did was ask himself for what distributio of the oise would least squares maximize the likelihood?.
12 Now let s take the derivative with respect to s: L s = s + 2 2s 3 (y i (b 0 + b x i )) 2 (48) Settig this to 0 at the optimum, icludig multiplyig through by σ 3, we get σ 2 = (y i ( β 0 + β x i )) 2 (49) Notice that the right-had side is just the i-sample mea squared error. Other models Maximum likelihood estimates of the regressio curve coicide with least-squares estimates whe the oise aroud the curve is additive, Gaussia, of costat variace, ad both idepedet of X ad of other oise terms. These were all assumptios we used i settig up a log-likelihood which was, up to costats, proportioal to the (egative) mea-squared error. If ay of those assumptios fail, maximum likelihood ad least squares estimates ca diverge, though sometimes the MLE solves a geeralized least squares problem (as we ll see later i this course). Exercises To thik through, ot to had it.. Show that if E [ɛ X = x] = 0 for all x, the Cov [X, ɛ] = 0. Would this still be true if E [ɛ X = x] = a for some other costat a? 2. Fid the variace of ˆβ0. Hit: Do you eed to worry about covariace betwee Y ad ˆβ? 2
Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationSTATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:
Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal
More informationUnderstanding Samples
1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationII. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation
II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationLinear Regression Models
Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect
More informationECON 3150/4150, Spring term Lecture 3
Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationChapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationChapter 18 Summary Sampling Distribution Models
Uit 5 Itroductio to Iferece Chapter 18 Summary Samplig Distributio Models What have we leared? Sample proportios ad meas will vary from sample to sample that s samplig error (samplig variability). Samplig
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More information3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.
3/3/04 CDS M Phil Old Least Squares (OLS) Vijayamohaa Pillai N CDS M Phil Vijayamoha CDS M Phil Vijayamoha Types of Relatioships Oly oe idepedet variable, Relatioship betwee ad is Liear relatioships Curviliear
More informationSection 14. Simple linear regression.
Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo
More informationACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics
ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the
More informationChapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationBig Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.
5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationMA Advanced Econometrics: Properties of Least Squares Estimators
MA Advaced Ecoometrics: Properties of Least Squares Estimators Karl Whela School of Ecoomics, UCD February 5, 20 Karl Whela UCD Least Squares Estimators February 5, 20 / 5 Part I Least Squares: Some Fiite-Sample
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More information(X i X)(Y i Y ) = 1 n
L I N E A R R E G R E S S I O N 10 I Chapter 6 we discussed the cocepts of covariace ad correlatio two ways of measurig the extet to which two radom variables, X ad Y were related to each other. I may
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notatio Math 113 - Itroductio to Applied Statistics Name : Use Word or WordPerfect to recreate the followig documets. Each article is worth 10 poits ad ca be prited ad give to the istructor
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationChapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p
Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE Part 3: Summary of CI for µ Cofidece Iterval for a Populatio Proportio p Sectio 8-4 Summary for creatig a 100(1-α)% CI for µ: Whe σ 2 is kow ad paret
More informationSTA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:
STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationStat 139 Homework 7 Solutions, Fall 2015
Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,
More informationMEASURES OF DISPERSION (VARIABILITY)
POLI 300 Hadout #7 N. R. Miller MEASURES OF DISPERSION (VARIABILITY) While measures of cetral tedecy idicate what value of a variable is (i oe sese or other, e.g., mode, media, mea), average or cetral
More informationCorrelation Regression
Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationRegression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.
Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationStatistical Properties of OLS estimators
1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of
More informationCHAPTER I: Vector Spaces
CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationWHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT
WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still
More information11 Correlation and Regression
11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationAAEC/ECON 5126 FINAL EXAM: SOLUTIONS
AAEC/ECON 5126 FINAL EXAM: SOLUTIONS SPRING 2015 / INSTRUCTOR: KLAUS MOELTNER This exam is ope-book, ope-otes, but please work strictly o your ow. Please make sure your ame is o every sheet you re hadig
More informationECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY
ECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY (1) A distributio that allows asymmetry differet probabilities for egative ad positive outliers is the asymmetric double expoetial,
More informationLecture 24 Floods and flood frequency
Lecture 4 Floods ad flood frequecy Oe of the thigs we wat to kow most about rivers is what s the probability that a flood of size will happe this year? I 100 years? There are two ways to do this empirically,
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationA quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population
A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate
More informationIf, for instance, we were required to test whether the population mean μ could be equal to a certain value μ
STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially
More informationResponse Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable
Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationThe Method of Least Squares. To understand least squares fitting of data.
The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationDS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationTMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.
Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx
More informationChapter 23: Inferences About Means
Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For
More informationLast time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object
6.3 Stochastic Estimatio ad Cotrol, Fall 004 Lecture 7 Last time: Momets of the Poisso distributio from its geeratig fuctio. Gs () e dg µ e ds dg µ ( s) µ ( s) µ ( s) µ e ds dg X µ ds X s dg dg + ds ds
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationIn this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.
17 3. OLS Part III I this sectio we derive some fiite-sample properties of the OLS estimator. 3.1 The Samplig Distributio of the OLS Estimator y = Xβ + ε ; ε ~ N[0, σ 2 I ] b = (X X) 1 X y = f(y) ε is
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More informationMath 113 Exam 3 Practice
Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you
More informationTable 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab
Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationRecall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.
Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationRead through these prior to coming to the test and follow them when you take your test.
Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationCURRICULUM INSPIRATIONS: INNOVATIVE CURRICULUM ONLINE EXPERIENCES: TANTON TIDBITS:
CURRICULUM INSPIRATIONS: wwwmaaorg/ci MATH FOR AMERICA_DC: wwwmathforamericaorg/dc INNOVATIVE CURRICULUM ONLINE EXPERIENCES: wwwgdaymathcom TANTON TIDBITS: wwwjamestatocom TANTON S TAKE ON MEAN ad VARIATION
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationEconomics Spring 2015
1 Ecoomics 400 -- Sprig 015 /17/015 pp. 30-38; Ch. 7.1.4-7. New Stata Assigmet ad ew MyStatlab assigmet, both due Feb 4th Midterm Exam Thursday Feb 6th, Chapters 1-7 of Groeber text ad all relevat lectures
More informationDefinitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.
Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,
More informationDiscrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions
CS 70 Discrete Mathematics for CS Sprig 2005 Clacy/Wager Notes 21 Some Importat Distributios Questio: A biased coi with Heads probability p is tossed repeatedly util the first Head appears. What is the
More informationProblems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:
Math 224 Fall 2017 Homework 4 Drew Armstrog Problems from 9th editio of Probability ad Statistical Iferece by Hogg, Tais ad Zimmerma: Sectio 2.3, Exercises 16(a,d),18. Sectio 2.4, Exercises 13, 14. Sectio
More informationRiemann Sums y = f (x)
Riema Sums Recall that we have previously discussed the area problem I its simplest form we ca state it this way: The Area Problem Let f be a cotiuous, o-egative fuctio o the closed iterval [a, b] Fid
More informationExample: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.
1 (*) If a lot of the data is far from the mea, the may of the (x j x) 2 terms will be quite large, so the mea of these terms will be large ad the SD of the data will be large. (*) I particular, outliers
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More information