Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Size: px
Start display at page:

Download "Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses"

Transcription

1 ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout the class? Relate this definition to the case where Y = driving speed of a car on Interstate 20 (the speed limit is 70MPH) and X=age of the car. (If the word distribution does not appear in your answer, then you will lose most of the possible points.) Solution: The regression model is a model for the conditional distribution of a dependent variable Y given the value (or values in the multiple regression case) of predictor variables X. In the case study, we would postulate a model for how the distributions of speeds observed on the freeway depends on the age of a car. Driving speeds differ depending on the driving style of the owner, so there is a distribution of possible driving speeds for any age-specific cohort of cars. For example, there is a distribution of speeds for X=5, and another distribution of speeds for X=10, etc. The regression model postulates how these distributions look; for example, one might postulate that they are all normal distributions with common variance, whose mean values lie precisely on a line E(Speed) = β 0 + β 1 X for some values β 0, β 1. But one might also postulate that these distributions are unspecified, of the generic form p(y X=x), with mean function also unspecified, of generic form f(x). Both are examples of regression models; the former is more specific but less believable, the latter less specific but more believable. 2. (10) Suppose you will collect actual data (x 1,y 1 ), (x 2,y 2 ),, (x n,y n ), where you have assumed a model Y X=x ~ N(β 0 + β 1 x, σ 2 ). If the model is true, how will the scatterplot of the actual data appear? Explain in words, and then draw a prototypical scatterplot for this case. Solution: The scatter of the Y data will appear to increase (or decrease) steadily, with no indication of curvature. The range of Y data for any given X=x (or in a small neighborhood of x such as x-δ < X < x+δ) will appear roughly constant for all x. Finally, the distribution of Y data for any given X=x (or in a small neighborhood of x such as x-δ < X < x+δ) will appear roughly symmetrically distributed on either side of the center, with no outliers. Further, there will be no obvious signs of discreteness in the Y data. For example:

2 600 Scatterplot of Data Produced By a Model Where All Assumptions Are Satisfied 500 y x 3. (10) Use the subpopulation (or cohort ) framework to interpret the parameter β 2 in the quantile regression model Income (0.9) = β 0 + β 1 (Education) + β 2 (Year). Here, Income (0.9) is the 0.9 quantile of the income distribution, Education is level of education of a person (say coded as 0,1,2,3,4,5,6,7,8,9, or 10), and Year is year (say 1960, 1961,, or 2011). Be specific. Solution: Consider two cohorts: Cohort 1: People with Education level = 5 in Cohort 2: People with Education level = 5 in Imagine that there are many people in Cohort 1. Then the 0.90 quantile (or 90 th percentile) of the incomes in this group should be approximately equal to the true quantile of the conceptual distribution, which is assumed to be β 0 + β 1 (5) + β 2 (1990), according to the model. Imagine also that there are many people in Cohort 2. Then the 0.90 quantile (or 90 th percentile) of the incomes in this group should be approximately equal to the true quantile of the conceptual distribution, which is assumed to be β 0 + β 1 (5) + β 2 (1991), according to the model. Thus, we can interpret β 2 as being approximately equal to the the 90 th percentile of income in cohort 2, minus the 90 th percentile of income in cohort 1.

3 4. (10) An estimated Probit model is P(Success) = Φ( X), where Φ(t) is the cumulative standard normal distribution function. (Recall that the standard normal distribution is the one that has mean zero and variance 1, so, for example, Φ( 1.96) = ) Draw a graph of the estimated probability of success as a function of X. Label the axes and put numbers on the axes. Solution: Plugging some numbers and using the rule: X X P(Success) 0-2 Φ(-2) Φ(-1) Φ(0) Φ(1) Φ(2) Here is the graph: 1 Estimated Probit Model 0.8 P(Success) X 5. (10) The model Y=Xβ+ε is assumed. Suppose there are only n=4 observable data points, and they are (Y 1, X 1 =4), (Y 2, X 2 =6), (Y 3, X 3 =1), and (Y 4, X 4 =5). Suppose also that the {Y i } are independent random variables with Var(Y i X i =x) = x 2 σ 2. Write down the entire covariance matrix of Y.

4 Solution: Since the Y data are independent, their covariances are 0, so we have a diagonal matrix. And according to the model, Var(Y 1 X 1 =4) = 4 2 σ 2, Var(Y 2 X 2 =6) = 6 2 σ 2, Var(Y 3 X 3 =1) = 1 2 σ 2, and Var(Y 4 X 4 =5) = 5 2 σ 2. Hence we have Y σ Y σ 0 0 = σ 0 Cov. Y Y σ (10) When do you use generalized least squares (GLS)? First, answer the question in general terms. Second, give an example of a real study that you might do where GLS, rather than ordinary least squares (OLS), is appropriate. Solution: When the covariance matrix of Y has non-zero off-diagonal elements, ie, when the error terms are correlated, then the usual OLS estimates are inefficient and their usual standard errors are incorrect. We use GLS to get more efficient estimates with correct standard errors. For example, I might perform a repeated measures study whereon of the predictor variables is drug (active or placebo) and other predictor variables are age, sex, initial health, etc. There are multiple observations per subject because the subject s health is evaluated at repeat visits to the clinic. These repeated measures must be assumed to be correlated because they share subjectspecific commonality. If instead, we assume they are independent, and perform the usual OLS analysis, then the estimated parameters will tend to be farther from the true process parameters than when we incorporate covariance information and use GLS. Further, the OLS standard errors will be simply wrong, perhaps leading us to conclude significance incorrectly, and perhaps leading us to an insignificant conclusion when significance is warranted, whereas the GLS standard errors will incorporate the covariance information appropriately. 7. Define the following terms briefly using one sentence. If you use formulas, make sure that they are incorporated into the sentence using proper English phrasing. (4 points each) 7.A. Latent variable : This is variable that you cannot observe directly, like satisfaction with boss. 7.B. Pseudo R-squared: This is a likelihood-based measure of goodness of a model, one that reduces to the usual form in the classical regression case. 7.C. LOESS estimate: This is an estimate of the mean of the distribution of Y as a function of

5 X, one that makes no assumption of linearity or other specific function form. 7.D. Nominal variable: This is a variable whose values are unordered categories, such as choice of ART, MATH, or SCIENCE. 7.E. Moderator variable: This is a variable that affects another variable s effect on a response. 7.F. Censored data: Data whose values are known only to lie above (or below) some known threshold are called censored data. 7.G. Dummy variable: A variable whose values are coded as 0 or 1, depending upon the value of some other variable is called a dummy variable. 7.H. Likelihood ratio test: This test is used to compare full and reduced models that are estimated using maximum likelihood, and the result is a chi-square statistic. 8. (8) A benefit of using a random effects model instead of a fixed effects model is that random effects models provide shrinkage estimators (BLUPs). Explain why this is a benefit in the context of an example, either one of your own choosing, or one discussed in class (such as the faculty rating example). Be sure to explain why there is a benefit, i.e., why the alternative fixed effects model is worse. Solution: In the class rating example, the BLUP estimates of major effects we re shrunk toward the overall mean when the sample size within the major was smaller. This gave us a better ranking of the majors than the fixed-effects model, which estimates the mean of major using the simple mean without shrinkage. A problem was that a mean of 5.0 based on one observation should not be rated more highly than a mean of 4.7 based on 30 observations, as would be the case using fixed-effects estimates. 9. (10) We discussed three types of outliers. One of these three types was the worst type of outlier, in the sense that such an outlier has the most potential influence on ordinary least squares estimates. Show how this worst type of outlier appears in a scatterplot.

6 Solution: The worst type was the outlier in both X- and Y-space. Here is a picture: Example 3: Outlier in both Y X-space and X-space 15 Y y = x R² = X 10. (10) Why must we use graphs in addition to tests when assessing the validity of assumptions? Be specific to the case of testing whether the regression function is a line rather than a curve. How do you test it? What graph do you draw? Why are both the test and the graph needed? Solution: We can test for linearity by adding a quadratic term and seeing whether it is significant if significant, we reject linearity in favor of curvature. The problem, though, is that with large sample sizes, even small deviations from linearity result in statistically significant results. The graph allows you to assess whether the degree of curvature is worth worrying about. One graph to draw would be the fitted linear and fitted quadratic functions, on the same axes, like this:

7 The graph shows a slight difference, which might be statistically significant, but which is also practically unimportant. Multiple choice. Each question is worth 2 points. 11. Suppose the number of injuries on construction site in a day are usually 0, sometimes 1, less often 2, etc., conceptually without any upper limit. Pick the most appropriate regression model: 11.A. Normal regression (no discrete data) 11.B. Ordinal logistic regression (no no upper bound) 11.C. Tobit regression (no observations not zero are not continuous) 11.D. Poisson regression (best choice) 12. In the case of predicting GPA (on the 0-4 scale) as a function of GMAT score, give the most plausible value for ROOT MSE: A. 0.7 (mean±2(.7) is the most plausible 95% range for GPA) B. 2.0 C D. 14,000

8 13. Maximum likelihood estimates and least squares estimates are identical when A. the distribution of Y is assumed to be normal. (yes, math) B. the variance of Y is a linear function of X. (WLS?) C. the Gauss-Markov assumptions hold true. (non normality allowed) D. the model is correct. (if non-normal?) 14. In simple least squares regression, what is the relationship between the correlation coefficient r, and the R 2 statistic? A. r = R 2 B. r 2 = R 2 (righto) C. 1 r = R 2 D. 1 r 2 = R In ordinary least squares regression, what is the relationship between the sum of squared errors (SSE), the corrected total sum of squares (SST), and the R 2 statistic? A. SSE/SST = R 2 B. SST SSE = R 2 C. SSE SST = R 2 D. 1 SSE/SST = R 2 (righto) 16. All else fixed, what is the effect of increasing the variance of X 1? A. The mean squared error (MSE) will be larger. B. The mean squared error (MSE) will be smaller. C. The standard error of ˆβ 1 will be larger. D. The standard error of ˆβ 1 will be smaller. (righto) 17. Why is ˆβ 1 a random variable? Because A. the process could have produced other data. ( ) B. the linearity assumption is never true. C. the true value of β 1 is unknown. D. the data set has more than 2 observations. 18. The true variance of the error terms, Var(ε), is also denoted by A. MSE (this is an estimate) B. ROOT MSE C. Var(Y X=x) (this is true) D. Cov( ˆβ ) 19. The linearity assumption means, by definition, A. When X is larger, then Y is larger. (it s not about data, it s about process) B. When X is larger, then E(Y) is larger. (allows curvature) C. The points (X, Y) fall exactly on a straight line. (it s not about data, it s about process)

9 D. The points (X, E(Y)) fall exactly on a straight line. (bingo) 20. Which one of the following models obeys the variable inclusion principle? A. Y = β 1 X + β 2 X 2 + ε B. Y = β 0 + β 1 X + ε (the one and only) C. Y = β 0 + β 1 X 1 + β 2 X 1 X 2 + ε D. Y = β 0 + β 1 X + β 2 Z 2 + ε 21. Which hypothesis test is used for testing normality? A. Breusch-Pagan B. Shapiro-Wilk (yes) C. q-q plot (this is a graph, not a hypothesis test) D. histogram ( ) 22. Which of the following terms require the randomness assumption for their definition? A. β 1 B. Var(Y X=x) C. The p-value for testing H 0 : β 1 = 0 D. All of the above (yes, see westfall/images/5349/practiceproblems_discussion.doc ) 23. The Gauss-Markov theorem refers to estimators that are linear functions of the data. Give an example of a non-linear estimator. A. The sample average of the Y data. (linear; see westfall/images/5349/sp2011_midterm1solution.pdf ) 1 B. ˆ β = ( X' X) XY '. (G-M applies to OLS so the must be linear) C. The model Y = β 0 + β 1 X + β 2 X 2 + ε. (This is a model, not an estimator) D. The estimator of β 1 when using quantile regression. (Yes, that s why the bootstrap is needed to find their s.e. s). 24. In the model Y = β 0 + β 1 (1/X) + ε, the intercept β 0 is equal to A. E(Y X=0) B. E(Y X=1) C. E(Y X= ) (right) D. E(Y X=2) E(Y X=1) 25. Suppose Y depends on X 1 and X 2. Then the model Y = β 0 + β 1 x 1 + ε A. is simply wrong. (no, we discussed this in class) B. is simply a model for the conditional distribution of Y given X 1 =x 1. (right, it answers different questions)

10 C. is less biased than the model Y = β 0 + β 1 x 1 + β 2 x 2 + ε. (If anything, it would be more biased, but only with respect to the model for the conditional distribution given both X s). D. violates the uncorrelated errors assumption. (irrelevant) 26. When are the absolute values of the residuals most useful? A. When checking the linearity assumption. B. When checking the homoscedasticity assumption. (this one) C. When checking the uncorrelated errors assumption. D. When checking the normality assumption. 27. Causal arguments are strengthened by (pick the best answer) A. finding a higher R 2 statistic. (correlation does not imply causation) B. including more X variables in the model and still finding significance of the effect of interest. (right, more control of confounders) C. showing that the model assumptions are satisfied. (good, but not related. There is no assumption of causality in the model.) D. using a larger sample size. (always a good idea, but it won t help directly. Correlation is not causation.) How is σ i = Var(Y i X i = x i ) estimated when using heteroscedasticity-consistent standard errors? 2 2 A. ˆi σ = e i (yes) 2 2 B. ˆi σ = ε i (unavailable) 2 2 C. ˆ σ i = xi ( MSE) (HC makes no assumption about form) 2 D. ˆ σ = exp( ˆ γ + ˆ γ x ) (HC makes no assumption about form) i 0 1 i 29. In mathematics, linearly independent columns of the X matrix means that A. the correlation matrix of the X data is an identity matrix. B. the columns of the X matrix are independent variables. C. the columns of the X matrix have correlations less than 0.9 (in absolute value). D. no column of X is a linear function of the other columns. (yes) 30. The hat matrix is H X X X X 1 = ( ' ) '. Select the true statement. A. ˆ β = HY. B. The trace of H is equal to n p. C. H is idempotent. (right) D. The standard errors of the ˆ β j are the diagonal elements of H. 31. There is a full model and a restricted model. Everything with a F subscript is from the full model, and everything with an R subscript is from the restricted model. Also, SSE refers to sum of squares for error and SST stands for (corrected) total sum of squares. Then A. SSE F SSE R

11 B. SSE F SSE R (yes, adding variables reduces SSE) C. SST F SST R (SST s are =) D. SST F SST R ( ) 32. When is the Model F statistic equal to { ˆ / se..( ˆ )} 2 β β? 1 1 A. When there is one X variable in the model. (Yes, F = t 2 ) B. When there is more than one X variable in the model. C. When the null hypothesis is rejected. D. When the t statistic is normally distributed. 33. Which value of c minimizes n yi c? i= 1 A. E(Y) (This question refers to data, not process) B. The median of the distribution of Y (This question refers to data, not process) C. 1 n yi n i = 1 D. The median of {y 1, y 2,,y n } (yes) 34. What is the bootstrap used for? A. To minimize the sum of absolute deviations. B. To extrapolate to X data outside the range of the observed data. C. To estimate standard errors. (among the choices, this is best. It s used for other things as well.) D. To compute posterior distributions of the regression parameters. 35. When using weighted least squares, you must assume that A. the errors are normally distributed. (no, WLS are BLUE even when normality violated) B. Var(Y i X i = x i ) is known for each i=1,,n. (not absolutely necessary; see C.) C. Var(Y i X i = x i ) = c i σ 2, where c i is known for each i=1,,n. (yes) D. Var(Y i X i = x i ) = exp(γ 0 + γ 1 x i ), where γ 0 and γ 1 are known. (you could WLS here, but this is wrong for the same reason that B. is wrong) 36. When the weights in weighted least squares (WLS) are all 1.0, then the WLS estimates are A. quantile regression estimates with q=0.5. B. quantile regression estimates with q=1.0. C. generalized least squares estimates with a block-diagonal homoscedastic covariance structure. D. ordinary least squares estimates. (yes) 37. The compound symmetry covariance structure assumes

12 A. observations that are farther apart in time have smaller covariance. B. observations that are farther apart in time have smaller variance. C. observations on different people have different variances. D. observations within a person are equally correlated. (yes) 38. The AR(1) covariance structure assumes A. observations that are farther apart in time have smaller covariance. (yes) B. observations that are farther apart in time have smaller variance. C. observations on different people have different variances. D. observations within a person are equally correlated. 39. In PROC MIXED of SAS/STAT, the covariance matrix of Y is estimated as ZGZ + R. Select the true statement. A. The R matrix is a correlation matrix. (no, it s a covariance matrix) B. The RANDOM statement defines the R matrix. C. The G matrix is a correlation matrix. (see A) D. The RANDOM statement defines the Z matrix. (yes) 40. In panel data where companies are followed over time, there is cross-sectional correlation. This means that A. observations in the same year are correlated. (yes) B. observations on the same company are correlated. (that s time series correlation) C. observations that are two years apart are more highly correlated than observations that are ten years apart. (see B) D. there is a high degree of multicollinearity. (irrelevant) 41. The multivariate normal distribution function is used in mixed models mainly to A. define the levels of the multilevel analysis. (?) B. estimate the parameters of the model via maximum likelihood. (yes) C. compute the R 2 statistic. (could be via pseudo, but we never did it that way and it s not in SAS) D. assure that the predictions of the of the random effects are best linear unbiased predictions (BLUPs). (normality needed as in the case of BLUE) 42. Select the incorrect answer. Generalized least squares estimates A. minimize the sum of squared errors (or SSE). (no, that s OLS) B. are best linear unbiased estimates when Φ is known. (right) C. are maximum likelihood estimates when the distribution of Y is multivariate normal with mean Xβ and known covariance matrix Φ. (yes) 1 1 D. are given by the formula ( X ' Φ X) X ' Φ Y. (yes)

13 43. When the errors are correlated, but all other assumptions are satisfied, then the ordinary least squares estimators for the β j are A. still unbiased. (yes) B. non-normally distributed. (doesn t follow logically) C. uncorrelated. ( ) D. efficient. (no) 44. If your estimate has the form ˆ θ = A B (A minus B), where A and B are positively correlated, then if you use a model that assumes independence of A and B, the reported standard error of ˆ θ will be A. too large. (yes, Var(A B) = Var(A) + Var(B) 2Cov(A,B). If you assume uncorrelated you get Var(A B) = Var(A) + Var(B), which is too large). B. too small. C. sometimes too large, sometimes too small. D. unaffected. 45. There are several reasons for not using classic linear regression analysis with binary responses. Which of following is not one of those reasons? A. Because the probabilities do not lie on a straight line. B. Because the variance is nonconstant. C. Because the distribution is non-normal. D. Because there are outliers in the Y variable. (0/1 data do not usually have outliers). 46. What is the likelihood of a single binary (0 or 1) observation y? A. exp ( y θ1) 2πθ 2θ 2 2 y 1 y B. θ (1 θ) (this one you get θ when y=1, and 1 θ when y=0) C. θe θ y D. 1/θ 47. When is the variance of binary (0 or 1) data largest? A. When the proportion of 1 s is B. When the proportion of 1 s is 0.5. (Yes, Var(Y) = π(1 π)) C. When the binary data are normally distributed. D. When the sample size is small.

14 48. If the estimated probability of success when X=4 is 0.3, then which is the most logical prediction? A. That 30 out of 100 people having X=4 will be successful. (this one) B. That 50 out of 100 people having X=4 will be successful. C. That 30 out of 100 people will be successful, regardless of their X. D. That 50 out of 100 people will be successful, regardless of their X. 49. If the probability of success is 0.75, then the odds of success is A. also B C. exp(0.75). D (.75/(1-.75) = 3) 50. The logarithm of the odds is also called the A. normit. B. probit. C. tobit. D. logit. (yep)

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

Closed book, notes and no electronic devices. 10 points per correct answer, 20 points for signing your name.

Closed book, notes and no electronic devices. 10 points per correct answer, 20 points for signing your name. Quiz 1. Name: 10 points per correct answer, 20 points for signing your name. 1. Pick the correct regression model. A. Y = b 0 + b 1X B. Y = b 0 + b 1X + e C. Y X = x ~ p(y x) D. X Y = y ~ p(y x) 2. How

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

ISQS 5349 Final Exam, Spring 2017.

ISQS 5349 Final Exam, Spring 2017. ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1. What is the difference between a deterministic model and a probabilistic model? (Two or three sentences only). 2. What is the

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Statistical View of Least Squares

Statistical View of Least Squares Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Multiple Regression. Peerapat Wongchaiwat, Ph.D. Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model

More information

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1.(10) What is usually true about a parameter of a model? A. It is a known number B. It is determined by the data C. It is an

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

POLI 618 Notes. Stuart Soroka, Department of Political Science, McGill University. March 2010

POLI 618 Notes. Stuart Soroka, Department of Political Science, McGill University. March 2010 POLI 618 Notes Stuart Soroka, Department of Political Science, McGill University March 2010 These pages were written originally as my own lecture notes, but are now designed to be distributed to students

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler Basic econometrics Tutorial 3 Dipl.Kfm. Introduction Some of you were asking about material to revise/prepare econometrics fundamentals. First of all, be aware that I will not be too technical, only as

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 2: Simple Regression Egypt Scholars Economic Society Happy Eid Eid present! enter classroom at http://b.socrative.com/login/student/ room name c28efb78 Outline

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

Introduction to Regression

Introduction to Regression Introduction to Regression ιατµηµατικό Πρόγραµµα Μεταπτυχιακών Σπουδών Τεχνο-Οικονοµικά Συστήµατα ηµήτρης Φουσκάκης Introduction Basic idea: Use data to identify relationships among variables and use these

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5) 10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression

More information

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar Multiple Regression and Model Building 11.220 Lecture 20 1 May 2006 R. Ryznar Building Models: Making Sure the Assumptions Hold 1. There is a linear relationship between the explanatory (independent) variable(s)

More information

Chapter 8. R-squared, Adjusted R-Squared, the F test, and Multicollinearity

Chapter 8. R-squared, Adjusted R-Squared, the F test, and Multicollinearity Chapter 8. R-squared, Adusted R-Squared, the F test, and Multicollinearity This chapter discusses additional output in the regression analysis, from the context of multiple regression in the classic model.

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Fall 07 ISQS 6348 Midterm Solutions

Fall 07 ISQS 6348 Midterm Solutions Fall 07 ISQS 648 Midterm Solutions Instructions: Open notes, no books. Points out of 00 in parentheses. 1. A random vector X = 4 X 1 X X has the following mean vector and covariance matrix: E(X) = 4 1

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Introduction to Econometrics. Heteroskedasticity

Introduction to Econometrics. Heteroskedasticity Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory

More information

Correlation and regression

Correlation and regression NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

1 Correlation between an independent variable and the error

1 Correlation between an independent variable and the error Chapter 7 outline, Econometrics Instrumental variables and model estimation 1 Correlation between an independent variable and the error Recall that one of the assumptions that we make when proving the

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals

More information

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran Lecture 2 Linear Regression: A Model for the Mean Sharyn O Halloran Closer Look at: Linear Regression Model Least squares procedure Inferential tools Confidence and Prediction Intervals Assumptions Robustness

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com 12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10) Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. ISQS 5347 Final Exam Spring 2017 Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. 1. Recall the commute

More information

Inference and Regression

Inference and Regression Name Inference and Regression Final Examination, 2015 Department of IOMS This course and this examination are governed by the Stern Honor Code. Instructions Please write your name at the top of this page.

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Survey on Population Mean

Survey on Population Mean MATH 203 Survey on Population Mean Dr. Neal, Spring 2009 The first part of this project is on the analysis of a population mean. You will obtain data on a specific measurement X by performing a random

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

STA 2101/442 Assignment Four 1

STA 2101/442 Assignment Four 1 STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information