STA 114: Statistics. Notes 21. Linear Regression

Size: px
Start display at page:

Download "STA 114: Statistics. Notes 21. Linear Regression"

Transcription

1 STA 114: Statistics Notes 1. Linear Regression Introduction A most widely used statistical analysis is regression, where one tries to explain a response variable Y by an explanatory variable X based on paired data (X 1, Y 1 ),, (X n, Y n ). The most common way to model the dependence of Y on X is to look for a linear relationship with additional noise, Y i = β 0 + β 1 X i + ϵ i with ϵ 1,, ϵ n taken to be independent and identically distributed random variables with mean 0 and variance σ. The unknown quantities are (β 0, β 1, σ ). Many pairs of natural measurements exhibit such linear relationships. The figure below shows atmospheric pressure (in inches of Mercury) against boiling point of water (in degrees F) based on 17 pairs of observations. Although water s boiling point and atmospheric pressure should have a precise physical relationship, there would always be some deviation in actual measurements due to factors that are hard to control. Pressure Temp Least squares line The straight line you see in the above figure is the line that best fits the data. This is found as follows. For any line y = b 0 +b 1 x, we can find the residuals e i = y i b 0 b 1 x i if we tried 1

2 to explain the observed values of Y by those of X using this line. The total deviation can be measured by the sum of squares of the residuals d(b 0, b 1 ) = e i = (y i b 0 b 1 x i ). We could find b 0 and b 1 that minimize d(b 0, b 1 ). This is easily done by using calculus. We set b 0 d(b 0, b 1 ) = 0, b 1 d(b 0, b 1 ) = 0 and solve for b 0, b 1. In particular, 0 = b 0 d(b 0, b 1 ) = (y i b 0 b 1 x i )( 1) = n(ȳ b 0 b 1 x) 0 = b 1 d(b 0, b 1 ) = (y i b 0 b 1 x i )( x i ) = ( x i y i nb 0 b 1 x i ). These are two linear equations in two unknowns b 0, b 1. The solutions are: ˆb1 = (y i ȳ)(x i x) (x i x) = y i(x i x) (x i x), ˆb0 = ȳ ˆb 1 x. Let s denote s xy = 1 (y n 1 i ȳ)(x i x) [this is the sample covariance between Y and X. Then using our old notation s x = 1 n n 1 (x i x) we can write the least squares solution as ˆb 0 = ȳ s xy x/s x, ˆb 1 = s xy /s x. The method of least squares was used by physicists working on astrological measurements in the early 18th century. A statistical framework was developed much later. The main import of the statistical development, as usual, has been to incorporate a notion of uncertainty. Statistical analysis of simple linear regression To put the linear regression relationship Y i = β 0 + β 1 X i + ϵ i into a statistical model, we need a distribution on the ϵ i s. The most common choice is a normal distribution Normal(0, σ ). This can be justified as follows: the additional factors that give rise to the noise term are many in number and act independently of each other, each making a little contribution. By the central limit theorem the aggregate of such numerous, independent, small contributions should behave like a normal variable. The mean is fixed at zero because any non-zero mean can be absorbed in the intercept β 0. So our statistical model is Y i = β 0 + β 1 X i + ϵ i, ϵ i Normal(0, σ ) with model parameters β 0 (, ), β 1 (, ), and σ > 0. We also need to assume that the error terms ϵ i s are independent of the explanatory variables X i (because the errors account for additional factors beyond the explanatory variable). Then, the log-likelihood function in the model parameters is given by, l x,y (β 0, β 1, σ ) = const n log σ (y i β 0 β 1 x i ). σ To find the MLE, first notice that for every σ, the log-likelihood function is maximized at (β 0, β 1 ) equal to the least squares solutions (ȳ s xy x/s x, s xy /s x), and so we must have ˆβ 1,MLE (x, y) = ȳ s xy x/s x, ˆβ,MLE (x, y) = s xy /s x. To shorten notations we ll just write ˆβ 0 and ˆβ 1 for ˆβ 0,MLE (x, y) and ˆβ 1,MLE (x, y) respectively.

3 Consequently, the MLE of σ is found by setting ˆσ MLE(x, y) = 1 n It is more common to estimate σ by ˆσ = s y x = 1 n l σ x,y ( ˆβ 0, ˆβ 1, σ ) = 0. This given, (y i ˆβ 0 ˆβ 1 x i ). (y i ˆβ 0 ˆβ 1 x i ). where n indicates that two unknown quantities (β 0 and β 1 ) were to be estimated to define the residuals. Sampling theory To construct confidence intervals and test procedures for the model parameters, we ll first need to look at their sampling theory. We ll explore these treating the explanatory variable as non-random, i.e., as if we picked and fixed the observations x 1,, x n. You can interpret this as being the conditional sampling theory of the estimated model parameters given the observed explanatory variables. It turns out that when Y i = β 0 + β 1 x i + ϵ i with ϵ i Normal(0, σ ), for any two numbers a and b, 1. a ˆβ 0 + b ˆβ 1 Normal. (n )ˆσ /σ χ (n ) { }) (aβ 0 + bβ 1, σ a + (a x b). n (n 1)s x 3. and these two random variables are mutually independent. A proof of this is given at the end of this handout. It is only for those who are curious to know why this happens. You may ignore it if you re not interested. Two special cases of the above result would be important to us. First, with a = 1 and b = 0 the result gives ˆβ 0 Normal(β 0, σ { 1 + x }) and is independent of ˆσ. Second, n (n 1)s x with a = 0 and b = 1 we have ˆβ 1 Normal(β 1, σ 1 ) and is independent of ˆσ. (n 1)s x ML confidence interval We shall only look at confidence intervals for parameters that can be written as η = aβ 0 + bβ 1 for some real numbers a and b (again, the interesting cases would be (a = 1, b = 0) corresponding to β 0 and (a = 0, b = 1) corresponding to β 1 ). From the result above it follows that ˆη = a ˆβ 0 + b ˆβ 1 Normal(η, σ η) where σ η = σ [ a Let ˆσ η = ˆσ [ a n + (a x b) (n 1)s x + (a x b) n (n 1)s x. A bit of algebra shows that ˆη η ˆσ η t(n ). 3 and ˆη is independent of ˆσ.

4 Therefore a 100(1 α)% confidence interval for η is given by B(x, y) = ˆη ˆσ η z n (α). This confidence interval is also the ML confidence interval, i.e., it matches {l x,y(η) max η l x,y(η) c /} for some c > 0 where l x,y(η) is the pofile log-likelihood of η, but we would not pursue a proof here. Hypotheses testing Again we restrict ourselves to parameters that can be written as η = aβ 0 + bβ 1. To test H 0 : η = η 0 against H 1 : η η 1, the size-α ML test is the one that rejects H 0 whenever η 0 is outside the 100(1 α)% ML confidence interval ˆη ˆσ η z n (α). The p-value based on these tests is precisely the α for which η 0 is just on the boundary of this interval. Of particular interest is testing H 0 : β 1 = 0 against H 1 : β 1 0. The null hypothesis says that X offers no explanation of Y within the linear relationship framework. This can be dealt as above with a = 0 and b = 1 and thus checking whether 0 is outside the interval ˆβ 1 ˆσz n (α)/ (n 1)s x. Prediction Suppose you want to predict the value Y of Y corresponding to an X = x that is known to you. The model says Y = β 0 + β 1 x + ϵ where ϵ Normal(0, σ ) and is independent of ϵ 1,, ϵ n. If we knew β 0, β 1, σ, we could give the interval β 0 + β 1 x σz(α), taking into account the variability σ of ϵ. Our best guess for β 0 + β 1 x is the fitted value ŷ = ˆβ 0 + ˆβ 1 x (the point on the least squares line with x-coordinate x ) with additional associated variability σ { 1 + ( x x ) } [follows from the result with a = 1, b = x. Also, we have an n (n 1)s x estimate ˆσ for σ. Putting all these together, it follows that the predictive interval has a 100(1 α)% coverage. Testing goodness of fit ˆβ 0 + ˆβ 1 x + ˆσ [1 + 1n + ( x x ) 1 zn (α) (n 1)s x Once we fit the model, we can construct our residuals e i = Y i ˆβ 0 ˆβ 1 x i, i = 1,, n. The histogram of these residuals should match the Normal(0, σ ) pdf, with σ estimated by ˆσ. So we could carry out a Pearson s goodness-of-fit test with these residuals in the same manner we did for iid normal data. The only difference would be that the p-value range would now be given by 1 F k 1 (Q(x)) to 1 F k 4 ((Q(x)) to reflect that 3 parameters are being estimated (β 0, β 1 and σ). Technical details (ignore if you re not interested) To dig into the sampling distribution of ˆβ 0, ˆβ 1 and ˆσ, we first need to recall an important result we had seen a long time back (Notes 7 to be precise). We saw that if Z 1,, Z n 4

5 Normal(0, 1) and W 1,, W n relate to Z 1,, Z n through the system of equations: W 1 = a 11 Z a 1n Z n W = a 1 Z a n Z n. W n = a n1 Z a nn Z n then W 1,, W n are also independent Normal(0, 1) variables with W1 + +Wn = Z1 + + Zn provided the coefficients on each row form a vector of unit length and these vectors are orthogonal to each other. We had used this result to show that if X 1,, X n Normal(0, 1) then X Normal(0, 1/ n) and (X i X) χ (n 1) and these two are independent of each other. We will pursue a similar track here. First we note that when Y i = β 0 + β 1 x i + ϵ i with ϵ i Normal(0, σ ), Ȳ = 1 (β 0 + β 1 x i + ϵ i ) = β 0 + β 1 x + ϵ n and with u i = (x i x)/ (x i x), ˆβ 1 = Y i(x i x) (x = u i x) i Y i = u i (β 0 + β 1 x i + ϵ i ) = β 1 + u i ϵ i because u i = 0 and u ix i = 1. Define ζ 1 = ϵ 1 /σ,, ζ n = ϵ n /σ, then ζ 1,, ζ n Normal(0, 1) and from calculations above Ȳ = β 0 + β 1 x + σ ζ and ˆβ 1 = β 1 + σ u iζ i. Now get W 1,, W n as [ 1/ (x i x) u i ζ i = W = n ζ = W1 = 1 n ζ n ζ n x 1 x (x i x) ζ W 3 = a 31 ζ a n1 ζ n. W n = a n1 ζ a nn ζ n x n x (x i x) ζ n where the rows are unit length and mutually orthogonal (the first two rows are so by design, and so can be extended to a set of n rows by what is known as Gram-Schmidt orthogonalization). So we have W i Normal(0, 1) and W1 + + Wn = ζ1 + + ζn. This leads to a rich collection of results. First we see that ˆβ 1 = β 1 +σw / (x i x) with W Normal(0, 1). Hence we must have ( ) σ ˆβ 1 Normal β 1, (x. i x) 5

6 Next, ˆβ 0 = Ȳ ˆβ 1 x = β 0 + σ [ W 1 xw n (x i x) with W 1, W independent Normal(0, 1). Therefore, [ ) 1n ˆβ 0 Normal (β 0, σ + ( x) n (x. i x) Furthermore, for any two numbers a and b a ˆβ 0 + b ˆβ 1 = aβ 1 + bβ 0 + σ Next we look at ˆσ. First note that, [ aw 1 (a x b)w n (x i x) { }) a Normal (aβ 0 + bβ 1, σ (a x b) +. n (n 1)s x Y i ˆβ 0 ˆβ 1 x i = ϵ i ( ˆβ 0 β 0 ) ( ˆβ 1 β 1 )x i = σ [ ζ i W 1 (x i x)w n (x i x) which leads to the following identity (with a bit of algebra that you can ignore) (Y i ˆβ 0 ˆβ 1 x i ) σ = ζi W1 W = W3 + + Wn because ζ i = W i. But W3 + +Wn is the sum of n independent Normal(0, 1) variables, so (n )ˆσ n = (Y i ˆβ 0 ˆβ 1 x i ) χ (n ) σ σ and because W 3,, W n are independent of W 1, W, we can conclude ˆσ is independent of ˆβ 0 and ˆβ 1. 6

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Classical Inference for Gaussian Linear Models

Classical Inference for Gaussian Linear Models Classical Inference for Gaussian Linear Models Surya Tokdar Gaussian linear model Data (z i, Y i ), i = 1,, n, dim(z i ) = p Model Y i = z T i Parameters: β R p, σ 2 > 0 Inference needed on η = a T β IID

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses. 1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Correlation 1. December 4, HMS, 2017, v1.1

Correlation 1. December 4, HMS, 2017, v1.1 Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients 17: INFERENCE FOR MULTIPLE REGRESSION Inference for Individual Regression Coefficients The results of this section require the assumption that the errors u are normally distributed. Let c i ij denote the

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1. Problem 1 (21 points) An economist runs the regression y i = β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 + ε i (1) The results are summarized in the following table: Equation 1. Variable Coefficient Std. Error

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

2 Regression Analysis

2 Regression Analysis FORK 1002 Preparatory Course in Statistics: 2 Regression Analysis Genaro Sucarrat (BI) http://www.sucarrat.net/ Contents: 1 Bivariate Correlation Analysis 2 Simple Regression 3 Estimation and Fit 4 T -Test:

More information

Psychology 282 Lecture #3 Outline

Psychology 282 Lecture #3 Outline Psychology 8 Lecture #3 Outline Simple Linear Regression (SLR) Given variables,. Sample of n observations. In study and use of correlation coefficients, and are interchangeable. In regression analysis,

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information

The Standard Linear Model: Hypothesis Testing

The Standard Linear Model: Hypothesis Testing Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 25: The Standard Linear Model: Hypothesis Testing Relevant textbook passages: Larsen Marx [4]:

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Chapter 10. Simple Linear Regression and Correlation

Chapter 10. Simple Linear Regression and Correlation Chapter 10. Simple Linear Regression and Correlation In the two sample problems discussed in Ch. 9, we were interested in comparing values of parameters for two distributions. Regression analysis is the

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Stat 502 Design and Analysis of Experiments General Linear Model

Stat 502 Design and Analysis of Experiments General Linear Model 1 Stat 502 Design and Analysis of Experiments General Linear Model Fritz Scholz Department of Statistics, University of Washington December 6, 2013 2 General Linear Hypothesis We assume the data vector

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing Feng Li Department of Statistics, Stockholm University What we have learned last time... 1 Estimating

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Confidence Intervals and Hypothesis Tests

Confidence Intervals and Hypothesis Tests Confidence Intervals and Hypothesis Tests STA 281 Fall 2011 1 Background The central limit theorem provides a very powerful tool for determining the distribution of sample means for large sample sizes.

More information