Linear Regression 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34
Regression analysis is a statistical methodology that utilizes the relation between two or more quantitative variables so that a response( 反應值 ) or outcome variable can be predicted from the other, or others. 迴歸分析 (Regression Analysis) 是一種統計學上分析數據的方法, 目的在於了解兩個或多個變數間是否相關 相關方向與強度, 並建立數學模型以便觀察特定變數來預測研究者感興趣的變數 (Wiki) hsuhl (NUK) SDA Regression 2 / 34
起源 : 迴歸 一詞最早由法蘭西斯 高爾頓 (Francis Galton) 所使用 他曾對親子間的身高做研究, 發現父母的身高雖然會遺傳給子女, 但子女的身高卻有逐漸 迴歸到中等 ( 即人的平均值 ) (regression to the mean) 的現象 不過當時的迴歸和現在的迴歸在意義上已不盡相同 (Wiki) 向平均迴歸 (regression to the mean) 現象 : 非常高的父母所生的子女, 往往比父母矮些, 而非常矮的雙親所生的孩子, 則往往比父母親高 將人的身高從高 矮兩個極端往所有人類的平均值拉 ( 統計改變了世界 ) hsuhl (NUK) SDA Regression 3 / 34
Relations between Variable Functional Relation between Two Variables functional relation vs. statistical relation If the selling price is $2 per unit, Y = 2X (a) figure (b) data Figure : Example of Functional Relation (Y = f (X)) hsuhl (NUK) SDA Regression 4 / 34
Relations between Variable Functional Relation between Two Variables (cont.) The observations for a statistical relation do not fall directly on the curve of relationship. Ex: Employees performance evaluations Y = Year-end evaluations; X = midyear evaluations hsuhl (NUK) SDA Regression 5 / 34
Relations between Variable Functional Relation between Two Variables (cont.) Figure : Curvilinear Statistical Relation between Age and Steroid( 膽固醇 ) Level in Healthy Females Aged 8 to 25. hsuhl (NUK) SDA Regression 6 / 34
Regression Models and Their Uses Basic Concepts A regression model: A probability distribution of Y for each level of X The probability distributions vary in some systematic fashion with X. Figure : Pictorial Representation of Regression Model hsuhl (NUK) SDA Regression 7 / 34
Regression Models and Their Uses Construction of Regression Models Y: the dependent or response variable X: the independent, explanatory or predictor variable Three major purposes: 1 description ( 描繪 ) 2 control ( 控制 ) 3 prediction ( 預測 ) hsuhl (NUK) SDA Regression 8 / 34
Simple Linear Regression Model with Distribution of Error Terms Unspecified Statement of Model The linear regression function with one predictor variable: Y i = β 0 + β 1 X i + ε i, i = 1,..., n Y i : the value of response variable in the ith trial β 0, β 1 : parameters X i : a known constant; the value of the predictor variable in the ith trial ε i : random error term; E{ε i } = 0; σ 2 (ε i ) = σ 2 ; uncorrelated σ{ε i, ε j } = 0 i, j(i j) simple, linear in the parameters; linear in the predictor variable hsuhl (NUK) SDA Regression 9 / 34
Simple Linear Regression Model with Distribution of Error Terms Unspecified Features 1 Y i : the sum of two components: (1) β 0 + β 1 X i (2) ε i 2 E{ε i } = 0: 3 The regression function: E{Y i } = E{β 0 + β 1 X i + ε i } = β 0 + β 1 X i E{Y} = β 0 + β 1 X (The regression function relates the means of the probability distribution of Y for given X to the level of X. hsuhl (NUK) SDA Regression 10 / 34
Simple Linear Regression Model with Distribution of Error Terms Unspecified Features (cont.) 1 Y i in the ith trial exceeds or falls short of the value of the regression function by the error term amount ε i 2 σ 2 {ε i } = σ 2 : σ 2 {Y i } = σ 2 ) (σ 2 {β 0 + β 1 X i + ε i } = σ 2 {ε i } = σ 2 3 The error terms are assumed to be uncorrelated, so are the responses Y i and Y j. hsuhl (NUK) SDA Regression 11 / 34
Simple Linear Regression Model with Distribution of Error Terms Unspecified Meaning of Regression Parameters Regression model: Y = 9.5 + 2.1X + ε, ε N(0, σ 2 ) Regression coefficients: β 0 (slope), β 1 (intercept) Figure : Meaning of Parameters of Simple Linear Regression Model hsuhl (NUK) SDA Regression 12 / 34
Simple Linear Regression Model with Distribution of Error Terms Unspecified Matrices Form for regression analysis Y = n 1 The regression model: Y i = β 0 + β 1 X i + ε i = E{Y i } + ε i, Y 1 Y 2. Y n X n 2 = i = 1,..., n Y = E{Y} + ε, E{ε} = 0, σ 2 {ε} = σ 2 I n 1 n 1 n 1 1 X 1 ε 1 1 X 2 ε 2 E{Y} n 1.. 1 X n = Xβ = ε = n 1 E{Y 1 } E{Y 2 }. E{Y n }. ε n, β = E{Y} = n 1 [ β0 β 1 ] E{Y 1 } E{Y 2 }. E{Y n } hsuhl (NUK) SDA Regression 13 / 34
Simple Linear Regression Model with Distribution of Error Terms Unspecified Data from Regression Analysis Unknown the regression parameters β 0, β 1 Estimate parameters from relevant data Rely on an analysis of the data for developing a suitable regression model hsuhl (NUK) SDA Regression 14 / 34
Estimation of Regression Function Estimate: Method of Least Squares Observations: (X i, Y i ), i = 1,..., n Deviation ( 偏差 ): Y i β 0 β 1 X i hsuhl (NUK) SDA Regression 15 / 34
Estimation of Regression Function Estimate: Method of Least Squares (cont.) The least square criterion: Q = n (Y i β 0 β 1 X i ) 2 = (Y Xβ) (Y Xβ) i=1 The property of Good estimators? The least squares estimators b 0, b 1 minimize the criterion Q for the given sample observations. How to obtain the estimators b 0, b 1? hsuhl (NUK) SDA Regression 16 / 34
Estimation of Regression Function Estimate: Method of Least Squares Q = 0 Y i = nb 0 + b 1 Xi β 0 b0,b 1 Q = 0 X i Y i = b 0 Xi + b 1 X 2 β i 1 b0,b 1 n i=1 b 1 = (X i X)(Y i Ȳ) n i=1 (X i X) 2 b 0 = Ȳ b 1 X The vector of the least squares regression coefficients: [ ] X X b = 2 2 2 1 X Y b = b0 = (X X) 1 X Y 2 1 b 1 hsuhl (NUK) SDA Regression 17 / 34
Estimation of Regression Function Property of Least Squares Estimators Unbiased: Estimated regression function: E{b 0 } = β 0 ; E{b 1 } = β 1 Ŷ = b 0 + b 1 X (Ŷ: the value of the estimated regression function at X of the predictor variable) Ŷ : an unbiased estimator of E{Y} Fitted value Ŷ i ( 配適值 ): Ŷ i = b 0 + b 1 X i, i = 1,..., n hsuhl (NUK) SDA Regression 18 / 34
Estimation of Regression Function Residuals( 殘差 ) Residual: e i e i = Y i Ŷ i = Y i (b 0 + b 1 X i ) is the vertical deviation of Y i from the fitted value Ŷ i on the estimated regression line, and it is known. Model error term: ε i ε i = Y i E{Y} the vertical deviation of Y i from the unknown true regression line and is unknown. hsuhl (NUK) SDA Regression 19 / 34
Estimation of Regression Function Properties of Fitted Regression Line The sum of the residuals is zero: n e i = 0 i=1 (Rounded errors may be presented.) The sum of the squared residuals is a minimum: n i=1 e2 i the criterion Q to be minimized equals n i=1 e2 i when b 0, b 1 are used for estimating β 0, β 1 The sum of the observed values Y i equals the sum of the fitted values Ŷ i : n n Y i = i=1 hsuhl (NUK) SDA Regression 20 / 34 i=1 Ŷ i
Estimation of Regression Function Properties of Fitted Regression Line (cont.) The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the level of the predictor variable in the ithe trial: n X i e i = 0 i=1 The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the fitted value of the response variable for the ith trial: n Ŷ i e i = 0 i=1 The regression line always goes through the point ( X, Ȳ) hsuhl (NUK) SDA Regression 21 / 34
Estimation of Regression Function Estimation of σ 2 σ 2 {Y i } = σ 2 The error sum of squares or residual sum of squares: SSE SSE = n (Y i Ŷ i ) 2 = i=1 n i=1 e 2 i The residual sum of squares SSE has n 2 degrees of freedom. (Two degrees of freedom are associated with the estimates b 0 and b 1 involved in obtaining Ŷ i ) E{SSE} = (n 2)σ 2 (need to be proof) hsuhl (NUK) SDA Regression 22 / 34
Estimation of Regression Function Estimation of σ 2 (cont.) The error mean square or residual mean square: MSE MSE = SSE n 2 = n i=1 (Y i Ŷ i ) 2 n 2 = e 2 i n 2 MSE is an unbiased estimator of σ 2 : An estimate of σ = MSE E{MSE} = σ 2 hsuhl (NUK) SDA Regression 23 / 34
Normal Error Regression Model Normal Error Regression Model The normal error regression model: Y i = β 0 + β 1 X i + ε i Y i : the observation response X i : a known constant β 0, β 1 : parameters ε i, i = 1,..., n: independent N(0, σ 2 ) 常態分佈的特性? The estimators of the parameters β 0, β 1 and σ 2 van be estimated be the method of maximum likelihood. (MLE) hsuhl (NUK) SDA Regression 24 / 34
Normal Error Regression Model Normal Error Regression Model (cont.) The method of maximum likelihood chooses as the maximum likelihood estimate that value for which the likelihood value is largest. Two methods for finding MLE: a systematic numerical search use of an analytical solution Estimator of µ is the sample mean Ȳ hsuhl (NUK) SDA Regression 25 / 34
Normal Error Regression Model Normal Error Regression Model (cont.) σ = 2.5; β 0 = 0; β 1 = 0.5 hsuhl (NUK) SDA Regression 26 / 34
Normal Error Regression Model Normal Error Regression Model (cont.) The density of an observation Y i for the normal error regression model: (E{Y i } = β 0 + β 1 X i ; σ 2 {Y i } = σ 2 ) [ f i = 1 exp 1 ( ) ] 2 Yi β 0 β 1 X i 2π 2 σ The likelihood function for n observations Y 1,..., Y n : [ n n L(β 0, β 1, σ 2 1 ) = f i = exp 1 ( ) ] 2 Yi β 0 β 1 X i i=1 i=1 2π 2 σ [ ] 1 = exp 1 n (Y (2πσ 2 ) n/2 2σ 2 i β 0 β 1 X i ) 2 hsuhl (NUK) SDA Regression 27 / 34 i=1
Normal Error Regression Model Normal Error Regression Model (cont.) (cont.) The MLE of σ 2 is biased. MSE = n n 2 ˆσ2 Ex: ˆβ 0 = b 0 = 2.81; ˆβ 1 = b 1 = 0.177 hsuhl (NUK) SDA Regression 28 / 34
Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares The analysis of variance( 變異數分析 ) approach is based on the partitioning of sums of squares( 平方和 ) and degrees of freedom( 自由度 ) associated with Y. The variation is measured: the deviations of the Y i around their mean Ȳ: Y i Ȳ hsuhl (NUK) SDA Regression 29 / 34
Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) hsuhl (NUK) SDA Regression 30 / 34
Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) The total deviation: Two components: Y i Ȳ }{{} Total deviation = Ŷ i Ȳ }{{} Deviation of fitted regression value around mean + Y i Ŷ i }{{} Deviation around fitted regression line The deviation of the fitted value Ŷ i around the mean Ȳ. The deviation of the observation Y i around the fitted regression line. hsuhl (NUK) SDA Regression 31 / 34
Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) The total deviation: Two components: Y i Ȳ }{{} Total deviation = Ŷ i Ȳ }{{} Deviation of fitted regression value around mean + Y i Ŷ i }{{} Deviation around fitted regression line The deviation of the fitted value Ŷ i around the mean Ȳ. The deviation of the observation Y i around the fitted regression line. hsuhl (NUK) SDA Regression 32 / 34
Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) Total variation: (SSTO): total sum of squares( 總平方和 ) SSTO = (Y i Ȳ) 2 Y i are the same SSTO = 0 The greater the variation among the Y i, the larger is SSTO. SSE: error sum of squares( 誤差平方和 ) SSE = (Y i Ŷ i ) 2 Y i fall on the fitted regression line SSE = 0 The greater the variation of the Y i around the fitted regression line, the larger is SSE. hsuhl (NUK) SDA Regression 33 / 34
Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) SSR: regression sum of squares( 迴歸平方和 ) SSR = (Ŷ i Ȳ i ) 2 The regression line is horizontal SSR = 0, otherwise SSR > 0 a measure associated with the regression line The larger SSR is in relation to SSTO, the greater is the effect of the regression relation in accounting for the total variation in the Y i observations. hsuhl (NUK) SDA Regression 34 / 34