Linear Regression. Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34

Similar documents
Chapter 1 Linear Regression with One Predictor Variable

Chapter 1 Linear Regression with One Predictor Variable

Chapter 5-7 Errors, Random Errors, and Statistical Data in Chemical Analyses

Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 )

Chapter 20 Cell Division Summary

生物統計教育訓練 - 課程. Introduction to equivalence, superior, inferior studies in RCT 謝宗成副教授慈濟大學醫學科學研究所. TEL: ext 2015

= lim(x + 1) lim x 1 x 1 (x 2 + 1) 2 (for the latter let y = x2 + 1) lim

國立中正大學八十一學年度應用數學研究所 碩士班研究生招生考試試題

0 0 = 1 0 = 0 1 = = 1 1 = 0 0 = 1

Chapter 11 Building the Regression Model II:

Chapter 22 Lecture. Essential University Physics Richard Wolfson 2 nd Edition. Electric Potential 電位 Pearson Education, Inc.

Statistical Intervals and the Applications. Hsiuying Wang Institute of Statistics National Chiao Tung University Hsinchu, Taiwan

Algorithms and Complexity

相關分析. Scatter Diagram. Ch 13 線性迴歸與相關分析. Correlation Analysis. Correlation Analysis. Linear Regression And Correlation Analysis

台灣大學開放式課程 有機化學乙 蔡蘊明教授 本著作除另有註明, 作者皆為蔡蘊明教授, 所有內容皆採用創用 CC 姓名標示 - 非商業使用 - 相同方式分享 3.0 台灣授權條款釋出

Lecture Notes on Propensity Score Matching

Ch.9 Liquids and Solids

Candidates Performance in Paper I (Q1-4, )

Differential Equations (DE)

Chapter 2 Inferences in Regression and Correlation Analysis

期中考前回顧 助教 : 王珊彗. Copyright 2009 Cengage Learning

Chapter 6. Series-Parallel Circuits ISU EE. C.Y. Lee

國立成功大學 航空太空工程學系 碩士論文 研究生 : 柯宗良 指導教授 : 楊憲東

Permutation Tests for Difference between Two Multivariate Allometric Patterns

EXPERMENT 9. To determination of Quinine by fluorescence spectroscopy. Introduction

1 dx (5%) andˆ x dx converges. x2 +1 a

壓差式迴路式均熱片之研製 Fabrication of Pressure-Difference Loop Heat Spreader

2019 年第 51 屆國際化學奧林匹亞競賽 國內初選筆試 - 選擇題答案卷

Advanced Engineering Mathematics 長榮大學科工系 105 級

Digital Integrated Circuits Lecture 5: Logical Effort

Chapter 1 Physics and Measurement

Chapter 9 Time-Weighted Control Charts. Statistical Quality Control (D. C. Montgomery)

ApTutorGroup. SAT II Chemistry Guides: Test Basics Scoring, Timing, Number of Questions Points Minutes Questions (Multiple Choice)

授課大綱 課號課程名稱選別開課系級學分 結果預視

HKDSE Chemistry Paper 2 Q.1 & Q.3

Candidates Performance in Paper I (Q1-4, )

Frequency Response (Bode Plot) with MATLAB

Using Bootstrap in Capture-Recapture Model

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

雷射原理. The Principle of Laser. 授課教授 : 林彥勝博士 Contents

STAT5044: Regression and Anova. Inyoung Kim

5.5 Using Entropy to Calculate the Natural Direction of a Process in an Isolated System

統計學 Spring 2011 授課教師 : 統計系余清祥日期 :2011 年 3 月 22 日第十三章 : 變異數分析與實驗設計

Multiple sequence alignment (MSA)

Boundary Influence On The Entropy Of A Lozi-Type Map. Cellular Neural Networks : Defect Patterns And Stability

適應控制與反覆控制應用在壓電致動器之研究 Adaptive and Repetitive Control of Piezoelectric Actuators

Regression Analysis. Institute of Statistics, National Tsing Hua University, Taiwan

CHAPTER 4. Thermochemistry ( 熱化學是熱力學的一支, 在化學反應或相變化過程中發生的能量吸收或釋出, 若以吸放熱的形式表現, 即為熱化學研究的對象 ) Chap. 4 Thermochemistry

Chapter 10 Building the Regression Model II: Diagnostics

d) There is a Web page that includes links to both Web page A and Web page B.

Statistics and Econometrics I

Ch2. Atoms, Molecules and Ions

論文與專利寫作暨學術 倫理期末報告 班級 : 碩化一甲學號 :MA 姓名 : 林郡澤老師 : 黃常寧

國立交通大學 電子工程學系電子研究所碩士班 碩士論文

原子模型 Atomic Model 有了正確的原子模型, 才會發明了雷射

KWUN TONG GOVERNMENT SECONDARY SCHOOL 觀塘官立中學 (Office) Shun Lee Estate Kwun Tong, Kowloon 上學期測驗

MECHANICS OF MATERIALS

GSAS 安裝使用簡介 楊仲準中原大學物理系. Department of Physics, Chung Yuan Christian University

REAXYS NEW REAXYS. RAEXYS 教育訓練 PPT HOW YOU THINK HOW YOU WORK

pseudo-code-2012.docx 2013/5/9

tan θ(t) = 5 [3 points] And, we are given that d [1 points] Therefore, the velocity of the plane is dx [4 points] (km/min.) [2 points] (The other way)

A Direct Simulation Method for Continuous Variable Transmission with Component-wise Design Specifications

第 3 章有機化學反應種類及酸鹼有機反應. 一 ) 有機化反應的種類及有機反應機制 (organic reactions and their mechanism)

FUNDAMENTALS OF FLUID MECHANICS Chapter 3 Fluids in Motion - The Bernoulli Equation

14-A Orthogonal and Dual Orthogonal Y = A X

統計學 ( 一 ) 第七章信賴區間估計 (Estimation Using Confidence Intervals) 授課教師 : 唐麗英教授 國立交通大學工業工程與管理學系聯絡電話 :(03)

Study of Leaf Area as Functions of Age and Temperature in Rice (Oryza sativa L.) 1

Elementary Number Theory An Algebraic Apporach

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

邏輯設計 Hw#6 請於 6/13( 五 ) 下課前繳交

2. Suppose that a consumer has the utility function

ON FINITE DIMENSIONAL APPROXIMATION IN NONPARAMETRIC REGRESSION

Ph.D. Qualified Examination

Chapter 13 Thin-layer chromatography. Shin-Hun Juang, Ph.D.

Chapter 8 Lecture. Essential University Physics Richard Wolfson 2 nd Edition. Gravity 重力 Pearson Education, Inc. Slide 8-1

第二章 : Hydrostatics and Atmospheric Stability. Ben Jong-Dao Jou Autumn 2010

Hong Kong s temperature record: Is it in support of global warming? 香港的溫度記錄 : 全球暖化的證據?

Comparing Relative Predictive Power through Squared Multiple Correlations in. Within-Sample Regression Analysis. CHEUNG, Yu Hin Ray

Chapter 13. Enzyme Kinetics ( 動力學 ) and Specificity ( 特異性 專一性 ) Biochemistry by. Reginald Garrett and Charles Grisham

國立交通大學 電子物理研究所 博士論文 微結構鐵磁系統的磁矩翻轉和磁電傳輸性質. Magnetization Reversal and Magneto-transport in Patterned Ferromagnetic Systems 研究生 : 鍾廷翊 指導教授 : 許世英

Earth System Science Programme. Academic Counseling 2018

基因演算法 學習速成 南台科技大學電機系趙春棠講解

Numbers and Fundamental Arithmetic

CHAPTER6 LINEAR REGRESSION

CHAPTER 2. Energy Bands and Carrier Concentration in Thermal Equilibrium

Chapter 1 Introduction: Matter and Measurement

Chapter 7. The Quantum- Mechanical Model of the Atom. Chapter 7 Lecture Lecture Presentation. Sherril Soman Grand Valley State University

磁振影像原理與臨床研究應用 課程內容介紹 課程內容 參考書籍. Introduction of MRI course 磁振成像原理 ( 前 8 週 ) 射頻脈衝 組織對比 影像重建 脈衝波序 影像假影與安全 等

新世代流式細胞儀. Partec GmbH from Münster Germany 派特科技有限公司

CH 5 More on the analysis of consumer behavior

Digital Image Processing

在破裂多孔介質中的情形 底下是我們考慮的抛物線微分方程式. is a domain and = f. in f. . Denote absolute permeability by. P in. k in. p in. and. and. , and external source by

在雲層閃光放電之前就開始提前釋放出離子是非常重要的因素 所有 FOREND 放電式避雷針都有離子加速裝置支援離子產生器 在產品設計時, 為增加電場更大範圍, 使用電極支援大氣離子化,

Chapter 5-6 Experimental Error

Chapter 1. Linear Regression with One Predictor Variable

PHI7470 Topics in Applied Philosophy: The Philosopher and Sociology

Sparse Learning Under Regularization Framework

Chapter 7 Propositional and Predicate Logic

Ch. 6 Electronic Structure and The Periodic Table

個體經濟學二. Ch10. Price taking firm. * Price taking firm: revenue = P(x) x = P x. profit = total revenur total cost

Transcription:

Linear Regression 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34

Regression analysis is a statistical methodology that utilizes the relation between two or more quantitative variables so that a response( 反應值 ) or outcome variable can be predicted from the other, or others. 迴歸分析 (Regression Analysis) 是一種統計學上分析數據的方法, 目的在於了解兩個或多個變數間是否相關 相關方向與強度, 並建立數學模型以便觀察特定變數來預測研究者感興趣的變數 (Wiki) hsuhl (NUK) SDA Regression 2 / 34

起源 : 迴歸 一詞最早由法蘭西斯 高爾頓 (Francis Galton) 所使用 他曾對親子間的身高做研究, 發現父母的身高雖然會遺傳給子女, 但子女的身高卻有逐漸 迴歸到中等 ( 即人的平均值 ) (regression to the mean) 的現象 不過當時的迴歸和現在的迴歸在意義上已不盡相同 (Wiki) 向平均迴歸 (regression to the mean) 現象 : 非常高的父母所生的子女, 往往比父母矮些, 而非常矮的雙親所生的孩子, 則往往比父母親高 將人的身高從高 矮兩個極端往所有人類的平均值拉 ( 統計改變了世界 ) hsuhl (NUK) SDA Regression 3 / 34

Relations between Variable Functional Relation between Two Variables functional relation vs. statistical relation If the selling price is $2 per unit, Y = 2X (a) figure (b) data Figure : Example of Functional Relation (Y = f (X)) hsuhl (NUK) SDA Regression 4 / 34

Relations between Variable Functional Relation between Two Variables (cont.) The observations for a statistical relation do not fall directly on the curve of relationship. Ex: Employees performance evaluations Y = Year-end evaluations; X = midyear evaluations hsuhl (NUK) SDA Regression 5 / 34

Relations between Variable Functional Relation between Two Variables (cont.) Figure : Curvilinear Statistical Relation between Age and Steroid( 膽固醇 ) Level in Healthy Females Aged 8 to 25. hsuhl (NUK) SDA Regression 6 / 34

Regression Models and Their Uses Basic Concepts A regression model: A probability distribution of Y for each level of X The probability distributions vary in some systematic fashion with X. Figure : Pictorial Representation of Regression Model hsuhl (NUK) SDA Regression 7 / 34

Regression Models and Their Uses Construction of Regression Models Y: the dependent or response variable X: the independent, explanatory or predictor variable Three major purposes: 1 description ( 描繪 ) 2 control ( 控制 ) 3 prediction ( 預測 ) hsuhl (NUK) SDA Regression 8 / 34

Simple Linear Regression Model with Distribution of Error Terms Unspecified Statement of Model The linear regression function with one predictor variable: Y i = β 0 + β 1 X i + ε i, i = 1,..., n Y i : the value of response variable in the ith trial β 0, β 1 : parameters X i : a known constant; the value of the predictor variable in the ith trial ε i : random error term; E{ε i } = 0; σ 2 (ε i ) = σ 2 ; uncorrelated σ{ε i, ε j } = 0 i, j(i j) simple, linear in the parameters; linear in the predictor variable hsuhl (NUK) SDA Regression 9 / 34

Simple Linear Regression Model with Distribution of Error Terms Unspecified Features 1 Y i : the sum of two components: (1) β 0 + β 1 X i (2) ε i 2 E{ε i } = 0: 3 The regression function: E{Y i } = E{β 0 + β 1 X i + ε i } = β 0 + β 1 X i E{Y} = β 0 + β 1 X (The regression function relates the means of the probability distribution of Y for given X to the level of X. hsuhl (NUK) SDA Regression 10 / 34

Simple Linear Regression Model with Distribution of Error Terms Unspecified Features (cont.) 1 Y i in the ith trial exceeds or falls short of the value of the regression function by the error term amount ε i 2 σ 2 {ε i } = σ 2 : σ 2 {Y i } = σ 2 ) (σ 2 {β 0 + β 1 X i + ε i } = σ 2 {ε i } = σ 2 3 The error terms are assumed to be uncorrelated, so are the responses Y i and Y j. hsuhl (NUK) SDA Regression 11 / 34

Simple Linear Regression Model with Distribution of Error Terms Unspecified Meaning of Regression Parameters Regression model: Y = 9.5 + 2.1X + ε, ε N(0, σ 2 ) Regression coefficients: β 0 (slope), β 1 (intercept) Figure : Meaning of Parameters of Simple Linear Regression Model hsuhl (NUK) SDA Regression 12 / 34

Simple Linear Regression Model with Distribution of Error Terms Unspecified Matrices Form for regression analysis Y = n 1 The regression model: Y i = β 0 + β 1 X i + ε i = E{Y i } + ε i, Y 1 Y 2. Y n X n 2 = i = 1,..., n Y = E{Y} + ε, E{ε} = 0, σ 2 {ε} = σ 2 I n 1 n 1 n 1 1 X 1 ε 1 1 X 2 ε 2 E{Y} n 1.. 1 X n = Xβ = ε = n 1 E{Y 1 } E{Y 2 }. E{Y n }. ε n, β = E{Y} = n 1 [ β0 β 1 ] E{Y 1 } E{Y 2 }. E{Y n } hsuhl (NUK) SDA Regression 13 / 34

Simple Linear Regression Model with Distribution of Error Terms Unspecified Data from Regression Analysis Unknown the regression parameters β 0, β 1 Estimate parameters from relevant data Rely on an analysis of the data for developing a suitable regression model hsuhl (NUK) SDA Regression 14 / 34

Estimation of Regression Function Estimate: Method of Least Squares Observations: (X i, Y i ), i = 1,..., n Deviation ( 偏差 ): Y i β 0 β 1 X i hsuhl (NUK) SDA Regression 15 / 34

Estimation of Regression Function Estimate: Method of Least Squares (cont.) The least square criterion: Q = n (Y i β 0 β 1 X i ) 2 = (Y Xβ) (Y Xβ) i=1 The property of Good estimators? The least squares estimators b 0, b 1 minimize the criterion Q for the given sample observations. How to obtain the estimators b 0, b 1? hsuhl (NUK) SDA Regression 16 / 34

Estimation of Regression Function Estimate: Method of Least Squares Q = 0 Y i = nb 0 + b 1 Xi β 0 b0,b 1 Q = 0 X i Y i = b 0 Xi + b 1 X 2 β i 1 b0,b 1 n i=1 b 1 = (X i X)(Y i Ȳ) n i=1 (X i X) 2 b 0 = Ȳ b 1 X The vector of the least squares regression coefficients: [ ] X X b = 2 2 2 1 X Y b = b0 = (X X) 1 X Y 2 1 b 1 hsuhl (NUK) SDA Regression 17 / 34

Estimation of Regression Function Property of Least Squares Estimators Unbiased: Estimated regression function: E{b 0 } = β 0 ; E{b 1 } = β 1 Ŷ = b 0 + b 1 X (Ŷ: the value of the estimated regression function at X of the predictor variable) Ŷ : an unbiased estimator of E{Y} Fitted value Ŷ i ( 配適值 ): Ŷ i = b 0 + b 1 X i, i = 1,..., n hsuhl (NUK) SDA Regression 18 / 34

Estimation of Regression Function Residuals( 殘差 ) Residual: e i e i = Y i Ŷ i = Y i (b 0 + b 1 X i ) is the vertical deviation of Y i from the fitted value Ŷ i on the estimated regression line, and it is known. Model error term: ε i ε i = Y i E{Y} the vertical deviation of Y i from the unknown true regression line and is unknown. hsuhl (NUK) SDA Regression 19 / 34

Estimation of Regression Function Properties of Fitted Regression Line The sum of the residuals is zero: n e i = 0 i=1 (Rounded errors may be presented.) The sum of the squared residuals is a minimum: n i=1 e2 i the criterion Q to be minimized equals n i=1 e2 i when b 0, b 1 are used for estimating β 0, β 1 The sum of the observed values Y i equals the sum of the fitted values Ŷ i : n n Y i = i=1 hsuhl (NUK) SDA Regression 20 / 34 i=1 Ŷ i

Estimation of Regression Function Properties of Fitted Regression Line (cont.) The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the level of the predictor variable in the ithe trial: n X i e i = 0 i=1 The sum of the weighted residuals is zero when the residual in the ith trial is weighted by the fitted value of the response variable for the ith trial: n Ŷ i e i = 0 i=1 The regression line always goes through the point ( X, Ȳ) hsuhl (NUK) SDA Regression 21 / 34

Estimation of Regression Function Estimation of σ 2 σ 2 {Y i } = σ 2 The error sum of squares or residual sum of squares: SSE SSE = n (Y i Ŷ i ) 2 = i=1 n i=1 e 2 i The residual sum of squares SSE has n 2 degrees of freedom. (Two degrees of freedom are associated with the estimates b 0 and b 1 involved in obtaining Ŷ i ) E{SSE} = (n 2)σ 2 (need to be proof) hsuhl (NUK) SDA Regression 22 / 34

Estimation of Regression Function Estimation of σ 2 (cont.) The error mean square or residual mean square: MSE MSE = SSE n 2 = n i=1 (Y i Ŷ i ) 2 n 2 = e 2 i n 2 MSE is an unbiased estimator of σ 2 : An estimate of σ = MSE E{MSE} = σ 2 hsuhl (NUK) SDA Regression 23 / 34

Normal Error Regression Model Normal Error Regression Model The normal error regression model: Y i = β 0 + β 1 X i + ε i Y i : the observation response X i : a known constant β 0, β 1 : parameters ε i, i = 1,..., n: independent N(0, σ 2 ) 常態分佈的特性? The estimators of the parameters β 0, β 1 and σ 2 van be estimated be the method of maximum likelihood. (MLE) hsuhl (NUK) SDA Regression 24 / 34

Normal Error Regression Model Normal Error Regression Model (cont.) The method of maximum likelihood chooses as the maximum likelihood estimate that value for which the likelihood value is largest. Two methods for finding MLE: a systematic numerical search use of an analytical solution Estimator of µ is the sample mean Ȳ hsuhl (NUK) SDA Regression 25 / 34

Normal Error Regression Model Normal Error Regression Model (cont.) σ = 2.5; β 0 = 0; β 1 = 0.5 hsuhl (NUK) SDA Regression 26 / 34

Normal Error Regression Model Normal Error Regression Model (cont.) The density of an observation Y i for the normal error regression model: (E{Y i } = β 0 + β 1 X i ; σ 2 {Y i } = σ 2 ) [ f i = 1 exp 1 ( ) ] 2 Yi β 0 β 1 X i 2π 2 σ The likelihood function for n observations Y 1,..., Y n : [ n n L(β 0, β 1, σ 2 1 ) = f i = exp 1 ( ) ] 2 Yi β 0 β 1 X i i=1 i=1 2π 2 σ [ ] 1 = exp 1 n (Y (2πσ 2 ) n/2 2σ 2 i β 0 β 1 X i ) 2 hsuhl (NUK) SDA Regression 27 / 34 i=1

Normal Error Regression Model Normal Error Regression Model (cont.) (cont.) The MLE of σ 2 is biased. MSE = n n 2 ˆσ2 Ex: ˆβ 0 = b 0 = 2.81; ˆβ 1 = b 1 = 0.177 hsuhl (NUK) SDA Regression 28 / 34

Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares The analysis of variance( 變異數分析 ) approach is based on the partitioning of sums of squares( 平方和 ) and degrees of freedom( 自由度 ) associated with Y. The variation is measured: the deviations of the Y i around their mean Ȳ: Y i Ȳ hsuhl (NUK) SDA Regression 29 / 34

Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) hsuhl (NUK) SDA Regression 30 / 34

Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) The total deviation: Two components: Y i Ȳ }{{} Total deviation = Ŷ i Ȳ }{{} Deviation of fitted regression value around mean + Y i Ŷ i }{{} Deviation around fitted regression line The deviation of the fitted value Ŷ i around the mean Ȳ. The deviation of the observation Y i around the fitted regression line. hsuhl (NUK) SDA Regression 31 / 34

Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) The total deviation: Two components: Y i Ȳ }{{} Total deviation = Ŷ i Ȳ }{{} Deviation of fitted regression value around mean + Y i Ŷ i }{{} Deviation around fitted regression line The deviation of the fitted value Ŷ i around the mean Ȳ. The deviation of the observation Y i around the fitted regression line. hsuhl (NUK) SDA Regression 32 / 34

Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) Total variation: (SSTO): total sum of squares( 總平方和 ) SSTO = (Y i Ȳ) 2 Y i are the same SSTO = 0 The greater the variation among the Y i, the larger is SSTO. SSE: error sum of squares( 誤差平方和 ) SSE = (Y i Ŷ i ) 2 Y i fall on the fitted regression line SSE = 0 The greater the variation of the Y i around the fitted regression line, the larger is SSE. hsuhl (NUK) SDA Regression 33 / 34

Analysis of Variance Approach to Regression Analysis Partition of Total Sum of Squares (cont.) SSR: regression sum of squares( 迴歸平方和 ) SSR = (Ŷ i Ȳ i ) 2 The regression line is horizontal SSR = 0, otherwise SSR > 0 a measure associated with the regression line The larger SSR is in relation to SSTO, the greater is the effect of the regression relation in accounting for the total variation in the Y i observations. hsuhl (NUK) SDA Regression 34 / 34