Lecture 8: Linear Regression

Similar documents
ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Chapter 13 Student Lecture Notes 13-1

Simple Linear Regression

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Linear Regression with One Regressor

residual. (Note that usually in descriptions of regression analysis, upper-case

Multiple Linear Regression Analysis

ENGI 3423 Simple Linear Regression Page 12-01


Lecture Notes Types of economic variables

Chapter 14 Logistic Regression Models

Probability and. Lecture 13: and Correlation

Statistics: Unlocking the Power of Data Lock 5

Statistics MINITAB - Lab 5

STA302/1001-Fall 2008 Midterm Test October 21, 2008

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Econometric Methods. Review of Estimation

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Objectives of Multiple Regression

Chapter Two. An Introduction to Regression ( )

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Summary of the lecture in Biostatistics

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Simple Linear Regression

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Lecture 2: Linear Least Squares Regression

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Simple Linear Regression - Scalar Form

CHAPTER VI Statistical Analysis of Experimental Data

LINEAR REGRESSION ANALYSIS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

: At least two means differ SST

Simple Linear Regression and Correlation.

TESTS BASED ON MAXIMUM LIKELIHOOD

Chapter 8. Inferences about More Than Two Population Central Values

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Functions of Random Variables

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Lecture 3. Sampling, sampling distributions, and parameter estimation

Multivariate Transformation of Variables and Maximum Likelihood Estimation

The TDT. (Transmission Disequilibrium Test) (Qualitative and quantitative traits) D M D 1 M 1 D 2 M 2 M 2D1 M 1

ESS Line Fitting

Lecture 1: Introduction to Regression

Correlation and Simple Linear Regression

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009

Chapter 2 Simple Linear Regression

Continuous Distributions

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

Statistics Review Part 3. Hypothesis Tests, Regression

Multiple Choice Test. Chapter Adequacy of Models for Regression

4. Standard Regression Model and Spatial Dependence Tests

Chapter Statistics Background of Regression Analysis

STK4011 and STK9011 Autumn 2016

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

ENGI 4421 Propagation of Error Page 8-01

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Lecture 1: Introduction to Regression

Maximum Likelihood Estimation

Topic 4: Simple Correlation and Regression Analysis

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

Lecture 1 Review of Fundamental Statistical Concepts

Lecture Notes Forecasting the process of estimating or predicting unknown situations

ρ < 1 be five real numbers. The

Chapter 11 The Analysis of Variance

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Third handout: On the Gini Index

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Fundamentals of Regression Analysis

CHAPTER 2. = y ˆ β x (.1022) So we can write

Special Instructions / Useful Data

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Chapter 4 Multiple Random Variables

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Line Fitting and Regression

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Example. Row Hydrogen Carbon

Simple Linear Regression Analysis

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

X ε ) = 0, or equivalently, lim

Linear Regression. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

MS exam problems Fall 2012

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Chapter 5 Properties of a Random Sample

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Linear Regression. Can height information be used to predict weight of an individual? How long should you wait till next eruption?

ε. Therefore, the estimate

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

6.867 Machine Learning

Chapter 3 Multiple Linear Regression Model

Transcription:

Lecture 8: Lear egresso May 4, GENOME 56, Sprg Goals Develop basc cocepts of lear regresso from a probablstc framework Estmatg parameters ad hypothess testg wth lear models Lear regresso Su I Lee, CSE & GS sulee@uw.edu egresso Techque used for the modelg ad aalyss of umercal data Explots the relatoshp betwee two or more varables so that we ca ga formato about oe of them through kowg values of the other egresso ca be used for predcto, estmato, hypothess testg, ad modelg causal relatoshps Why Lear egresso? Suppose we wat to model the outcome varable Y terms of three predctors, X, X, X Y = f (X, X, X ) Typcally wll ot have eough data to try ad drectly estmate f Therefore, we usually have to assume that t has some restrcted form, such as lear Y = X + X + X 4

egresso Termology Y = X + X + X Lear egresso s a Probablstc Model Much of mathematcs s devoted to studyg varables that are determstcally related to oe aother Y = β + β X Depedet Varable Outcome Varable Idepedet Varable Predctor Varable Lug cacer rsk smokg espose Varable Lug cacer rsk Explaatory Varable Geetc factor, smokg, det, etc Itercept term Slope Expresso level of gee X Expresso levels of X s Ts A, B ad C 5 But we re terested uderstadg the relatoshp betwee varables related a odetermstc fasho 6 A Lear Probablstc Model Defto: There exsts parameters β, β ad σ, such that for ay fxed value of the predctor varable X, the outcome varable Y s related to X through the model equato Y = β + β X + ε, Lug cacer rsk where ε s a V assumed to be N(, σ ) Implcatos The expected value of Y s a lear fucto of X, but for fxed value x, the varable Y dffers from ts expected value by a radom amout Varables & Symbols: How s x dfferet from X? Captal letter X: a radom varable Lower case letter x: correspodg values (.e. the real umbers the V X map to) or example, X: Geotype of a certa locus x:, or (meag AA, AG ad GG, respectvely) smokg 7 8

Implcatos The expected value of Y s a lear fucto of X, but for fxed value x, the varable Y dffers from ts expected value by a radom amout Graphcal Iterpretato ormally, let x* deote a partcular value of the predctor varable X, the our lear probablstc model says: E( Y x*) Y x* V ( Y x* ) Y x* mea valueof Y whe X s x* varace of Y whe X s x* E( Y x*) Y x* V ( Y x* ) Y x* mea valueof Y whe X s x* varace of Y whe X s x* 9 Graphcal Iterpretato Weght Say that X = heght ad Y = weght Heght The μ Y x=6 s the average weght for all dvduals 6 ches tall the populato Oe More Example Suppose the relatoshp betwee the predctor varable heght (X) ad outcome varable weght (Y) s descrbed by a smple lear regresso model wth true regresso le Y = 7.5 +.5 X, ε~ N(, σ ) ad σ = Q: What s the terpretato of β =.5? The expected chage weght (Y ) assocated wth a ut crease heght (X ) Q: If x =, what s the expected value of Y? μ Y x= = 7.5 +.5 () = 7.5

Oe More Example Q: If x =, what s P(Y>)? Y = μ Y x= = 7.5 Y ~ N(μ =7.5, σ = ) Estmatg Model Parameters Where are the parameters β ad β from? Predcted, or ftted, values are values of y predcted by pluggg x, x,, x to the estmated regresso le: y = β + β x x x x x= Gve Y ~ N(μ =7.5, σ = ), 7.5 P( Y x ) ( ) (.5).67 where ϕ meas the CD of Normal dst. N(,) σ = μ =7.5 Y > esduals are the devatos of observed (red dots) ad predcted values (red le) y y x x x y = β + β x e y e y e y 4 esduals Are Useful! The error sum of squares (SSE) ca tell us how well the le fts to the data SSE ( e ) ( y ) d β ad β that mmzes SSE Deote the solutos by ˆ ad ˆ, ) y ( x ) f ( x x x y = β + β x x x x 5 Least Squares d β ad β that mmzes SSE, ) y ( x ) f ( 6 4

Least Squares Least Squares d β ad β that mmzes SSE d β ad β that mmzes SSE, ) y ( x ) f ( 7, ) y ( x ) f ( 8 Least Squares Coeffcet of Determato Importat statstc referred to as the coeffcet of determato ( ): SSE SST SSE ( e ) ( y ) Error Sum Squares d β ad β that mmzes SSE, ) y ( x ) f ( 9 SST ( y y) y = β + β x y = average y Error Sum Squares, whe β = avg(y) ad β = 5

Multple Lear egresso Categorcal Idepedet Varables Exteso of the smple lear regresso model to two or more depedet varables Qualtatve varables are easly corporated regresso framework through dummy varables y = β + β x + β x + + β x + ε Expresso = Basele + Age + Tssue + Sex + Error Smple example: sex ca be coded as / Partal egresso Coeffcets: What f my categorcal varable cotas three levels: β effect o the outcome varable whe creasg the th predctor varable by ut, holdg all other predctors costat X f AA f AG f GG NO! Categorcal Idepedet Varables Hypothess Testg: Model Utlty Test Prevous codg would result collearty Soluto s to set up a seres of dummy varable. I geeral for k levels you eed (k ) dummy varables X f AA otherwse X f AG otherwse The frst thg we wat to kow after fttg a model s whether ay of the predctor varables (X s) are sgfcatly related to the outcome varable (Y): H H A : β β : At least oe β Let s frame ths our ANOVE framework β k X X f AA f AG f GG AA AG GG X X I ANOVA, we parttoed total varace (SST) to two compoets: SSE (uexplaed varato) SS (varato explaed by lear model) 6

Model Utlty Test Partto total varace (SST) to two compoets: SSE (uexplaed varato) SS (varato explaed by lear model) ANOVA ormulato of Model Utlty Test Partto total varace (SST) to two compoets: SSE (uexplaed varato) SS (varato explaed by lear model) Let s cosder (=) data pots ad k (=) predctor model y = β + β x y = average y SST SSE ( y y) ( e ) ( y ) (k+) (k+) SS ( y) 5 # data pots (# parameters the model) ejecto ego :, k, ( k ) 6 ANOVA ormulato of Model Utlty Test test statstc MS MS E SS / k SSE / ( k ) ejecto ego : ( k ) k, k, ( k ) Pck the dstrbuto fucto, based o k ad -(k+). Choose the crtcal value based o α ( α,k,-(k+) ) Say that α =.5 Prob(> α,k,-(k+) ) =.5 7 Test or Subsets of Idepedet Varables A powerful tool multple regresso aalyss s the ablty to compare two models or stace say we wat to compare ull Model: y β educed Model: y β β x β x β x β x Aga, aother example of ANOVA SSE error sum of squares for reduced model wth l predctors SSE error sum of squares for full model wth k predctors β x β x 4 4 (SSE SSE ) /( k l) SSE /[ ( k )] 8 7

Example of Model Comparso We have a quattatve trat ad wat to test the effects at two markers, M ad M. ull Model: Trat Mea M M (M X) educed Model: Trat Mea M M (SSE SSE ) /( k l) (SSE SSE ) /( ) SSE /[ ( k )] SSE /[ ()] (SSE SSE ) SSE / 96 ejecto ego :,,96 9 How To Do I You ca ft a least squares regresso usg the fucto mm < lsft(x,y) The coeffcets of the ft are the gve by mm$coef The resduals are mm$resduals Ad to prt out the tests for zero slope just do ls.prt (mm) Iput Data http://www.cs.washgto.edu/homes/sulee/geo me56/data/cats.txt Data o fluctuatg proportos of marked cells marrow from heterozygous Safar cats Proportos of cells of oe cell type samples from cats (take our departmet may years ago). Colum s the ID umber of the partcular cat. You wll wat to plot the data from oe cat. or example cat 44 s rows :7, 45a s 8:, 45b s :47, 46 s 48:65, 4665 s 66:8 ad so o. Iput Data d colum: Tme, weeks from the start of motorg, that the measuremet from marrow s recorded. rd colum: Percet of domestc type progetor cells observed a sample of cells at that tme. 4 th colum: Sample sze at that tme,.e. the umber of progetor cells aalyzed. Cat # Cat # 8