ETH Zürich, October 25, 2010
|
|
- Lesley Jordan
- 5 years ago
- Views:
Transcription
1 Marcel Dettling Institute for Data nalysis and Process Design Zurich University of pplied Sciences t t th h/ ttli ETH Zürich, October 25,
2 Mortality Example City Mortality JanTemp JulyTemp RelHum Rain Educ Dens NonWhite WhiteCollar Pop House Income HC NOx SO2 kron, OH lbany, NY llentown, P tlanta, G altimore, MD irmingham, L
3 Model Diagnostics Why do we need to do this? a) make sure that estimatess and inference are valid - E[ [ ε i ] = Var( ε i ) = σ ε - Cov ( εi, ε j ) = εi ~ N(0, σ ε I), iid.. b) improving the model (better fit, reliable conclusions) - variable transformations - further predictors or interactions between them - weighted reg or more general model 3
4 Model Diagnostics: Structure and Variance Tukey-nscombe Plot Scale-Location Plot Residua als New Orleans lbany Lancaster residuals Stan ndardized New Orleans lbany Lancaster Fitted values Fitted values 4
5 Model Diagnostics: Example It is all in the residuals Important trickery: - Plot the residuals against some predictors - These predictors can be in or out of the model No matter what the predict tor is, we must not see any structures in these plots. If we find some, that means the model is inadequate. 5
6 Model Diagnostics: Normality/Correlation Normal Plot Serial Correlation Plot 4 New Orleans 100 ndardized residuals lbany residuals s(fit) 50 0 Sta -2-1 Lancaster Theoretical Quantiles Index 6
7 How To Identify Influential Data Points? 1) Poor man s approach Redo the analysis n times by excluding each data point 2) Leverage If we change yi by Δy, then h i iiδy is the change in y i ˆi High leverage for a data point ( hii > 2( p+ 1)/ n ) means that it forces the reg line to fit well to it. 3) Cook s Distance D i ( yˆ y ) h r = = ( p 1) 1 h ( p 1) 2 *2 j j () i ii i 2 + σ ε ii + e careful if Cook s Distance > 1. 7
8 Model Diagnostics: Example Cook's Distance Leverage Plot York 4 New Orleans tance Cook's dist New Orleans Miam i tandardized residuals St Miami 0.5 York Cook's distance Obs. number Leverage 8
9 Model Diagnostics: Conclusions Conclusions from the mode el diagnostics: - there are 2 influential data points: York and New Orleans - they do not seem to be very strongly influential, but still: - better to re-run the analysis without these and check results Results from that analysis: - log(so2) is significant ifi again!!! - Residual standard error smaller - Coefficient of determination higher - Thus: better fit! 9
10 Why re They Influential? Population Density % White Collar vs. Education York York 60 Washington Population Density New York % WhiteCo ollar Index Education 10
11 Weighted Reg We consider the model: Y = Xβ + ε, where ε 2 ~ N(0, σ Σ) ε with Σ = I generalized least squares weighted reg 1 1 Σ= diag,,..., w 1 w 2 1 w n 11
12 Weighted Reg When, why and how: Y i If the are means of observations, we choose wi = n i If the variance is proportional to a predictor: w i = 1/ x i For observed non-constan nt variance: estimate t from OLS! The reg coefficients in weighted reg are obtained by minimizing the sum of weighted least squares: n i= 1 wr 2 i i Solution is still explicit and unique for full-ranked design. 12
13 Robust Reg How to deal with outliers? Check for typos first. Keep contact to the original data source. Examine the physical context why did it happen? Outliers are often the most interesting data points! Exclude the outliers from the analysis, and re-fit the model. lways report the existenc ce of outliers that were removed. If outliers are not mistakess but occur naturally due to long- the analysis run a robust reg tailed error distribution: do not exclude them from 13
14 Robust Reg > library(mss) > fit.rlm <- rlm(mortality ~ JanTemp + + log(so2), data= ) Relies on Huber s method Downweights the effect of outliers summary(fit.rlm) Coefficients: Value Std. Errorr t value (Intercept) JanTemp log(so2) Residual standard error: on 46 degrees of freedom 14
15 Polynomial Regressi ion Polynomial Reg = Multiple Linear Reg!!! Goals: Y = β + β x+ β x fit a curvilinear relation - improve the fit between x and Y - determine the polynomial order d Example: + β + ε x d d - Savings dataset: personal savings ~ income per capita 15
16 Polynomial Regressi ion Fit Savings Data: Pol ynomial Reg Fit 0 5 sr ddpi 16
17 Polynomial Regressi ion Output from the model with the linear term only: > summary(lm(sr ~ ddpi, data = savings)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-10 *** ddpi * --- Residual standard error: on 48 degrees of freedom Multiple R-squared: , djusted R-squared: F-statistic: on 1 and 48 DF, p-value:
18 Diagnostic Plots 1 Residuals vs Fitted Normal Q-Q Residua als Chile Japan Zambia Stan ndardized residuals Libya Chile Japan Fitted values Theoretical Quantiles 18
19 Diagnostic Plots 2 Cook's distance Residuals vs Leverage Libya Japan 2 Cook's dis stance Japan Jamaica residuals Sta andardized Jamaica Cook's distance Libya Obs. number Leverage 19
20 Quadratic Reg dd the quadratic term: Y 2 = β + β x+ β x + ε > summary(lm(sr ~ ddpi + I( (ddpi^2), data = savings)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** ddpi ** I(ddpi^2) * --- Residual standard error: on 47 degrees of freedom Multiple R-squared: 0.205, djusted R-squared: F-statistic: on 2 and 47 DF, p-value:
21 Diagnostic Plots: Qu uadratic Reg Residuals vs Fitted Normal Q-Q Japan 2 Japan Residu uals Chile Korea Sta andardized d residuals Chile Korea Fitted values Theoretical Quantiles 21
22 Diagnostic Plots: Qu uadratic Reg Cook's distance Residuals vs Leverage Libya 2 Japan Cook's dis stance Japan Jamaicaa Sta andardized d residuals Jamaica Cook's distance Libya Obs. number Leverage 22
23 Cubic Reg Y dd the cubic term: = β + β x+ β x + β x + ε β 0 > summary(lm(sr~ddpi + I(ddpi^2) + I(ddpi^3), data = savings) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 5.145e e * ddpi 1.746e e I(ddpi^2) e e I(ddpi^3) e e Residual standard error: on 46 degrees of freedom Multiple R-squared: 0.205, djusted R-squared: F-statistic: on 3 and 46 DF, p-value:
24 Powers re Strongly Correlated Predictors! The smaller the x-range, the bigger the problem! > cor(cbind(ddpi, ddpi2=ddpi^2, ddpi3=ddpi^3)) ddpi ddpi2 ddpi ddpi ddpi Way out: use centered predictors! z = ( x x ) i i z = ( x x) 2 2 i i z = ( x x ) 3 3 i i ddpi3 24
25 Powers re Strongly Correlated Predictors! > summary(lm(sr~z.ddpi+i(z.ddpi^2)+i(z.ddpi^3),dat=z.savings) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) t) 1.042e e e < 2e-16 *** z.ddpi 1.059e e ** I(z.ddpi^2) e e e I(z.ddpi^3) e e Coefficients, standard errorr and tests are different Fitted values and global inference remain the same Not overly beneficial on this dataset! e careful: extrapolation with polynomials is dangerous! 25
26 Dummy Variables So far, we only considered continuous predictors: - temperature - distance - pressure - It is perfectly valid to have categorical predictors, too: - sex (male or female) - status variables (employed or unemployed) - working shift (day, evening, night) - Implementation in the reg g with dummy variables 26
27 Example: inary Categorical Variable The lathe dataset: Y - lifetime of a cutting tool in a lathe - x 1 speed of the machine in rpm x 2 - tool type or Dummy variable encoding: x 2 0 tool type = 1 tool type 27
28 Interpretation of the Model see blackboard > summary(lm(hours ~ rpm + Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 *** rpm e-05 *** tool e-09 *** --- tool, data = lathe)) Residual standard error: on 17 degrees of freedom Multiple R-squared: , djusted R-squared: F-statistic: on 2 and 17 DF, p-value: 3.086e-09 28
29 The Dummy Variable Fit Durability of Lathe Cutting Tools hours rpm 29
30 Model with Interactions Question: do the slopes nee ed to be identical? with the appropriate model, the answer is no! Y = β + β x + β x + β xx ε see blackboard for model interpretation 30
31 Different Slope for the Reg Lines Durability of Lathe Cuttin ng Tools: with Interaction hours rpm 31
32 Summary Output > summary(lm(hours rs ~ rpm * tool, data = lathe)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-06 *** rpm ** tool ** rpm:tool Residual standard error: on 16 degrees of freedom Multiple R-squared: , djusted R-squared: F-statistic: on 3 and 16 DF, p-value: 1.319e-08 32
33 How Complex the Model Needs to e? Question 1: do we need differ rent slopes for the two lines? H H against 0 : β 3 = 0 individual parameter test for the interaction term! Question 2: is there any difference altogether? H : β = β = 0 against : β3 0 H : β 0 and / or β this is a partial F-test we try to exclude interactioi nand dummy variable together R offers convenient functionality for these tests! 33
34 nova Output Summary output for the inte eraction model > fit1 <- lm(hours ~ rpm, data=lathe) > fit2 <- lm(hours ~ rpm * tool, data=lathe) > anova(fit1, fit2) Model 1: hours ~ rpm Model 2: hours ~ rpm * tool Res.Df RSS Df Sum of Sq F Pr(>F) e-08 *** no different slopes, but different intercept! 34
35 Categorical Input with More than 2 Levels There are now 3 tool types,, C: x2 x3 0 0 for observations of type 1 0 for observations of type 0 1 for observations of type C Main effect model: = 0 + With interactions: 0 Y β β x + β x + β x + ε Y = β + β x + β x + β x + β x x + β x x + ε
36 Three Types of Cutting Tools Durability of Lathe Cutting Tools: 3 Types hours C C C C C C C C C C rpm 36
37 Summary Output > summary(lm(hours ~ rpm * tool, data = abc.lathe) Coefficients:Estimate Std. Error t value Pr(> t ) (Intercept) e-07 *** rpm ** tool ** toolc rpm:tool rpm:toolc Residual standard error: 2.88 on 24 degrees of freedom Multiple R-squared: , djusted R-squared: F-statistic: on 5 and 24 DF, p-value: 9.064e-11 37
38 Inference with Categorical Predictors Do not perform individual hy ypothesis tests on factors! Question 1: do we have different slopes? H : β = 0 and β = against H : β4 0 and / or β5 0 Question 2: is there any difference altogether? H : β = β = β = β = 0 against H : any of β, β, β, β gain, Rprovides convenie ent functionality
39 nova Output > anova(fit.abc) nalysis of Variance Table Df Sum Sq Mean Sq F value Pr(>F) rpm *** tool e-11 *** rpm:tool * Residuals strong evidence that we need to distinguish the tools! weak evidence for the necessity of different slopes 39
Marcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design
Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zürich, October 22, 2012 1 What is Regression?
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More information( ), which of the coefficients would end
Discussion Sheet 29.7.9 Qualitative Variables We have devoted most of our attention in multiple regression to quantitative or numerical variables. MR models can become more useful and complex when we consider
More informationActivity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression
Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationBooklet of Code and Output for STAC32 Final Exam
Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure
More informationLecture 6: Linear Regression
Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i
More informationDetecting and Assessing Data Outliers and Leverage Points
Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More information> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))
School of Mathematical Sciences MTH5120 Statistical Modelling I Tutorial 4 Solutions The first two models were looked at last week and both had flaws. The output for the third model with log y and a quadratic
More informationRegression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.
Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationTopic 14: Inference in Multiple Regression
Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationMultiple Predictor Variables: ANOVA
What if you manipulate two factors? Multiple Predictor Variables: ANOVA Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More informationIES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc
IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared
More informationProject Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang
Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations
More informationStatistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).
Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results
More informationRegression diagnostics
Regression diagnostics Leiden University Leiden, 30 April 2018 Outline 1 Error assumptions Introduction Variance Normality 2 Residual vs error Outliers Influential observations Introduction Errors and
More informationNotes for ISyE 6413 Design and Analysis of Experiments
Notes for ISyE 6413 Design and Analysis of Experiments Instructor : C. F. Jeff Wu School of Industrial and Systems Engineering Georgia Institute of Technology Text book : Experiments : Planning, Analysis,
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationSTAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis
STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationNC Births, ANOVA & F-tests
Math 158, Spring 2018 Jo Hardin Multiple Regression II R code Decomposition of Sums of Squares (and F-tests) NC Births, ANOVA & F-tests A description of the data is given at http://pages.pomona.edu/~jsh04747/courses/math58/
More informationExample: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA
s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation
More informationMultiple Predictor Variables: ANOVA
Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment
More informationIntroduction to the Analysis of Hierarchical and Longitudinal Data
Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationRegression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr
Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics
More informationChapter 13. Multiple Regression and Model Building
Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the
More information2. Outliers and inference for regression
Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16
More informationExample: 1982 State SAT Scores (First year state by state data available)
Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More informationLinear Regression Models
Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least
More informationMultiple Regression Part I STAT315, 19-20/3/2014
Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationUnit 11: Multiple Linear Regression
Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationReaction Days
Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)
More informationOutline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses
UCLA Department of Statistics Statistical Consulting Center Introduction to Regression in R Part II: Multivariate Linear Regression Denise Ferrari denise@stat.ucla.edu Outline 1 Preliminaries 2 Introduction
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationLecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran
Lecture 2 Linear Regression: A Model for the Mean Sharyn O Halloran Closer Look at: Linear Regression Model Least squares procedure Inferential tools Confidence and Prediction Intervals Assumptions Robustness
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationMixtures of Prior Distributions
Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 9, 2017 Bartlett s Paradox The Bayes factor for comparing M γ to the null model: BF
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More informationREGRESSION ANALYSIS AND INDICATOR VARIABLES
REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationWorkshop 7.4a: Single factor ANOVA
-1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.
More informationBivariate data analysis
Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More information> Y ~ X1 + X2. The tilde character separates the response variable from the explanatory variables. So in essence we fit the model
Regression Analysis Regression analysis is one of the most important topics in Statistical theory. In the sequel this widely known methodology will be used with S-Plus by means of formulae for models.
More informationMixtures of Prior Distributions
Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 10, 2016 Bartlett s Paradox The Bayes factor for comparing M γ to the null model:
More information22S39: Class Notes / November 14, 2000 back to start 1
Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics
More informationLecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables
Lecture 8. Using the CLR Model Relation between patent applications and R&D spending Variables PATENTS = No. of patents (in 000) filed RDEP = Expenditure on research&development (in billions of 99 $) The
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationData Analysis 1 LINEAR REGRESSION. Chapter 03
Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative
More informationIntroduction to Regression
Introduction to Regression ιατµηµατικό Πρόγραµµα Μεταπτυχιακών Σπουδών Τεχνο-Οικονοµικά Συστήµατα ηµήτρης Φουσκάκης Introduction Basic idea: Use data to identify relationships among variables and use these
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationExercise 2 SISG Association Mapping
Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected
More informationMulticollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.
Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear
More informationMultiple Regression: Example
Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationChapter 3: Multiple Regression. August 14, 2018
Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ
More informationStat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb
Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra
More informationDistribution Assumptions
Merlise Clyde Duke University November 22, 2016 Outline Topics Normality & Transformations Box-Cox Nonlinear Regression Readings: Christensen Chapter 13 & Wakefield Chapter 6 Linear Model Linear Model
More informationExplanatory Variables Must be Linear Independent...
Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationMultiple Linear Regression. Chapter 12
13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More informationChapter 12: Multiple Regression
Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x
More informationSTK4900/ Lecture 5. Program
STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward
More informationLinear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.
Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach
More informationCRP 272 Introduction To Regression Analysis
CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous
More informationOct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope
Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u
More informationSTA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007
STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.
More informationOrdinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!
Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much! OLS: Comparison of SLR and MLR Analysis Interpreting Coefficients I (SRF): Marginal effects ceteris paribus
More informationChapter 7: Variances. October 14, In this chapter we consider a variety of extensions to the linear model that allow for more gen-
Chapter 7: Variances October 14, 2018 In this chapter we consider a variety of extensions to the linear model that allow for more gen- eral variance structures than the independent, identically distributed
More informationStat 5303 (Oehlert): Models for Interaction 1
Stat 5303 (Oehlert): Models for Interaction 1 > names(emp08.10) Recall the amylase activity data from example 8.10 [1] "atemp" "gtemp" "variety" "amylase" > amylase.data
More informationLinear Regression. Furthermore, it is simple.
Linear Regression While linear regression has limited value in the classification problem, it is often very useful in predicting a numerical response, on a linear or ratio scale. Furthermore, it is simple.
More informationLecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II
Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies
More informationLinear Regression with Multiple Regressors
Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution
More information