Operators and the Formula Argument in lm

Size: px
Start display at page:

Download "Operators and the Formula Argument in lm"

Transcription

1 Operators and the Formula Argument in lm Recall that the first argument of lm (the formula argument) took the form y. or y x (recall that the term on the left of the told lm what the response variable is and the term on the right tells lm what the predictor variable(s) is(are)). To frame the following discussion, please load the the fuel economy dataset into R and save it as a data frame named cars (see Module 2 if you don t remember how to do this!). Exercises: 1. What model would you expect to be fit with the following formula argument? Fuel. Economy Weight + Horsepower + Seating Does the model include a predictor which is numerically equal to the following vector? c a r s $Weight + c a r s $Horsepower + c a r s $ Seating 2. What model do you think is fit when the formula argument is as below? Fuel. Economy. Seating Cylinders Run lm with this formula and see if the result matches your intuition. 3. How many terms are included in the model when the formula argument is as below? Fuel. Economy Weight Fuel. Capacity Is this what you had expected? 4. What models are being fit by the following code? How does model1 compare with model2? Is this what you expected? > model1 < lm( Fuel. Economy Weight, data = c a r s ) > model2 < lm( Fuel. Economy Weight ˆ2, data=c a r s ) 5. Based on your answer to Exercise 4, do you expect model3 and model4 to be different? > model3 < lm( Fuel. Economy ( Weight + Horsepower ), data = c a r s ) > model4 < lm( Fuel. Economy ( Weight + Horsepower )ˆ2, data=c a r s ) Be sure to run lm to check your intuition! 1

2 In the above examples and exercises, we have seen several operators which don t behave as they normally do outside of lm. For instance, the + tells lm to include Weight, Horsepower, and Seating as separate predictors in the model rather than first computing the sum of the three variables and then using that as a single predictor in a simple regression model. Similarly, the * tells lm to include an interaction between Weight and Fuel.Capacity in addition to Weight and Fuel Capacity themselves (the interaction term is numerically equal to Weight Fuel.Capacity). This interaction term is denoted Weight:Fuel.Capacity in the summary of the model fit. > model2 < lm( Fuel. Economy Weight Fuel. Capacity, data = c a r s ) > summary( model2 ) Call : lm( formula = Fuel. Economy Weight Fuel. Capacity, data = c a r s ) Residuals : Min 1Q Median 3Q Max C o e f f i c i e n t s : Estimate Std. Error t value Pr( > t ) ( I n t e r c e p t ) < 2e 16 Weight e 15 Fuel. Capacity < 2e 16 Weight : Fuel. Capacity e 12 S i g n i f. codes : Residual standard e r r o r : 3.39 on 226 degrees o f freedom Multiple R squared : , Adjusted R squared : F s t a t i s t i c : on 3 and 226 DF, p value : < 2. 2 e 16 We also observe that ˆ did not act like the usual exponentiation operator. We ve now seen a couple of examples of operators like +,,, ˆ which do not behave in the usual arithmetic sense when they are used in the the formula argument of lm 1 Table 1 summarize the meaning of these operators in lm 1 This phenomenon is known as operator overloading. 2

3 Operator Meaning + include the following predictor in the model - exclude the following predictor in the model * introduce these predictors along with interaction ˆ include predictors and higher-order interactions. include all predictors Table 1: Meaning of operators in lm The ˆ operator tells lm to include all interaction terms up to a specified order. As we saw in model4 from Exercise 5 above, when we had (Weight + Horsepower)ˆ2 in the formula, lm fit a model Weight, Horsepower, and an interaction term between them. This interaction term, because it involves two terms, is known as a second-order interaction. In general, if we wrote ˆk in the formula, where k is a positive integer, lm will include terms for all possible interactions of k different predictors. Note that when there are fewer than k predictors, lm will include interactions up to the largest possible order. To see this, consider the two models below. new. model1 < lm( Fuel. Economy. ˆ 3, data=c a r s ) new. model2 < lm( Fuel. Economy ( Weight + Horsepower )ˆ3, data=c a r s ) In the first, we have included all possible two-way and three-way interactions and in the second we have only included two-way interactions (since we are only passing two different predictors to lm). 3

4 Transformations As we saw in Table 1, many standard arithmetic operators have alternate meanings within lm. In particular, if we decided, for whatever reason, that we wanted to predict fuel economy using the following variables (Weight + Horsepower) 2 and (Height Width Length), it would appear that we re in a bit of a bind. In order to prevent operator overloading (e.g. when we really want + to mean plus ) we use I(expr) as follows: > model < lm( Fuel. Economy I ( ( Weight+Horsepower )ˆ2)+ I ( Length Weight Height ), data=c a r s ) The syntax I(expr) tells lm that it should evaluate the expression expr normally and use the result as a predictor in the linear model that is fit. Note, for other transformations like log or square-root you don t need to use I (...) : > model < lm( log ( Fuel. Economy) log ( Weight + 1) + sqrt ( Fuel. Capacity ), data = c a r s ) 4

5 Categorical Variables Formally, there is no new syntax required to actually fit a model with categorical predictors. However, as you should recall from other statistics classes, when including categorical predictors we need to include a reference level 2. So if you have a categorical predictor that has k values, R will include k 1 dummy variables in the regression model (and so it will only estimate k 1 coefficients for the categorical predictor). To demonstrate how we can include categorical predictors in linear regression models, we will use the updated fuel economy dataset includes the manufacturer, type of transmission, and number of transmission speeds for each car in the original fuel economy dataset. If we load this data into R and look at summaries of these three new variables, we see that there are 121 cars with automatic transmission and 109 with manual transmission. > c a r s 2 < read. table ( f i l e = c a r s 2. csv, header = TRUE) > summary( c a r s 2 [, c ( Transmission. Speeds, Transmission, Manufacturer ) ] ) Transmission. Speeds Transmission Manufacturer Min. : Automatic :121 Toyota : 15 1 s t Qu. : Manual :109 Chevrolet : 14 Median : Ford : 14 Mean : Mazda : 9 3 rd Qu. : Mercedes Benz : 9 Max. : Nissan : 9 ( Other ) :160 As we see in the above, when we look at the summary of a categorical variable, R reports the counts for each level. By default, R treats the first level as the reference level and to determine which is the first levels, we can use the levels function: levels ( c a r s 2 $Manufacturer ) [ 1 ] Acura Aston Martin Audi BMW [ 5 ] Buick C a d i l l a c Chevrolet Chrysler [ 9 ] Dodge F e r r a r r i Ford GMC [ 1 3 ] Honda Hyundai I n f i n i t i Isuzu [ 1 7 ] Jaguar Jeep Kia Lamborghini [ 2 1 ] Lexus Lincoln Lotus Maserati [ 2 5 ] Mazda Mercedes Benz Mercury MINI [ 2 9 ] Mitsubishi Nissan Oldsmobile Panoz [ 3 3 ] Pontiac Porsche Rolls Royce Saab [ 3 7 ] Saturn Subaru Suzuki Toyota [ 4 1 ] Volkswagen Volvo 2 This is so that the model is identifiable 5

6 As we see from above the reference manufacturer is Acura and in general, when it reads in a categorical variable, R will sort the levels alphabetically. If we want to change the reference level, we can use the relevel function: > c a r s 2 $Manufacturer < relevel ( c a r s 2 $Manufacturer, r e f= Rolls Royce ) In the context of modeling fuel economy, one might argue that we should be treating the number of transmission speeds as a categorical variable. However, as we saw above, R treats this variable as a scalar. To convert it to a categorical variable, we use as.factor > c a r s 2 $Transmission. Speeds < as. factor ( c a r s 2 $Transmission. Speeds ) > summary( c a r s 2 $Transmission. Speeds ) We see that there are 69 4-speed cars in our dataset and this is the reference category. 6

7 Logistic Regression While linear regression is a very useful tool, we note that it is not always the most appropriate tool to use. Returning to the fuel economy example, suppose we with to predict only whether or not a car will get more than 25 miles per gallon using the available predictors. We can create a new dummy variable in our data frame cars2 which is equal to 1 if a car gets more than 25 miles per gallon and zero otherwise as follows: > c a r s 2 $newy < as. numeric( c a r s 2 $Fuel. Economy>= 25) Now, we are dealing with a binary outcome and a linear regression model is not necessarily the most useful model. Instead, we ought to consider a logistic regression model, which can be fit in R using glm as follows: > l o g i s t i c 1 < glm (..., family = binomial, data =... ) The first argument of glm is the formula argument, which is formatted exactly like the formula in lm and the data argument is also analogous to that of lm. The only new syntax in glm is the family argument (for more information on this argument, please see the help page for glm). Exercises 1. Fit a logistic regression model with outcome variable cars2$newy to predict the logodds of a car having a fuel economy greater than 25 MPG using its weight, type of transmission, and manufacturer. 2. Fit a logistic regression model to include all two-way interactions between the available predictors. Note that you should not include Fuel.Economy as a predictor. 7

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.2 with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 5: Exploring Data: Distributions Lesson Plan Lesson Plan Exploring Data Displaying Distributions: Histograms For All Practical Purposes Mathematical Literacy in Today s World, 7th ed. Interpreting Histograms Displaying Distributions: Stemplots Describing

More information

HawkEye Pro. NEW and EXCLUSIVE Professional Diagnostic tool for the workshop or mobile technician. Fully unlocked for ALL Land Rover vehicles*

HawkEye Pro. NEW and EXCLUSIVE Professional Diagnostic tool for the workshop or mobile technician. Fully unlocked for ALL Land Rover vehicles* NEW and EXCLUSIVE Professional Diagnostic tool for the workshop or mobile technician Fully unlocked for ALL Land Rover vehicles* * Exclusions Apply FREELANDER DEFENDER DISCOVERY RANGE ROVER A New diagnostic

More information

O.E. Alloy Wheel Weight Applications

O.E. Alloy Wheel Weight Applications O.E. Alloy Wheel Weight Applications Passenger Cars Vehicle Model 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 n Acura n AUDI 1 All EN EN EN EN EN EN EN EN EN EN EN n BMW 1 All IAWbo IAWbo IAWbo

More information

Chapter 24: Comparing means

Chapter 24: Comparing means Chapter 4: Comparing means Example: Consumer Reports annually conducts a survey of automobile reliability Approximately 4 million households are surveyed by mail, The 990 survey is summarized in the Figure

More information

Name. City Weight Model MPG

Name. City Weight Model MPG Name The following table reports the EPA s city miles per gallon rating and the weight (in lbs.) for the sports cars described in Consumer Reports 99 New Car Buying Guide. (The EPA rating for the Audii

More information

We will now find the one line that best fits the data on a scatter plot.

We will now find the one line that best fits the data on a scatter plot. General Education Statistics Class Notes Least-Squares Regression (Section 4.2) We will now find the one line that best fits the data on a scatter plot. We have seen how two variables can be correlated

More information

Multiple Regression: Mixed Predictor Types. Tim Frasier

Multiple Regression: Mixed Predictor Types. Tim Frasier Multiple Regression: Mixed Predictor Types Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. The

More information

42 GEO Metro Japan

42 GEO Metro Japan Statistics 101 106 Lecture 11 (17 November 98) c David Pollard Page 1 Read M&M Chapters 2 and 11 again. Section leaders will decide how much of Chapters 12 and 13 to cover formally; they will assign the

More information

Lab #5 - Predictive Regression I Econ 224 September 11th, 2018

Lab #5 - Predictive Regression I Econ 224 September 11th, 2018 Lab #5 - Predictive Regression I Econ 224 September 11th, 2018 Introduction This lab provides a crash course on least squares regression in R. In the interest of time we ll work with a very simple, but

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

Bias, Variance and Parsimony in Regression Analysis. ECS 256 Winter 2014

Bias, Variance and Parsimony in Regression Analysis. ECS 256 Winter 2014 Bias, Variance and Parsimony in Regression Analysis ECS 256 Winter 2014 Christopher Patton, cjpatton@ucdavis.edu Alex Rumbaugh, aprumbaugh@ucdavis.edu Thomas Provan,tcprovan@ucdavis.edu Olga Prilepova,

More information

The Great Lakes Coffee Roasting Company makes 500 cups of coffee per day. Each day they use 20 pounds of coffee beans a day.

The Great Lakes Coffee Roasting Company makes 500 cups of coffee per day. Each day they use 20 pounds of coffee beans a day. auto KNOW 1. The new G-class displacement is 3,982 and the Mercedes-AMG CLS 53 Edition is 2,999. What is the difference between the two displacements???? 983 Solve: D = 2. Volkswagen has 14 cars on the

More information

Solution to Series 11

Solution to Series 11 Prof. Dr. M. Maathuis Multivariate Statistics SS 2014 Solution to Series 11 1. a) > car

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

A. Robotic Vehicle. B. Computer Vision. C. Control Systems D. LIDAR. E. Odometry F. SURUS. G. Radar H. V2V

A. Robotic Vehicle. B. Computer Vision. C. Control Systems D. LIDAR. E. Odometry F. SURUS. G. Radar H. V2V Vehicles are our main source of transportation, which is why auto manufacturers are using innovation and technology to produce autonomous, or self-driving, cars. This type of vehicle is capable of sensing

More information

The Great Lakes Coffee Roasting Company makes 500 cups of coffee per day. Each day they use 20 pounds of coffee beans a day.

The Great Lakes Coffee Roasting Company makes 500 cups of coffee per day. Each day they use 20 pounds of coffee beans a day. auto KNOW 1. The new G-class displacement is 3,982 and the Mercedes-AMG CLS 53 Edition is 2,999. What is the difference between the two displacements???? Solve: D = 2. Volkswagen has 14 cars on the show

More information

France FRANCE Q HIGHLIGHTS COVERAGE CONTENT. Country Statistics for France

France FRANCE Q HIGHLIGHTS COVERAGE CONTENT. Country Statistics for France FRANCE Q2 2008 HIGHLIGHTS France COVERAGE The area covers the countries of France, Andorra and Monaco. The NAVTEQ map of France covers 100% of the population as Prime Coverage. This release includes 1,254,870

More information

Motor Trend Car Road Analysis

Motor Trend Car Road Analysis Motor Trend Car Road Analysis Zakia Sultana February 28, 2016 Executive Summary You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are

More information

Chapter 3 - Linear Regression

Chapter 3 - Linear Regression Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to

More information

Multiple Variable Analysis

Multiple Variable Analysis Multiple Variable Analysis Revised: 10/11/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 4 Scatterplot Matrix... 4 Summary Statistics... 6 Confidence Intervals... 7 Correlations...

More information

R package ggplot2 STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley

R package ggplot2 STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley R package ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Scatterplot with "ggplot2"

More information

Data and slope and y-intercepts, Oh My! Linear Regression in the Common Core

Data and slope and y-intercepts, Oh My! Linear Regression in the Common Core Data and slope and y-intercepts, Oh My! Linear Regression in the Common Core Jared Derksen mrmathman.com/talks jared@mrmathman.com Outline I. Data collection-cheerios II. History lesson III. More data

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

Fixed-Point Approaches to Computing Bertrand-Nash Equilibrium Prices Under Mixed-Logit Demand

Fixed-Point Approaches to Computing Bertrand-Nash Equilibrium Prices Under Mixed-Logit Demand OPERATIONS RESEARCH Vol. 59, No. 2, March April 2011, pp. 328 345 issn 0030-364X eissn 1526-5463 11 5902 0328 doi 10.1287/opre.1100.0894 2011 INFORMS Fixed-Point Approaches to Computing Bertrand-Nash Equilibrium

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

b. Write the rule for a function that has your line as its graph. a. What shadow location would you predict when the flag height is12 feet?

b. Write the rule for a function that has your line as its graph. a. What shadow location would you predict when the flag height is12 feet? Regression and Correlation Shadows On sunny days, every vertical object casts a shadow that is related to its height. The following graph shows data from measurements of flag height and shadow location,

More information

LINK Gooding & Co auction Montery 2015 August 15/

LINK Gooding & Co auction Montery 2015 August 15/ LINK Gooding & Co auction Montery 2015 August 15/16 2015 Lot Description Lower Estimate Median Estimate Upper Estimate Hammer price Price with premium Key In lower half of estimate In upper half of estimate

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Chapter 3 Assignment

Chapter 3 Assignment Chapter 3 Assignment AP Statistics-Adams Name: Period: Date: 1. This scatterplot shows the overall percentage of on-time arrivals versus overall mishandled baggage per 1000 passengers for the year 2002.

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

We d like to know the equation of the line shown (the so called best fit or regression line).

We d like to know the equation of the line shown (the so called best fit or regression line). Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")

More information

STA102 Class Notes Chapter Logistic Regression

STA102 Class Notes Chapter Logistic Regression STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

More information

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson ) Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation

More information

Logistic Regression in R. by Kerry Machemer 12/04/2015

Logistic Regression in R. by Kerry Machemer 12/04/2015 Logistic Regression in R by Kerry Machemer 12/04/2015 Linear Regression {y i, x i1,, x ip } Linear Regression y i = dependent variable & x i = independent variable(s) y i = α + β 1 x i1 + + β p x ip +

More information

The Steps to Follow in a Multiple Regression Analysis

The Steps to Follow in a Multiple Regression Analysis ABSTRACT The Steps to Follow in a Multiple Regression Analysis Theresa Hoang Diem Ngo, Warner Bros. Home Video, Burbank, CA A multiple regression analysis is the most powerful tool that is widely used,

More information

Simple, Marginal, and Interaction Effects in General Linear Models

Simple, Marginal, and Interaction Effects in General Linear Models Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means

More information

Principal Components. Summary. Sample StatFolio: pca.sgp

Principal Components. Summary. Sample StatFolio: pca.sgp Principal Components Summary... 1 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 8 Component Weights... 9 D and 3D Component Plots... 10 Data Table... 11 D and 3D Component

More information

Sem. 1 Review Ch. 1-3

Sem. 1 Review Ch. 1-3 AP Stats Sem. 1 Review Ch. 1-3 Name 1. You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of variables you have measured is a. 1463; all quantitative. b.

More information

Explanatory Variables Must be Linear Independent...

Explanatory Variables Must be Linear Independent... Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships

More information

Generating OLS Results Manually via R

Generating OLS Results Manually via R Generating OLS Results Manually via R Sujan Bandyopadhyay Statistical softwares and packages have made it extremely easy for people to run regression analyses. Packages like lm in R or the reg command

More information

Polynomial Regression

Polynomial Regression Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...

More information

Canonical Correlations

Canonical Correlations Canonical Correlations Summary The Canonical Correlations procedure is designed to help identify associations between two sets of variables. It does so by finding linear combinations of the variables in

More information

Alternator Test Leads

Alternator Test Leads Alternator Test eads $34.92 $22.74 ield 897ACRCUT "A" Circuit Adapter (89700184) 897BM iat, Bosch $22.74 $34.40 D+ AM 897AM NCON 2004+ 897BT Bosch, ucas ndustrial $19.59 $19.59 W D+ D+ W 897B Bosch, ucas,

More information

Biol 206/306 Advanced Biostatistics Lab 5 Multiple Regression and Analysis of Covariance Fall 2016

Biol 206/306 Advanced Biostatistics Lab 5 Multiple Regression and Analysis of Covariance Fall 2016 Biol 206/306 Advanced Biostatistics Lab 5 Multiple Regression and Analysis of Covariance Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Extend your knowledge of bivariate OLS regression to

More information

Regression Methods for Survey Data

Regression Methods for Survey Data Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear

More information

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH Math 3339 Section 21155 MW 1-2:30pm SR 117 Bekki George bekki@math.uh.edu 206 PGH Office Hours: M 11-12:30pm & T,TH 10:00 11:00 am and by appointment Linear Regression (again) Consider the relationship

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression

Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu Chong Ma (Statistics, USC) STAT 201

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons 1. Suppose we wish to assess the impact of five treatments while blocking for study participant race (Black,

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

THE OFFICIAL SUMMER TYRES 2017

THE OFFICIAL SUMMER TYRES 2017 THE OFFICIAL SUMMER TYRES 2017 TECHNOLOGY TO MA XIMISE THE BEST CA RS' PERFORMA NCES ENGINEERED TO BE CUSTOM-MA DE A PERFECT SYNERGY BETW EEN PIRELLI A ND EACH INDIVIDUA L HIGH TECH CA R MODEL W E A RE

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Consumer Search and Prices in the Automobile Market

Consumer Search and Prices in the Automobile Market Consumer Search and Prices in the Automobile Market José Luis Moraga-González Zsolt Sándor Matthijs R. Wildenbeest First draft: December 2009 PRELIMINARY AND INCOMPLETE, COMMENTS WELCOME Abstract In many

More information

Regression 1: Linear Regression

Regression 1: Linear Regression Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear regression Linear regression in R Outline Classic linear regression Introduction Constructing the model Estimation

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

A NOVEL CONTROL ALGORITHM FOR INTEGRATION OF ACTIVE AND PASSIVE VEHICLE SAFETY SYSTEMS IN FRONTAL COLLISIONS

A NOVEL CONTROL ALGORITHM FOR INTEGRATION OF ACTIVE AND PASSIVE VEHICLE SAFETY SYSTEMS IN FRONTAL COLLISIONS A NOVEL CONTROL ALGORITHM FOR INTEGRATION OF ACTIVE AND PASSIVE VEHICLE SAFETY SYSTEMS IN FRONTAL COLLISIONS Daniel WALLNER Arno EICHBERGER Wolfgang HIRSCHBERG Institute of Automotive Engineering, Graz

More information

Estimating the Market Share Attraction Model using. Support Vector Regressions

Estimating the Market Share Attraction Model using. Support Vector Regressions Estimating the Market Share Attraction Model using Support Vector Regressions Georgi I. Nalbantov Philip Hans Franses Patrick J. F. Groenen Jan C. Bioch Econometric Institute Report EI27-6 Abstract We

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

(Refer Slide Time: 0:21)

(Refer Slide Time: 0:21) Theory of Computation Prof. Somenath Biswas Department of Computer Science and Engineering Indian Institute of Technology Kanpur Lecture 7 A generalisation of pumping lemma, Non-deterministic finite automata

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

STAT 420: Methods of Applied Statistics

STAT 420: Methods of Applied Statistics STAT 420: Methods of Applied Statistics Model Diagnostics Transformation Shiwei Lan, Ph.D. Course website: http://shiwei.stat.illinois.edu/lectures/stat420.html August 15, 2018 Department

More information

1 The basics of panel data

1 The basics of panel data Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Related materials: Steven Buck Notes to accompany fixed effects material 4-16-14 ˆ Wooldridge 5e, Ch. 1.3: The Structure of Economic Data ˆ Wooldridge

More information

Non-parametric Statistics

Non-parametric Statistics 45 Contents Non-parametric Statistics 45.1 Non-parametric Tests for a Single Sample 45. Non-parametric Tests for Two Samples 4 Learning outcomes You will learn about some significance tests which may be

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

12 Generalized linear models

12 Generalized linear models 12 Generalized linear models In this chapter, we combine regression models with other parametric probability models like the binomial and Poisson distributions. Binary responses In many situations, we

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

MODULE 6 LOGISTIC REGRESSION. Module Objectives: MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

0.1 ologit: Ordinal Logistic Regression for Ordered Categorical Dependent Variables

0.1 ologit: Ordinal Logistic Regression for Ordered Categorical Dependent Variables 0.1 ologit: Ordinal Logistic Regression for Ordered Categorical Dependent Variables Use the ordinal logit regression model if your dependent variable is ordered and categorical, either in the form of integer

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

Intro to Stats Lecture 11

Intro to Stats Lecture 11 Outliers and influential points Intro to Stats Lecture 11 Collect data this week! Midterm is coming! Terms X outliers: observations outlying the overall pattern of the X- variable Y outliers: observations

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

From the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models

From the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models The Stata Journal (2002) 2, Number 3, pp. 301 313 From the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models Mario A. Cleves, Ph.D. Department

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Commuting CO2 Emissions at UCDMC

Commuting CO2 Emissions at UCDMC Commuting CO2 Emissions at UCDMC 2016 ESTIMATES Joseph Lacap and Ernst Oehninger UNIVERSITY OF CALIFNORNIA, DAVIS DAVIS, CA 1 Contents 2 Table of Figures... 2 3 List of Tables... 2 4 Introduction... 2

More information

Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons

Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons 1. Suppose we wish to assess the impact of five treatments on an outcome Y. How would these five treatments

More information

Solving Word Problems

Solving Word Problems Bonus Activity (online only) Solving Word Problems Why Learning skills: defining the problem, defining knowns and validating Mathematical word problems (or story problems) require you to take real-life

More information

Package generalhoslem

Package generalhoslem Package generalhoslem December 2, 2017 Type Package Title Goodness of Fit Tests for Logistic Regression Models Version 1.3.2 Date 2017-12-02 Author Matthew Jay [aut, cre] Maintainer Matthew Jay

More information

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1 Simple, Marginal, and Interaction Effects in General Linear Models: Part 1 PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 2: August 24, 2012 PSYC 943: Lecture 2 Today s Class Centering and

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Outline. Regression 1: Linear Regression. Outline. Outline. Classic linear regression. Linear regression in R. Marco Baroni. Practical Statistics in R

Outline. Regression 1: Linear Regression. Outline. Outline. Classic linear regression. Linear regression in R. Marco Baroni. Practical Statistics in R Outline Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Outline Introduction Constructing the model Estimation Looking at the fitted model Introduction Constructing the model

More information

International Journal of Industrial Organization

International Journal of Industrial Organization International Journal Int. J. Ind. of Industrial Organ. 7 Organization (9) 5 63 7 (9) 5 63 Contents lists available at ScienceDirect International Journal of Industrial Organization journal homepage: www.elsevier.com/locate/ijio

More information

Topic 1. Definitions

Topic 1. Definitions S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Regression_Model_Project Md Ahmed June 13th, 2017

Regression_Model_Project Md Ahmed June 13th, 2017 Regression_Model_Project Md Ahmed June 13th, 2017 Executive Summary Motor Trend is a magazine about the automobile industry. It is interested in exploring the relationship between a set of variables and

More information