Homework1 Yang Sun 2017/9/11

Size: px
Start display at page:

Download "Homework1 Yang Sun 2017/9/11"

Transcription

1 Homework1 Yang Sun 2017/9/11 1. Describe data According to the data description, the response variable is AmountSpent; the predictors are, Age, Gender, OwnHome, Married, Location, Salary, Children, History, Catalogs. 2. Statistical and graphical data summary 2.0 Initial Setup # Set workspace setwd("d:/dropbox/pitt/fall 18/IS 2160 Data Mining/Homeworks/HW1") # Import csv file here DirectMarketing <- read.csv("directmarketing.csv", header=true, stringsasfactors=false) # Data Summary summary(directmarketing) Age Gender OwnHome Length:1000 Length:1000 Length:1000 Class :character Class :character Class :character Mode :character Mode :character Mode :character Married Location Salary Children Length:1000 Length:1000 Min. : Min. :0.000 Class :character Class :character 1st Qu.: st Qu.:0.000 Mode :character Mode :character Median : Median :1.000 Mean : Mean : rd Qu.: rd Qu.:2.000 Max. : Max. :3.000 History Catalogs AmountSpent Length:1000 Min. : 6.00 Min. : 38.0 Class :character 1st Qu.: st Qu.: Mode :character Median :12.00 Median : Mean :14.68 Mean : rd Qu.: rd Qu.: Max. :24.00 Max. : a Missing Values # Check for missing values table(is.na(directmarketing)) 1

2 FALSE TRUE # Determing if that all missing values came from history table(is.na(directmarketing$history)) FALSE TRUE # Make the N/A field into 'None' as one category # Based on the data description, NA means that this customer has not yet purchased # Hence we cannot simply delete NAs, instead convert them into "None"" DirectMarketing [is.na(directmarketing)] <- "None" # Chekc again for missing values table(is.na(directmarketing)) FALSE No more missing values 2.b Generate Summary summary(directmarketing) Age Gender OwnHome Length:1000 Length:1000 Length:1000 Class :character Class :character Class :character Mode :character Mode :character Mode :character Married Location Salary Children Length:1000 Length:1000 Min. : Min. :0.000 Class :character Class :character 1st Qu.: st Qu.:0.000 Mode :character Mode :character Median : Median :1.000 Mean : Mean : rd Qu.: rd Qu.:2.000 Max. : Max. :3.000 History Catalogs AmountSpent Length:1000 Min. : 6.00 Min. : 38.0 Class :character 1st Qu.: st Qu.: Mode :character Median :12.00 Median : Mean :14.68 Mean : rd Qu.: rd Qu.: Max. :24.00 Max. : # Standard Deviation for numerical values sd(directmarketing$salary) 2

3 [1] # Standard Deviation for Salary sd(directmarketing$children) [1] # Standard Deviation for Children sd(directmarketing$catalogs) [1] # Standard Deviation for Catalogs sd(directmarketing$amountspent) [1] # Standard Deviation for AmountSpent # Convert all fields of char into factor cols <- c("age", "Gender", "OwnHome", "Married", "Location", "History") DirectMarketing[cols] <- lapply(directmarketing[cols], factor) # Do Summary again summary(directmarketing) Age Gender OwnHome Married Location Middle:508 Female:506 Own :516 Married:502 Close:710 Old :205 Male :494 Rent:484 Single :498 Far :290 Young :287 Salary Children History Catalogs Min. : Min. :0.000 High :255 Min. : st Qu.: st Qu.:0.000 Low :230 1st Qu.: 6.00 Median : Median :1.000 Medium:212 Median :12.00 Mean : Mean :0.934 None :303 Mean : rd Qu.: rd Qu.: rd Qu.:18.00 Max. : Max. :3.000 Max. :24.00 AmountSpent Min. : st Qu.: Median : Mean : rd Qu.: Max. : C.Kernel Density Plot AmountSpent Density Distribution library(ggplot2) load the plotting package ggplot(directmarketing, aes(x=amountspent)) + geom_density() + 3

4 labs(title = "AmountSpent Density Distribution") AmountSpent Density Distribution 6e 04 4e 04 density 2e 04 0e AmountSpent The density distribution for AmountSpent is right-skewed. Salary Density Distribution # For Salary ggplot(directmarketing, aes(x=salary)) + geom_density() + labs(title = "Salary Density Distribution") 4

5 1.2e 05 Salary Density Distribution 9.0e 06 density 6.0e e e Salary The density distribution for Slary is left-skewed, and it also has two peaks. 2.d correlatio and scatterplot for numerical predictor Correlation numpredictor <- as.data.frame(directmarketing[,c("salary", "Children", "Catalogs")]) responsevariable <- as.data.frame(directmarketing$amountspent) colnames(responsevariable)[1] <- "Amount Spent" cor(responsevariable, numpredictor) Salary Children Catalogs Amount Spent Scatterplot # Salary vs. AmountSpent # use ggplot theme_set(theme_bw()) set default theme with a white background ggplot(data=directmarketing, aes(x=salary,y=amountspent)) + geom_point() + geom_smooth(method=lm, # add linear regression line se=false) # (by default includes 95% confidence region) 5

6 6000 AmountSpent Salary # Children vs. AmountSpent ggplot(data=directmarketing, aes(x=children,y=amountspent)) + geom_point() + geom_smooth(method=lm, # add linear regression line se=false) # (by default includes 95% confidence region) 6

7 6000 AmountSpent Children # Catalogs vs. AmountSpent ggplot(data=directmarketing, aes(x=catalogs,y=amountspent)) + geom_point() + geom_smooth(method=lm, # add linear regression line se=false) # (by default includes 95% confidence region) 7

8 6000 AmountSpent Catalogs 2.e Conditional density plot for categorical predictor ggplot(data=directmarketing, aes(amountspent, colour = Age)) + geom_density() 8

9 density Age Middle Old Young AmountSpent ggplot(data=directmarketing, aes(amountspent, colour = Gender)) + geom_density() 9

10 6e 04 density 4e 04 Gender Female Male 2e 04 0e AmountSpent ggplot(data=directmarketing, aes(amountspent, colour = OwnHome)) + geom_density() 10

11 density OwnHome Own Rent AmountSpent ggplot(data=directmarketing, aes(amountspent, colour = Married)) + geom_density() 11

12 density Married Married Single AmountSpent ggplot(data=directmarketing, aes(amountspent, colour = Location)) + geom_density() 12

13 6e 04 density 4e 04 Location Close Far 2e 04 0e AmountSpent ggplot(data=directmarketing, aes(amountspent, colour = History)) + geom_density() 13

14 density History High Low Medium None AmountSpent 2.f Compare significantly different means #Age a1 <- mean(directmarketing$amountspent[directmarketing$age == "Young"]) a2 <- mean(directmarketing$amountspent[directmarketing$age == "Middle"]) a3 <- mean(directmarketing$amountspent[directmarketing$age == "Old"]) AgeMean <- data.frame("meanofamountspent" = c(a1, a2, a3)) rownames(agemean) <- c("age-young", "Age-Middle", "Age-Old") #Gender g1 <- mean(directmarketing$amountspent[directmarketing$gender == "Male"]) g2 <- mean(directmarketing$amountspent[directmarketing$gender == "Female"]) GenderMean <- data.frame("meanofamountspent" = c(g1, g2)) rownames(gendermean) <- c("gender-male", "Gender-Female") #OwnHome o1 <- mean(directmarketing$amountspent[directmarketing$ownhome == "Own"]) o2 <- mean(directmarketing$amountspent[directmarketing$ownhome == "Rent"]) OwnHomeMean <- data.frame("meanofamountspent" = c(o1, o2)) rownames(ownhomemean) <- c("ownhome-own", "OwnHome-Rent") #Married m1 <- mean(directmarketing$amountspent[directmarketing$married == "Married"]) m2 <- mean(directmarketing$amountspent[directmarketing$married == "Single"]) MarriedMean <- data.frame("meanofamountspent" = c(m1, m2)) rownames(marriedmean) <- c("married-married", "Married-Single") #Location 14

15 l1 <- mean(directmarketing$amountspent[directmarketing$location == "Far"]) l2 <- mean(directmarketing$amountspent[directmarketing$location == "Close"]) LocationMean <- data.frame("meanofamountspent" = c(l1, l2)) rownames(locationmean) <- c("location-far", "Location-Close") #History h1 <- mean(directmarketing$amountspent[directmarketing$history == "None"]) h2 <- mean(directmarketing$amountspent[directmarketing$history == "Low"]) h3 <- mean(directmarketing$amountspent[directmarketing$history == "Medium"]) h4 <- mean(directmarketing$amountspent[directmarketing$history == "High"]) HistoryMean <- data.frame("meanofamountspent" = c(h1, h2, h3, h4)) rownames(historymean) <- c("history-none", "History-Low", "History_Medium", "History-High") #Overall categorytable <- rbind(agemean,gendermean,ownhomemean,marriedmean,locationmean,historymean) categorytable MeanOfAmountSpent Age-Young Age-Middle Age-Old Gender-Male Gender-Female OwnHome-Own OwnHome-Rent Married-Married Married-Single Location-Far Location-Close History-None History-Low History_Medium History-High From both the conditional density plots and the table of means, it shows that for catagoty age, young among 3 age groups has a siginificantly different means. Similarly for OwnHome-Own vs OwnHome-Rent; Married-Married vs Maeeired-Single; the means for category History are all different. 3. Regression modeling and analysis 3.a Standard linear regression # Standard linear regression model with all predictors slr <- lm(amountspent~., data=directmarketing) summary(slr) Call: lm(formula = AmountSpent ~., data = DirectMarketing) Residuals: Min 1Q Median 3Q Max

16 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * AgeOld AgeYoung GenderMale OwnHomeRent MarriedSingle LocationFar < 2e-16 *** Salary < 2e-16 *** Children < 2e-16 *** HistoryLow e-08 *** HistoryMedium e-14 *** HistoryNone Catalogs < 2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 987 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 12 and 987 DF, p-value: < 2.2e-16 # RMSE y = DirectMarketing$AmountSpent mean.mse = mean((rep(mean(y),length(y)) - y)^2) model.mse = mean(residuals(slr)^2) rmse = sqrt(model.mse) rmse [1] Summary for standard linear regression model, r squared is , Adjusted R-squared is , RMSE is b Different combination of predictors in linear and non-linear models Out-of-Sample RMSE for standard linear regression n = length(directmarketing$amountspent) error = dim(n) for (k in 1:n) { train1 = c(1:n) train = train1[train1!=k] pick elements that are different from k slr = lm(amountspent ~., data=directmarketing[train,]) pred = predict(slr, newdat=directmarketing[-train,]) obs = DirectMarketing$AmountSpent[-train] error[k] = obs-pred } OSrmse=sqrt(mean(error^2)) OSrmse root mean square error (out-of-sample) [1]

17 Backward Stepwise Selection library(mass) slr <- lm(amountspent~., data=directmarketing) stepaic(slr, direction="backward") Start: AIC= AmountSpent ~ Age + Gender + OwnHome + Married + Location + Salary + Children + History + Catalogs Df Sum of Sq RSS AIC - Age OwnHome Married <none> Gender Children History Location Catalogs Salary Step: AIC= AmountSpent ~ Gender + OwnHome + Married + Location + Salary + Children + History + Catalogs Df Sum of Sq RSS AIC - Married OwnHome <none> Gender Children History Location Catalogs Salary Step: AIC= AmountSpent ~ Gender + OwnHome + Location + Salary + Children + History + Catalogs Df Sum of Sq RSS AIC - OwnHome <none> Gender Children History Location Catalogs Salary Step: AIC= AmountSpent ~ Gender + Location + Salary + Children + History + Catalogs 17

18 Df Sum of Sq RSS AIC <none> Gender Children History Location Catalogs Salary Call: lm(formula = AmountSpent ~ Gender + Location + Salary + Children + History + Catalogs, data = DirectMarketing) Coefficients: (Intercept) GenderMale LocationFar Salary Children HistoryLow HistoryMedium HistoryNone Catalogs Use new combinations of Gender + Location + Salary + Children + History + Catalogs newslr <- lm(amountspent~gender + Location + Salary + Children + History + Catalogs, data=directmarketing) summary(newslr) Call: lm(formula = AmountSpent ~ Gender + Location + Salary + Children + History + Catalogs, data = DirectMarketing) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e e ** GenderMale e e LocationFar 4.360e e < 2e-16 *** Salary 1.892e e < 2e-16 *** Children e e < 2e-16 *** HistoryLow e e e-08 *** HistoryMedium e e e-14 *** HistoryNone e e Catalogs 4.175e e < 2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 991 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 8 and 991 DF, p-value: < 2.2e-16 18

19 Out-of-Sample RMSE for new linear regression n = length(directmarketing$amountspent) error = dim(n) for (k in 1:n) { train1 = c(1:n) train = train1[train1!=k] pick elements that are different from k slr = lm(amountspent ~ Gender + Location + Salary + Children + History + Catalogs, data=directmarketing[train,]) pred = predict(slr, newdat=directmarketing[-train,]) obs = DirectMarketing$AmountSpent[-train] error[k] = obs-pred } OSrmse=sqrt(mean(error^2)) OSrmse root mean square error (out-of-sample) [1] Nonlinear regression 2-degree on Salary, Children and Catalogs nonlr <- lm(amountspent~ Gender + Location + poly(salary,degree=2) + poly(children,degree=2) + History + poly(catalogs,degree=2), data=directmarketing) summary(nonlr) Call: lm(formula = AmountSpent ~ Gender + Location + poly(salary, degree = 2) + poly(children, degree = 2) + History + poly(catalogs, degree = 2), data = DirectMarketing) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** GenderMale LocationFar < 2e-16 *** poly(salary, degree = 2) < 2e-16 *** poly(salary, degree = 2) poly(children, degree = 2) < 2e-16 *** poly(children, degree = 2) HistoryLow e-07 *** HistoryMedium e-14 *** HistoryNone poly(catalogs, degree = 2) < 2e-16 *** poly(catalogs, degree = 2)

20 Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 988 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 11 and 988 DF, p-value: < 2.2e-16 Out-of-Sample RMSE n = length(directmarketing$amountspent) error = dim(n) for (k in 1:n) { train1 = c(1:n) train = train1[train1!=k] pick elements that are different from k poly1 = lm(amountspent~ Gender + Location + poly(salary,degree=2) + poly(children,degree=2) + History + poly(catalogs,degree=2), data=directmarketing[train,] pred = predict(poly1, newdat=directmarketing[-train,]) obs = DirectMarketing$AmountSpent[-train] error[k] = obs-pred } nlrmse1=sqrt(mean(error^2)) nlrmse1 root mean square error (out-of-sample) [1] degree on Salary and Children nonlr1 <- lm(amountspent~ Gender + Location + poly(salary,degree=2) + poly(children,degree=2) + History, data=directmarketing) summary(nonlr1) Call: lm(formula = AmountSpent ~ Gender + Location + poly(salary, degree = 2) + poly(children, degree = 2) + History, data = DirectMarketing) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** GenderMale LocationFar < 2e-16 *** poly(salary, degree = 2) < 2e-16 *** poly(salary, degree = 2) poly(children, degree = 2) < 2e-16 *** poly(children, degree = 2) HistoryLow e-13 *** HistoryMedium < 2e-16 *** HistoryNone ** 20

21 --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 990 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 9 and 990 DF, p-value: < 2.2e-16 Out-of-Sample RMSE n = length(directmarketing$amountspent) error = dim(n) for (k in 1:n) { train1 = c(1:n) train = train1[train1!=k] pick elements that are different from k poly2 = lm(amountspent ~ Gender + Location + poly(salary,degree=2) + poly(children,degree=2) + History, data=directmarketing[train,]) pred = predict(poly2, newdat=directmarketing[-train,]) obs = DirectMarketing$AmountSpent[-train] error[k] = obs-pred } nlrmse1=sqrt(mean(error^2)) nlrmse1 root mean square error (out-of-sample) [1] degree on Salary, Children and Catalogs nonlr <- lm(amountspent~ Gender + Location + poly(salary,degree=3) + poly(children,degree=3) + History + poly(catalogs,degree=3), data=directmarketing) summary(nonlr) Call: lm(formula = AmountSpent ~ Gender + Location + poly(salary, degree = 3) + poly(children, degree = 3) + History + poly(catalogs, degree = 3), data = DirectMarketing) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** GenderMale LocationFar < 2e-16 *** poly(salary, degree = 3) < 2e-16 *** poly(salary, degree = 3) poly(salary, degree = 3) poly(children, degree = 3) < 2e-16 *** poly(children, degree = 3)

22 poly(children, degree = 3) HistoryLow e-07 *** HistoryMedium e-13 *** HistoryNone poly(catalogs, degree = 3) < 2e-16 *** poly(catalogs, degree = 3) poly(catalogs, degree = 3) Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 985 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 14 and 985 DF, p-value: < 2.2e-16 Out-of-Sample RMSE n = length(directmarketing$amountspent) error = dim(n) for (k in 1:n) { train1 = c(1:n) train = train1[train1!=k] pick elements that are different from k poly3 = lm(amountspent~ Gender + Location + poly(salary,degree=3) + poly(children,degree=3) + History + poly(catalogs,degree=3), data=directmarketing[train,] pred = predict(poly3, newdat=directmarketing[-train,]) obs = DirectMarketing$AmountSpent[-train] error[k] = obs-pred } nlrmse1=sqrt(mean(error^2)) nlrmse1 root mean square error (out-of-sample) [1] degree on Salary and Children nonlr1 <- lm(amountspent~ Gender + Location + poly(salary,degree=3) + poly(children,degree=3) + History, data=directmarketing) summary(nonlr1) Call: lm(formula = AmountSpent ~ Gender + Location + poly(salary, degree = 3) + poly(children, degree = 3) + History, data = DirectMarketing) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** GenderMale

23 LocationFar < 2e-16 *** poly(salary, degree = 3) < 2e-16 *** poly(salary, degree = 3) poly(salary, degree = 3) poly(children, degree = 3) < 2e-16 *** poly(children, degree = 3) poly(children, degree = 3) HistoryLow e-13 *** HistoryMedium < 2e-16 *** HistoryNone ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 988 degrees of freedom Multiple R-squared: 0.673, Adjusted R-squared: F-statistic: on 11 and 988 DF, p-value: < 2.2e-16 Out-of-Sample RMSE n = length(directmarketing$amountspent) error = dim(n) for (k in 1:n) { train1 = c(1:n) train = train1[train1!=k] pick elements that are different from k poly4 = lm(amountspent ~ Gender + Location + poly(salary,degree=3) + poly(children,degree=3) + History, data=directmarketing[train,]) pred = predict(poly4, newdat=directmarketing[-train,]) obs = DirectMarketing$AmountSpent[-train] error[k] = obs-pred } nlrmse1=sqrt(mean(error^2)) nlrmse1 root mean square error (out-of-sample) [1] Since multiple degree of polynomial did not improve the performance of the model comparing to standard linear regression and too many degrees of polynomial can cause overfitting, I decided to stop here. 3.c Best model and the most important predictor The original standard linear regression model is the best performanced one with RMSE of 482. When determining the important predictors, we look at its p value, the smaller the p value, the more important predictors will be. In this case, the important predictors are Location, Salary, Children, History and Catalogs. 23

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Exercise 2 SISG Association Mapping

Exercise 2 SISG Association Mapping Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected

More information

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 Packages................................ 2 2 Hospital infection risk data (some).................

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

Regression on Faithful with Section 9.3 content

Regression on Faithful with Section 9.3 content Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination Modeling kid s test scores (revisited) Lecture 20 - Model Selection Sta102 / BME102 Colin Rundel November 17, 2014 Predicting cognitive test scores of three- and four-year-old children using characteristics

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson ) Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation

More information

> Y ~ X1 + X2. The tilde character separates the response variable from the explanatory variables. So in essence we fit the model

> Y ~ X1 + X2. The tilde character separates the response variable from the explanatory variables. So in essence we fit the model Regression Analysis Regression analysis is one of the most important topics in Statistical theory. In the sequel this widely known methodology will be used with S-Plus by means of formulae for models.

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

Stat 4510/7510 Homework 7

Stat 4510/7510 Homework 7 Stat 4510/7510 Due: 1/10. Stat 4510/7510 Homework 7 1. Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Example: 1982 State SAT Scores (First year state by state data available)

Example: 1982 State SAT Scores (First year state by state data available) Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

GPA Chris Parrish January 18, 2016

GPA Chris Parrish January 18, 2016 Chris Parrish January 18, 2016 Contents Data..................................................... 1 Best subsets................................................. 4 Backward elimination...........................................

More information

STAT 572 Assignment 5 - Answers Due: March 2, 2007

STAT 572 Assignment 5 - Answers Due: March 2, 2007 1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

lm statistics Chris Parrish

lm statistics Chris Parrish lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models Chapter 14 Multiple Regression Models 1 Multiple Regression Models A general additive multiple regression model, which relates a dependent variable y to k predictor variables,,, is given by the model equation

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester 2008 9 Linear

More information

Psychology 405: Psychometric Theory

Psychology 405: Psychometric Theory Psychology 405: Psychometric Theory Homework Problem Set #2 Department of Psychology Northwestern University Evanston, Illinois USA April, 2017 1 / 15 Outline The problem, part 1) The Problem, Part 2)

More information

Multiple Regression: Example

Multiple Regression: Example Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c

More information

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Data Analysis 1 LINEAR REGRESSION. Chapter 03 Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Class: Dean Foster. September 30, Read sections: Examples chapter (chapter 3) Question today: Do prices go up faster than they go down?

Class: Dean Foster. September 30, Read sections: Examples chapter (chapter 3) Question today: Do prices go up faster than they go down? Class: Dean Foster September 30, 2013 Administrivia Read sections: Examples chapter (chapter 3) Gas prices Question today: Do prices go up faster than they go down? Idea is that sellers watch spot price

More information

Multivariate Analysis of Variance

Multivariate Analysis of Variance Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing Homework #1 Key Spring 2014 Psyx 501, Montana State University Prof. Colleen F Moore Preliminary comments: The design is a 4x3 factorial between-groups. Non-athletes do aerobic training for 6, 4 or 2 weeks,

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

movies Name:

movies Name: movies Name: 217-4-14 Contents movies.................................................... 1 USRevenue ~ Budget + Opening + Theaters + Opinion..................... 6 USRevenue ~ Opening + Opinion..................................

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 2: Linear Regression (v3) Ramesh Johari rjohari@stanford.edu September 29, 2017 1 / 36 Summarizing a sample 2 / 36 A sample Suppose Y = (Y 1,..., Y n ) is a sample of real-valued

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Interactions in Logistic Regression

Interactions in Logistic Regression Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/

More information

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison. Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Multiple Regression Part I STAT315, 19-20/3/2014

Multiple Regression Part I STAT315, 19-20/3/2014 Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Multiple Linear Regression. Chapter 12

Multiple Linear Regression. Chapter 12 13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

A course in statistical modelling. session 09: Modelling count variables

A course in statistical modelling. session 09: Modelling count variables A Course in Statistical Modelling SEED PGR methodology training December 08, 2015: 12 2pm session 09: Modelling count variables Graeme.Hutcheson@manchester.ac.uk blackboard: RSCH80000 SEED PGR Research

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile

More information

Stat 401B Exam 3 Fall 2016 (Corrected Version)

Stat 401B Exam 3 Fall 2016 (Corrected Version) Stat 401B Exam 3 Fall 2016 (Corrected Version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

Regression Methods for Survey Data

Regression Methods for Survey Data Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear

More information

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6 R in Linguistic Analysis Wassink 2012 University of Washington Week 6 Overview R for phoneticians and lab phonologists Johnson 3 Reading Qs Equivalence of means (t-tests) Multiple Regression Principal

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Analysis of Covariance: Comparing Regression Lines

Analysis of Covariance: Comparing Regression Lines Chapter 7 nalysis of Covariance: Comparing Regression ines Suppose that you are interested in comparing the typical lifetime (hours) of two tool types ( and ). simple analysis of the data given below would

More information

Stat 401B Exam 2 Fall 2016

Stat 401B Exam 2 Fall 2016 Stat 40B Eam Fall 06 I have neither given nor received unauthorized assistance on this eam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

MATH 423/533 - ASSIGNMENT 4 SOLUTIONS

MATH 423/533 - ASSIGNMENT 4 SOLUTIONS MATH 423/533 - ASSIGNMENT 4 SOLUTIONS INTRODUCTION This assignment concerns the use of factor predictors in linear regression modelling, and focusses on models with two factors X 1 and X 2 with M 1 and

More information

We d like to know the equation of the line shown (the so called best fit or regression line).

We d like to know the equation of the line shown (the so called best fit or regression line). Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Data Analysis Using R ASC & OIR

Data Analysis Using R ASC & OIR Data Analysis Using R ASC & OIR Overview } What is Statistics and the process of study design } Correlation } Simple Linear Regression } Multiple Linear Regression 2 What is Statistics? Statistics is a

More information

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017 STK 200 Oblig Zhou Siyu February 5, 207 Question a) Make a scatter box plot for the data set. Answer:Here is the code I used to plot the scatter box in R. library ( MASS ) 2 pairs ( Boston ) Figure : Scatter

More information

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Stat 5303 (Oehlert): Analysis of CR Designs; January

Stat 5303 (Oehlert): Analysis of CR Designs; January Stat 5303 (Oehlert): Analysis of CR Designs; January 2016 1 > resin

More information

Lecture 19: Inference for SLR & Transformations

Lecture 19: Inference for SLR & Transformations Lecture 19: Inference for SLR & Transformations Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 Announcements Announcements HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner

More information

The Statistical Sleuth in R: Chapter 5

The Statistical Sleuth in R: Chapter 5 The Statistical Sleuth in R: Chapter 5 Linda Loi Kate Aloisio Ruobing Zhang Nicholas J. Horton January 21, 2013 Contents 1 Introduction 1 2 Diet and lifespan 2 2.1 Summary statistics and graphical display........................

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Stat 8053, Fall 2013: Multinomial Logistic Models

Stat 8053, Fall 2013: Multinomial Logistic Models Stat 8053, Fall 2013: Multinomial Logistic Models Here is the example on page 269 of Agresti on food preference of alligators: s is size class, g is sex of the alligator, l is name of the lake, and f is

More information

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO

More information

Linear Regression Measurement & Evaluation of HCC Systems

Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Today s goal: Evaluate the effect of multiple variables on an outcome variable (regression) Outline: - Basic theory - Simple

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 NBA attendance data........................ 2 2 Regression model for NBA attendances...............

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

Regression in R I. Part I : Simple Linear Regression

Regression in R I. Part I : Simple Linear Regression UCLA Department of Statistics Statistical Consulting Center Regression in R Part I : Simple Linear Regression Denise Ferrari & Tiffany Head denise@stat.ucla.edu tiffany@stat.ucla.edu Feb 10, 2010 Objective

More information

Stat 328 Final Exam (Regression) Summer 2002 Professor Vardeman

Stat 328 Final Exam (Regression) Summer 2002 Professor Vardeman Stat Final Exam (Regression) Summer Professor Vardeman This exam concerns the analysis of 99 salary data for n = offensive backs in the NFL (This is a part of the larger data set that serves as the basis

More information

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room: STA0HF Term Test Oct 6, 005 Last Name: First Name: Student #: TA s Name: or Tutorial Room: Time allowed: hour and 45 minutes. Aids: one sided handwritten aid sheet + non-programmable calculator Statistical

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Using R in 200D Luke Sonnet

Using R in 200D Luke Sonnet Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

Outline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses

Outline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses UCLA Department of Statistics Statistical Consulting Center Introduction to Regression in R Part II: Multivariate Linear Regression Denise Ferrari denise@stat.ucla.edu Outline 1 Preliminaries 2 Introduction

More information

holding all other predictors constant

holding all other predictors constant Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing

More information

L21: Chapter 12: Linear regression

L21: Chapter 12: Linear regression L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample

More information

CASE STUDY: CYCLONES

CASE STUDY: CYCLONES CASE STUDY: CYCLONES DEFINITION Cyclones are defined as ``an atmospheric system in which the barometric pressure diminishes progressively to a minimum at the centre and toward which the winds blow spirally

More information

Inference with Heteroskedasticity

Inference with Heteroskedasticity Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination

More information