CLEAR EVIDENCE OF VOTING ANOMALIES IN BLADEN AND ROBESON COUNTIES RICHARD L. SMITH FEBRUARY 11, 2019

Size: px
Start display at page:

Download "CLEAR EVIDENCE OF VOTING ANOMALIES IN BLADEN AND ROBESON COUNTIES RICHARD L. SMITH FEBRUARY 11, 2019"

Transcription

1 CLEAR EVIDENCE OF VOTING ANOMALIES IN BLADEN AND ROBESON COUNTIES RICHARD L. SMITH FEBRUARY 11, 2019 This is a revision of an earlier commentary submitted on January 18. I am a professor of statistics at the University of North Carolina, Chapel Hill. As an exercise for students in one of my courses, I downloaded and asked the students to analyze data on absentee ballots in this election. The results showed clear evidence of missing absentee ballots in Bladen and Robeson Counties, with an excess, compared with normal random variation, of least 1,500 absentee ballots having not been returned in these two counties. The following commentary is based on my own analysis supplemented by those of the students. Major Conclusion This commentary addresses the allegation of absentee ballots being tampered with in Bladen and Robeson Counties. Without making any assumptions about what happened or who might have been responsible, I show that there was a large number of misplaced votes among the absentee ballots in these two counties absentee ballots that were requested but never returned, over and above what could have been expected from normal statistical variation. Based on the other 98 counties of North Carolina, I estimate that at least 1,500 votes were misplaced. If those votes had indeed been intended for the Democrat candidate, they would overturn the current majority of 905 votes in favor of the Republican candidate. 1

2 Methods The website of NCBOE includes a data file consisting of an anonymized record of all 2,111,797 absentee ballots requested for the November 6 election, and also the ultimate disposition of those ballots (whether they were accepted as valid votes or one of several possible alternate outcomes). From this data file I was able to calculate the percentage not returned, which I will abbreviate as PNR, for each for each of the 100 counties of North Carolina. For each county, PNR is defined as The plot on the previous page shows distribution of PNR over all 100 counties. In 98 counties (left side of picture), the PNR is between 0 and 4%, and the distribution closely follows a classic normal or bellshaped curve. Two counties, however, stand out as being very far from the other 98: they are Bladen and Robeson counties with PNR of 11.31% and 11.0% respectively. The variability of PNR over counties may be explained in various ways. Some of it can be explained by demographic factors such as race and educational level. The majority (probably at least 75% of the total variability) is just random. However, we can combine both the demographic and random components of variability to come up with predictions of the PNR for Bladen and Robeson Counties, based on the data in the other 98 counties. These predictions show what could be expected, in the absence of vote tampering or any other mechanism that might have explained why these two counties were anomalous. It is in the nature of any statistical analysis that predictions cannot be certain. We can allow for uncertainty by quoting a prediction interval, which is a range of values that include the desired prediction with specified probability. In this exercise, I have used 99% prediction intervals. This may be interpreted as meaning that, taking account of both demographic and random variability, the quoted interval will contain the quantity being predicted with a 99% probability, assuming that the same statistical model applies to Bladen and Robeson as the other 98 counties. The demographic variables selected were the mean travel time to work, the percentage of high school graduates, the percentage of blacks in a given county, and the percentage of Hispanics in a given county. Based on these demographic predictors, the predicted PNR for Bladen County is 1.6%, with a 99% prediction interval from 0.59% to 4.3%. The actual PNR in Bladen was 11.31% an excess of 7.01% over the upper end of the prediction interval. The number of absentee ballots requested in Bladen was 8,110, so I estimate there were at least 569 (7.01% of 8,110) misplaced votes in Bladen County. A corresponding calculation for Robeson county (with 16,069 requested absentee ballots) gives at least 1,197 misplaced votes there. Combining the two counties, there appear to have been at least 1,766 misplaced votes, well in excess of the 905 votes by which Mark Harris led the actual count. There are numerous possible variants of this analysis based on different demographic variables or different ways of handling the random error. The students in my class submitted numerous alternative analyses. However, in every analysis that I was able to verify, the number of misplaced votes was in excess of 1,500. 2

3 Details The file on Absentee Data was downloaded from the North Carolina Board of Elections ( Data on a number of demographic variables, including population size (as of 2017), the proportion of individuals in each county who have graduated at either high-school or college level, the proportions of Hispanics and non-hispanic Blacks in each county, and median income level for each county, were downloaded from various online sources such as This information was combined into a single file which may be downloaded from This file lists the PNR for each country as well as the following demographic variables: population (as of 2017), percentage rural, median age, mean travel time to work, percentage of high school graduates, percentage of college graduates, percentage of blacks and percentage of Hispanics. The file also lists the total number of absentee ballots requests in each county for the November 6, 2018, election. The idea of a regression analysis is to take a variable of interest (in this case, PNR), and express it in terms of the other variables that could affect it. It may be necessary to make various transformations, such as taking logarithms. There is no unique best way to do this, but using two well-established methods of variable selection (AIC and Backward Selection), the following model was identified: In words, the logarithm of PNR in a given county is expressed as a combination of the mean travel time to work (Travel), the percentage of high school graduates (HSGrad), the percentage of Black (Black) and the percentage of Hispanics (Hisp), plus a random error that accounts for unexplained variability among the counties. The model just described was fitted to the 98 counties that do not include Bladen and Robeson counties, and then used to predict the PNR for Bladen and Robeson. To be precise, I computed a 99% prediction interval for each of Bladen and Robeson counties, which means a range of values (separately for each county) that will, if the assumptions of the statistical analysis are correct, include the true value with probability Here, saying the assumptions of the statistical analysis are correct is allowing for normal random variation between counties, but not for any unusual circumstances such as tampering with the ballots. The results are: For Bladen county, predicted PNR is 1.6% with 99%-probability prediction limits of 0.59% and 4.3%; For Robeson county, predicted PNR is 1.31% with 99%-probability prediction limits of 0.48% and 3.55%. 3

4 Alternatively, one can ignore the demographic variables and simply make the predictions assuming random variability among counties. In this case, the predicted PNR for both Bladen and Robeson counties (based on the other 98) is 1.49% with 99%-probability prediction limits of 0.48% and 4.57%. The observed PNR for Bladen and Robeson counties are 11.31% and 11.0% respectively. We can now translate these assessments into actual counts of potentially misplaced votes. Under the first analysis above, for Bladen county, the upper limit of the prediction interval is 4.3%. The observed PNR was 11.31%, and the number of absentee ballots requested was 8,110. For Robeson county, the corresponding numbers are 3.55%, 11.0%, 16,069. Therefore, our estimated lower limit on the number of misplaced votes is 8,110 x ( )/ ,069 x ( )/100 = 1,766 (to the nearest whole number). Alternatively, based on the second analysis above, the corresponding calculation is 8,110 x ( )/ ,069 x ( )/100 = 1,580. Students in my class submitted a number of alternative versions of the analysis. Some of them framed the analysis to predict PNR directly, rather than the logarithm of PNR as in the above analysis. In addition, the students used different methods to decide which demographic variables to include, resulting in different model equations. In every analysis that I was able to verify, however, the estimated number of misplaced votes was at least 1,500. Recall that, in the initial count of votes, the Republican Mark Harris was leading by 905 votes. The statistical analysis given here, while supporting that there were misplaced absentee ballots, does not provide any evidence concerning which candidate would have received those votes. However, if allegations of ballot tampering are correct, there is at least a plausible argument that the misplaced votes were all intended for the Democrat candidate. I am available to answer questions as needed. 4

5 Appendix: Statistical Code in R R version ( ) -- "Eggshell Igloo" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: i386-w64-mingw32/i386 (32-bit) # R code for NC Voter Analysis # Richard L. Smith, February 2019 # load data Y=read.csv('C:/Users/rsmith/jan16/UNC/STOR556/Data/ProportionNotReturned.csv',header=T) # Plot for density par(cex=1.5) plot(density(100*y$pnr),main='percentage of Absentee Ballots Not Returned') rug(100*y$pnr) lines(density(100*y$pnr),lw=3) text(11,0.4,'bladen') lines(c(11.3,11),c(0,0.38)) text(9.2,0.3,'robeson') lines(c(11,9.2),c(0,0.28)) text(2,0.05,'all OTHER COUNTIES',cex=0.7) # Regression analysis using log (100*PNR) as response # Variable selection via AIC y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$collgrad+y$medinc+y$black+y$hisp, lm2=step(lm1) Start: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$MedInc + Y$Black + Y$Hisp - Y$MedInc Y$Collgrad Y$Travel Y$MedAge Y$Rural <none Y$Hisp Y$Pop Y$Hsgrad Y$Black

6 Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp - Y$Collgrad Y$MedAge Y$Travel Y$Rural <none Y$Hisp Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp - Y$MedAge Y$Travel Y$Rural Y$Hisp <none Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp - Y$Rural Y$Travel <none Y$Pop Y$Hisp Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp 6

7 - Y$Pop <none Y$Hisp Y$Travel Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp <none Y$Travel Y$Hisp Y$Hsgrad Y$Black lm1=lm2 # Write final model and compute prediction limits summary(lm1) lm(formula = y1 ~ Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) ** Y$Travel Y$Hsgrad ** Y$Black e-06 *** Y$Hisp * Residual standard error: on 93 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 93 DF, p-value: 7.393e-06 7

8 summary(lm1)$adj.r.squared [1] pr1=predict(lm1,se.fit=t,interval='prediction',level=0.99,weights=1) Warning message: In predict.lm(lm1, se.fit = T, interval = "prediction", level = 0.99, : predictions on current data refer to _future_ responses n1=(y$pnr[9]-exp(pr1$fit[9,3])/100)*y$absbal[9] n2=(y$pnr[78]-exp(pr1$fit[78,3])/100)*y$absbal[78] print(exp(pr1$fit[c(9,78),])) fit lwr upr print(c(n1,n2,n1+n2)) [1] print(y$absbal[c(9,78)]) [1] # Alternative solution by backward selection y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$collgrad+y$medinc+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$MedInc + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e * Y$Pop 5.082e e Y$Rural 2.638e e Y$MedAge e e Y$Travel 1.213e e Y$Hsgrad 3.192e e * Y$Collgrad e e Y$MedInc 5.502e e

9 Y$Black 1.195e e e-05 *** Y$Hisp 1.683e e Residual standard error: on 88 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 9 and 88 DF, p-value: y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$collgrad+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e * Y$Pop 5.083e e Y$Rural 2.601e e Y$MedAge e e Y$Travel 1.278e e Y$Hsgrad 3.207e e * Y$Collgrad e e Y$Black 1.192e e e-05 *** Y$Hisp 1.693e e Residual standard error: on 89 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 8 and 89 DF, p-value: y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$black+y$hisp, 9

10 summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e Y$Pop 4.285e e Y$Rural 2.633e e Y$MedAge e e Y$Travel 1.362e e Y$Hsgrad 2.562e e * Y$Black 1.212e e e-05 *** Y$Hisp 1.564e e Residual standard error: on 90 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 7 and 90 DF, p-value: 5.246e-05 y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$travel+y$hsgrad+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Rural + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts)

11 Estimate Std. Error t value Pr( t ) (Intercept) e e * Y$Pop 4.387e e Y$Rural 1.940e e Y$Travel 1.385e e Y$Hsgrad 2.582e e * Y$Black 1.263e e e-06 *** Y$Hisp 1.876e e Residual standard error: on 91 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 6 and 91 DF, p-value: 2.737e-05 y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$travel+y$hsgrad+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e * Y$Pop 3.230e e Y$Travel 1.919e e Y$Hsgrad 2.119e e * Y$Black 1.200e e e-06 *** Y$Hisp 1.596e e Residual standard error: on 92 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 92 DF, p-value: 1.308e-05 11

12 y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$travel+y$hsgrad+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) ** Y$Travel Y$Hsgrad ** Y$Black e-06 *** Y$Hisp * Residual standard error: on 93 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 93 DF, p-value: 7.393e-06 summary(lm1)$adj.r.squared [1] pr1=predict(lm1,se.fit=t,interval='prediction',level=0.99,weights=1) Warning message: In predict.lm(lm1, se.fit = T, interval = "prediction", level = 0.99, : predictions on current data refer to _future_ responses n1=(y$pnr[9]-exp(pr1$fit[9,3])/100)*y$absbal[9] n2=(y$pnr[78]-exp(pr1$fit[78,3])/100)*y$absbal[78] print(exp(pr1$fit[c(9,78),])) fit lwr upr print(c(n1,n2,n1+n2)) 12

13 [1] # alternative with no demographic covariates y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~1, summary(lm1) lm(formula = y1 ~ 1, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e-15 *** Residual standard error: on 97 degrees of freedom summary(lm1)$adj.r.squared [1] 0 pr1=predict(lm1,se.fit=t,interval='prediction',level=0.99,weights=1) Warning message: In predict.lm(lm1, se.fit = T, interval = "prediction", level = 0.99, : predictions on current data refer to _future_ responses n1=(y$pnr[9]-exp(pr1$fit[9,3])/100)*y$absbal[9] n2=(y$pnr[78]-exp(pr1$fit[78,3])/100)*y$absbal[78] print(exp(pr1$fit[c(9,78),])) fit lwr upr print(c(n1,n2,n1+n2)) [1] # same exercise using PNR directly with AIC for model selection y1=100*y$pnr wts=as.numeric(y$pnr<0.1) 13

14 lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$collgrad+y$medinc+y$black+y$hisp, lm2=step(lm1) Start: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$MedInc + Y$Black + Y$Hisp - Y$MedInc Y$Collgrad Y$Travel Y$Rural Y$MedAge Y$Hisp <none Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp - Y$Travel Y$MedAge Y$Hisp Y$Collgrad Y$Rural <none Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp - Y$MedAge Y$Hisp Y$Collgrad <none Y$Rural

15 - Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp - Y$Collgrad <none Y$Rural Y$Hisp Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$Hsgrad + Y$Black + Y$Hisp - Y$Hisp <none Y$Rural Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$Hsgrad + Y$Black - Y$Rural <none Y$Hsgrad Y$Pop Y$Black Step: AIC= y1 ~ Y$Pop + Y$Hsgrad + Y$Black <none Y$Hsgrad Y$Pop

16 - Y$Black lm1=lm2 # Write final model and compute prediction limits summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Hsgrad + Y$Black, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e Y$Pop 7.561e e Y$Hsgrad 2.277e e Y$Black 1.833e e e-05 *** Residual standard error: on 94 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 94 DF, p-value: 2.498e-05 summary(lm1)$adj.r.squared [1] ### The R-squared is not as large as for the log PNR model pr1=predict(lm1,se.fit=t,interval='prediction',level=0.99,weights=1) Warning message: In predict.lm(lm1, se.fit = T, interval = "prediction", level = 0.99, : predictions on current data refer to _future_ responses n1=(y$pnr[9]-(pr1$fit[9,3])/100)*y$absbal[9] n2=(y$pnr[78]-(pr1$fit[78,3])/100)*y$absbal[78] print(exp(pr1$fit[c(9,78),])) fit lwr upr print(c(n1,n2,n1+n2)) [1] ### final count 1888 higher than the counts quoted for the earlier models 16

Regression on Faithful with Section 9.3 content

Regression on Faithful with Section 9.3 content Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,

More information

We d like to know the equation of the line shown (the so called best fit or regression line).

We d like to know the equation of the line shown (the so called best fit or regression line). Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")

More information

lm statistics Chris Parrish

lm statistics Chris Parrish lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................

More information

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007 Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20 The Big

More information

Data Analysis Using R ASC & OIR

Data Analysis Using R ASC & OIR Data Analysis Using R ASC & OIR Overview } What is Statistics and the process of study design } Correlation } Simple Linear Regression } Multiple Linear Regression 2 What is Statistics? Statistics is a

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Lecture 19: Inference for SLR & Transformations

Lecture 19: Inference for SLR & Transformations Lecture 19: Inference for SLR & Transformations Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 Announcements Announcements HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Stat 401B Exam 2 Fall 2015

Stat 401B Exam 2 Fall 2015 Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Exercise 2 SISG Association Mapping

Exercise 2 SISG Association Mapping Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

Multiple Linear Regression. Chapter 12

Multiple Linear Regression. Chapter 12 13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

Hint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit

Hint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit Amherst College Department of Economics Economics 360 Fall 2014 Exam 1: Solutions 1. (10 points) The following table in reports the summary statistics for high and low temperatures in Key West, FL from

More information

2. Outliers and inference for regression

2. Outliers and inference for regression Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Two-Variable Regression Model: The Problem of Estimation

Two-Variable Regression Model: The Problem of Estimation Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

and the Sample Mean Random Sample

and the Sample Mean Random Sample MATH 183 Random Samples and the Sample Mean Dr. Neal, WKU Henceforth, we shall assume that we are studying a particular measurement X from a population! for which the mean µ and standard deviation! are

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Linear Probability Model

Linear Probability Model Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If

More information

Math 141. Lecture 20: Regression Remedies. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Math 141. Lecture 20: Regression Remedies. Albyn Jones 1.  jones/courses/ Library 304. Albyn Jones Math 141 Math 141 Lecture 20: Regression Remedies Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 LAST TIME Formal Inference: Hypothesis tests and Confidence Intervals for regression

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Chapter 12: Linear regression II

Chapter 12: Linear regression II Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model

More information

Stat 401B Exam 2 Fall 2017

Stat 401B Exam 2 Fall 2017 Stat 0B Exam Fall 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will

More information

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example The Big Picture Remedies after Model Diagnostics The Big Picture Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Residual plots

More information

The Statistical Sleuth in R: Chapter 7

The Statistical Sleuth in R: Chapter 7 The Statistical Sleuth in R: Chapter 7 Linda Loi Ruobing Zhang Kate Aloisio Nicholas J. Horton January 21, 2013 Contents 1 Introduction 1 2 The Big Bang 2 2.1 Summary statistics and graphical display........................

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

Linear Regression & Correlation

Linear Regression & Correlation Linear Regression & Correlation Jamie Monogan University of Georgia Introduction to Data Analysis Jamie Monogan (UGA) Linear Regression & Correlation POLS 7012 1 / 25 Objectives By the end of these meetings,

More information

Solutions: Monday, October 22

Solutions: Monday, October 22 Amherst College Department of Economics Economics 360 Fall 2012 1. Focus on the following agricultural data: Solutions: Monday, October 22 Agricultural Production Data: Cross section agricultural data

More information

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination Modeling kid s test scores (revisited) Lecture 20 - Model Selection Sta102 / BME102 Colin Rundel November 17, 2014 Predicting cognitive test scores of three- and four-year-old children using characteristics

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Class: Taylor. January 12, Story time: Dan Willingham, the Cog Psyc. Willingham: Professor of cognitive psychology at Harvard

Class: Taylor. January 12, Story time: Dan Willingham, the Cog Psyc. Willingham: Professor of cognitive psychology at Harvard Class: Taylor January 12, 2011 (pdf version) Story time: Dan Willingham, the Cog Psyc Willingham: Professor of cognitive psychology at Harvard Why students don t like school We know lots about psychology

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester 2008 9 Linear

More information

CRP 272 Introduction To Regression Analysis

CRP 272 Introduction To Regression Analysis CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous

More information

Math: Question 1 A. 4 B. 5 C. 6 D. 7

Math: Question 1 A. 4 B. 5 C. 6 D. 7 Math: Question 1 Abigail can read 200 words in one minute. If she were to read at this rate for 30 minutes each day, how many days would Abigail take to read 30,000 words of a book? A. 4 B. 5 C. 6 D. 7

More information

Example: 1982 State SAT Scores (First year state by state data available)

Example: 1982 State SAT Scores (First year state by state data available) Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data

More information

Math 10 - Compilation of Sample Exam Questions + Answers

Math 10 - Compilation of Sample Exam Questions + Answers Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

Variables and Variable De nitions

Variables and Variable De nitions APPENDIX A Variables and Variable De nitions All demographic county-level variables have been drawn directly from the 1970, 1980, and 1990 U.S. Censuses of Population, published by the U.S. Department

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Explanatory Variables Must be Linear Independent...

Explanatory Variables Must be Linear Independent... Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Community Health Needs Assessment through Spatial Regression Modeling

Community Health Needs Assessment through Spatial Regression Modeling Community Health Needs Assessment through Spatial Regression Modeling Glen D. Johnson, PhD CUNY School of Public Health glen.johnson@lehman.cuny.edu Objectives: Assess community needs with respect to particular

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Chapter 1. The Mathematics of Voting

Chapter 1. The Mathematics of Voting Introduction to Contemporary Mathematics Math 112 1.1. Preference Ballots and Preference Schedules Example (The Math Club Election) The math club is electing a new president. The candidates are Alisha

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

L21: Chapter 12: Linear regression

L21: Chapter 12: Linear regression L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample

More information

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015 Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

Math 141. Lecture 27: More Issues in Model Selection and Interpretation. Albyn Jones 1. 1 Library 304

Math 141. Lecture 27: More Issues in Model Selection and Interpretation. Albyn Jones 1. 1 Library 304 Math 141 Lecture 27: More Issues in Model Selection and Interpretation Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Confounding Confounding: a term from experimental

More information

Outline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses

Outline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses UCLA Department of Statistics Statistical Consulting Center Introduction to Regression in R Part II: Multivariate Linear Regression Denise Ferrari denise@stat.ucla.edu Outline 1 Preliminaries 2 Introduction

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018 Page 1 of 4 QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018 ECONOMICS 250 Introduction to Statistics Instructor: Gregor Smith Instructions: The exam

More information

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Supplemental Appendix to Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs" Yingying Dong University of California Irvine February 2018 Abstract This document provides

More information

Stat 401B Exam 3 Fall 2016 (Corrected Version)

Stat 401B Exam 3 Fall 2016 (Corrected Version) Stat 401B Exam 3 Fall 2016 (Corrected Version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied

More information

Consider fitting a model using ordinary least squares (OLS) regression:

Consider fitting a model using ordinary least squares (OLS) regression: Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful

More information

Econometrics Review questions for exam

Econometrics Review questions for exam Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables.

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables. Lab 13. Linear Regression www.nmt.edu/~olegm/382labs/lab13r.pdf Note: the things you will read or type on the computer are in the Typewriter Font. All the files mentioned can be found at www.nmt.edu/~olegm/382labs/

More information

Stat 401B Final Exam Fall 2015

Stat 401B Final Exam Fall 2015 Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison. Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose

More information

In the previous chapter, we learned how to use the method of least-squares

In the previous chapter, we learned how to use the method of least-squares 03-Kahane-45364.qxd 11/9/2007 4:40 PM Page 37 3 Model Performance and Evaluation In the previous chapter, we learned how to use the method of least-squares to find a line that best fits a scatter of points.

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination

More information

Foundations of Correlation and Regression

Foundations of Correlation and Regression BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual

More information

Chapter 5 Exercises 1

Chapter 5 Exercises 1 Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine

More information