CLEAR EVIDENCE OF VOTING ANOMALIES IN BLADEN AND ROBESON COUNTIES RICHARD L. SMITH FEBRUARY 11, 2019
|
|
- Rosalyn Easter Houston
- 5 years ago
- Views:
Transcription
1 CLEAR EVIDENCE OF VOTING ANOMALIES IN BLADEN AND ROBESON COUNTIES RICHARD L. SMITH FEBRUARY 11, 2019 This is a revision of an earlier commentary submitted on January 18. I am a professor of statistics at the University of North Carolina, Chapel Hill. As an exercise for students in one of my courses, I downloaded and asked the students to analyze data on absentee ballots in this election. The results showed clear evidence of missing absentee ballots in Bladen and Robeson Counties, with an excess, compared with normal random variation, of least 1,500 absentee ballots having not been returned in these two counties. The following commentary is based on my own analysis supplemented by those of the students. Major Conclusion This commentary addresses the allegation of absentee ballots being tampered with in Bladen and Robeson Counties. Without making any assumptions about what happened or who might have been responsible, I show that there was a large number of misplaced votes among the absentee ballots in these two counties absentee ballots that were requested but never returned, over and above what could have been expected from normal statistical variation. Based on the other 98 counties of North Carolina, I estimate that at least 1,500 votes were misplaced. If those votes had indeed been intended for the Democrat candidate, they would overturn the current majority of 905 votes in favor of the Republican candidate. 1
2 Methods The website of NCBOE includes a data file consisting of an anonymized record of all 2,111,797 absentee ballots requested for the November 6 election, and also the ultimate disposition of those ballots (whether they were accepted as valid votes or one of several possible alternate outcomes). From this data file I was able to calculate the percentage not returned, which I will abbreviate as PNR, for each for each of the 100 counties of North Carolina. For each county, PNR is defined as The plot on the previous page shows distribution of PNR over all 100 counties. In 98 counties (left side of picture), the PNR is between 0 and 4%, and the distribution closely follows a classic normal or bellshaped curve. Two counties, however, stand out as being very far from the other 98: they are Bladen and Robeson counties with PNR of 11.31% and 11.0% respectively. The variability of PNR over counties may be explained in various ways. Some of it can be explained by demographic factors such as race and educational level. The majority (probably at least 75% of the total variability) is just random. However, we can combine both the demographic and random components of variability to come up with predictions of the PNR for Bladen and Robeson Counties, based on the data in the other 98 counties. These predictions show what could be expected, in the absence of vote tampering or any other mechanism that might have explained why these two counties were anomalous. It is in the nature of any statistical analysis that predictions cannot be certain. We can allow for uncertainty by quoting a prediction interval, which is a range of values that include the desired prediction with specified probability. In this exercise, I have used 99% prediction intervals. This may be interpreted as meaning that, taking account of both demographic and random variability, the quoted interval will contain the quantity being predicted with a 99% probability, assuming that the same statistical model applies to Bladen and Robeson as the other 98 counties. The demographic variables selected were the mean travel time to work, the percentage of high school graduates, the percentage of blacks in a given county, and the percentage of Hispanics in a given county. Based on these demographic predictors, the predicted PNR for Bladen County is 1.6%, with a 99% prediction interval from 0.59% to 4.3%. The actual PNR in Bladen was 11.31% an excess of 7.01% over the upper end of the prediction interval. The number of absentee ballots requested in Bladen was 8,110, so I estimate there were at least 569 (7.01% of 8,110) misplaced votes in Bladen County. A corresponding calculation for Robeson county (with 16,069 requested absentee ballots) gives at least 1,197 misplaced votes there. Combining the two counties, there appear to have been at least 1,766 misplaced votes, well in excess of the 905 votes by which Mark Harris led the actual count. There are numerous possible variants of this analysis based on different demographic variables or different ways of handling the random error. The students in my class submitted numerous alternative analyses. However, in every analysis that I was able to verify, the number of misplaced votes was in excess of 1,500. 2
3 Details The file on Absentee Data was downloaded from the North Carolina Board of Elections ( Data on a number of demographic variables, including population size (as of 2017), the proportion of individuals in each county who have graduated at either high-school or college level, the proportions of Hispanics and non-hispanic Blacks in each county, and median income level for each county, were downloaded from various online sources such as This information was combined into a single file which may be downloaded from This file lists the PNR for each country as well as the following demographic variables: population (as of 2017), percentage rural, median age, mean travel time to work, percentage of high school graduates, percentage of college graduates, percentage of blacks and percentage of Hispanics. The file also lists the total number of absentee ballots requests in each county for the November 6, 2018, election. The idea of a regression analysis is to take a variable of interest (in this case, PNR), and express it in terms of the other variables that could affect it. It may be necessary to make various transformations, such as taking logarithms. There is no unique best way to do this, but using two well-established methods of variable selection (AIC and Backward Selection), the following model was identified: In words, the logarithm of PNR in a given county is expressed as a combination of the mean travel time to work (Travel), the percentage of high school graduates (HSGrad), the percentage of Black (Black) and the percentage of Hispanics (Hisp), plus a random error that accounts for unexplained variability among the counties. The model just described was fitted to the 98 counties that do not include Bladen and Robeson counties, and then used to predict the PNR for Bladen and Robeson. To be precise, I computed a 99% prediction interval for each of Bladen and Robeson counties, which means a range of values (separately for each county) that will, if the assumptions of the statistical analysis are correct, include the true value with probability Here, saying the assumptions of the statistical analysis are correct is allowing for normal random variation between counties, but not for any unusual circumstances such as tampering with the ballots. The results are: For Bladen county, predicted PNR is 1.6% with 99%-probability prediction limits of 0.59% and 4.3%; For Robeson county, predicted PNR is 1.31% with 99%-probability prediction limits of 0.48% and 3.55%. 3
4 Alternatively, one can ignore the demographic variables and simply make the predictions assuming random variability among counties. In this case, the predicted PNR for both Bladen and Robeson counties (based on the other 98) is 1.49% with 99%-probability prediction limits of 0.48% and 4.57%. The observed PNR for Bladen and Robeson counties are 11.31% and 11.0% respectively. We can now translate these assessments into actual counts of potentially misplaced votes. Under the first analysis above, for Bladen county, the upper limit of the prediction interval is 4.3%. The observed PNR was 11.31%, and the number of absentee ballots requested was 8,110. For Robeson county, the corresponding numbers are 3.55%, 11.0%, 16,069. Therefore, our estimated lower limit on the number of misplaced votes is 8,110 x ( )/ ,069 x ( )/100 = 1,766 (to the nearest whole number). Alternatively, based on the second analysis above, the corresponding calculation is 8,110 x ( )/ ,069 x ( )/100 = 1,580. Students in my class submitted a number of alternative versions of the analysis. Some of them framed the analysis to predict PNR directly, rather than the logarithm of PNR as in the above analysis. In addition, the students used different methods to decide which demographic variables to include, resulting in different model equations. In every analysis that I was able to verify, however, the estimated number of misplaced votes was at least 1,500. Recall that, in the initial count of votes, the Republican Mark Harris was leading by 905 votes. The statistical analysis given here, while supporting that there were misplaced absentee ballots, does not provide any evidence concerning which candidate would have received those votes. However, if allegations of ballot tampering are correct, there is at least a plausible argument that the misplaced votes were all intended for the Democrat candidate. I am available to answer questions as needed. 4
5 Appendix: Statistical Code in R R version ( ) -- "Eggshell Igloo" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: i386-w64-mingw32/i386 (32-bit) # R code for NC Voter Analysis # Richard L. Smith, February 2019 # load data Y=read.csv('C:/Users/rsmith/jan16/UNC/STOR556/Data/ProportionNotReturned.csv',header=T) # Plot for density par(cex=1.5) plot(density(100*y$pnr),main='percentage of Absentee Ballots Not Returned') rug(100*y$pnr) lines(density(100*y$pnr),lw=3) text(11,0.4,'bladen') lines(c(11.3,11),c(0,0.38)) text(9.2,0.3,'robeson') lines(c(11,9.2),c(0,0.28)) text(2,0.05,'all OTHER COUNTIES',cex=0.7) # Regression analysis using log (100*PNR) as response # Variable selection via AIC y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$collgrad+y$medinc+y$black+y$hisp, lm2=step(lm1) Start: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$MedInc + Y$Black + Y$Hisp - Y$MedInc Y$Collgrad Y$Travel Y$MedAge Y$Rural <none Y$Hisp Y$Pop Y$Hsgrad Y$Black
6 Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp - Y$Collgrad Y$MedAge Y$Travel Y$Rural <none Y$Hisp Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp - Y$MedAge Y$Travel Y$Rural Y$Hisp <none Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp - Y$Rural Y$Travel <none Y$Pop Y$Hisp Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp 6
7 - Y$Pop <none Y$Hisp Y$Travel Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp <none Y$Travel Y$Hisp Y$Hsgrad Y$Black lm1=lm2 # Write final model and compute prediction limits summary(lm1) lm(formula = y1 ~ Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) ** Y$Travel Y$Hsgrad ** Y$Black e-06 *** Y$Hisp * Residual standard error: on 93 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 93 DF, p-value: 7.393e-06 7
8 summary(lm1)$adj.r.squared [1] pr1=predict(lm1,se.fit=t,interval='prediction',level=0.99,weights=1) Warning message: In predict.lm(lm1, se.fit = T, interval = "prediction", level = 0.99, : predictions on current data refer to _future_ responses n1=(y$pnr[9]-exp(pr1$fit[9,3])/100)*y$absbal[9] n2=(y$pnr[78]-exp(pr1$fit[78,3])/100)*y$absbal[78] print(exp(pr1$fit[c(9,78),])) fit lwr upr print(c(n1,n2,n1+n2)) [1] print(y$absbal[c(9,78)]) [1] # Alternative solution by backward selection y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$collgrad+y$medinc+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$MedInc + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e * Y$Pop 5.082e e Y$Rural 2.638e e Y$MedAge e e Y$Travel 1.213e e Y$Hsgrad 3.192e e * Y$Collgrad e e Y$MedInc 5.502e e
9 Y$Black 1.195e e e-05 *** Y$Hisp 1.683e e Residual standard error: on 88 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 9 and 88 DF, p-value: y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$collgrad+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e * Y$Pop 5.083e e Y$Rural 2.601e e Y$MedAge e e Y$Travel 1.278e e Y$Hsgrad 3.207e e * Y$Collgrad e e Y$Black 1.192e e e-05 *** Y$Hisp 1.693e e Residual standard error: on 89 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 8 and 89 DF, p-value: y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$black+y$hisp, 9
10 summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e Y$Pop 4.285e e Y$Rural 2.633e e Y$MedAge e e Y$Travel 1.362e e Y$Hsgrad 2.562e e * Y$Black 1.212e e e-05 *** Y$Hisp 1.564e e Residual standard error: on 90 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 7 and 90 DF, p-value: 5.246e-05 y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$rural+y$travel+y$hsgrad+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Rural + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts)
11 Estimate Std. Error t value Pr( t ) (Intercept) e e * Y$Pop 4.387e e Y$Rural 1.940e e Y$Travel 1.385e e Y$Hsgrad 2.582e e * Y$Black 1.263e e e-06 *** Y$Hisp 1.876e e Residual standard error: on 91 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 6 and 91 DF, p-value: 2.737e-05 y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$pop+y$travel+y$hsgrad+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e * Y$Pop 3.230e e Y$Travel 1.919e e Y$Hsgrad 2.119e e * Y$Black 1.200e e e-06 *** Y$Hisp 1.596e e Residual standard error: on 92 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 92 DF, p-value: 1.308e-05 11
12 y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~y$travel+y$hsgrad+y$black+y$hisp, summary(lm1) lm(formula = y1 ~ Y$Travel + Y$Hsgrad + Y$Black + Y$Hisp, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) ** Y$Travel Y$Hsgrad ** Y$Black e-06 *** Y$Hisp * Residual standard error: on 93 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 93 DF, p-value: 7.393e-06 summary(lm1)$adj.r.squared [1] pr1=predict(lm1,se.fit=t,interval='prediction',level=0.99,weights=1) Warning message: In predict.lm(lm1, se.fit = T, interval = "prediction", level = 0.99, : predictions on current data refer to _future_ responses n1=(y$pnr[9]-exp(pr1$fit[9,3])/100)*y$absbal[9] n2=(y$pnr[78]-exp(pr1$fit[78,3])/100)*y$absbal[78] print(exp(pr1$fit[c(9,78),])) fit lwr upr print(c(n1,n2,n1+n2)) 12
13 [1] # alternative with no demographic covariates y1=log(100*y$pnr) wts=as.numeric(y$pnr<0.1) lm1=lm(y1~1, summary(lm1) lm(formula = y1 ~ 1, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e-15 *** Residual standard error: on 97 degrees of freedom summary(lm1)$adj.r.squared [1] 0 pr1=predict(lm1,se.fit=t,interval='prediction',level=0.99,weights=1) Warning message: In predict.lm(lm1, se.fit = T, interval = "prediction", level = 0.99, : predictions on current data refer to _future_ responses n1=(y$pnr[9]-exp(pr1$fit[9,3])/100)*y$absbal[9] n2=(y$pnr[78]-exp(pr1$fit[78,3])/100)*y$absbal[78] print(exp(pr1$fit[c(9,78),])) fit lwr upr print(c(n1,n2,n1+n2)) [1] # same exercise using PNR directly with AIC for model selection y1=100*y$pnr wts=as.numeric(y$pnr<0.1) 13
14 lm1=lm(y1~y$pop+y$rural+y$medage+y$travel+y$hsgrad+y$collgrad+y$medinc+y$black+y$hisp, lm2=step(lm1) Start: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$MedInc + Y$Black + Y$Hisp - Y$MedInc Y$Collgrad Y$Travel Y$Rural Y$MedAge Y$Hisp <none Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Travel + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp - Y$Travel Y$MedAge Y$Hisp Y$Collgrad Y$Rural <none Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$MedAge + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp - Y$MedAge Y$Hisp Y$Collgrad <none Y$Rural
15 - Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$Hsgrad + Y$Collgrad + Y$Black + Y$Hisp - Y$Collgrad <none Y$Rural Y$Hisp Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$Hsgrad + Y$Black + Y$Hisp - Y$Hisp <none Y$Rural Y$Pop Y$Hsgrad Y$Black Step: AIC= y1 ~ Y$Pop + Y$Rural + Y$Hsgrad + Y$Black - Y$Rural <none Y$Hsgrad Y$Pop Y$Black Step: AIC= y1 ~ Y$Pop + Y$Hsgrad + Y$Black <none Y$Hsgrad Y$Pop
16 - Y$Black lm1=lm2 # Write final model and compute prediction limits summary(lm1) lm(formula = y1 ~ Y$Pop + Y$Hsgrad + Y$Black, weights = wts) Estimate Std. Error t value Pr( t ) (Intercept) e e Y$Pop 7.561e e Y$Hsgrad 2.277e e Y$Black 1.833e e e-05 *** Residual standard error: on 94 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 94 DF, p-value: 2.498e-05 summary(lm1)$adj.r.squared [1] ### The R-squared is not as large as for the log PNR model pr1=predict(lm1,se.fit=t,interval='prediction',level=0.99,weights=1) Warning message: In predict.lm(lm1, se.fit = T, interval = "prediction", level = 0.99, : predictions on current data refer to _future_ responses n1=(y$pnr[9]-(pr1$fit[9,3])/100)*y$absbal[9] n2=(y$pnr[78]-(pr1$fit[78,3])/100)*y$absbal[78] print(exp(pr1$fit[c(9,78),])) fit lwr upr print(c(n1,n2,n1+n2)) [1] ### final count 1888 higher than the counts quoted for the earlier models 16
Regression on Faithful with Section 9.3 content
Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,
More informationWe d like to know the equation of the line shown (the so called best fit or regression line).
Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")
More informationlm statistics Chris Parrish
lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................
More informationModel Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007
Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20 The Big
More informationData Analysis Using R ASC & OIR
Data Analysis Using R ASC & OIR Overview } What is Statistics and the process of study design } Correlation } Simple Linear Regression } Multiple Linear Regression 2 What is Statistics? Statistics is a
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationLecture 19: Inference for SLR & Transformations
Lecture 19: Inference for SLR & Transformations Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 Announcements Announcements HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationStat 401B Exam 2 Fall 2015
Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationExercise 2 SISG Association Mapping
Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationLecture 1 Intro to Spatial and Temporal Data
Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1
More informationMultiple Linear Regression. Chapter 12
13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationHint: The following equation converts Celsius to Fahrenheit: F = C where C = degrees Celsius F = degrees Fahrenheit
Amherst College Department of Economics Economics 360 Fall 2014 Exam 1: Solutions 1. (10 points) The following table in reports the summary statistics for high and low temperatures in Key West, FL from
More information2. Outliers and inference for regression
Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationTwo-Variable Regression Model: The Problem of Estimation
Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationand the Sample Mean Random Sample
MATH 183 Random Samples and the Sample Mean Dr. Neal, WKU Henceforth, we shall assume that we are studying a particular measurement X from a population! for which the mean µ and standard deviation! are
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationStatistics 203 Introduction to Regression Models and ANOVA Practice Exam
Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationLinear Probability Model
Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If
More informationMath 141. Lecture 20: Regression Remedies. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141
Math 141 Lecture 20: Regression Remedies Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 LAST TIME Formal Inference: Hypothesis tests and Confidence Intervals for regression
More informationAnalytics 512: Homework # 2 Tim Ahn February 9, 2016
Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationChapter 12: Linear regression II
Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model
More informationStat 401B Exam 2 Fall 2017
Stat 0B Exam Fall 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will
More informationThe Big Picture. Model Modifications. Example (cont.) Bacteria Count Example
The Big Picture Remedies after Model Diagnostics The Big Picture Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Residual plots
More informationThe Statistical Sleuth in R: Chapter 7
The Statistical Sleuth in R: Chapter 7 Linda Loi Ruobing Zhang Kate Aloisio Nicholas J. Horton January 21, 2013 Contents 1 Introduction 1 2 The Big Bang 2 2.1 Summary statistics and graphical display........................
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationLECTURE 15: SIMPLE LINEAR REGRESSION I
David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).
More informationLinear Regression & Correlation
Linear Regression & Correlation Jamie Monogan University of Georgia Introduction to Data Analysis Jamie Monogan (UGA) Linear Regression & Correlation POLS 7012 1 / 25 Objectives By the end of these meetings,
More informationSolutions: Monday, October 22
Amherst College Department of Economics Economics 360 Fall 2012 1. Focus on the following agricultural data: Solutions: Monday, October 22 Agricultural Production Data: Cross section agricultural data
More informationModeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination
Modeling kid s test scores (revisited) Lecture 20 - Model Selection Sta102 / BME102 Colin Rundel November 17, 2014 Predicting cognitive test scores of three- and four-year-old children using characteristics
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationClass: Taylor. January 12, Story time: Dan Willingham, the Cog Psyc. Willingham: Professor of cognitive psychology at Harvard
Class: Taylor January 12, 2011 (pdf version) Story time: Dan Willingham, the Cog Psyc Willingham: Professor of cognitive psychology at Harvard Why students don t like school We know lots about psychology
More informationMath 2311 Written Homework 6 (Sections )
Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationNo other aids are allowed. For example you are not allowed to have any other textbook or past exams.
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In
More informationSCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester 2008 9 Linear
More informationCRP 272 Introduction To Regression Analysis
CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous
More informationMath: Question 1 A. 4 B. 5 C. 6 D. 7
Math: Question 1 Abigail can read 200 words in one minute. If she were to read at this rate for 30 minutes each day, how many days would Abigail take to read 30,000 words of a book? A. 4 B. 5 C. 6 D. 7
More informationExample: 1982 State SAT Scores (First year state by state data available)
Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationSimple Linear Regression
Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationIntroduction to Linear Regression
Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More information1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species
Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for
More informationChapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)
Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data
More informationMath 10 - Compilation of Sample Exam Questions + Answers
Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationIntroduction and Background to Multilevel Analysis
Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and
More informationVariables and Variable De nitions
APPENDIX A Variables and Variable De nitions All demographic county-level variables have been drawn directly from the 1970, 1980, and 1990 U.S. Censuses of Population, published by the U.S. Department
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationExplanatory Variables Must be Linear Independent...
Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationCommunity Health Needs Assessment through Spatial Regression Modeling
Community Health Needs Assessment through Spatial Regression Modeling Glen D. Johnson, PhD CUNY School of Public Health glen.johnson@lehman.cuny.edu Objectives: Assess community needs with respect to particular
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationChapter 1. The Mathematics of Voting
Introduction to Contemporary Mathematics Math 112 1.1. Preference Ballots and Preference Schedules Example (The Math Club Election) The math club is electing a new president. The candidates are Alisha
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationL21: Chapter 12: Linear regression
L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample
More informationLecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015
Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More informationMath 141. Lecture 27: More Issues in Model Selection and Interpretation. Albyn Jones 1. 1 Library 304
Math 141 Lecture 27: More Issues in Model Selection and Interpretation Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Confounding Confounding: a term from experimental
More informationOutline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses
UCLA Department of Statistics Statistical Consulting Center Introduction to Regression in R Part II: Multivariate Linear Regression Denise Ferrari denise@stat.ucla.edu Outline 1 Preliminaries 2 Introduction
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD
More informationQUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018
Page 1 of 4 QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018 ECONOMICS 250 Introduction to Statistics Instructor: Gregor Smith Instructions: The exam
More informationSupplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"
Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs" Yingying Dong University of California Irvine February 2018 Abstract This document provides
More informationStat 401B Exam 3 Fall 2016 (Corrected Version)
Stat 401B Exam 3 Fall 2016 (Corrected Version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied
More informationConsider fitting a model using ordinary least squares (OLS) regression:
Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful
More informationEconometrics Review questions for exam
Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More informationLinear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables.
Lab 13. Linear Regression www.nmt.edu/~olegm/382labs/lab13r.pdf Note: the things you will read or type on the computer are in the Typewriter Font. All the files mentioned can be found at www.nmt.edu/~olegm/382labs/
More informationStat 401B Final Exam Fall 2015
Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationRegression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.
Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose
More informationIn the previous chapter, we learned how to use the method of least-squares
03-Kahane-45364.qxd 11/9/2007 4:40 PM Page 37 3 Model Performance and Evaluation In the previous chapter, we learned how to use the method of least-squares to find a line that best fits a scatter of points.
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationSCHOOL OF MATHEMATICS AND STATISTICS
SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination
More informationFoundations of Correlation and Regression
BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations
More informationInteractions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept
Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and
More informationPart II { Oneway Anova, Simple Linear Regression and ANCOVA with R
Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual
More informationChapter 5 Exercises 1
Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine
More information