Holiday Assignment PS 531

Size: px
Start display at page:

Download "Holiday Assignment PS 531"

Transcription

1 Holiday Assignment PS 531 Prof: Jake Bowers TA: Paul Testa January 27, 2014 Overview Below is a brief assignment for you to complete over the break. It should serve as refresher, covering some of the basic concepts and skills you learned in PS 530, so that we can all start on the same page when class begins. Please complete the assignment and turn it in to us no later than Friday, January 24, You should turn in 1) a.pdf file containing your write up with any tables and figures you d like to include 2),.R or.txt file containing the code you used to generate your analysis. # Create some toy data, 500 observations set.seed( ) the.probs <- c(seq(1, 2, length.out = 12), rep(4, 6), rep(3, 2))/20 length(the.probs) [1] 20 hrs <- sample(1:20, size = 500, replace = T, prob = the.probs) * 12 horde <- rbinom(500, 1, prob = 0.45) y <- 2.4 * (hrs/12) * (hrs/12)ˆ2 + horde * runif(500, -5, 4.2) + rnorm(500, 0, 3.75) + 66 practice <- data.frame(cbind(y, hrs, horde)) write.csv(practice, file = "practice.csv", row.names = F) save(practice, file = "practice.rda") 1. Download the dataset practice.rda from Dropbox using the following code. download.file(" destfile = " /Desktop/practice.rda", method = "curl") load(" /Desktop/practice.rda") # Change location if you like This simulated dataset contains 500 observations of the previous students in this course. For each student, you have information on his or her final grade y, the total number of hours, hrs, he or she spent studying for the class, and an indicator of the student s World of Warcraft faction, horde, that takes a value of 1 for horde, and 0 for alliance. 1 ) 2. Calucluate two measures of the typical number of hours spent studying by these fictional previous students in the class. Discuss the differences and benefits of each measure of centrality or typicalness. What do the two measures together tell you about the distribution hours spent studying? Hint: You can learn about the idea of typical values with Kaplan s textbook here ( kaplan/ism/statmodeling-review.pdf ). mean(practice$hrs) [1] median(practice$hrs) [1]

2 Holiday Assignment PS 531 January 27, Describe the distribution of final grades, y. Begin by calculating the mean, variance and standard deviation of y (You can use R s standard functions or for practice, try doing it by hand). Next, calculate y s median, range, the interquartile range, and the 95% coverage interval. Finally, produce a figure that summarizes this information. Hint: You can learn about the idea of typical variation with Kaplan s textbook here ( http: // kaplan/ism/statmodeling-review.pdf ). n <- length(practice$y) mean(practice$y) [1] sum(practice$y)/n [1] var(practice$y) [1] sum((practice$y - mean(practice$y))ˆ2)/(n - 1) [1] # Why n-1? sd(practice$y) [1] sqrt(sum((practice$y - mean(practice$y))ˆ2)/(n - 1)) [1] quantile(practice$y, prob = c(0.25, 0.5, 0.75)) 25% 50% 75% summary(practice$y) Min. 1st Qu. Median Mean 3rd Qu. Max par(mfrow = c(1, 3), mgp = c(1.5, 0.5, 0), oma = rep(0, 4)) with(practice, boxplot(y)) with(practice, hist(y)) with(practice, plot(ecdf(y))) Histogram of y ecdf(y) Frequency Fn(x) y x 4. How do horde members compare to alliance members in their grades? To answer this question some use a difference of means. Please produce a difference of means and interpret it.

3 Holiday Assignment PS 531 January 27, Having calculated a difference of means, some would wonder, Do we have enough information to exclude the idea that the difference of means is really zero? Please answer this question without using any canned function (for example, no t.test() or lm() ). t.test(practice$y practice$horde, var.equal = F) Welch Two Sample t-test data: practice$y by practice$horde t = 1.307, df = 424.2, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group 0 mean in group t.test(practice$y practice$horde, var.equal = T) Two Sample t-test data: practice$y by practice$horde t = 1.332, df = 498, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group 0 mean in group muh <- mean(practice$y[practice$horde == 1]) # Horde mua <- mean(practice$y[practice$horde == 0]) # Alliance nh <- length(practice$y[practice$horde == 1]) # Number of Horde na <- length(practice$y[practice$horde == 0]) # Number of Alliance varh <- var(practice$y[practice$horde == 1]) # Variance of y for Horde vara <- var(practice$y[practice$horde == 0]) # Variance of y for Alliance # Calculate T-stat by hand sigma <- sqrt(varh/nh + vara/na) sigma [1] df <- (varh/nh + vara/na)ˆ2/((varh/nh)ˆ2/(nh - 1) + (vara/na)ˆ2/(na - 1)) t.stat <- (mua - muh)/sigma p.val <- 2 * pt(-abs(t.stat), df = df) # Equal Variance df.eq <- nh + na - 2 sigma.eq <- sqrt(((nh - 1) * varh + (na - 1) * vara)/df.eq * (1/nH + 1/nA)) t.stat.eq <- (mua - muh)/(sigma.eq) t.stat.eq [1] p.val.eq <- 2 * pt(-abs(t.stat.eq), df) p.val.eq [1] Now, use a linear model to calculate the average difference between the final grades of those who support the horde versus alliance. How should we interpret this model. What do the coefficients on the intercept and horde mean? What do the standard errors mean? What do the test statistics and p-values mean? summary(lm(y horde, data = practice)) Call: lm(formula = y horde, data = practice)

4 Holiday Assignment PS 531 January 27, Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** horde Signif. codes: 0 *** ** 0.01 * Residual standard error: 6.63 on 498 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: 1.77 on 1 and 498 DF, p-value: Suppose instead you had estimated the following model: y = α 0 + α 1 beard01 + u Where beard01 is an indicator for facial hair. 2 Your model yields a coefficient on α 1 or ˆα 1 of 1.23 with a standard error of 0.85 and degrees of freedom 498. Assume you had prior knowledge that led you to believe that beards should only have a positive effect on final grades. Formulate the null and alternative hypotheses for this claim and calculate a test statistic and corresponding p-value. Are you conducting a one-tailed or two-tailed-hypothesis test? Some might talk about whether the average difference in final grades is statistically significan. What does this mean? Plot your test statistic on the probability density function of the appropriate t-distribution. a1 < se1 < a1.t <- a1/se1 2 * pt(-abs(a1.t), df = 498) [1] x <- seq(-4.5, 4.5, length.out = 100) plot(x, dt(x, df = 498), type = "l") abline(v = a1.t, lty = 2) 2 Fur counts as facial hair in this exercise.

5 Holiday Assignment PS 531 January 27, dt(x, df = 498) x 8. Create two figures, one for the horde and one for the alliance, showing how the final grade, y, varies with the average hours,hrs spent studying in a week. Describe any differences or similarities you observe. par(mfrow = c(1, 2)) with(practice, plot(x = hrs, y = y, type = "n", xlab = "Hours a Week Spent\n\t\t Studying", ylab = "Final Grade", main = "Horde")) Warning: font width unknown for character 0x9 Warning: font width unknown for character 0x9 with(practice[practice$horde == 1, ], points(x = hrs, y = y, col = "red", cex = 0.5, pch = 20)) with(practice, plot(x = hrs, y = y, type = "n", xlab = "Hours a Week Spent\n\t\t Studying", ylab = "Final Grade", main = "Alliance")) Warning: font width unknown for character 0x9 Warning: font width unknown for character 0x9 with(practice[practice$horde == 0, ], points(x = hrs, y = y, col = "black", cex = 0.5, pch = 20))

6 Holiday Assignment PS 531 January 27, Horde Hours a Week Spent Studying Final Grade Alliance Hours a Week Spent Studying Final Grade par(mfrow = c(1, 1)) 9. Estimate a simple linear regression predicting final grade,y, as a function of average hours a week spent studying hrs. Interpret the coefficients and standard errors and p-values from this model. Calculate a 95-percent confidence interval for the coefficient on hrs. Now calculate a 95-percent confidence interval for using the percentile bootstrap method with 1,000 replications. How do the two confidence intervals compare? When would you prefer to use the bootstrap versus the analytic based confidence interval? What assumptions do you need to make for the bootstrap interval? What assumptions do you need to make for the analytic interval? Would these assumptions be reasonable, given what you might assume about the research design that generated these data? fm1 <- lm(y hrs, data = practice) confint(fm1)[2, ] 2.5 % 97.5 % (ci.upper <- coef(fm1)[2] + qt(0.975, 498) * sqrt(vcov(fm1)[2, 2])) hrs (ci.lower <- coef(fm1)[2] - qt(0.975, 498) * sqrt(vcov(fm1)[2, 2])) hrs n <- 500 R < bs.est <- NA for (i in 1:R) { s <- sample(1:n, replace = T) f <- lm(y[s] hrs[s], data = practice) coefs <- coef(f) bs.est[i] <- coefs[2] } quantile(bs.est, c(0.025, 0.975)) 2.5% 97.5%

7 Holiday Assignment PS 531 January 27, confint(fm1)[2, ] 2.5 % 97.5 % Plot the residuals from your linear model against their predicted (fitted) values. What should this plot look like if the assumptions of OLS are met? What does it look like? plot(x = fm1$fitted, y = fm1$residuals) fm1$fitted fm1$residuals 11. Propose (and estimate) an alternative model for the relationship between grades and time spent studying. Again calculate 95-percent confidence intervals using both the analytic method and the percentile bootstrap method. Compare your results to those obtained from the simple bivariate regression. How does the coefficient on hrs change? Did you include a coefficient for warcraft faction (horde)? Why or why not? Overall, does your model do a better job explaining variation in final grades? Bonus: What s the optimal amount of time someone should spend studying if they want to maximize their expected final grade? practice$hrs.sq <- practice$hrsˆ2 fm2 <- lm(y hrs + hrs.sq, data = practice) fm2.1 <- lm(y hrs + hrs.sq + horde, data = practice) summary(fm2) Call: lm(formula = y hrs + hrs.sq, data = practice) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.41e e <2e-16 *** hrs 2.22e e <2e-16 *** hrs.sq -5.46e e <2e-16 *** --- Signif. codes: 0 *** ** 0.01 *

8 Holiday Assignment PS 531 January 27, Residual standard error: 4.13 on 497 degrees of freedom Multiple R-squared: 0.614,Adjusted R-squared: F-statistic: 396 on 2 and 497 DF, p-value: <2e-16 summary(fm2.1) Call: lm(formula = y hrs + hrs.sq + horde, data = practice) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.44e e <2e-16 *** hrs 2.21e e <2e-16 *** hrs.sq -5.45e e <2e-16 *** horde -4.80e e Signif. codes: 0 *** ** 0.01 * Residual standard error: 4.13 on 496 degrees of freedom Multiple R-squared: 0.616,Adjusted R-squared: F-statistic: 265 on 3 and 496 DF, p-value: <2e-16 plot(x = fm2$fitted, y = fm2$residuals) fm2$fitted fm2$residuals hrs.star <- -coef(fm2)[2]/(2 * coef(fm2)[3]) hrs.star.1 <- -coef(fm2.1)[2]/(2 * coef(fm2.1)[3]) par(mfrow = c(1, 1)) plot(x = practice$hrs, y = practice$y, pch = 20, col = "grey", type = "n") points(x = practice$hrs, y = practice$y, pch = 20, col = "grey", cex = 0.5) pred.df <- expand.grid(hrs = sort(unique(practice$hrs)), horde = 1) pred.df$hrs.sq <- pred.df$hrs * pred.df$hrs pred.y <- predict(fm2, newdata = pred.df) lines(x = sort(unique(practice$hrs)), y = pred.y, col = "red", lty = 1)

9 Holiday Assignment PS 531 January 27, abline(v = hrs.star, col = "red", lty = 2) practice$hrs practice$y n <- 500 R < bs.est.2 <- matrix(na, nrow = R, ncol = 2) for (i in 1:R) { s <- sample(1:n, replace = T) f <- lm(y[s] hrs[s] + hrs.sq[s], data = practice) coefs <- coef(f) bs.est.2[i, ] <- coefs[2:3] } quantile(bs.est.2[, 1], c(0.025, 0.975)) 2.5% 97.5% quantile(bs.est.2[, 2], c(0.025, 0.975)) 2.5% 97.5% confint(fm2) 2.5 % 97.5 % (Intercept) hrs hrs.sq par(mfrow = c(2, 2), pty = "s", mgp = c(1.5, 0.5, 0), oma = rep(0, 4)) plot(fm2)

10 Holiday Assignment PS 531 January 27, Fitted values Residuals Residuals vs Fitted Theoretical Quantiles Standardized residuals Normal Q Q Fitted values Standardized residuals Scale Location Leverage Standardized residuals Cook's distance Residuals vs Leverage Do your results suggest a causal relationship between time spent studying and final grades in this class? What are some factors that might lead to a potential spurious relationship between studying and grades? Propose a strategy or strategies to identify the causal effect of an 1 extra hour of studying a week on a student s final grade? Discuss the benefits, limitations, and potential difficulties of your approach. Hint: You ll need to be clear about what you mean by studying causes grades. If you are not already comfortable with the idea of potential outcomes, you might want to learn about it. For example, see Field Experiments Ch02. pdf, Field Experiments Ch01.pdf, and for a canonical piece http: //jakebowers.org/itvexperiments/holland86wdisc.pdf.

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

Using R in 200D Luke Sonnet

Using R in 200D Luke Sonnet Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random

More information

Prediction problems 3: Validation and Model Checking

Prediction problems 3: Validation and Model Checking Prediction problems 3: Validation and Model Checking Data Science 101 Team May 17, 2018 Outline Validation Why is it important How should we do it? Model checking Checking whether your model is a good

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015 Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015 One Variable x = spacing of plants (either 4, 8 12 or 16 inches), and y = plant yield (bushels per acre). Each condition

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Multiple Linear Regression (solutions to exercises)

Multiple Linear Regression (solutions to exercises) Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................

More information

Explore the data. Anja Bråthen Kristoffersen

Explore the data. Anja Bråthen Kristoffersen Explore the data Anja Bråthen Kristoffersen density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by a density function, p(x)

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

Consider fitting a model using ordinary least squares (OLS) regression:

Consider fitting a model using ordinary least squares (OLS) regression: Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Chapter 5 Exercises 1

Chapter 5 Exercises 1 Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Statistical Simulation An Introduction

Statistical Simulation An Introduction James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Simulation Through Bootstrapping Introduction 1 Introduction When We Don t Need Simulation

More information

Nonstationary time series models

Nonstationary time series models 13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Explore the data. Anja Bråthen Kristoffersen Biomedical Research Group

Explore the data. Anja Bråthen Kristoffersen Biomedical Research Group Explore the data Anja Bråthen Kristoffersen Biomedical Research Group density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

Chapter 3 - Linear Regression

Chapter 3 - Linear Regression Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Collinearity: Impact and Possible Remedies

Collinearity: Impact and Possible Remedies Collinearity: Impact and Possible Remedies Deepayan Sarkar What is collinearity? Exact dependence between columns of X make coefficients non-estimable Collinearity refers to the situation where some columns

More information

Additional Problems Additional Problem 1 Like the http://www.stat.umn.edu/geyer/5102/examp/rlike.html#lmax example of maximum likelihood done by computer except instead of the gamma shape model, we will

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015 Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro

More information

Chapter 16: Understanding Relationships Numerical Data

Chapter 16: Understanding Relationships Numerical Data Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

The Application of California School

The Application of California School The Application of California School Zheng Tian 1 Introduction This tutorial shows how to estimate a multiple regression model and perform linear hypothesis testing. The application is about the test scores

More information

Chapter 9. Polynomial Models and Interaction (Moderator) Analysis

Chapter 9. Polynomial Models and Interaction (Moderator) Analysis Chapter 9. Polynomial Models and Interaction (Moderator) Analysis In Chapter 4, we introduced the quadratic model as a device to test for curvature in the conditional mean function. You could also use

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Aster Models and Lande-Arnold Beta By Charles J. Geyer and Ruth G. Shaw Technical Report No. 675 School of Statistics University of Minnesota January

Aster Models and Lande-Arnold Beta By Charles J. Geyer and Ruth G. Shaw Technical Report No. 675 School of Statistics University of Minnesota January Aster Models and Lande-Arnold Beta By Charles J. Geyer and Ruth G. Shaw Technical Report No. 675 School of Statistics University of Minnesota January 9, 2010 Abstract Lande and Arnold (1983) proposed an

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

R Demonstration ANCOVA

R Demonstration ANCOVA R Demonstration ANCOVA Objective: The purpose of this week s session is to demonstrate how to perform an analysis of covariance (ANCOVA) in R, and how to plot the regression lines for each level of the

More information

R STATISTICAL COMPUTING

R STATISTICAL COMPUTING R STATISTICAL COMPUTING some R Examples Dennis Friday 2 nd and Saturday 3 rd May, 14. Topics covered Vector and Matrix operation. File Operations. Evaluation of Probability Density Functions. Testing of

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Motor Trend Car Road Analysis

Motor Trend Car Road Analysis Motor Trend Car Road Analysis Zakia Sultana February 28, 2016 Executive Summary You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Ch. 16: Correlation and Regression

Ch. 16: Correlation and Regression Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to

More information

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm

More information

Chapter 4 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006)

Chapter 4 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006) Chapter 4 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006) Preliminaries > library(daag) Exercise 2 Draw graphs that show, for degrees of freedom between 1 and 100,

More information

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology January 9, 2018, 9:00am 1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer

Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer Lunds universitet Matematikcentrum Matematisk statistik Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer General information on labs During the rst half of the course MASA01 we will have

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression 1. Regression Equation A simple linear regression (also known as a bivariate regression) is a linear equation describing the relationship between an explanatory

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

STAT 572 Assignment 5 - Answers Due: March 2, 2007

STAT 572 Assignment 5 - Answers Due: March 2, 2007 1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.

More information

Statistical Computing Session 4: Random Simulation

Statistical Computing Session 4: Random Simulation Statistical Computing Session 4: Random Simulation Paul Eilers & Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center p.eilers@erasmusmc.nl Masters Track Statistical Sciences,

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

lm statistics Chris Parrish

lm statistics Chris Parrish lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................

More information

Logistic Regression. 0.1 Frogs Dataset

Logistic Regression. 0.1 Frogs Dataset Logistic Regression We move now to the classification problem from the regression problem and study the technique ot logistic regression. The setting for the classification problem is the same as that

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Later in the same chapter (page 45) he asserted that

Later in the same chapter (page 45) he asserted that Chapter 7 Randomization 7 Randomization 1 1 Fisher on randomization..................... 1 2 Shoes: a paired comparison................... 2 3 The randomization distribution................. 4 4 Theoretical

More information

Inference with Heteroskedasticity

Inference with Heteroskedasticity Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.

More information

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling 2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics, University of Washington 2015-07-24 Case control example We analyze

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

You are permitted to use your own calculator where it has been stamped as approved by the University.

You are permitted to use your own calculator where it has been stamped as approved by the University. ECONOMICS TRIPOS Part I Friday 11 June 004 9 1 Paper 3 Quantitative Methods in Economics This exam comprises four sections. Sections A and B are on Mathematics; Sections C and D are on Statistics. You

More information

Simple linear regression

Simple linear regression Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.

More information

Reaction Days

Reaction Days Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Examples of fitting various piecewise-continuous functions to data, using basis functions in doing the regressions.

Examples of fitting various piecewise-continuous functions to data, using basis functions in doing the regressions. Examples of fitting various piecewise-continuous functions to data, using basis functions in doing the regressions. David. Boore These examples in this document used R to do the regression. See also Notes_on_piecewise_continuous_regression.doc

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Stat 401B Final Exam Fall 2015

Stat 401B Final Exam Fall 2015 Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots. Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Estimated Simple Regression Equation

Estimated Simple Regression Equation Simple Linear Regression A simple linear regression model that describes the relationship between two variables x and y can be expressed by the following equation. The numbers α and β are called parameters,

More information

The Statistical Sleuth in R: Chapter 9

The Statistical Sleuth in R: Chapter 9 The Statistical Sleuth in R: Chapter 9 Linda Loi Kate Aloisio Ruobing Zhang Nicholas J. Horton January 21, 2013 Contents 1 Introduction 1 2 Effects of light on meadowfoam flowering 2 2.1 Data coding, summary

More information

Understanding p Values

Understanding p Values Understanding p Values James H. Steiger Vanderbilt University James H. Steiger Vanderbilt University Understanding p Values 1 / 29 Introduction Introduction In this module, we introduce the notion of a

More information

Linear Probability Model

Linear Probability Model Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If

More information

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Stat 5303 (Oehlert): Randomized Complete Blocks 1

Stat 5303 (Oehlert): Randomized Complete Blocks 1 Stat 5303 (Oehlert): Randomized Complete Blocks 1 > library(stat5303libs);library(cfcdae);library(lme4) > immer Loc Var Y1 Y2 1 UF M 81.0 80.7 2 UF S 105.4 82.3 3 UF V 119.7 80.4 4 UF T 109.7 87.2 5 UF

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

Lecture 5 - Plots and lines

Lecture 5 - Plots and lines Lecture 5 - Plots and lines Understanding magic Let us look at the following curious thing: =rnorm(100) y=rnorm(100,sd=0.1)+ k=ks.test(,y) k Two-sample Kolmogorov-Smirnov test data: and y D = 0.05, p-value

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

Regression Diagnostics

Regression Diagnostics Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2

More information