Holiday Assignment PS 531
|
|
- Arthur Berry
- 5 years ago
- Views:
Transcription
1 Holiday Assignment PS 531 Prof: Jake Bowers TA: Paul Testa January 27, 2014 Overview Below is a brief assignment for you to complete over the break. It should serve as refresher, covering some of the basic concepts and skills you learned in PS 530, so that we can all start on the same page when class begins. Please complete the assignment and turn it in to us no later than Friday, January 24, You should turn in 1) a.pdf file containing your write up with any tables and figures you d like to include 2),.R or.txt file containing the code you used to generate your analysis. # Create some toy data, 500 observations set.seed( ) the.probs <- c(seq(1, 2, length.out = 12), rep(4, 6), rep(3, 2))/20 length(the.probs) [1] 20 hrs <- sample(1:20, size = 500, replace = T, prob = the.probs) * 12 horde <- rbinom(500, 1, prob = 0.45) y <- 2.4 * (hrs/12) * (hrs/12)ˆ2 + horde * runif(500, -5, 4.2) + rnorm(500, 0, 3.75) + 66 practice <- data.frame(cbind(y, hrs, horde)) write.csv(practice, file = "practice.csv", row.names = F) save(practice, file = "practice.rda") 1. Download the dataset practice.rda from Dropbox using the following code. download.file(" destfile = " /Desktop/practice.rda", method = "curl") load(" /Desktop/practice.rda") # Change location if you like This simulated dataset contains 500 observations of the previous students in this course. For each student, you have information on his or her final grade y, the total number of hours, hrs, he or she spent studying for the class, and an indicator of the student s World of Warcraft faction, horde, that takes a value of 1 for horde, and 0 for alliance. 1 ) 2. Calucluate two measures of the typical number of hours spent studying by these fictional previous students in the class. Discuss the differences and benefits of each measure of centrality or typicalness. What do the two measures together tell you about the distribution hours spent studying? Hint: You can learn about the idea of typical values with Kaplan s textbook here ( kaplan/ism/statmodeling-review.pdf ). mean(practice$hrs) [1] median(practice$hrs) [1]
2 Holiday Assignment PS 531 January 27, Describe the distribution of final grades, y. Begin by calculating the mean, variance and standard deviation of y (You can use R s standard functions or for practice, try doing it by hand). Next, calculate y s median, range, the interquartile range, and the 95% coverage interval. Finally, produce a figure that summarizes this information. Hint: You can learn about the idea of typical variation with Kaplan s textbook here ( http: // kaplan/ism/statmodeling-review.pdf ). n <- length(practice$y) mean(practice$y) [1] sum(practice$y)/n [1] var(practice$y) [1] sum((practice$y - mean(practice$y))ˆ2)/(n - 1) [1] # Why n-1? sd(practice$y) [1] sqrt(sum((practice$y - mean(practice$y))ˆ2)/(n - 1)) [1] quantile(practice$y, prob = c(0.25, 0.5, 0.75)) 25% 50% 75% summary(practice$y) Min. 1st Qu. Median Mean 3rd Qu. Max par(mfrow = c(1, 3), mgp = c(1.5, 0.5, 0), oma = rep(0, 4)) with(practice, boxplot(y)) with(practice, hist(y)) with(practice, plot(ecdf(y))) Histogram of y ecdf(y) Frequency Fn(x) y x 4. How do horde members compare to alliance members in their grades? To answer this question some use a difference of means. Please produce a difference of means and interpret it.
3 Holiday Assignment PS 531 January 27, Having calculated a difference of means, some would wonder, Do we have enough information to exclude the idea that the difference of means is really zero? Please answer this question without using any canned function (for example, no t.test() or lm() ). t.test(practice$y practice$horde, var.equal = F) Welch Two Sample t-test data: practice$y by practice$horde t = 1.307, df = 424.2, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group 0 mean in group t.test(practice$y practice$horde, var.equal = T) Two Sample t-test data: practice$y by practice$horde t = 1.332, df = 498, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group 0 mean in group muh <- mean(practice$y[practice$horde == 1]) # Horde mua <- mean(practice$y[practice$horde == 0]) # Alliance nh <- length(practice$y[practice$horde == 1]) # Number of Horde na <- length(practice$y[practice$horde == 0]) # Number of Alliance varh <- var(practice$y[practice$horde == 1]) # Variance of y for Horde vara <- var(practice$y[practice$horde == 0]) # Variance of y for Alliance # Calculate T-stat by hand sigma <- sqrt(varh/nh + vara/na) sigma [1] df <- (varh/nh + vara/na)ˆ2/((varh/nh)ˆ2/(nh - 1) + (vara/na)ˆ2/(na - 1)) t.stat <- (mua - muh)/sigma p.val <- 2 * pt(-abs(t.stat), df = df) # Equal Variance df.eq <- nh + na - 2 sigma.eq <- sqrt(((nh - 1) * varh + (na - 1) * vara)/df.eq * (1/nH + 1/nA)) t.stat.eq <- (mua - muh)/(sigma.eq) t.stat.eq [1] p.val.eq <- 2 * pt(-abs(t.stat.eq), df) p.val.eq [1] Now, use a linear model to calculate the average difference between the final grades of those who support the horde versus alliance. How should we interpret this model. What do the coefficients on the intercept and horde mean? What do the standard errors mean? What do the test statistics and p-values mean? summary(lm(y horde, data = practice)) Call: lm(formula = y horde, data = practice)
4 Holiday Assignment PS 531 January 27, Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** horde Signif. codes: 0 *** ** 0.01 * Residual standard error: 6.63 on 498 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: 1.77 on 1 and 498 DF, p-value: Suppose instead you had estimated the following model: y = α 0 + α 1 beard01 + u Where beard01 is an indicator for facial hair. 2 Your model yields a coefficient on α 1 or ˆα 1 of 1.23 with a standard error of 0.85 and degrees of freedom 498. Assume you had prior knowledge that led you to believe that beards should only have a positive effect on final grades. Formulate the null and alternative hypotheses for this claim and calculate a test statistic and corresponding p-value. Are you conducting a one-tailed or two-tailed-hypothesis test? Some might talk about whether the average difference in final grades is statistically significan. What does this mean? Plot your test statistic on the probability density function of the appropriate t-distribution. a1 < se1 < a1.t <- a1/se1 2 * pt(-abs(a1.t), df = 498) [1] x <- seq(-4.5, 4.5, length.out = 100) plot(x, dt(x, df = 498), type = "l") abline(v = a1.t, lty = 2) 2 Fur counts as facial hair in this exercise.
5 Holiday Assignment PS 531 January 27, dt(x, df = 498) x 8. Create two figures, one for the horde and one for the alliance, showing how the final grade, y, varies with the average hours,hrs spent studying in a week. Describe any differences or similarities you observe. par(mfrow = c(1, 2)) with(practice, plot(x = hrs, y = y, type = "n", xlab = "Hours a Week Spent\n\t\t Studying", ylab = "Final Grade", main = "Horde")) Warning: font width unknown for character 0x9 Warning: font width unknown for character 0x9 with(practice[practice$horde == 1, ], points(x = hrs, y = y, col = "red", cex = 0.5, pch = 20)) with(practice, plot(x = hrs, y = y, type = "n", xlab = "Hours a Week Spent\n\t\t Studying", ylab = "Final Grade", main = "Alliance")) Warning: font width unknown for character 0x9 Warning: font width unknown for character 0x9 with(practice[practice$horde == 0, ], points(x = hrs, y = y, col = "black", cex = 0.5, pch = 20))
6 Holiday Assignment PS 531 January 27, Horde Hours a Week Spent Studying Final Grade Alliance Hours a Week Spent Studying Final Grade par(mfrow = c(1, 1)) 9. Estimate a simple linear regression predicting final grade,y, as a function of average hours a week spent studying hrs. Interpret the coefficients and standard errors and p-values from this model. Calculate a 95-percent confidence interval for the coefficient on hrs. Now calculate a 95-percent confidence interval for using the percentile bootstrap method with 1,000 replications. How do the two confidence intervals compare? When would you prefer to use the bootstrap versus the analytic based confidence interval? What assumptions do you need to make for the bootstrap interval? What assumptions do you need to make for the analytic interval? Would these assumptions be reasonable, given what you might assume about the research design that generated these data? fm1 <- lm(y hrs, data = practice) confint(fm1)[2, ] 2.5 % 97.5 % (ci.upper <- coef(fm1)[2] + qt(0.975, 498) * sqrt(vcov(fm1)[2, 2])) hrs (ci.lower <- coef(fm1)[2] - qt(0.975, 498) * sqrt(vcov(fm1)[2, 2])) hrs n <- 500 R < bs.est <- NA for (i in 1:R) { s <- sample(1:n, replace = T) f <- lm(y[s] hrs[s], data = practice) coefs <- coef(f) bs.est[i] <- coefs[2] } quantile(bs.est, c(0.025, 0.975)) 2.5% 97.5%
7 Holiday Assignment PS 531 January 27, confint(fm1)[2, ] 2.5 % 97.5 % Plot the residuals from your linear model against their predicted (fitted) values. What should this plot look like if the assumptions of OLS are met? What does it look like? plot(x = fm1$fitted, y = fm1$residuals) fm1$fitted fm1$residuals 11. Propose (and estimate) an alternative model for the relationship between grades and time spent studying. Again calculate 95-percent confidence intervals using both the analytic method and the percentile bootstrap method. Compare your results to those obtained from the simple bivariate regression. How does the coefficient on hrs change? Did you include a coefficient for warcraft faction (horde)? Why or why not? Overall, does your model do a better job explaining variation in final grades? Bonus: What s the optimal amount of time someone should spend studying if they want to maximize their expected final grade? practice$hrs.sq <- practice$hrsˆ2 fm2 <- lm(y hrs + hrs.sq, data = practice) fm2.1 <- lm(y hrs + hrs.sq + horde, data = practice) summary(fm2) Call: lm(formula = y hrs + hrs.sq, data = practice) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.41e e <2e-16 *** hrs 2.22e e <2e-16 *** hrs.sq -5.46e e <2e-16 *** --- Signif. codes: 0 *** ** 0.01 *
8 Holiday Assignment PS 531 January 27, Residual standard error: 4.13 on 497 degrees of freedom Multiple R-squared: 0.614,Adjusted R-squared: F-statistic: 396 on 2 and 497 DF, p-value: <2e-16 summary(fm2.1) Call: lm(formula = y hrs + hrs.sq + horde, data = practice) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.44e e <2e-16 *** hrs 2.21e e <2e-16 *** hrs.sq -5.45e e <2e-16 *** horde -4.80e e Signif. codes: 0 *** ** 0.01 * Residual standard error: 4.13 on 496 degrees of freedom Multiple R-squared: 0.616,Adjusted R-squared: F-statistic: 265 on 3 and 496 DF, p-value: <2e-16 plot(x = fm2$fitted, y = fm2$residuals) fm2$fitted fm2$residuals hrs.star <- -coef(fm2)[2]/(2 * coef(fm2)[3]) hrs.star.1 <- -coef(fm2.1)[2]/(2 * coef(fm2.1)[3]) par(mfrow = c(1, 1)) plot(x = practice$hrs, y = practice$y, pch = 20, col = "grey", type = "n") points(x = practice$hrs, y = practice$y, pch = 20, col = "grey", cex = 0.5) pred.df <- expand.grid(hrs = sort(unique(practice$hrs)), horde = 1) pred.df$hrs.sq <- pred.df$hrs * pred.df$hrs pred.y <- predict(fm2, newdata = pred.df) lines(x = sort(unique(practice$hrs)), y = pred.y, col = "red", lty = 1)
9 Holiday Assignment PS 531 January 27, abline(v = hrs.star, col = "red", lty = 2) practice$hrs practice$y n <- 500 R < bs.est.2 <- matrix(na, nrow = R, ncol = 2) for (i in 1:R) { s <- sample(1:n, replace = T) f <- lm(y[s] hrs[s] + hrs.sq[s], data = practice) coefs <- coef(f) bs.est.2[i, ] <- coefs[2:3] } quantile(bs.est.2[, 1], c(0.025, 0.975)) 2.5% 97.5% quantile(bs.est.2[, 2], c(0.025, 0.975)) 2.5% 97.5% confint(fm2) 2.5 % 97.5 % (Intercept) hrs hrs.sq par(mfrow = c(2, 2), pty = "s", mgp = c(1.5, 0.5, 0), oma = rep(0, 4)) plot(fm2)
10 Holiday Assignment PS 531 January 27, Fitted values Residuals Residuals vs Fitted Theoretical Quantiles Standardized residuals Normal Q Q Fitted values Standardized residuals Scale Location Leverage Standardized residuals Cook's distance Residuals vs Leverage Do your results suggest a causal relationship between time spent studying and final grades in this class? What are some factors that might lead to a potential spurious relationship between studying and grades? Propose a strategy or strategies to identify the causal effect of an 1 extra hour of studying a week on a student s final grade? Discuss the benefits, limitations, and potential difficulties of your approach. Hint: You ll need to be clear about what you mean by studying causes grades. If you are not already comfortable with the idea of potential outcomes, you might want to learn about it. For example, see Field Experiments Ch02. pdf, Field Experiments Ch01.pdf, and for a canonical piece http: //jakebowers.org/itvexperiments/holland86wdisc.pdf.
Introduction to Statistics and R
Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationPrediction problems 3: Validation and Model Checking
Prediction problems 3: Validation and Model Checking Data Science 101 Team May 17, 2018 Outline Validation Why is it important How should we do it? Model checking Checking whether your model is a good
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationStat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015
Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015 One Variable x = spacing of plants (either 4, 8 12 or 16 inches), and y = plant yield (bushels per acre). Each condition
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationMultiple Linear Regression (solutions to exercises)
Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................
More informationExplore the data. Anja Bråthen Kristoffersen
Explore the data Anja Bråthen Kristoffersen density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by a density function, p(x)
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationHandout 4: Simple Linear Regression
Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More informationConsider fitting a model using ordinary least squares (OLS) regression:
Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationChapter 5 Exercises 1
Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationStatistical Simulation An Introduction
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Simulation Through Bootstrapping Introduction 1 Introduction When We Don t Need Simulation
More informationNonstationary time series models
13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationExplore the data. Anja Bråthen Kristoffersen Biomedical Research Group
Explore the data Anja Bråthen Kristoffersen Biomedical Research Group density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationCollinearity: Impact and Possible Remedies
Collinearity: Impact and Possible Remedies Deepayan Sarkar What is collinearity? Exact dependence between columns of X make coefficients non-estimable Collinearity refers to the situation where some columns
More informationAdditional Problems Additional Problem 1 Like the http://www.stat.umn.edu/geyer/5102/examp/rlike.html#lmax example of maximum likelihood done by computer except instead of the gamma shape model, we will
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More informationLecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015
Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro
More informationChapter 16: Understanding Relationships Numerical Data
Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationThe Application of California School
The Application of California School Zheng Tian 1 Introduction This tutorial shows how to estimate a multiple regression model and perform linear hypothesis testing. The application is about the test scores
More informationChapter 9. Polynomial Models and Interaction (Moderator) Analysis
Chapter 9. Polynomial Models and Interaction (Moderator) Analysis In Chapter 4, we introduced the quadratic model as a device to test for curvature in the conditional mean function. You could also use
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationAster Models and Lande-Arnold Beta By Charles J. Geyer and Ruth G. Shaw Technical Report No. 675 School of Statistics University of Minnesota January
Aster Models and Lande-Arnold Beta By Charles J. Geyer and Ruth G. Shaw Technical Report No. 675 School of Statistics University of Minnesota January 9, 2010 Abstract Lande and Arnold (1983) proposed an
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationR Demonstration ANCOVA
R Demonstration ANCOVA Objective: The purpose of this week s session is to demonstrate how to perform an analysis of covariance (ANCOVA) in R, and how to plot the regression lines for each level of the
More informationR STATISTICAL COMPUTING
R STATISTICAL COMPUTING some R Examples Dennis Friday 2 nd and Saturday 3 rd May, 14. Topics covered Vector and Matrix operation. File Operations. Evaluation of Probability Density Functions. Testing of
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationMotor Trend Car Road Analysis
Motor Trend Car Road Analysis Zakia Sultana February 28, 2016 Executive Summary You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More informationCh. 16: Correlation and Regression
Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to
More informationStat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz
Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm
More informationChapter 4 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006)
Chapter 4 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006) Preliminaries > library(daag) Exercise 2 Draw graphs that show, for degrees of freedom between 1 and 100,
More informationChapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)
Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data
More informationStatistics GIDP Ph.D. Qualifying Exam Methodology
Statistics GIDP Ph.D. Qualifying Exam Methodology January 9, 2018, 9:00am 1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you
More information1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species
Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for
More informationMatematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer
Lunds universitet Matematikcentrum Matematisk statistik Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer General information on labs During the rst half of the course MASA01 we will have
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationIntroduction to Simple Linear Regression
Introduction to Simple Linear Regression 1. Regression Equation A simple linear regression (also known as a bivariate regression) is a linear equation describing the relationship between an explanatory
More informationSTAT 350 Final (new Material) Review Problems Key Spring 2016
1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,
More informationSTAT 572 Assignment 5 - Answers Due: March 2, 2007
1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.
More informationStatistical Computing Session 4: Random Simulation
Statistical Computing Session 4: Random Simulation Paul Eilers & Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center p.eilers@erasmusmc.nl Masters Track Statistical Sciences,
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationQuantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression
Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly
More informationBIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users
BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationlm statistics Chris Parrish
lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................
More informationLogistic Regression. 0.1 Frogs Dataset
Logistic Regression We move now to the classification problem from the regression problem and study the technique ot logistic regression. The setting for the classification problem is the same as that
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationSimple Linear Regression for the Climate Data
Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO
More informationLater in the same chapter (page 45) he asserted that
Chapter 7 Randomization 7 Randomization 1 1 Fisher on randomization..................... 1 2 Shoes: a paired comparison................... 2 3 The randomization distribution................. 4 4 Theoretical
More informationInference with Heteroskedasticity
Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.
More information2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling
2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics, University of Washington 2015-07-24 Case control example We analyze
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationYou are permitted to use your own calculator where it has been stamped as approved by the University.
ECONOMICS TRIPOS Part I Friday 11 June 004 9 1 Paper 3 Quantitative Methods in Economics This exam comprises four sections. Sections A and B are on Mathematics; Sections C and D are on Statistics. You
More informationSimple linear regression
Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.
More informationReaction Days
Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More informationExamples of fitting various piecewise-continuous functions to data, using basis functions in doing the regressions.
Examples of fitting various piecewise-continuous functions to data, using basis functions in doing the regressions. David. Boore These examples in this document used R to do the regression. See also Notes_on_piecewise_continuous_regression.doc
More informationSLR output RLS. Refer to slr (code) on the Lecture Page of the class website.
SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationLecture 1 Intro to Spatial and Temporal Data
Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1
More informationSTAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS
STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationStat 401B Final Exam Fall 2015
Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationHomework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.
Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationEstimated Simple Regression Equation
Simple Linear Regression A simple linear regression model that describes the relationship between two variables x and y can be expressed by the following equation. The numbers α and β are called parameters,
More informationThe Statistical Sleuth in R: Chapter 9
The Statistical Sleuth in R: Chapter 9 Linda Loi Kate Aloisio Ruobing Zhang Nicholas J. Horton January 21, 2013 Contents 1 Introduction 1 2 Effects of light on meadowfoam flowering 2 2.1 Data coding, summary
More informationUnderstanding p Values
Understanding p Values James H. Steiger Vanderbilt University James H. Steiger Vanderbilt University Understanding p Values 1 / 29 Introduction Introduction In this module, we introduce the notion of a
More informationLinear Probability Model
Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If
More informationNote on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin
Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing
More informationOrdinary Least Squares Regression Explained: Vartanian
Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent
More informationStat 5303 (Oehlert): Randomized Complete Blocks 1
Stat 5303 (Oehlert): Randomized Complete Blocks 1 > library(stat5303libs);library(cfcdae);library(lme4) > immer Loc Var Y1 Y2 1 UF M 81.0 80.7 2 UF S 105.4 82.3 3 UF V 119.7 80.4 4 UF T 109.7 87.2 5 UF
More informationThis gives us an upper and lower bound that capture our population mean.
Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when
More informationIES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc
IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared
More informationSMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning
SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance
More informationSimple Linear Regression: One Quantitative IV
Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,
More informationLecture 5 - Plots and lines
Lecture 5 - Plots and lines Understanding magic Let us look at the following curious thing: =rnorm(100) y=rnorm(100,sd=0.1)+ k=ks.test(,y) k Two-sample Kolmogorov-Smirnov test data: and y D = 0.05, p-value
More informationSwarthmore Honors Exam 2012: Statistics
Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may
More informationRegression Diagnostics
Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2
More information