Stat 4510/7510 Homework 7
|
|
- Shannon Tate
- 5 years ago
- Views:
Transcription
1 Stat 4510/7510 Due: 1/10. Stat 4510/7510 Homework 7 1. Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader can determine how you obtained your answer. Concrete is the most important material in civil engineering. Concrete compressive strength is a highly nonlinear function of age and ingredients. The dataset concrete.csv contains the following information: Cement (component 1) (kg per cubic meter) Blast Furnace Slag (component 2) (kg per cubic meter) Fly Ash (component 3) (kg per cubic meter) Water (component 4) (kg per cubic meter) Superplasticizer (component 5) (kg per cubic meter) Coarse Aggregate (component 6) (kg per cubic meter) Fine Aggregate (component 7) (kg per cubic meter) Age (day) Concrete compressive strength (MPa, megapascals) Use these data to answer the following questions. (a). Use the poly() function fit polynomial regressions for predicting Compressive Strength using Age. Plot the data and add these polynomial fits ranging from degree 1 to 7. Be sure to include a legend (see?legend). Additionally, display a table of the RSS for each degree. 1
2 Stat 4510/7510 Due: 2/10 concrete=read.csv("concrete.csv") concrete=concrete[,-1] plot(concrete$age,concrete$concretecompressivestrength, xlab="age",ylab="compressive Strength") for(i in 1:7){ fit=lm(concretecompressivestrength~poly(age,degree=i,raw=true),data=concrete) points(seq(0,400,length.out=1000), predict.lm(fit,newdata=list(age=seq(0,400,length.out=1000))), col=i+1,type="l")} legend("bottomright",col=2:8, legend=c("degree 1", "degree 2","degree 3","degree 4", "degree 5","degree 6","degree 7"),lty=1) Compressive Strength degree 1 degree 2 degree 3 degree 4 degree 5 degree 6 degree Age 2
3 Stat 4510/7510 Due: 3/10 RSS=NULL for(i in 1:7){ fit=lm(concretecompressivestrength~poly(age,degree=i,raw=true),data=concrete) RSS[i]=sum(fit$resid^2)} Polynomial RSS (b). Based on your plot in (a), which polynomial degree do you think fits the trends in the data best? The degree 3 polynomial appears to be the best fitted trend. The higher order polynomials appear to overfit the data. (c). Now use 10-fold cross validation to select the best degree polynomial. Which degree was chosen? Does it match the conclusion you made in (b)? You might consider making a plot to justify your decision. K=10 cv.error=matrix(ncol=7,nrow=k) set.seed(1) folds = sample(1:k,nrow(concrete),replace=t) for(k in 1:K){ CV.train = concrete[folds!= k,] CV.test = concrete[folds == k,] for(i in 1:7){ cv.fit=lm(concretecompressivestrength~poly(age,degree=i,raw=true), data=cv.train) cv.pred=predict.lm(cv.fit,newdata=cv.test) cv.error[k,i]=mean((cv.pred-cv.test$concretecompressivestrength)^2) }} apply(cv.error,2,mean) [1] In terms of out-of-sample prediction, the model with polynomial degree of 4 or more have nearly the same predictive error. Therefore, we suggest the model with degree 4 as the models with degree 5 or more are likely overfitting. 3
4 Stat 4510/7510 Due: 4/10 (d). Use the bs() function to fit a regression spline to predict Compressive Strength using Age. Report the output for the fit using 5 degrees of freedom (which results in 1 knot!). Plot the resulting fit. Where is the knot placed? library(splines) fit=lm(concretecompressivestrength~bs(age,df=5),data=concrete) summary(fit) Call: lm(formula = Concretecompressivestrength ~ bs(age, df = 5), data = concrete) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** bs(age, df = 5) ** bs(age, df = 5) < 2e-16 *** bs(age, df = 5) < 2e-16 *** bs(age, df = 5) bs(age, df = 5) e-15 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 13.7 on 1024 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 5 and 1024 DF, p-value: < 2.2e-16 plot(concrete$age,concrete$concretecompressivestrength, xlab="age",ylab="compressive Strength") points(seq(0,400,length.out=1000), predict.lm(fit,newdata=list(age=seq(1,365,length.out=1000))), col=2,type="l") 4
5 Stat 4510/7510 Due: 5/10 Compressive Strength Age #attr(bs(concrete Age,df=5,intercept=T),"knots") For 5 degrees of freedom, there is only 1 knot, which is at the median value of Age, 28. (e). (7510*) Perform 10-fold cross-validation in order to select the best single-knot model. Note that the knot locations can be anywhere between (but not equal to) the miniumum and maximum Age values in the data. Make a plot of the errors verus knot location and describe your results. K=10 cv.error=matrix(ncol=365,nrow=k) set.seed(1) folds = sample(1:k,nrow(concrete),replace=t) 5
6 Stat 4510/7510 Due: 6/10 for(k in 1:K){ CV.train = concrete[folds!= k,] CV.test = concrete[folds == k,] for(i in 2:364){ cv.fit=lm(concretecompressivestrength~bs(age,knots=i),data=cv.train) cv.pred=predict.lm(cv.fit,newdata=cv.test) cv.error[k,i]=mean((cv.pred-cv.test$concretecompressivestrength)^2) }} plot(1:365,apply(cv.error,2,mean)) apply(cv.error, 2, mean) :365 The best single-knot location chosen by 10-fold cross validation is Age=159. (f). Fit a smoothing spline and use cross validation to select λ. What is the chosen degrees of freedom? Plot the fit along with the fit from part (d). How do they compare? 6
7 Stat 4510/7510 Due: 7/10 fit.ss=smooth.spline(x=concrete$age,y=concrete$concretecompressivestrength, cv=true) fit.ss$df [1] fit.bs=lm(concretecompressivestrength~bs(age,knots=159),data=concrete) plot(concrete$age,concrete$concretecompressivestrength) lines(fit.ss,col=2) points(seq(0,400,length.out=1000), predict.lm(fit.bs,newdata=list(age=seq(1,365,length.out=1000))), col=4,type="l") legend("bottomright",col=c(2,4), legend=c("smoothing spline","regression spline"),lty=1) 7
8 Stat 4510/7510 Due: 8/10 concrete$concretecompressivestrength smoothing spline regression spline concrete$age (g). Split the data into a 90% training set and a 10% test set. Be sure to set a seed of 1 for consistency of results. set.seed(1) train.set = sample(1:nrow(concrete),.9*nrow(concrete),replace=false) concrete.train=concrete[train.set,] concrete.test=concrete[-train.set,] (h). Fit a linear regression on the training data using Compressive Strength as the response and all other variables as predictors. Which variables are significant? 8
9 Stat 4510/7510 Due: 9/10 fit.lm=lm(concretecompressivestrength~.,data=concrete.train) summary(fit.lm) Call: lm(formula = Concretecompressivestrength ~., data = concrete.train) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Cement < 2e-16 *** BlastFurnaceSlag < 2e-16 *** FlyAsh e-10 *** Water *** Superplasticizer ** CoarseAggregate FineAggregate Age < 2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 918 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 8 and 918 DF, p-value: < 2.2e-16 All the of the variables are significant at the α = 0.1 level. (i). Predict Compressive Strength on the test set, what is the test MSE? pred.lm=predict.lm(fit.lm,newdata=concrete.test) mean((concrete.test$concretecompressivestrength-pred.lm)^2) [1] The test MSE is (j). Fit a GAM on the training data using Compressive Strength as the response and all variables except Age as linear predictors. For the Age variable, investigate different degrees of freedom for the smoothing splines. Is there evidence of a non-linear relationship? 9
10 Stat 4510/7510 Due: 10/10 library(gam) Loading required package: foreach Loaded gam 1.16 fit.gam1=gam(concretecompressivestrength~.,data=concrete.train) fit.gam5=gam(concretecompressivestrength~cement+blastfurnaceslag+ FlyAsh+Water+Superplasticizer+CoarseAggregate+ FineAggregate+s(Age,5),data=concrete.train) anova(fit.gam1,fit.gam5) Analysis of Deviance Table Model 1: Concretecompressivestrength ~ Cement + BlastFurnaceSlag + FlyAsh + Water + Superplasticizer + CoarseAggregate + FineAggregate + Age Model 2: Concretecompressivestrength ~ Cement + BlastFurnaceSlag + FlyAsh + Water + Superplasticizer + CoarseAggregate + FineAggregate + s(age, 5) Resid. Df Resid. Dev Df Deviance Pr(>Chi) < 2.2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Yes. The p-value comparing the linear model with the non-linear GAM model with 5 degrees of freedom for Age is significant. Therefore, there is strong evidence of a non-linear relationship. (k). Predict Compressive Strength on the test set for different degrees of freedom of the smoothing spline for Age. What degrees of freedom has the lowest test MSE? Is it better than that of the linear regression from part (i)? ss.error=na for(i in 1:20){ fit.gam=gam(concretecompressivestrength~cement+blastfurnaceslag+ FlyAsh+Water+Superplasticizer+CoarseAggregate+ FineAggregate+s(Age,i),data=concrete.train) pred.gam=predict(fit.gam,newdata=concrete.test) ss.error[i]=mean((concrete.test$concretecompressivestrength-pred.gam)^2) } The test MSE from the GAM model with degrees of freedom 7 on the Age variable is 42.3, which is much lower than that from the linear model. Therefore, we conclude that the GAM is better than the linear model. 10
MATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationStat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov
Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationR Output for Linear Models using functions lm(), gls() & glm()
LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationChapter 12: Linear regression II
Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model
More informationStat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz
Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationStatistical Prediction
Statistical Prediction P.R. Hahn Fall 2017 1 Some terminology The goal is to use data to find a pattern that we can exploit. y: response/outcome/dependent/left-hand-side x: predictor/covariate/feature/independent
More informationIntelligible Models for Classification and Regression
for Classification and Regression 1 Rich Caruana 2 Johannes Gehrke 1 Department of Computer Science 1 Microsoft Research 2 Cornell University Microsoft Corporation May 7, 2012 Brazil - Cornell Workshop
More informationNonlinear Modeling lab A
Nonlinear Modeling lab A library(islr) attach(wage) fit=lm(wage~poly(age,4),data=wage) fit2=lm(wage~poly(age,4,raw=t),data=wage) fit2a=lm(wage~age+i(age^2)+i(age^3)+i(age^4),data=wage) fit2b=lm(wage~cbind(age,age^2,age^3,age^4),data=wage)
More informationExample: 1982 State SAT Scores (First year state by state data available)
Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative
More informationPoisson Regression. The Training Data
The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following
More informationSTAT 420: Methods of Applied Statistics
STAT 420: Methods of Applied Statistics Model Diagnostics Transformation Shiwei Lan, Ph.D. Course website: http://shiwei.stat.illinois.edu/lectures/stat420.html August 15, 2018 Department
More informationStat 401B Exam 2 Fall 2015
Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationSCHOOL OF MATHEMATICS AND STATISTICS
SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination
More informationPAPER 206 APPLIED STATISTICS
MATHEMATICAL TRIPOS Part III Thursday, 1 June, 2017 9:00 am to 12:00 pm PAPER 206 APPLIED STATISTICS Attempt no more than FOUR questions. There are SIX questions in total. The questions carry equal weight.
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More informationHomework1 Yang Sun 2017/9/11
Homework1 Yang Sun 2017/9/11 1. Describe data According to the data description, the response variable is AmountSpent; the predictors are, Age, Gender, OwnHome, Married, Location, Salary, Children, History,
More informationAnalytics 512: Homework # 2 Tim Ahn February 9, 2016
Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction
More informationUnderstanding and Predicting Crime Rates Using Statistical Methods Carlos Espino, Xavier Gonzalez, Diego Llarrull, Woojin Kim December 15, 2015
Understanding and Predicting Crime Rates Using Statistical Methods Carlos Espino, Xavier Gonzalez, Diego Llarrull, Woojin Kim December 15, 215 Contents 1 Introduction 2 2 Dataset 2 3 Analysis 6 3.1 Influencial
More informationProblem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56
STAT 391 - Spring Quarter 2017 - Midterm 1 - April 27, 2017 Name: Student ID Number: Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 Directions. Read directions carefully and show all your
More informationStat 401B Final Exam Fall 2015
Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationStat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb
Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra
More informationReaction Days
Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)
More informationStat 5303 (Oehlert): Randomized Complete Blocks 1
Stat 5303 (Oehlert): Randomized Complete Blocks 1 > library(stat5303libs);library(cfcdae);library(lme4) > immer Loc Var Y1 Y2 1 UF M 81.0 80.7 2 UF S 105.4 82.3 3 UF V 119.7 80.4 4 UF T 109.7 87.2 5 UF
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationMath 2311 Written Homework 6 (Sections )
Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationStat 401XV Final Exam Spring 2017
Stat 40XV Final Exam Spring 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationSTAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS
STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in
More information1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species
Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for
More informationStat 401B Exam 2 Fall 2016
Stat 40B Eam Fall 06 I have neither given nor received unauthorized assistance on this eam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will
More informationLet s see if we can predict whether a student returns or does not return to St. Ambrose for their second year.
Assignment #13: GLM Scenario: Over the past few years, our first-to-second year retention rate has ranged from 77-80%. In other words, 77-80% of our first-year students come back to St. Ambrose for their
More informationInteractions in Logistic Regression
Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/
More informationHomework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.
Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for
More informationGeneral Linear Statistical Models
General Linear Statistical Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin This framework includes General Linear Statistical Models Linear Regression Analysis of Variance (ANOVA) Analysis
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationMath 3339 Homework 2 (Chapter 2, 9.1 & 9.2)
Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationBooklet of Code and Output for STAD29/STA 1007 Midterm Exam
Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 NBA attendance data........................ 2 2 Regression model for NBA attendances...............
More informationSTAT Fall HW8-5 Solution Instructor: Shiwen Shen Collection Day: November 9
STAT 509 2016 Fall HW8-5 Solution Instructor: Shiwen Shen Collection Day: November 9 1. There is a gala dataset in faraway package. It concerns the number of species of tortoise on the various Galapagos
More informationClass: Dean Foster. September 30, Read sections: Examples chapter (chapter 3) Question today: Do prices go up faster than they go down?
Class: Dean Foster September 30, 2013 Administrivia Read sections: Examples chapter (chapter 3) Gas prices Question today: Do prices go up faster than they go down? Idea is that sellers watch spot price
More informationSolutions to obligatorisk oppgave 2, STK2100
Solutions to obligatorisk oppgave 2, STK2100 Vinnie Ko May 14, 2018 Disclaimer: This document is made solely for my own personal use and can contain many errors. Oppgave 1 We load packages and read data
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationThe Big Picture. Model Modifications. Example (cont.) Bacteria Count Example
The Big Picture Remedies after Model Diagnostics The Big Picture Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Residual plots
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationStat 401B Exam 3 Fall 2016 (Corrected Version)
Stat 401B Exam 3 Fall 2016 (Corrected Version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationThe linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.
8 Nonlinear effects Lots of effects in economics are nonlinear Examples Deal with these in two (sort of three) ways: o Polynomials o Logarithms o Interaction terms (sort of) 1 The linear model Our models
More informationSTAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis
STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO
More informationPackage MAVE. May 20, 2018
Type Package Title Methods for Dimension Reduction Version 1.3.8 Date 2018-05-18 Package MAVE May 20, 2018 Author Hang Weiqiang, Xia Yingcun Maintainer Hang Weiqiang
More informationLecture 1 Intro to Spatial and Temporal Data
Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1
More information1 Use of indicator random variables. (Chapter 8)
1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting
More informationModel Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007
Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20 The Big
More informationService & Repair Demand Forecasting Timothy Wong (Senior Data Scientist, Centrica plc)
Service & Repair Demand Forecasting Timothy Wong (Senior Data Scientist, Centrica plc) European R Users Meeting 14 th -16 th May, 2018 Budapest, Hungary We supply energy and services to over 27 million
More informationMultiple Linear Regression. Chapter 12
13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationStatistics GIDP Ph.D. Qualifying Exam Methodology
Statistics GIDP Ph.D. Qualifying Exam Methodology May 26, 2017, 9:00am-1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish
More informationSample solutions. Stat 8051 Homework 8
Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if
More informationlm statistics Chris Parrish
lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper
Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions
More informationStatistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).
Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results
More informationPsychology 405: Psychometric Theory
Psychology 405: Psychometric Theory Homework Problem Set #2 Department of Psychology Northwestern University Evanston, Illinois USA April, 2017 1 / 15 Outline The problem, part 1) The Problem, Part 2)
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationGeneral Linear Statistical Models - Part III
General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.
More informationActivity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression
Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear
More informationResiduals from regression on original data 1
Residuals from regression on original data 1 Obs a b n i y 1 1 1 3 1 1 2 1 1 3 2 2 3 1 1 3 3 3 4 1 2 3 1 4 5 1 2 3 2 5 6 1 2 3 3 6 7 1 3 3 1 7 8 1 3 3 2 8 9 1 3 3 3 9 10 2 1 3 1 10 11 2 1 3 2 11 12 2 1
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationMultiple Regression: Example
Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c
More informationWeek 7.1--IES 612-STA STA doc
Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ
More informationInternational Journal of Scientific Research and Reviews
Research article Available online www.ijsrr.org ISSN: 2279 0543 International Journal of Scientific Research and Reviews Prediction of Compressive Strength of Concrete using Artificial Neural Network ABSTRACT
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationSTK4900/ Lecture 7. Program
STK4900/9900 - Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationGMM - Generalized method of moments
GMM - Generalized method of moments GMM Intuition: Matching moments You want to estimate properties of a data set {x t } T t=1. You assume that x t has a constant mean and variance. x t (µ 0, σ 2 ) Consider
More informationHands on cusp package tutorial
Hands on cusp package tutorial Raoul P. P. P. Grasman July 29, 2015 1 Introduction The cusp package provides routines for fitting a cusp catastrophe model as suggested by (Cobb, 1978). The full documentation
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More informationCOMPARING PARAMETRIC AND SEMIPARAMETRIC ERROR CORRECTION MODELS FOR ESTIMATION OF LONG RUN EQUILIBRIUM BETWEEN EXPORTS AND IMPORTS
Applied Studies in Agribusiness and Commerce APSTRACT Center-Print Publishing House, Debrecen DOI: 10.19041/APSTRACT/2017/1-2/3 SCIENTIFIC PAPER COMPARING PARAMETRIC AND SEMIPARAMETRIC ERROR CORRECTION
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationNested 2-Way ANOVA as Linear Models - Unbalanced Example
Linear Models Nested -Way ANOVA ORIGIN As with other linear models, unbalanced data require use of the regression approach, in this case by contrast coding of independent variables using a scheme not described
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More information