Lecture 5 - Plots and lines

Similar documents
SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

Lecture 18: Simple Linear Regression

Chapter 8 Conclusion

STAT 3022 Spring 2007

Introduction to Statistics and R

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Density Temp vs Ratio. temp

Chapter 3 - Linear Regression

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Business Statistics. Lecture 10: Course Review

Regression on Faithful with Section 9.3 content

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

lm statistics Chris Parrish

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

ANOVA (Analysis of Variance) output RLS 11/20/2016

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

Introduction to Linear Regression

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

Inferences on Linear Combinations of Coefficients

1 Multiple Regression

Holiday Assignment PS 531

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

GMM - Generalized method of moments

ST430 Exam 1 with Answers

Statistics. Introduction to R for Public Health Researchers. Processing math: 100%

Stat 5102 Final Exam May 14, 2015

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Regression and the 2-Sample t

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

Chapter 12: Linear regression II

Stat 401B Exam 2 Fall 2016

STAT 350. Assignment 4

Handout 4: Simple Linear Regression

probability George Nicholson and Chris Holmes 29th October 2008

MATH 644: Regression Analysis Methods

1 The Classic Bivariate Least Squares Model

Using R in 200D Luke Sonnet

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Final Exam. Name: Solution:

Chapter 8 (More on Assumptions for the Simple Linear Regression)

BMI 541/699 Lecture 22

probability George Nicholson and Chris Holmes 31st October 2008

Math 2311 Written Homework 6 (Sections )

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 9: June 6, Abstract. Regression and R.

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Introduction and Single Predictor Regression. Correlation

Lecture 9: Structural Breaks and Threshold Model

Stat 101 L: Laboratory 5

Regression and Models with Multiple Factors. Ch. 17, 18

Variance Decomposition and Goodness of Fit

Quantitative Understanding in Biology 2.2 Fitting Model Parameters

Checking the Poisson assumption in the Poisson generalized linear model

Factorial Analysis of Variance with R

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

Robustness and Distribution Assumptions

Regression and correlation

Comparing Nested Models

Newton s Cooling Model in Matlab and the Cooling Project!

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

AGEC 621 Lecture 16 David Bessler

Stat 5303 (Oehlert): Analysis of CR Designs; January

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing

1 Introduction 1. 2 The Multiple Regression Model 1

The Application of California School

General Linear Statistical Models - Part III

STAT 215 Confidence and Prediction Intervals in Regression

Multiple Regression Part I STAT315, 19-20/3/2014

Review of Multiple Regression

Nonlinear Models. Daphnia: Purveyors of Fine Fungus 1/30 2/30

Pumpkin Example: Flaws in Diagnostics: Correcting Models

L21: Chapter 12: Linear regression

Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11)

Consider fitting a model using ordinary least squares (OLS) regression:

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Handout 1: Predicting GPA from SAT

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

1 Use of indicator random variables. (Chapter 8)

Chapter 27 Summary Inferences for Regression

MS&E 226: Small Data

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Practice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh?

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Linear Regression Model. Badr Missaoui

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

Solutions to obligatorisk oppgave 2, STK2100

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Coefficient of Determination

Stat 401B Exam 2 Fall 2017

Package depth.plot. December 20, 2015

Class: Taylor. January 12, Story time: Dan Willingham, the Cog Psyc. Willingham: Professor of cognitive psychology at Harvard

Linear Probability Model

Diagnostics and Transformations Part 2

Advanced Regression Topics: Violation of Assumptions

Exercise 5.4 Solution

Transcription:

Lecture 5 - Plots and lines Understanding magic Let us look at the following curious thing: =rnorm(100) y=rnorm(100,sd=0.1)+ k=ks.test(,y) k Two-sample Kolmogorov-Smirnov test data: and y D = 0.05, p-value = 0.9996 alternative hypothesis: two.sided print(k) Two-sample Kolmogorov-Smirnov test data: and y D = 0.38, p-value = 1.071e-06 alternative hypothesis: two.sided If everything in R is vectors or lists, why is the result of the KS test printed like this? Here is another eample: s=summary() s Min. 1st Qu. Median Mean 3rd Qu. Ma. -2.22700-0.57190-0.08799 0.01895 0.72390 2.04100 The magic is the following: Even though things are always vectors or lists, they can have a class. To see the class of an object, we can use the class() function: class(s) [1] table class(k) 1

[1] htest And now the trick: When we say print(k), the actual function that is called is print.htest() for k, and print.table() for s. print.htest(k) Two-sample Kolmogorov-Smirnov test data: and y D = 0.38, p-value = 1.071e-06 alternative hypothesis: two.sided print.table(s) Min. 1st Qu. Median Mean 3rd Qu. Ma. -2.22700-0.57190-0.08799 0.01895 0.72390 2.04100 print.htest(s) Error in strsplit(, as.character(split), as.logical(etended)) : non-character argument in strsplit() print function (,...) UseMethod( print ) environment: namespace:base All this is not very important, yet. It only matters if we want to understand what function is called. help(print.htest) help() for print.htest is shown in browser netscape... Use help( print.htest, htmlhelp=false) or options(htmlhelp = FALSE) to revert. Warning message: Using non-linked HTML file: style sheet and hyperlinks may be incorrect in: help(print.htest) Another function that depends on its input is plot(). plot 2

awk: cmd. line:1: warning: escape sequence \. treated as plain. function (, y,...) if (is.null(attr(, class )) && is.function()) nms - names(list(...)) if (missing(y)) y - if (! from %in% nms) 0 else if (! to %in% nms) 1 else if (! lim %in% nms) NULL if ( ylab %in% nms) plot.function(, y,...) else plot.function(, y, ylab = paste(deparse(substitute()), () ),...) else UseMethod( plot ) environment: namespace:base plot(,y);v() y 2 1 0 1 2 2 1 0 1 2 z=as.factor( 0 ) 3

plot(z,);v() 2 1 0 1 2 FALSE TRUE plot(sin,from=-3,to=3);v() sin () 1.0 0.5 0.0 0.5 1.0 3 2 1 0 1 2 3 4

And now we can understand how to bring up help for these functions: class(sin) [1] function help(plot.function) help() for plot.function is shown in browser netscape... Use help( plot.function, htmlhelp=false) or options(htmlhelp = FALSE) to revert. Warning message: Using non-linked HTML file: style sheet and hyperlinks may be incorrect in: help(plot.function) class(z) awk: cmd. line:1: warning: escape sequence \. treated as plain. [1] factor help(plot.factor) help() for plot.factor is shown in browser netscape... Use help( plot.factor, htmlhelp=false) or options(htmlhelp = FALSE) to revert. Warning message: Using non-linked HTML file: style sheet and hyperlinks may be incorrect in: help(plot.factor) Some plot commands we already encountered qqplot Last time we learned about qqplot and qqnorm. Let us see how to interpret them: =rnorm(1000) 5

qqnorm();v() Normal Q Q Plot Sample Quantiles 3 2 1 0 1 2 3 3 2 1 0 1 2 3 Theoretical Quantiles abline(0,1,col= red );v() Normal Q Q Plot Sample Quantiles 3 2 1 0 1 2 3 3 2 1 0 1 2 3 Theoretical Quantiles 6

Warning message: parameter color couldn t be set in high-level plot() function abline adds a line to the plot. It has two parameters: slope and intercept. The col= parameter to plots specifies color. qqnorm();abline(0,0,col= red );abline(1,0,col= green );abline(0,1,col= blue ) abline(0,2,col= orange );v() Normal Q Q Plot Sample Quantiles 3 2 1 0 1 2 3 3 2 1 0 1 2 3 Theoretical Quantiles Let us add points on the right to the normal: =c( rnorm(1000), rnorm(100,mean=-5) ) 7

hist(,br=30);v() Frequency 0 50 100 150 200 Histogram of 8 6 4 2 0 2 qqnorm();abline(0,1,col=2);v() Normal Q Q Plot Sample Quantiles 8 6 4 2 0 2 3 2 1 0 1 2 3 Theoretical Quantiles 8

If we have more points on the left than we should have, then quantiles occur earlier, and thus the points are low than they should be. =rnorm(1000) =[ -1 ] qqnorm(); abline(0,1,col=2);v() Sample Quantiles 1 0 1 2 3 Normal Q Q Plot 3 2 1 0 1 2 3 Theoretical Quantiles 9

If we don t have enough points on the left, quantiles happen later, and the points move up. Now let us add points on the right: =c( rnorm(1000), rnorm(100,mean=3) ) qqnorm();abline(0,1,col=2);v() Normal Q Q Plot Sample Quantiles 2 0 2 4 3 2 1 0 1 2 3 Theoretical Quantiles If we have too many points on the right, then the last quantiles occur too late, and the points move up. plot =-10:10 ; y= 2 plot(,y) ; v() 10

y 0 20 40 60 80 100 10 5 0 5 10 plot(,y,type= l );v() y 0 20 40 60 80 100 10 5 0 5 10 We can also add to an eisting plot 11

plot(,y);v() y 0 20 40 60 80 100 10 5 0 5 10 points(,,col=2);v() y 0 20 40 60 80 100 10 5 0 5 10 12

abline(40,0,col= green );v() y 0 20 40 60 80 100 10 5 0 5 10 lines(,y);v() y 0 20 40 60 80 100 10 5 0 5 10 title( A nice plot );v() 13

A nice plot y 0 20 40 60 80 100 10 5 0 5 10 plot(,y,pch= M, col=3, lab= this is, ylab= this is y, main= A very nice plot );v() + + A very nice plot this is y 0 20 40 60 80 100 M M M M M M M M M MM MMM MM 10 5 0 5 10 this is M M M M M 14

The most useful pch is pch=. plot(,y,pch=. );v() y 0 20 40 60 80 100 10 5 0 5 10 Saving a plot to a file We have a couple of options for saving a plot to a file: One is saving as postscript. For others, look at help(devices). First we specify the device, and the file: postscript(file= graph1.eps,width=5,height=5,horizontal=f,onefile=f) 15

Then we do all the plot commands: plot(sin,-3,3) abline(-1,0,col=2); abline(1,0,col=3) Finally we close the device; dev.off() X11 2 Putting several plots into one two.by.two=matri( 1:4, 2, 2) two.by.two [,1] [,2] [1,] 1 3 [2,] 2 4 layout(two.by.two) layout.show(4);v() 1 3 2 4 16

plot(sin,-3,3) plot(cos,-3,3,col=2) plot(tan,-1,1) plot( function() 2, -2, 2) v() sin () 1.0 0.5 3 1 1 3 tan () 1.5 0.5 1.0 0.0 1.0 cos () 1.0 0.5 3 1 1 3 function() ^2 () 0 2 4 2 0 1 2 17

layout(1) plot(sin,-3,3);v() sin () 1.0 0.5 0.0 0.5 1.0 3 2 1 0 1 2 3 l=matri( c(1,1,2,3),2,2) l [,1] [,2] [1,] 1 2 [2,] 1 3 layout(l); layout.show(3); v() 18

2 1 3 plot(sin,-3,3); plot(cos,-6,6,col=2); plot(tan,-1.5,1.5,col=3) v() sin () 1.0 0.5 0.0 0.5 1.0 3 1 1 3 cos () tan () 1.0 0.5 15 0 10 6 2 2 6 1.5 0.0 1.0 19

why are the figures so small? R reserves space for the title, the aes, the labels, and so on. The space is proportional to the character size, and that size does not change when we split the plot. par( mar ) [1] 5.1 4.1 4.1 2.1 above we see how much space is kept for the margin, in lines. top, bottom, left, right. So, if we want to keep figures looking the same, we should scale the tet: par(ce=0.5) plot(sin,-3,3); plot(cos,-6,6); plot(tan,-1,1); v() sin () 1.0 0.5 0.0 0.5 1.0 cos () tan () 1.0 0.5 0.0 0.5 1.0 1.5 0.5 0.5 1.5 6 4 2 0 2 4 6 3 2 1 0 1 2 3 1.0 0.5 0.0 0.5 1.0 20

par( ce ) [1] 0.5 layout(1) plot(sin,-3,3); v() sin () 1.0 0.5 0.0 0.5 1.0 3 2 1 0 1 2 3 par( ce ) [1] 1 The function layout() changes the character size! Linear regression a=read.table( data/regression.tt,sep= \t,head=t) a growth tannin 1 12 0 21

2 10 1 3 8 2 4 11 3 5 6 4 6 7 5 7 2 6 8 3 7 9 3 8 plot(a$tannin,a$growth,pch=16);v() a$growth 2 4 6 8 10 12 0 2 4 6 8 a$tannin The growth of the caterpilar seems to depend on the tannin content. we can try to fit a linear regression. regression=lm( growth tannin, data = a ) regression Call: lm(formula = growth tannin, data = a) Coefficients: (Intercept) tannin 11.756-1.217 this means to do a linear model of growth, so that it depends on tannin. 22

plot( growth tannin, data=a);v() growth 2 4 6 8 10 12 0 2 4 6 8 tannin abline( regression );v() growth 2 4 6 8 10 12 0 2 4 6 8 tannin 23

R found a good line. The method tries to minimize the square distance to the line. Here is another line: reg1=lm( growth reg1 1, data=a) Call: lm(formula = growth 1, data = a) Coefficients: (Intercept) 6.889 In this case, R tries to fit a line that doesn t depend on anything, i.e. it is constant. The best constant to fit the growth is 6.889 abline(6.889,0,col=2); v() growth 2 4 6 8 10 12 0 2 4 6 8 tannin mean(a$growth) [1] 6.888889 Such a line will be a good fit if we have normal errors. For eample: = 1:100 24

y= 3* + 15 + rnorm(100,sd=10) plot(,y); v() y 0 50 100 200 300 0 20 40 60 80 100 reg=lm( y ) reg Call: lm(formula = y ) Coefficients: (Intercept) 16.567 2.977 We can see that the right values, 3 and 15 were almost found. summary(reg) Call: lm(formula = y ) Residuals: Min 1Q Median 3Q Ma -31.14652-5.59417-0.03833 6.73383 31.15194 Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) 16.56712 2.12266 7.805 6.6e-12 *** 2.97744 0.03649 81.592 2e-16 *** 25

--- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 10.53 on 98 degrees of freedom Multiple R-Squared: 0.9855, Adjusted R-squared: 0.9853 F-statistic: 6657 on 1 and 98 DF, p-value: 2.2e-16 Summary tells us that the standard error on the intercept is 2.1, and on the slope 0.03 That means that the estimate of the slope is better than that of the intercept. 26