Topics on Statistics 2
|
|
- Victor Gaines
- 5 years ago
- Views:
Transcription
1 Topics on Statistics 2 Pejman Mahboubi March 7, Regression vs Anova In Anova groups are the predictors. When plotting, we can put the groups on the x axis in any order we wish, say in increasing or decreasing order of their means or even alphabetical order of the group names. There is no relation between the means µ 1, µ 2,, µ g, other than the ones specified by constraints. In regression, predictors are numbers, say average winter time daily temperature, which has a natural order on the horizontal axis T. Let y be their corresponding values of the mean energy consumption. Furthermore, the regression assumption is that these means form a straight line. > set.seed(1114) > T<-sort(runif(15,20,50))#independent or predictor variable > E< *T+rnorm(15,0,.2)#dependent or response variable > (df<-data.frame(temperature=round(t,1),energy=round(e,2))) Temperature Energy > plot(t,e,xlab="temperature",ylab="energy",cex.lab=.5,cex.axis=.4,pch=20,tcl=-0.1) 1
2 Energy Temperature 1. We assume at every temperature T = t, the value E = e is sampled from an independent normal distribution which is assigned to that specific temperature t 2. Therefore, each temperature corresponds to its own population. 3. We assume that the variance of the populations are the same: σ We assume that the mean of these populations are on a straight line. (the red line below) What we are looking for is the equation of the straight line that goes through the points: > plot(t,e,xlab="temperature",ylab="energy",cex.lab=.5,cex.axis=.4,pch=20,tcl=-0.1) > abline(lm(e~t),col='red') 2
3 Energy Temperature 2 Prediction What is your predicted energy consumption on a day that temperature is 30, or 43.44? The regression model claims that the average values fo the energy consumption falls on a line, R computes β 0 and β 1 : > fit<-lm(energy~temperature,df) > summary(fit) e := e(t) = β 0 + β 1 t 3
4 Call: lm(formula = Energy ~ Temperature, data = df) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** Temperature e-07 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 13 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 13 DF, p-value: 4.541e-07 If you just want to see the coefficients, write > coef(fit) (Intercept) Temperature Therefore, β 0 = and β 1 = R has a prediction function. It is very simple. Something like: > predict_fn<-function(lmobj,t){ + sum(coef(lmobj)*c(1,t)) + } > predict_fn(fit,44) [1] or we can use R s function: > newdata = data.frame(temperature=44) > predict(fit,newdata)
5 3 Fitted values vs. Actual For each temperature t, the fitted value is the energy consumption predicted by the model. The fitted value is the y value of the point in the red line for the specific t. There are multiple ways to compute them. 1. Use the predict function with the original data frame > predict(fit,df) Use the object fit: > fit$fitted.value Use our own function > sapply(df$temperature,function(x) predict_fn(fit,x)) [1] [9] Residuals Just as in anova, the distance between the fitted value and the actual value is called residual. residaul at t = observed e fitted value e residaul at t = e ê We can easily compute all 15 residuals. There are multiple ways of doing this: 1. Use the predict function > df$energy-predict(fit,df)
6 2. Use the resid() function > resid(fit) Remember that populations of energy corresponding to different temperatures are independent. Therefore, residuals are independent normal. Residuals are sampled from a centered normal distribution with fixed variance. Let s take a look at the histograms of the residuals > hist(resid(fit),10) Histogram of resid(fit) Frequency resid(fit) qqpnorm() is another tool that allows us to compare any set of numbers with normal distributions. > qqnorm(resid(fit),ann=false,cex.lab=.4,cex.axis=.4,pch=20,tcl=-0.1,cex.main=.8,mgp = > mtext(side=1,text="theoretical Quantile", line=1,cex.lab=.4,cex.axis=.4,pch=20,tcl=-0 6
7 > mtext(side = 2, text = "Sample Quantile", line = 1,cex.lab=.4,cex.axis=.4,pch=20,tcl= > qqline(resid(fit)) Sample Quantile Theoretical Quantile 4 Explained, Unexplained variation and R-squared If we consider every temperature as a separate group, similar to anova we can define sum of square of residuals as the undefined variation. Similarly, we can compute the grand mean and and define the distance of the fitted values to the grand mean as the explained variation. In our example we have the residuals, so the unexplained variation is > (within.var<-sum((resid(fit))^2)) [1] The explained variation is > (between.var<-sum((fitted.values(fit)-mean(df$energy))^2)) [1]
8 and the total variation is > (total.var<-var(df$energy)*(length(df$energy)-1)) [1] And, once again we can check that total.var equals within.var + between.var > within.var+between.var [1] We also can compute the R-squared, which is the ratio of the explained variation to the total variation, or in codes > (r.sqrd<-between.var/total.var) [1] which matches the R-squared reported by the summary() function. F-statistics also mentioned in the summary() output. Remember that F-statistics is the average Explained var to the average Unexplained var. So the question is what are the degrees of freedom for the explained and unexplained variations. The unexplained variation comes from the residuals. There are 15 residuals r 1,, r 15. But they are not arbitrary numbers. Linear regression puts 2 constraints on them 1 : Constraint1 The sum of residuals is zero: Let s check this: r r n = 0. > sum(resid(fit)) [1] e-17 Constraint2 The predictor Temperature is perpendicular to the residuals, i.e., T emperature Residual = (t 1,, t 15 ) (r 1,, r 15 ) = t 1 r 1 + t 2 r t 15 r 15 = 0 Let s check this: > sum(df$temperature*resid(fit)) [1] e-14 Therefore, with two constraints, the degree of freedom of the residuals is 15-2=13. Next look at the between variations, which is the distance of the 15 fitted values to the horizontal line Energy = mean(df$energy), grand mean. The only parameter we need for computing between.var is β 1 (it is not so trivial though). Therefore the degree of freedom of the explained var is 1. Therefore, the F-statistics is 1 We will soon see where these constraints come from 8
9 > (between.var/1)/(within.var/13) [1] Definition 1 (Erros vs. Residuals). Let s draw 5 samples from N(10, 1) > (x<-rnorm(5,10,1)) [1] You can check that mean(x) is not zero, and var(x) is not one. > mean(x) [1] > var(x) [1] errors are x 10 > (errors=x-10)# distance to the mean of the population [1] You can check that mean(error)!=0, > mean(errors) [1] If we don t privy to the population, Errors remain unknown to us. But residuals are x- mean(x) > (residuals<-x-mean(x)) [1] And mean of residuals is zero > mean(residuals) [1] e-16 9
10 4.1 Equation of the regression line Assume, we are given points > head(df) Temperature Energy Likelihood Function Cost Function In linear regression, cost function is the sum of square of residuals. Let f(x) = a + b x denote the regression line, then Then the cost function is Residual corresponding pint T = E f(t ) = E a b T. (1) 4.2 Pearson s Correlation Coefficient 15 cost(df, f) = (E f(t )) 2 (2) 1 15 = (E a b T ) 2 (3) 1 If x = x 1,, x n and y = y 1,, y n, then the correlation coefficient r between x and y is defined as n i=1 (x x)(y ȳ) r = n i=1 (x x)2 n i=1 (y ȳ)2 or n i=1 (x x)(y ȳ) covar(x, y) r = = sd(x)sd(y) sd(x)sd(y) where x and ȳ denotes the mean of x and y respectively r 1. This follows from a mathematical inequality called Schwarz inequality, 2. If r = 1, then there is a perfect negative linear relationship between x and y. If r = 1, then there is a perfect positive linear relationship between x and y. 10
11 3. If r = 0, then there is no linear relationship between x and y. 4. All other values of r tell us that the relationship between x and y is not perfect. The closer r is to 0, the weaker the linear relationship. The closer r is to 1, the stronger the negative linear relationship. And, the closer r is to 1, the stronger the positive linear relationship The Schwarz inequality states that X n Y n 2 X 2 n Y 2 n. Let see an example > x<-rnorm(10);y<-runif(10,20,1000) > #Schwarz inequality implies that > (sum(x*y))^2<sum(x^2)*sum(y^2) [1] TRUE The dot product of two vectors x, y is a measure of similarity between the two. To make it comparable, we can normalize the vectors by their lengths. For example for x = (1, 0) let us see which one of the following vectors is most similar to x according to the dot product. y1 = (1,.2), y2 = c(1, 1), y3 = c(0, 1). > x<-c(1,0) > y1<-c(1,.2);y2<-c(1,1);y3<-c(0,1) > sapply(list(y1,y2,y3),function(z) sum(x*z)/sum(z^2)) [1] Returning to energy vs. temperature, let us see what is the correlation coefficient between the 2. > with(df,cor(temperature,energy)) [1] It is very close to 1, but the slope of the regression line is > coef(fit) (Intercept) Temperature That is very close to 0. So, the value of b doesn t say much about the true strength of the relation between x and y. But, this is because the variance on the y axis is much smaller than the variance on x axis. If we multiply b by sd(temperature)/sd(energy). > unname(coef(fit)[2])*sd(df$temperature)/sd(df$energy) [1] By the way, this gives us the formula for computing b, i.e., b = r sd(y) sd(x) = covar(x, y) sd(x) sd(y) sd(y) sd(x) = covar(x, y) var(x) = n i=1 (x x)(y ȳ) var(x) (3) 11
12 4.2.1 Connection to the R-squared Both r and R 2 are indications of the goodness of the fit. For computing r we don t need to fit the data. To compute R 2 we need the explained and unexplained data. Means that we need the fitted values to compute residuals. We know that 0 R 2 1 and 1 r 1. However, r 2 just like R 2 is positive and in simple linear regression, they are closely related. The relationship is r 2 = R 2 (3) If R 2 =.9, then r 2 =.9, which means r =.9 = So either r = or r = How can we find it out? The answer is, if b > 0, then r > 0, and if b < 0, then we take the negative value of r. Example 1. Using the fit object calculate Pearson r. Solution. Since the model fit is linear, we can compute r using R 2. First we find R 2 > summary(fit)$r.squared [1] Therefore, the Pearson correlation is one of the two roots of R 2 > (sqrt(summary(fit)$r.squared)) [1] > #or > -(sqrt(summary(fit)$r.squared)) [1] Since b is negative: > (b=coef(fit)[2]) Temperature Therefore, r is also negative, and r = cor(x,y) function: We can compute r directly using > cor(df$energy,df$temperature) [1] Remark r measures the linear relation. R 2 measure the percentage of the variation explained by the model. So R 2 is not model independent r is. 12
13 Consider the following data: > x<-sort(runif(20,-2,2)) > y<-x^2+rnorm(20,-3,.1) > plot(x,y) y x Here, there is an almost perfect relation between x and y, i.e., y = x 2 3. Let me add the plot of y = x > x<-sort(runif(20,-2,2)) > y<-x^2+rnorm(20,-3,.1) > plot(x,y) > a<-seq(from = -2,to = 2,.5) 13
14 > b<-a^2-3 > lines(a,b) y x If we don t know that a line is not a best fit, we would do this: > fit.1<-lm(y~x) Let s look at the result: > summary(fit.1) Call: lm(formula = y ~ x) 14
15 Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** x Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 18 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 18 DF, p-value: Let s look at the plots of prediction and the data > x<-sort(runif(20,-2,2)) > y<-x^2+rnorm(20,-3,.1) > plot(x,y) > abline(fit.1,col='red') 15
16 y x So, what should we do. We know that a quadratic term would create a better result. Here is how we proceed > fit.2<-lm(y~i(x^2)+i(x)) > summary(fit.2) Call: lm(formula = y ~ I(x^2) + I(x)) Residuals: Min 1Q Median 3Q Max
17 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** I(x^2) <2e-16 *** I(x) Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 17 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1389 on 2 and 17 DF, p-value: < 2.2e-16 Note that R 2 = Let s check the correlation coefficient: > cor(x,y) [1] How to plot the data and the model? > plot(x,y) > lines(a,predict(object = fit.2,newdata = data.frame(x=a)),col='red') 17
18 y x 4.3 Optimization and Gradient Descent Remember how we computed the coefficients a and b for the simple linear regression y = a + b x. Assume the data is x = x 1,, x n and y = y 1,, y n, then the fitted values are Then residuals are ŷ i = a + b x i for i = 1,, n (3) res i = y i ŷ i for i = 1,, n = y i (a + b x i ) for i = 1,, n 18
19 Therefore, the cost function is cost(data, model) = n res 2 i = i=1 n (y i (a + b x i )) 2 (3) i=1 Therefore, cost is actually a function of a and b. The goal is to determine a and b that minimizes cost(a, b). Last time we saw how to do it mathematically, i.e., we compute the roots of the partial derivatives, by solving a cost = 0 b cost = 0. Computing softwares don t solve this equation, because as the number of predictors increases, the complexity increases significantly. The solution often involves finding the inverse matrices which are computationally very expensive. Example 2. Compute the cost function for the energy data. > #cost function > cost<-function(a,b){ + sum((df$energy-a-b*df$temperature)^2) + } > (cost(5,-1)) [1] > (cost(10,.02)) [1] Since we solved this problem before, we already know what is the best value of a and b. By looking at the coef(fit), we know that a = and b = The cost function for these values is minimum. > cost( , ) [1] The method of gradient descent states that, starting from any point, say a = 3, b = 0 19
Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov
Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationSimple linear regression
Simple linear regression Thomas Lumley BIOST 578C Linear model Linear regression is usually presented in terms of a model Y = α + βx + ɛ ɛ N(0, σ 2 ) because the theoretical analysis is pretty for this
More informationSLR output RLS. Refer to slr (code) on the Lecture Page of the class website.
SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association
More informationChapter 12: Linear regression II
Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More informationL21: Chapter 12: Linear regression
L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample
More informationlm statistics Chris Parrish
lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationPsych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions
Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions Part 1: Conceptual ideas about correlation and regression Tintle 10.1.1 The association would be negative (as distance increases,
More informationCRP 272 Introduction To Regression Analysis
CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous
More informationStatistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz
Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of
More informationVariance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.
10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationApplied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections
Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections 3.4 3.6 by Iain Pardoe 3.4 Model assumptions 2 Regression model assumptions.............................................
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More information22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation
22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 1: Simple Linear Regression Introduction and Estimation Methods for studying the relationship of two or more quantitative
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationCorrelation and simple linear regression S5
Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationLinear Modelling: Simple Regression
Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation
More informationRegression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.
Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationExam 3 Practice Questions Psych , Fall 9
Vocabular Eam 3 Practice Questions Psch 3101-100, Fall 9 Rather than choosing some practice terms at random, I suggest ou go through all the terms in the vocabular lists. The real eam will ask for definitions
More informationMultiple Linear Regression (solutions to exercises)
Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................
More informationRegression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.
Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationLecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016
Lecture 1: Random number generation, permutation test, and the bootstrap August 25, 2016 Statistical simulation 1/21 Statistical simulation (Monte Carlo) is an important part of statistical method research.
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationBusiness Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal
Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationPrincipal components
Principal components Principal components is a general analysis technique that has some application within regression, but has a much wider use as well. Technical Stuff We have yet to define the term covariance,
More informationSTA 218: Statistics for Management
Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Problem How much do people with a bachelor s degree
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationTopic 10 - Linear Regression
Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider
More informationNorth Carolina Offshore Buoy Data. Christopher Nunalee Deirdre Fateiger Tom Meiners
1 North Carolina Offshore Buoy Data Christopher Nunalee Deirdre Fateiger Tom Meiners 2 Table of Contents Executive Summary 3 Description of Data..3 Statistical Analysis..4 Major Findings 7 Discussion 8
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More informationMatrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =
Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationStudy Sheet. December 10, The course PDF has been updated (6/11). Read the new one.
Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More informationGMM - Generalized method of moments
GMM - Generalized method of moments GMM Intuition: Matching moments You want to estimate properties of a data set {x t } T t=1. You assume that x t has a constant mean and variance. x t (µ 0, σ 2 ) Consider
More informationx 21 x 22 x 23 f X 1 X 2 X 3 ε
Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationALGEBRA 2 MIDTERM REVIEW. Simplify and evaluate the expression for the given value of the variable:
ALGEBRA 2 MIDTERM REVIEW Evaluating Expressions: 1.) -3 + 3(-2+ 5) 2 2.) ( -5 ) 2 3.) -5 2 Simplify and evaluate the expression for the given value of the variable: 4.) f(x) = x 2 + x 8; find f(-2) 5.)
More informationbivariate correlation bivariate regression multiple regression
bivariate correlation bivariate regression multiple regression Today Bivariate Correlation Pearson product-moment correlation (r) assesses nature and strength of the linear relationship between two continuous
More informationGeneral Linear Statistical Models - Part III
General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More informationStat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz
Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm
More informationOrdinary Least Squares Regression Explained: Vartanian
Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent
More informationLinear regression and correlation
Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University
More informationCoefficient of Determination
Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationAP Statistics L I N E A R R E G R E S S I O N C H A P 7
AP Statistics 1 L I N E A R R E G R E S S I O N C H A P 7 The object [of statistics] is to discover methods of condensing information concerning large groups of allied facts into brief and compendious
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationMath 2311 Written Homework 6 (Sections )
Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationIntroduction to Linear Regression
Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46
More informationLinear Regression. Furthermore, it is simple.
Linear Regression While linear regression has limited value in the classification problem, it is often very useful in predicting a numerical response, on a linear or ratio scale. Furthermore, it is simple.
More informationLecture 2. Simple linear regression
Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More informationMath M111: Lecture Notes For Chapter 3
Section 3.1: Math M111: Lecture Notes For Chapter 3 Note: Make sure you already printed the graphing papers Plotting Points, Quadrant s signs, x-intercepts and y-intercepts Example 1: Plot the following
More informationQuantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression
Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationDealing with Heteroskedasticity
Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationDraft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM
1 REGRESSION AND CORRELATION As we learned in Chapter 9 ( Bivariate Tables ), the differential access to the Internet is real and persistent. Celeste Campos-Castillo s (015) research confirmed the impact
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationOrdinary Least Squares Regression Explained: Vartanian
Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent
More informationStatistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).
Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results
More informationStat 401B Exam 2 Fall 2015
Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More information