Introduction to Statistical Data Analysis Lecture 8: Correlation and Simple Regression
|
|
- Marshall Watts
- 5 years ago
- Views:
Transcription
1 Introduction to Statistical Data Analysis Lecture 8: and James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1 / 40
2 Introduction In the previous lecture, we learned how to determine whether two random variables were statistically dependent on one another, using the chi-square goodness-of-fit test. However, that test alone does not give us any indication of how the variables are related. In this lecture, we will learn how to use correlation and regression to gain some insight into the nature of the relationship between two variables. James V. Lambers Statistical Data Analysis 2 / 40
3 Independent and Dependent Variables In the following discussion, we classify one of the variables, x, as the independent variable, and the other variable, y, as the dependent variable. This means that x serves as the input and y serves as the output. Mathematically, y is a function of x, meaning that y is determined from x in some systematic way. Therefore, for each value of x, there is only one value of y, whereas one value of y can correspond to more than one value of x. James V. Lambers Statistical Data Analysis 3 / 40
4 Coefficient Testing the Significance of the Coefficient measures the strength and direction of the relationship between x and y. Types of correlation are: positive linear correlation, which means that as x increases, y increases linearly, negative linear correlation, which means that as x increases, y decreases linearly, nonlinear correlation, which means that there is a clear relationship between x and y, but the dependence of y on x cannot be described graphically using a straight line, and no correlation, which means that there is no clear relationship between x and y. In the remainder of this discussion, we will limit ourselves to linear correlation. James V. Lambers Statistical Data Analysis 4 / 40
5 Coefficient Testing the Significance of the Coefficient Coefficient To determine the correlation between two variables x and y, for which we have n observations of each, we compute the correlation coefficient, which is defined by ( n n ) ( n ) n x i y i x i y i i=1 i=1 i=1 r = ( n n ) 2 ( n n ). 2 n x i n y i i=1 x 2 i i=1 Geometrically, r is the cosine of the angle between the vector of x-values and the vector of y-values, with their respective means subtracted. It follows from this interpretation that r 1. i=1 y 2 i i=1 James V. Lambers Statistical Data Analysis 5 / 40
6 Coefficient Testing the Significance of the Coefficient Interpretation If r > 0, then x and y have a positive linear correlation, whereas if r < 0, then x and y have a negative linear correlation. If r = 0, then there is no correlation between x and y. In the extreme cases, r = ±1, we have y = cx for some constant c that is positive (r = 1) or negative (r = 1). The benefit of knowing whether two variables are linearly correlated is that we can, at least approximately, predict values of the dependent variable y from values of the independent variable x. Of course, the accuracy of this prediction depends on r ; if r is nearly zero, such a prediction is not likely to be reliable. James V. Lambers Statistical Data Analysis 6 / 40
7 Coefficient Testing the Significance of the Coefficient Testing the Significance of r Suppose we have determined that x and y are linearly correlated, based on the value of the correlation coefficient r obtained from a sample. How do we know whether a similar correlation applies to the entire population? We can answer this question by performing a hypothesis test on the population correlation coefficient, which we denote by p. If we only wish to test whether p is nonzero, then we can use a two-tail test, with null hypothesis H 0 : p = 0 and alternative hypothesis H 1 : p 0. On the other hand, if we wish to test for a positive linear correlation, we can perform a one-tail test with null hypothesis H 0 : p 0 and alternative hypothesis H 1 : p > 0; testing for a negative linear correlation is similar. James V. Lambers Statistical Data Analysis 7 / 40
8 Coefficient Testing the Significance of the Coefficient Performing the Test For this test, we use the Student t-distribution. The test statistic is t = r 1 r 2 n 2 where, as before, n is the sample size for each variable, d.f. = n 2 is the number of degrees of freedom, and (1 r 2 )/(n 2) is the standard error of the correlation coefficient. For the one-tail test with H 0 : p 0, we reject H 0 and conclude that x and y have a positive linear correlation if t > t α. For the two-tail test with H 0 : p = 0, we reject H 0 and conclude that x and y are linearly correlated if t > t α/2., James V. Lambers Statistical Data Analysis 8 / 40
9 Coefficient Testing the Significance of the Coefficient vs. Causation Always keep in mind: correlation does not imply causation! Meaning: it often occurs that variables exhibit a correlation with one another even though there is no influence whatsoever Even if there is a causal relationship, it s not always clear which is the cause and which is the effect! James V. Lambers Statistical Data Analysis 9 / 40
10 Coefficient Testing the Significance of the Coefficient Reverse Causality Case in point: the effect of Course Signals on student retention at Purdue University Purdue developed Course Signals to use analytics to alert faculty and staff to potential problems for students Purdue claimed that when students took at least two courses that used Course Signals, retention improved by 21%! This conclusion was supported by appropriate data, so what could be the problem? James V. Lambers Statistical Data Analysis 10 / 40
11 Coefficient Testing the Significance of the Coefficient Look for Anomalies! It was observed from the data that taking two Course Signals courses greatly improved retention, whereas taking only one did not help at all Also, an initial bump in retention rate quickly faded after Course Signals had been in use for a few years What the data was really showing was that students were taking more Course Signals courses because they were taking more courses overall (that is, they did not control for freshmen dropping out early) In other words, it was retention that led to increased use of Course Signals, not the other way around! Reference: What the Course Signals Kerfuffle is About, and What it Means to You by Michael Caulfield, posted at educause.edu James V. Lambers Statistical Data Analysis 11 / 40
12 Coefficient Testing the Significance of the Coefficient Causal Inference Given that two variables are correlated, the ideal approach to establishing causation is to understand the mechanism by which it acts Failing that, another approach, if less effective, is to perform a controlled intervention study Establishing causation based solely on observations is much less reliable, but more broadly applicable In fact, this is impossible without making assumptions about the data Reference: Max Planck Institute James V. Lambers Statistical Data Analysis 12 / 40
13 Coefficient Testing the Significance of the Coefficient Inferring Causation via Probability Reichenbach s theory of causation: C is a cause of E if and only if P(E C) > P(E C ), and There is no event B such that P(E B C) = P(E B) (that is, B does not screen off C from E) Equivalently, there is no event B such that C and E are independent given B This theory has several shortcomings, that are somewhat rectified by Cartwright and Skyrms using background contexts (other causes of E that are controlled) in place of screening-off events Eells further refined this theory to define positive and negative causation probabilistically James V. Lambers Statistical Data Analysis 13 / 40
14 Coefficient Testing the Significance of the Coefficient Proof Claim: if B screens off C from E, then C and E are independent, given B Equivalently: If P(E B C) = P(E B), then P(C E B) = P(C B)P(E B) Proof: by conditional probability and the multiplication rule, P(E B C) = P(E B C)/P(B C) = P(E B C)/[P(C B)P(B)] But P(E B C) = P(C E B)P(B) Therefore P(C E B)P(B) = P(C B)P(B)P(E B) James V. Lambers Statistical Data Analysis 14 / 40
15 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions If x and y are found to be linearly correlated, then we can use simple regression to find the straight line that best fits the ordered pairs (x i, y i ), i = 1, 2,..., n. The equation of this line is ŷ = a + bx, where ŷ is the predicted value of y obtained from x. The y-intercept a and slope b need to be determined. James V. Lambers Statistical Data Analysis 15 / 40
16 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions The Least Squares Method To find the values of a and b such that the line ŷ = a + bx best fits the sample data, we use the least squares method. In this method, we compute a and b so as to minimize n (y i ŷ i ) 2 = i=1 n (y i a bx i ) 2. The name of the method comes from the fact that we are trying to minimize a sum of squares, of the deviations between y and ŷ. The line ŷ = a + bx that minimizes this sum of squares, and therefore best fits the data, is called the regression line. i=1 James V. Lambers Statistical Data Analysis 16 / 40
17 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Solving the Least Squares Problem The criterion of minimizing the sum of squares is chosen because it is differentiable, and is therefore suitable for minimization techniques from calculus. The minimizing coefficients are ( n n ) ( n ) n x i y i x i y i i=1 i=1 i=1 b = ( n n ) 2, n x i a = ȳ b x, i=1 where x and ȳ are the sample means x = n i=1 x i, ȳ = n i=1 y i. x 2 i i=1 James V. Lambers Statistical Data Analysis 17 / 40
18 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Discussion It should be noted that b is closely related to the correlation coefficient r; the formulas have the same numerator. It follows that the slope is positive if and only if the correlation coefficient indicates that x and y have a positive linear correlation. In R, the least squares method is implemented in the function lsfit. Its simplest usage is to specify two arguments, which are vectors consisting of the x- and y-values, respectively. It returns a data structure called a named list, which includes the coefficients a and b of the regression line. James V. Lambers Statistical Data Analysis 18 / 40
19 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Example The following code illustrates the use of lsfit, including extraction of the y-intercept a and slope b. Then, both the data points and regression line are plotted. > x=c(1:10) > y=c(8,6,10,6,10,13,9,11,15,17) > lslist=lsfit(x,y) > coefs=lslist[["coefficients"]] > coefs Intercept X James V. Lambers Statistical Data Analysis 19 / 40
20 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Extracting the Coefficients > a=coefs[["intercept"]] > b=coefs[["x"]] > a [1] > b [1] > plot(x,y) > abline(a,b) James V. Lambers Statistical Data Analysis 20 / 40
21 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Code Dissection The first two statements specify vectors of x- and y-values; the x-values are the integers 1 through 10, specified concisely using the colon operator. Note the use of double square brackets to extract elements of a named list; the names of elements of a list returned by a built-in R function are listed in the documentation. The element coefs extracted from lsfit is itself a named list, the elements of which are the y-intercept a and slope b. The plot command plots the individual data points, and abline adds a line to the current plot, with the first argument specifying the y-intercept and the second argument specifying the slope. James V. Lambers Statistical Data Analysis 21 / 40
22 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Plot of Regression Line It is merely coincidence that in this example, the regression line happens to pass through one of the points; in general this does not happen, as the goal of the least squares method is to minimize the distance between all of the predicted y-values and observed y-values. James V. Lambers Statistical Data Analysis 22 / 40
23 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Confidence Interval for the Regression Line To measure how well the regression line fits the data, we can construct a confidence interval. We use the standard error of the estimate, n n n n (y i ŷ i ) 2 yi 2 a y i b x i y i i=1 i=1 i=1 i=1 s e = =, n 2 n 2 which measures the amount of dispersion of the observations around regression line. The smaller s e is, the closer the points are to the regression line. It is worth noting the similarity between this formula and the sample standard deviation; the number of degrees of freedom is n 2 since two degrees of freedom are taken away by the coefficients a and b of the regression line. James V. Lambers Statistical Data Analysis 23 / 40
24 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Testing the Slope of the Regression Line We need to determine whether the slope b of the regression line is indicative of the slope β for the population. To that end, we can perform a hypothesis test. For example, we can use the null hypothesis H 0 : β = β 0 and H 1 : β β 0 for a two-tail test. If β 0 = 0, then we are testing whether there is any linear relationship between x and y, and rejection of H 0 would imply that this is the case. James V. Lambers Statistical Data Analysis 24 / 40
25 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Standard Error of the Slope The standard error of slope is s b = s e n i=1 x 2 i n x 2, where s e is the standard error of the estimate, defined earlier. Note that s b is the standard deviation in the y-values divided by n times the standard deviation of the x-values, which intuitively makes sense because we are testing the slope, which is the change in y divided by the change in x. James V. Lambers Statistical Data Analysis 25 / 40
26 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Test Statistic As with the test of the correlation coefficient, we use the Student s t-distribution to determine the critical value. The test statistic is t = b β 0 s b. This is compared to the critical value t α/2,n 2, the t-value satisfying P( T n 2 > t α/2,n 2 ) = α/2. If t > t α/2,n 2, then we reject H 0 and conclude β β 0. If β 0 = 0, then our conclusion is that x and y are linearly correlated. James V. Lambers Statistical Data Analysis 26 / 40
27 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Interpretation It is important to keep in mind that correlation does not imply causation. That is, even if there is a strong correlation between x and y, that does not necessarily mean that a change in y is caused by a change in x. It could be mere coincidence, or that some other variable influences both x and y in a similar way. James V. Lambers Statistical Data Analysis 27 / 40
28 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions The Coefficient of Determination The strength of the relationship between x and y can be measured by the coefficient of determination, which is defined to be r 2, where r is the correlation coefficient. More precisely, the coefficient of determination measures the percentage of the variation in y that can be explained by the regression line. James V. Lambers Statistical Data Analysis 28 / 40
29 The Least Squares Method Confidence Interval for the Regression Line Testing the Slope of the Regression Line The Coefficient of Determination Assumptions Assumptions For the least squares method to be valid, we need to make the following assumptions: Individual differences between y i and ŷ i, i = 1, 2,..., n, are independent of one another. The observed values of y are normally distributed around ŷ. The variation of y around the regression line is equal for all values of x. James V. Lambers Statistical Data Analysis 29 / 40
30 Polynomial Regression Multiple Linear Regression Exponential Regression Polynomial Regression In linear regression, we are trying to find constants a and b such that the function y = a + bx best fits the data (x i, y i ), i = 1, 2,..., n, in least-squares sense. The method of least squares can readily be generalized to the problem of finding constants a 0, a 2,..., a m such that the function y = c 0 + c 1 x + c 2 x c m x m, a polynomial of degree m, best fits the data. James V. Lambers Statistical Data Analysis 30 / 40
31 Polynomial Regression Multiple Linear Regression Exponential Regression System Set-up We define the n (m + 1) matrix 1 x 1 x1 2 x1 m 1 x 2 x2 2 x2 m A =.. 1 x n xn 2 xn m, and the vectors c = c 0 c 1. c m A is known as a Vandermonde matrix., y = y 1 y 2. y n. James V. Lambers Statistical Data Analysis 31 / 40
32 Polynomial Regression Multiple Linear Regression Exponential Regression The Normal Equations Then, by solving the normal equations A T Ac = A T y, we obtain the coefficients of the best-fitting polynomial of degree m. Note that A T is the transpose of A, which is obtained by changing rows into columns; that is, (A T ) ij = a ji. James V. Lambers Statistical Data Analysis 32 / 40
33 Polynomial Regression Multiple Linear Regression Exponential Regression Example The following R statements construct data vectors x and y, and then call the function lm (short for linear model ) to obtain > x=c(0.6291,0.2956,0.6170,0.9885,0.3440,0.2396,0.0004,... > y=c(0.7487,0.6169,0.1834,0.8436,0.7160,0.6518,0.6128,... > lm(y poly(x,2,raw=true)) Call: lm(formula = y poly(x, 2, raw = TRUE)) Coefficients: (Intercept) poly(x,2,raw=true)1 poly(x,2,raw=true) That is, the quadratic function that best fits the data is y = x x James V. Lambers Statistical Data Analysis 33 / 40
34 Polynomial Regression Multiple Linear Regression Exponential Regression Code Dissection The expression y poly(x,2,raw=true) specifies that y is to be treated as a quadratic function of x. That is, the second argument to poly is the degree. The third argument to poly, raw=true, specifies that the monomial basis 1, x, x 2,... is to be used, instead of the default behavior of poly, which is to use orthogonal polynomials. This is done in order to facilitate interpretation of the coefficients returned by lm. James V. Lambers Statistical Data Analysis 34 / 40
35 Polynomial Regression Multiple Linear Regression Exponential Regression Multiple Linear Regression A similar approach can be used for multiple linear regression, in which we seek a model of the form y = c 0 + c 1 x 1 + c 2 x c m x m. Let x ij be the ith observation of x j. We define the matrix A by 1 x 11 x 12 x 1m 1 x 21 x 22 x 2m A = x n1 x n2 x nm Then, we solve the normal equations A T Ac = A T y to obtain the coefficients c 0, c 1,..., c m. James V. Lambers Statistical Data Analysis 35 / 40
36 Polynomial Regression Multiple Linear Regression Exponential Regression Example Suppose that we have a set of n observations (x i1, x i2, y i ), i = 1, 2,..., n, and seek the coefficients c 0, c 1, c 2 so that the model y = c 0 + c 1 x 1 + c 2 x 2 best fits the data in the least-squares sense. James V. Lambers Statistical Data Analysis 36 / 40
37 Polynomial Regression Multiple Linear Regression Exponential Regression Getting the Job Done in R The following R statements obtain these coefficients. > x1=c(0.4092,0.9977,0.6238,0.3532,0.1827,0.3209,.. > x2=c(0.9525,0.8742,0.1622,0.1467,0.6498,0.7901,... > y=c(0.2549,0.9122,0.3675,0.0380,0.6508,0.8164,... > lm(y x1+x2) Call: lm(formula = y x1 + x2) Coefficients: (Intercept) x1 x That is, c 0 = , c 1 = , and c 2 = James V. Lambers Statistical Data Analysis 37 / 40
38 Polynomial Regression Multiple Linear Regression Exponential Regression Exponential Regression The least squares method can also be used for models of the form y = be ax, where a and b are coefficients that are to be determined. Taking the natural logarithm of both sides yields ln y = ln b + ax, so we can apply the method of least squares to the model z = c + ax, where z = ln y and c = ln b, and then compute b = e c. James V. Lambers Statistical Data Analysis 38 / 40
39 Polynomial Regression Multiple Linear Regression Exponential Regression Maximum Likelihood Let x 1, x 2,..., x n be a sample of n i.i.d (independent and identically distributed) observations, coming from an unknown distribution with probability distribution function of the form f (x, θ) The method of maximum likelihood is used to obtain an estimate ˆθ of the unknown parameter θ Because the observations are independent, we have f (x 1 x 2 x n θ) = f (x 1 θ)f (x 2 θ) f (x n θ) The maximum likelihood estimator (MLE) is the value of ˆθ that maximizes the average log-likelihood ˆl = 1 n n ln f (x i θ) i=1 James V. Lambers Statistical Data Analysis 39 / 40
40 Polynomial Regression Multiple Linear Regression Exponential Regression Example Let the n observations be coin flips of an unfair coin, and let h be the number of heads. These flips follow a binomial distribution ( ) n f (X = h θ) = θ h (1 θ) n h h with unknown probability of success θ The MLE ˆθ maximizes ( 1 n n ln h ) θ h (1 θ) n h = 1 n [ ( n ln h which, through calculus, is maximized at ˆθ = h/n ) ] + h ln θ + (n h) ln(1 θ) James V. Lambers Statistical Data Analysis 40 / 40
Maximum-Likelihood Estimation: Basic Ideas
Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators
More informationIntroduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution
Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationIntroduction to Statistical Data Analysis Lecture 5: Confidence Intervals
Introduction to Statistical Data Analysis Lecture 5: Confidence Intervals James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationA Library of Functions
LibraryofFunctions.nb 1 A Library of Functions Any study of calculus must start with the study of functions. Functions are fundamental to mathematics. In its everyday use the word function conveys to us
More informationAlgebraic. techniques1
techniques Algebraic An electrician, a bank worker, a plumber and so on all have tools of their trade. Without these tools, and a good working knowledge of how to use them, it would be impossible for them
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationMathematical Statistics
Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics
More informationLecture 3: Multiple Regression
Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u
More informationAlgebra 2 and Mathematics 3 Critical Areas of Focus
Critical Areas of Focus Ohio s Learning Standards for Mathematics include descriptions of the Conceptual Categories. These descriptions have been used to develop critical areas for each of the courses
More informationTopics Covered in Math 115
Topics Covered in Math 115 Basic Concepts Integer Exponents Use bases and exponents. Evaluate exponential expressions. Apply the product, quotient, and power rules. Polynomial Expressions Perform addition
More informationWA State Common Core Standards - Mathematics
Number & Quantity The Real Number System Extend the properties of exponents to rational exponents. 1. Explain how the definition of the meaning of rational exponents follows from extending the properties
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More information8.7 MacLaurin Polynomials
8.7 maclaurin polynomials 67 8.7 MacLaurin Polynomials In this chapter you have learned to find antiderivatives of a wide variety of elementary functions, but many more such functions fail to have an antiderivative
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationMathematics Standards for High School Algebra II
Mathematics Standards for High School Algebra II Algebra II is a course required for graduation and is aligned with the College and Career Ready Standards for Mathematics in High School. Throughout the
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationCurriculum Scope & Sequence. Unit Duration Common Core Standards / Unit Goals Transfer Goal(s) Enduring Understandings 16 Days Units:
BOE APPROVED 2/14/12 REVISED 9/25/12 Curriculum Scope & Sequence Subject/Grade Level: MATHEMATICS/HIGH SCHOOL Course: ALGEBRA II CP/HONORS *The goals and standards addressed are the same for both levels
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS 1a. Under the null hypothesis X has the binomial (100,.5) distribution with E(X) = 50 and SE(X) = 5. So P ( X 50 > 10) is (approximately) two tails
More informationChapter 9. Correlation and Regression
Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationProbability Distributions
CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas
More informationLOOKING FOR RELATIONSHIPS
LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an
More informationGeneral Least Squares Fitting
Appendix B General Least Squares Fitting B.1 Introduction Previously you have done curve fitting in two dimensions. Now you will learn how to extend that to multiple dimensions. B.1.1 Linearizable Non-linear
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationStandards-Based Learning Power Standards. High School- Algebra
Standards-Based Learning Power Standards Mathematics Algebra 3,4 The high school standards specify the mathematics that all students should study in order to be college and career ready. High School Number
More informationMathematics - High School Algebra II
Mathematics - High School Algebra II All West Virginia teachers are responsible for classroom instruction that integrates content standards and mathematical habits of mind. Students in this course will
More informationEM375 STATISTICS AND MEASUREMENT UNCERTAINTY CORRELATION OF EXPERIMENTAL DATA
EM375 STATISTICS AND MEASUREMENT UNCERTAINTY CORRELATION OF EXPERIMENTAL DATA In this unit of the course we use statistical methods to look for trends in data. Often experiments are conducted by having
More informationKDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V
KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION Unit : I - V Unit I: Syllabus Probability and its types Theorems on Probability Law Decision Theory Decision Environment Decision Process Decision tree
More informationSTAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015
STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationWest Windsor-Plainsboro Regional School District Advanced Algebra II Grades 10-12
West Windsor-Plainsboro Regional School District Advanced Algebra II Grades 10-12 Page 1 of 23 Unit 1: Linear Equations & Functions (Chapter 2) Content Area: Mathematics Course & Grade Level: Advanced
More informationPOLI 443 Applied Political Research
POLI 443 Applied Political Research Session 4 Tests of Hypotheses The Normal Curve Lecturer: Prof. A. Essuman-Johnson, Dept. of Political Science Contact Information: aessuman-johnson@ug.edu.gh College
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationFundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved
Fundamentals of Linear Algebra Marcel B. Finan Arkansas Tech University c All Rights Reserved 2 PREFACE Linear algebra has evolved as a branch of mathematics with wide range of applications to the natural
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationCumberland County Schools
Cumberland County Schools MATHEMATICS Algebra II The high school mathematics curriculum is designed to develop deep understanding of foundational math ideas. In order to allow time for such understanding,
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationAlgebra 1 Khan Academy Video Correlations By SpringBoard Activity and Learning Target
Algebra 1 Khan Academy Video Correlations By SpringBoard Activity and Learning Target SB Activity Activity 1 Investigating Patterns 1-1 Learning Targets: Identify patterns in data. Use tables, graphs,
More informationJim Lambers MAT 419/519 Summer Session Lecture 13 Notes
Jim Lambers MAT 419/519 Summer Session 2011-12 Lecture 13 Notes These notes correspond to Section 4.1 in the text. Least Squares Fit One of the most fundamental problems in science and engineering is data
More informationQuestions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.
Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized
More informationWHCSD Grade Content Area
Course Overview and Timing This section is to help you see the flow of the unit/topics across the entire school year. Quarter Unit Description Unit Length Early First Quarter Unit 1: Investigations and
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationAlgebra 2 Standards. Essential Standards:
Benchmark 1: Essential Standards: 1. Alg2.M.F.LE.A.02 (linear): I can create linear functions if provided either a graph, relationship description or input-output tables. - 15 Days 2. Alg2.M.A.APR.B.02a
More informationAlgebra II/Math III Curriculum Map
6 weeks Unit Unit Focus Common Core Math Standards 1 Simplify and perform operations with one variable involving rational, exponential and quadratic functions. 2 Graph and evaluate functions to solve problems.
More informationMATH III CCR MATH STANDARDS
INFERENCES AND CONCLUSIONS FROM DATA M.3HS.1 Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that there are data sets
More informationReview for Final Exam Stat 205: Statistics for the Life Sciences
Review for Final Exam Stat 205: Statistics for the Life Sciences Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 205: Statistics for the Life Sciences 1 / 20 Overview of Final Exam
More informationPhysical Chemistry - Problem Drill 02: Math Review for Physical Chemistry
Physical Chemistry - Problem Drill 02: Math Review for Physical Chemistry No. 1 of 10 1. The Common Logarithm is based on the powers of 10. Solve the logarithmic equation: log(x+2) log(x-1) = 1 (A) 1 (B)
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationCurriculum Scope & Sequence. Subject/Grade Level: MATHEMATICS/HIGH SCHOOL Course: ALGEBRA 2
BOE APPROVED 2/14/12 Curriculum Scope & Sequence Subject/Grade Level: MATHEMATICS/HIGH SCHOOL Course: ALGEBRA 2 Unit Duration Common Core Standards / Unit Goals Transfer Goal(s) Enduring 12 Days Units:
More informationALGEBRA INTEGRATED WITH GEOMETRY II CURRICULUM MAP
2013-2014 MATHEMATICS ALGEBRA INTEGRATED WITH GEOMETRY II CURRICULUM MAP Department of Curriculum and Instruction RCCSD The Real Number System Common Core Major Emphasis Clusters Extend the properties
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationCollege Algebra Through Problem Solving (2018 Edition)
City University of New York (CUNY) CUNY Academic Works Open Educational Resources Queensborough Community College Winter 1-25-2018 College Algebra Through Problem Solving (2018 Edition) Danielle Cifone
More informationGROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION
FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89
More informationLecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000
Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination
More informationLecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)
Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination
More informationAchieve Recommended Pathway: Algebra II
Units Standard Clusters Mathematical Practice Standards Perform arithmetic operations with complex numbers. Use complex numbers in polynomial identities and equations. Interpret the structure of expressions.
More information3.3 Real Zeros of Polynomial Functions
71_00.qxp 12/27/06 1:25 PM Page 276 276 Chapter Polynomial and Rational Functions. Real Zeros of Polynomial Functions Long Division of Polynomials Consider the graph of f x 6x 19x 2 16x 4. Notice in Figure.2
More informationLecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012
Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationStatistics 135 Fall 2008 Final Exam
Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations
More informationJust Enough Likelihood
Just Enough Likelihood Alan R. Rogers September 2, 2013 1. Introduction Statisticians have developed several methods for comparing hypotheses and for estimating parameters from data. Of these, the method
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationGreene, Econometric Analysis (6th ed, 2008)
EC771: Econometrics, Spring 2010 Greene, Econometric Analysis (6th ed, 2008) Chapter 17: Maximum Likelihood Estimation The preferred estimator in a wide variety of econometric settings is that derived
More informationReview. December 4 th, Review
December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter
More informationYou can compute the maximum likelihood estimate for the correlation
Stat 50 Solutions Comments on Assignment Spring 005. (a) _ 37.6 X = 6.5 5.8 97.84 Σ = 9.70 4.9 9.70 75.05 7.80 4.9 7.80 4.96 (b) 08.7 0 S = Σ = 03 9 6.58 03 305.6 30.89 6.58 30.89 5.5 (c) You can compute
More informationIntroduction to Statistical Data Analysis Lecture 4: Sampling
Introduction to Statistical Data Analysis Lecture 4: Sampling James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1 / 30 Introduction
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationSchool District of Marshfield Course Syllabus
School District of Marshfield Course Syllabus Course Name: Algebra II Length of Course: 1 Year Credit: 1 Program Goal: The School District of Marshfield Mathematics Program will prepare students for college
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationMath 101 Study Session Spring 2016 Test 4 Chapter 10, Chapter 11 Chapter 12 Section 1, and Chapter 12 Section 2
Math 101 Study Session Spring 2016 Test 4 Chapter 10, Chapter 11 Chapter 12 Section 1, and Chapter 12 Section 2 April 11, 2016 Chapter 10 Section 1: Addition and Subtraction of Polynomials A monomial is
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More information