Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Similar documents
Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

1 Inferential Methods for Correlation and Regression Analysis

Correlation Regression

Regression, Inference, and Model Building

6.3 Testing Series With Positive Terms

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Topic 9: Sampling Distributions of Estimators

Properties and Hypothesis Testing

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

(all terms are scalars).the minimization is clearer in sum notation:

Mathematical Notation Math Introduction to Applied Statistics

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Chapter 23: Inferences About Means

Linear Regression Models

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Linear Regression Demystified

Lesson 10: Limits and Continuity

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

11 Correlation and Regression

Paired Data and Linear Correlation

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A string of not-so-obvious statements about correlation in the data. (This refers to the mechanical calculation of correlation in the data.

Castiel, Supernatural, Season 6, Episode 18

1 Lesson 6: Measure of Variation

Output Analysis (2, Chapters 10 &11 Law)

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Power and Type II Error

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.

Stat 200 -Testing Summary Page 1

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Chapter 8: Estimating with Confidence

Comparing your lab results with the others by one-way ANOVA

Analysis of Experimental Data

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Math 113 Exam 3 Practice

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

ECON 3150/4150, Spring term Lecture 3

4.3 Growth Rates of Solutions to Recurrences

Topic 9: Sampling Distributions of Estimators

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Riemann Sums y = f (x)

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Position Time Graphs 12.1

Polynomial Functions and Their Graphs

Final Examination Solutions 17/6/2010

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Data Analysis and Statistical Methods Statistics 651

Simple Linear Regression

CURRICULUM INSPIRATIONS: INNOVATIVE CURRICULUM ONLINE EXPERIENCES: TANTON TIDBITS:

NUMERICAL METHODS FOR SOLVING EQUATIONS

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Efficient GMM LECTURE 12 GMM II

1 Approximating Integrals using Taylor Polynomials

Homework 5 Solutions

Stat 139 Homework 7 Solutions, Fall 2015

18.01 Calculus Jason Starr Fall 2005

Simple Linear Regression

Statistics 20: Final Exam Solutions Summer Session 2007

Algebra of Least Squares

Chapter 10: Power Series

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Dealing with Data and Fitting Empirically

MIND YouR Ps AND Qs ABSTRACT

Fall 2013 MTH431/531 Real analysis Section Notes

Understanding Samples

The Method of Least Squares. To understand least squares fitting of data.

Markscheme May 2015 Calculus Higher level Paper 3

The Pendulum. Purpose

GG313 GEOLOGICAL DATA ANALYSIS

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Chapter 6 Infinite Series

Correlation and Covariance

Lecture 2: Monte Carlo Simulation

Chapter 12 Correlation

Sequences and Series of Functions

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Math 257: Finite difference methods

The Binomial Theorem

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

1 Generating functions for balls in boxes

Lecture 11 Simple Linear Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2.

Transcription:

Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat if we chaged the axes o our graph. - This is NOT true for regressio. B) I regressio, we ofte have a variable that is idepedet, ad aother that depeds o this variable. - For example, temperature (idepedet) ad activity level (depedet) i cold blooded aimals. Activity level obviously depeds o temperature (ad NOT vice-versa). - Ofte we also use this idepedet variable to predict the depedet variable. - We're also iterested i testig for sigificace, though i this case we look at a lie, ot just the relatioship betwee the variables. A) There is a possible relatioship betwee the variables uder cosideratio. This relatioship is modeled usig a equatio. I the simplest case, a equatio for a lie. III. Fittig the lie. 1) This raises the questio: is the lie sigificat? This is oe place where statistics come i. 2) There are actually may differet ways of estimatig the lie, but we'll oly lear the most commo method. A) We ll use what is called the least-squares lie. B) Illustrate. See fig. 12.7, p. 509 [12.7, p. 534] {12.3.7, p. 501}. C) The basic idea is that we add up all the residuals, ad the rotate the lie util the sum of squares for these distaces is miimized. 1) Residual - the vertical distace from a give poit to the lie goig through the poits (at least i regressio).

D) To do this, you eed calculus - (if you've had calculus: you differetiate, set the result to 0, solve, ad the make sure it s actually a miimum) 1) If you do all the calculus, you wid up with our estimates for the equatio of a lie: = b 0 = y x x) 2 where is the itercept ad is the slope. So basically you ll wid up with the equatio of a lie: Y = X where Y is the value of Y predicted for the particular X. ********* 4 th editio silliess: I a strage fit (o oe else really does it this way), the 4th editio does this differetly, ad presets the followig equatios: = r ( s y s x ) = y x it turs out they're the same if you ote the followig: r ( s y s x ) = ( x i (x x) 2 ( y 2 ( y 2 1 (x x) 2 1 = (x i x) 2 (the -1 terms cacel, the the SS y terms cacel, ad you're left with the square root of SS x x SS x i the deomiator, so the expressios are the same) We will use the expressio for ad as i the 2d ad 3rd editios, sice that's the way it's almost always doe.}. ********* Ed 4 th editio silliess. 2) Recap: You calculate ad, the put everythig ito the form for the equatio of a lie.

3) Note: estimates 0 estimates 1 As usual, β 0 ad β 1 are ukow. Sice we do t kow what they are, we estimate them with ad. Side commet: aother way of lookig at thigs: Y i = X i e i we ofte use this because this tells us what each y i is equal to. The e i s are the deviatio or residual betwee our equatio of a lie ad the actual value of y. The earlier equatio gives us a lie, this gives us a exact relatioship betwee x ad y. - Icidetally, miimizig the sum of the squares of the e i s will give us our least squares. - Also ote that this estimates the followig: y i = 0 1 x i i where the ε i is a ukow error term. 4) The ext step is to figure out if β 0 ad β 1 are sigificat, that is, do they mea aythig. As usual, we use our estimates. a) this is similar to what we ve doe previously: we hypothesize some value for β 1 - although we ca do the same thig for β 0, this is't doe too ofte (though we do occasioally get cofidece itervals for β 0. - β 1 is almost always tested for equivalece to 0, sice this idicates o slope => so o effect of x o y. - (Icidetally, if we do test β 0, we re hardly ever iterested i β 0 = 0. But, as metioed, we wo't cover this). IV. Let s illustrate thigs before we get lost. We ll use exercise 12.3, p. 511 [12.3 p. 536] {12.3.1, p. 503}. A) Set-up: a biologist ijects leucie ito frog egg cells. After a give amout of time he measures the amout of leucie that has bee absorbed by the cell proteis. Details? read the descriptio i the text. B) Results: Time 0 10 20 30 40 50 60 Leucie levels.02.25.54.69 1.07 1.50 1.74

C) You kow how to calculate most of this: 1) Sum of squares for X: 2,800 2) Sum of squares for Y: 2.4308 (Do we eed both of these? No - we oly eed the sum of squares for the x s) (Note that the 4 th editio does't give you these sice it's usig a differet way to calculate ad ). 3) Sum of cross products (SS cp ) for i = 1: (0-30)(.02-.83) = 24.3 i = 2: (10-30)(.25-.83) = 11.6 i = 3: (20-30)(.54-.83) = 2.9 i = 4: (30-30)(.69-.83) = 0 i = 5: (40-30)(1.07-.83) = 2.4 i = 6: (50-30)(1.50-.83) = 13.4 i = 7: (60-30)(1.74-.83) = 27.3 Add up everythig to get 81.90 (Agai, ote that the 4 th editio does thigs differetly) 4) The we estimate the slope of our lie: = 81.90 2,800 = 0.02925 5) So ow we have a estimate for the slope. Oce we get this, we get the itercept: = 0.83 0.02925 30 = 0.0475 6) So our equatio is: Y = -0.0475 + 0.02925 X ( or we could say: Y i = -0.0475 + 0.02925 X i + e i ) This gives us a best fit lie through the poits, usig our least squares criterio. 7) Let s plot all this to see what it looks like:

8) Some thigs to ote: a) We do t yet kow if this meas aythig - i.e., is this sigificat??? b) Eve though the slope is oly icreasig gradually, this does t imply o-sigificace (ote the differece i scales)