Dr. Allen Back. Sep. 23, 2016

Similar documents
Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Lecture 16 - Correlation and Regression

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

Linear Regression and Correlation. February 11, 2009

Chapter 6: Exploring Data: Relationships Lesson Plan

Simple Linear Regression for the MPG Data

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

The Simple Regression Model. Part II. The Simple Regression Model

9. Linear Regression and Correlation

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Scatterplots and Correlation

Scatter plot of data from the study. Linear Regression

Chapter 7. Scatterplots, Association, and Correlation

Introduction and Single Predictor Regression. Correlation

A company recorded the commuting distance in miles and number of absences in days for a group of its employees over the course of a year.

Simple Linear Regression

Scatterplots. STAT22000 Autumn 2013 Lecture 4. What to Look in a Scatter Plot? Form of an Association

Scatter plot of data from the study. Linear Regression

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Describing Bivariate Relationships

Covariance and Correlation

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

Weighted Least Squares

5.1 Bivariate Relationships

Lecture 18: Simple Linear Regression

Correlation 1. December 4, HMS, 2017, v1.1

THE PEARSON CORRELATION COEFFICIENT

ECON3150/4150 Spring 2015

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

Chapter 14. Linear least squares

Lecture 11: Simple Linear Regression

Business Statistics. Lecture 9: Simple Regression

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

ECON 450 Development Economics

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Can you tell the relationship between students SAT scores and their college grades?

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Scatterplots and Correlations

Section 3: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression

Correlation Analysis

The Simple Linear Regression Model

Section Least Squares Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Unit 6 - Simple linear regression

Weighted Least Squares

18 Bivariate normal distribution I

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Simple Linear Regression

7.0 Lesson Plan. Regression. Residuals

Second Midterm Exam Name: Solutions March 19, 2014

Lecture notes on Regression & SAS example demonstration

STATISTICS Relationships between variables: Correlation

Unit 6 - Introduction to linear regression

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

The regression model with one stochastic regressor (part II)

Bivariate Relationships Between Variables

Statistical View of Least Squares

Homework 2: Simple Linear Regression

STAT Regression Methods

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Chapter 5 Friday, May 21st

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Y i = η + ɛ i, i = 1,...,n.

Regression Analysis II

11 Correlation and Regression

Sociology 6Z03 Review I

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

MATH 1150 Chapter 2 Notation and Terminology

Bios 6648: Design & conduct of clinical research

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

Simple Linear Regression Analysis

A Significance Test for the Lasso

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

Regression - Modeling a response

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Chapter 16. Simple Linear Regression and dcorrelation

STA 218: Statistics for Management

PS2: Two Variable Statistics

Bivariate Data Summary

A Short Course in Basic Statistics

Looking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Inferences for Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Density Temp vs Ratio. temp

L21: Chapter 12: Linear regression

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Multivariate Statistical Analysis

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Basic Probability Reference Sheet

1 A Review of Correlation and Regression

BNAD 276 Lecture 10 Simple Linear Regression Model

REVIEW 8/2/2017 陈芳华东师大英语系

STAT5044: Regression and Anova. Inyoung Kim

Transcription:

Dr. Allen Back Sep. 23, 2016

Look at All the Data Graphically A Famous Example: The Challenger Tragedy

Look at All the Data Graphically A Famous Example: The Challenger Tragedy Type of Data Looked at the Night Before: Conclusion: Failures at all temperatures so Temp not the issue.

Look at All the Data Graphically A Famous Example: The Challenger Tragedy What Wasn t Looked At: Conclusion: Clear Pattern of Failures More Likely at Low Temp.

Given: paired data (x 1, y 1 ),..., (x n, y n )

Given: paired data (x 1, y 1 ),..., (x n, y n ) Scatterplot: Plot x horizontally, y vertically.

Given: paired data (x 1, y 1 ),..., (x n, y n ) If one variable is potentially explanatory for the response of the other, choose the explanatory variable as x.

Given: paired data (x 1, y 1 ),..., (x n, y n ) is what we care about. If we know the x value of a point, does it tell us something about the likely y value?

Given: paired data (x 1, y 1 ),..., (x n, y n ) is what we care about. If we know the x value of a point, does it tell us something about the likely y value? (a number) and regression (a line) are just techniques to study association.

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: Strength: Form:

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: positive or negative Strength: Form:

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: positive or negative Strength: Form: Negative means as x increases, y generally decreases.

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: Strength: strong, moderate, or weak Form:

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: Strength: Form: linear, curved, or clustered

Example b

Example b My call: Direction: positive Strength: moderate Form: curved

Example b using Data Desk

Example c

Example c My call: Direction: negative Strength: strong Form: linear

Example d

Example d My call: Direction: negative Strength: moderate Form: (perhaps some outliers)

Example d using Data Desk

Example e

Example e My call: Direction: positive Strength: strong Form: linear

Example f

Example f My call: Direction: negative Strength: weak Form: curved, maybe an outlier

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y (Without the s x and s y, this would be the statistics (i.e. data related) version of the covariance of two random variables which was briefly mentioned in chapter 16 of your textbook.)

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y Positive association means as x increases, so does y generally. (And similarly when x decreases.)

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y Positive association means as x increases, so does y generally. (And similarly when x decreases.) So for pos. association, most terms in the sum are either (+) (+) or ( ) ( ).

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ Positive association means as x increases, so does y generally. (And similarly when x decreases.) So for pos. association, most terms in the sum are either (+) (+) or ( ) ( ). Thus with pos. association, r tends to be positive. s y

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y pos r pos. association neg. r neg. association

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y 1 <= r <= 1 (= ±1 only for perfect linear association)

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y 1 <= r <= 1 (= ±1 only for perfect linear association)

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y 1 <= r <= 1 (= ±1 only for perfect linear association) To see = ±1 for perfect linear association y = β 1 x + β 0 means s y = β 1 s x and ȳ = β 1 x + β 0.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ To see = ±1 for perfect linear association y = β 1 x + β 0 means s y = β 1 s x and ȳ = β 1 x + β 0. r = 1 ( ) ( ) n 1 Σ xi x yi ȳ s x s y = 1 ( ) ( ) n 1 Σ xi x xi x β 1 = 1 n 1 Σ s x ( xi x s x ) 2 β 1 β 1 s y β 1 s x = β 1 β 1 where the last line used the definition of the variance.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ If you ve studied vectors, the fact 1 <= r <= 1 comes from the same mathematics which explains why s y cos θ = v w v w has right hand side between 1 and +1.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y r is unchanged if x and y are exchanged.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y invariant under rescaling

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y Curved association and r=0 are consistent!

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ Curved association and r=0 are consistent! Five points along y = x 2. ( x = 0 and ȳ = 2.) s y x y (x-0) (y-2) (x-0)(y-2) 0 0 0-2 0 1 1 1-1 -1-1 1-1 -1 1 2 4 2 2 4-2 4-2 2-4

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ Curved association and r=0 are consistent! Five points along y = x 2. ( x = 0 and ȳ = 2.) s y x y (x-0) (y-2) (x-0)(y-2) 0 0 0-2 0 1 1 1-1 -1-1 1-1 -1 1 2 4 2 2 4-2 4-2 2-4 r is exactly zero even though the association is very strong.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y r is strongly affected by outliers.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y r is strongly affected by outliers.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ This shows the effect of a single outlier. (All 10 points but the last fit y = 2x + 3.) s y x y 0 3 1 5 2 7 3 9...... 7 17 8 19

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ This shows the effect of a single outlier. (All 10 points but the last fit y = 2x + 3.) s y ɛ 9 0 1.00-5.968 5.981-10.853 10.944-15.663-20.455 20.866

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y samples from independent RV s r 0

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ. (e.g. ρ =.99 (1 ρ => 2 ) ρ =.14!)

X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ.

X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ. 1 ρ 2 Or Y = X + ρ Y with the coefficient of Y interpreted as the magnitude of the deviation from perfect association.

X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ. 1 ρ 2 Or Y = X + ρ Y with the coefficient of Y interpreted as the magnitude of the deviation from perfect association. 1 ρ 2 ρ ρ.1 9.95.2 4.90.5 1.73.8.75.9.48.95.33.99.14