Dr. Allen Back. Sep. 23, PDF Free Download

Dr. Allen Back Sep. 23, 2016

Look at All the Data Graphically A Famous Example: The Challenger Tragedy

Look at All the Data Graphically A Famous Example: The Challenger Tragedy Type of Data Looked at the Night Before: Conclusion: Failures at all temperatures so Temp not the issue.

Look at All the Data Graphically A Famous Example: The Challenger Tragedy What Wasn t Looked At: Conclusion: Clear Pattern of Failures More Likely at Low Temp.

Given: paired data (x 1, y 1 ),..., (x n, y n )

Given: paired data (x 1, y 1 ),..., (x n, y n ) Scatterplot: Plot x horizontally, y vertically.

Given: paired data (x 1, y 1 ),..., (x n, y n ) If one variable is potentially explanatory for the response of the other, choose the explanatory variable as x.

Given: paired data (x 1, y 1 ),..., (x n, y n ) is what we care about. If we know the x value of a point, does it tell us something about the likely y value?

Given: paired data (x 1, y 1 ),..., (x n, y n ) is what we care about. If we know the x value of a point, does it tell us something about the likely y value? (a number) and regression (a line) are just techniques to study association.

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: Strength: Form:

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: positive or negative Strength: Form:

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: positive or negative Strength: Form: Negative means as x increases, y generally decreases.

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: Strength: strong, moderate, or weak Form:

Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: Strength: Form: linear, curved, or clustered

Example b

Example b My call: Direction: positive Strength: moderate Form: curved

Example b using Data Desk

Example c

Example c My call: Direction: negative Strength: strong Form: linear

Example d

Example d My call: Direction: negative Strength: moderate Form: (perhaps some outliers)

Example d using Data Desk

Example e

Example e My call: Direction: positive Strength: strong Form: linear

Example f

Example f My call: Direction: negative Strength: weak Form: curved, maybe an outlier

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y (Without the s x and s y, this would be the statistics (i.e. data related) version of the covariance of two random variables which was briefly mentioned in chapter 16 of your textbook.)

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y Positive association means as x increases, so does y generally. (And similarly when x decreases.)

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y Positive association means as x increases, so does y generally. (And similarly when x decreases.) So for pos. association, most terms in the sum are either (+) (+) or ( ) ( ).

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ Positive association means as x increases, so does y generally. (And similarly when x decreases.) So for pos. association, most terms in the sum are either (+) (+) or ( ) ( ). Thus with pos. association, r tends to be positive. s y

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y pos r pos. association neg. r neg. association

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y 1 <= r <= 1 (= ±1 only for perfect linear association)

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y 1 <= r <= 1 (= ±1 only for perfect linear association) To see = ±1 for perfect linear association y = β 1 x + β 0 means s y = β 1 s x and ȳ = β 1 x + β 0.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ To see = ±1 for perfect linear association y = β 1 x + β 0 means s y = β 1 s x and ȳ = β 1 x + β 0. r = 1 ( ) ( ) n 1 Σ xi x yi ȳ s x s y = 1 ( ) ( ) n 1 Σ xi x xi x β 1 = 1 n 1 Σ s x ( xi x s x ) 2 β 1 β 1 s y β 1 s x = β 1 β 1 where the last line used the definition of the variance.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ If you ve studied vectors, the fact 1 <= r <= 1 comes from the same mathematics which explains why s y cos θ = v w v w has right hand side between 1 and +1.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y r is unchanged if x and y are exchanged.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y invariant under rescaling

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y Curved association and r=0 are consistent!

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ Curved association and r=0 are consistent! Five points along y = x 2. ( x = 0 and ȳ = 2.) s y x y (x-0) (y-2) (x-0)(y-2) 0 0 0-2 0 1 1 1-1 -1-1 1-1 -1 1 2 4 2 2 4-2 4-2 2-4

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y r is strongly affected by outliers.

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ This shows the effect of a single outlier. (All 10 points but the last fit y = 2x + 3.) s y x y 0 3 1 5 2 7 3 9...... 7 17 8 19

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ This shows the effect of a single outlier. (All 10 points but the last fit y = 2x + 3.) s y ɛ 9 0 1.00-5.968 5.981-10.853 10.944-15.663-20.455 20.866

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y samples from independent RV s r 0

Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ. (e.g. ρ =.99 (1 ρ => 2 ) ρ =.14!)

X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ.

X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ. 1 ρ 2 Or Y = X + ρ Y with the coefficient of Y interpreted as the magnitude of the deviation from perfect association.

Dr. Allen Back. Sep. 23, 2016