Dr. Allen Back Sep. 23, 2016
Look at All the Data Graphically A Famous Example: The Challenger Tragedy
Look at All the Data Graphically A Famous Example: The Challenger Tragedy Type of Data Looked at the Night Before: Conclusion: Failures at all temperatures so Temp not the issue.
Look at All the Data Graphically A Famous Example: The Challenger Tragedy What Wasn t Looked At: Conclusion: Clear Pattern of Failures More Likely at Low Temp.
Given: paired data (x 1, y 1 ),..., (x n, y n )
Given: paired data (x 1, y 1 ),..., (x n, y n ) Scatterplot: Plot x horizontally, y vertically.
Given: paired data (x 1, y 1 ),..., (x n, y n ) If one variable is potentially explanatory for the response of the other, choose the explanatory variable as x.
Given: paired data (x 1, y 1 ),..., (x n, y n ) is what we care about. If we know the x value of a point, does it tell us something about the likely y value?
Given: paired data (x 1, y 1 ),..., (x n, y n ) is what we care about. If we know the x value of a point, does it tell us something about the likely y value? (a number) and regression (a line) are just techniques to study association.
Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: Strength: Form:
Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: positive or negative Strength: Form:
Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: positive or negative Strength: Form: Negative means as x increases, y generally decreases.
Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: Strength: strong, moderate, or weak Form:
Given: paired data (x 1, y 1 ),..., (x n, y n ) Principal Aspects of : Direction: Strength: Form: linear, curved, or clustered
Example b
Example b My call: Direction: positive Strength: moderate Form: curved
Example b using Data Desk
Example c
Example c My call: Direction: negative Strength: strong Form: linear
Example d
Example d My call: Direction: negative Strength: moderate Form: (perhaps some outliers)
Example d using Data Desk
Example e
Example e My call: Direction: positive Strength: strong Form: linear
Example f
Example f My call: Direction: negative Strength: weak Form: curved, maybe an outlier
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y (Without the s x and s y, this would be the statistics (i.e. data related) version of the covariance of two random variables which was briefly mentioned in chapter 16 of your textbook.)
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y Positive association means as x increases, so does y generally. (And similarly when x decreases.)
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y Positive association means as x increases, so does y generally. (And similarly when x decreases.) So for pos. association, most terms in the sum are either (+) (+) or ( ) ( ).
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ Positive association means as x increases, so does y generally. (And similarly when x decreases.) So for pos. association, most terms in the sum are either (+) (+) or ( ) ( ). Thus with pos. association, r tends to be positive. s y
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y pos r pos. association neg. r neg. association
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y 1 <= r <= 1 (= ±1 only for perfect linear association)
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y 1 <= r <= 1 (= ±1 only for perfect linear association)
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y 1 <= r <= 1 (= ±1 only for perfect linear association) To see = ±1 for perfect linear association y = β 1 x + β 0 means s y = β 1 s x and ȳ = β 1 x + β 0.
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ To see = ±1 for perfect linear association y = β 1 x + β 0 means s y = β 1 s x and ȳ = β 1 x + β 0. r = 1 ( ) ( ) n 1 Σ xi x yi ȳ s x s y = 1 ( ) ( ) n 1 Σ xi x xi x β 1 = 1 n 1 Σ s x ( xi x s x ) 2 β 1 β 1 s y β 1 s x = β 1 β 1 where the last line used the definition of the variance.
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ If you ve studied vectors, the fact 1 <= r <= 1 comes from the same mathematics which explains why s y cos θ = v w v w has right hand side between 1 and +1.
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y r is unchanged if x and y are exchanged.
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y invariant under rescaling
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y Curved association and r=0 are consistent!
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ Curved association and r=0 are consistent! Five points along y = x 2. ( x = 0 and ȳ = 2.) s y x y (x-0) (y-2) (x-0)(y-2) 0 0 0-2 0 1 1 1-1 -1-1 1-1 -1 1 2 4 2 2 4-2 4-2 2-4
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ Curved association and r=0 are consistent! Five points along y = x 2. ( x = 0 and ȳ = 2.) s y x y (x-0) (y-2) (x-0)(y-2) 0 0 0-2 0 1 1 1-1 -1-1 1-1 -1 1 2 4 2 2 4-2 4-2 2-4 r is exactly zero even though the association is very strong.
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y r is strongly affected by outliers.
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y r is strongly affected by outliers.
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ This shows the effect of a single outlier. (All 10 points but the last fit y = 2x + 3.) s y x y 0 3 1 5 2 7 3 9...... 7 17 8 19
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ This shows the effect of a single outlier. (All 10 points but the last fit y = 2x + 3.) s y ɛ 9 0 1.00-5.968 5.981-10.853 10.944-15.663-20.455 20.866
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y samples from independent RV s r 0
Properties r = 1 ( n 1 Σ xi x s x ) ( ) yi ȳ s y X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ. (e.g. ρ =.99 (1 ρ => 2 ) ρ =.14!)
X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ.
X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ. 1 ρ 2 Or Y = X + ρ Y with the coefficient of Y interpreted as the magnitude of the deviation from perfect association.
X,Y indep std normal RV s ; set Y = ρx + 1 ρ 2 Y. Then (X, Y ) will tend to generate data with r ρ. 1 ρ 2 Or Y = X + ρ Y with the coefficient of Y interpreted as the magnitude of the deviation from perfect association. 1 ρ 2 ρ ρ.1 9.95.2 4.90.5 1.73.8.75.9.48.95.33.99.14