STAB22 Statistics I Lecture 7 1
Example Newborn babies weight follows Normal distr. w/ mean 3500 grams & SD 500 grams. A baby is defined as high birth weight if it is in the top 2% of birth weights. What weight would make a baby high birth weight? 2
Checking Normality Normal density is theoretical model; when should we use it to describe real data? Can check histogram Look for bell-shape (unimodal & symmetric) A better check is Normal Probability plot 3
Normal Probability Plot Plot data values against their theoretical Normal z-scores (a.k.a. Normal quantile plot) StatCrunch: Graphics > QQplot If points lie close to straight line data welldescribed by Normal 4
Example (non-normal plots) Right skewed distr. Left skewed distr. convex plot (U-shaped) concave plot ( -shaped) 5
Relationship Between Two Quantitative Variables Consider following student data Quantitative variables Weight (in kg) Height (in cm) Name Weight Height Aubrey 77 188 Ron 75 173 Carl 70 178 What is relationship between weight & height? First step is to examine relationship visually using a scatterplot 6
Scatterplot Variables measured along horizontal (y-) and vertical (x-) axis; each dot presents combination of corresponding individual s values (Height=170, Weight=61) StatCrunch: 7 Graphics > Scatterplot
Role of Variables Usually there is a variable of interest, called response / dependent variable, and a variable whose effect on the response we want to examine, called explanatoty / independent Response goes on vertical axis (a.k.a. y-variable) and Explanatory goes on horizontal axis (a.k.a. x-variable) E.g. Want to study whether Blood Pressure increases with Age; how would you classify the variables? Response variable: Explanatory variable: 8
Types of Relationships Overall pattern of scatterplot describes form, direction & strength of relationship Form: Linear relationship Non-linear relationship 40 45 50-10 -5 0 5 8.5 9.0 9.5 10.5 11.5 8.5 9.0 9.5 10.5 11.5 9
Types of Relationships Direction: Positive relationship Negative relationship 0 10 20 30 40 50 0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Y-var. increases with X-var. and vice-versa Y-var. decreases with X-var. and vice-versa 10
Scatterplot Pattern Strength: Strong relationship Week relationship -1.0-0.5 0.0 0.5 1.0-2 -1 0 1 2 8 9 10 11 12 13 data tightly clustered around pattern 8 9 10 11 12 13 data loosely spread around pattern, forming vague cloud 11
Outliers Scatterplots also help identify outliers, i.e. extreme deviations from overall pattern -1.0-0.5 0.0 0.5 1.0 8 9 10 11 12 13 12
Example Describe relationship of variables based on scatterplot & identify any outliers -10-5 0 5 10 Form: Direction: Strength: -2-1 0 1 2 3 4 13
Correlation Correlation coefficient (r): numerical measure of linear relationship between 2 vars For variable data (x 1,,x n ) & (y 1,,y n ), given by r x x y y i x x y y i i 2 2 i StatCrunch: Stat > Summary Stats > Correlation r is always a number between 1 and 1 r describes strength & direction of linear relationships only 14
Interpretation of Correlation Coefficient 15
Correlation Properties r>0 +ve & r<0 ve relationship r magnitude, i.e. distance from 0, describes the strength of the relationship, i.e. how close the data are to a line r does not change when the x and/or y variables are shifted or rescaled r is symmetric: doesn t matter which variable is on x- or y-axis, r is the same in both cases r is sensitive to outliers 16
Example Choose corresponding r for each scatterplot -4-2 0 2 4-10 0 5 10 20 0 5 10 20 30-4 -2 0 2 4-4 -2 0 2 4-4 -2 0 2 0.8 0.4 0.0 +0.4 +0.8 0.8 0.4 0.0 +0.4 +0.8 0.8 0.4 0.0 +0.4 +0.8 17
Correlation vs Causation If two variables are correlated this does not necessarily imply that x causes y to change E.g. Height & Weight +ly correlated, but Weight does not cause Height, i.e. putting on more weight will not make you taller (r = +0.8762) Generally, Correlation Causation 18
Correlation vs Causation Observed correlation/association between two variables can be result of a third hidden or lurking variable E.g. Ice-cream sales correlated with drowning, but both variables are caused by weather ice-cream sales weather When weather is hot, people eat more ice-creams and do more swimming (& therefore drowning)! # people drowning 19