Statistics for EES Linear regression and linear models
|
|
- Kerrie Welch
- 6 years ago
- Views:
Transcription
1 Statistics for EES Linear regression and linear models Dirk Metzler July 2010
2 Contents 1 Univariate linear regression: how and why? 2 t-test for linear regression 3 log-scaling the data 4 Checking model assumptions 5 Linear regression example with scaling 6 Why it s called regression
3 Contents Univariate linear regression: how and why? 1 Univariate linear regression: how and why? 2 t-test for linear regression 3 log-scaling the data 4 Checking model assumptions 5 Linear regression example with scaling 6 Why it s called regression
4 Univariate linear regression: how and why? Griffon Vulture Gypus fulvus German: Ga nsegeier photo (c) by Jo rg Hempel
5 Univariate linear regression: how and why? Prinzinger, R., E. Karl, R. Bögel, Ch. Walzer (1999): Energy metabolism, body temperature, and cardiac work in the Griffon vulture Gyps vulvus - telemetric investigations in the laboratory and in the field. Zoology 102, Suppl. II: 15 Data from Goethe-University, Group of Prof. Prinzinger Developed telemetric system for measuring heart beats of flying birds
6 Univariate linear regression: how and why? Prinzinger, R., E. Karl, R. Bögel, Ch. Walzer (1999): Energy metabolism, body temperature, and cardiac work in the Griffon vulture Gyps vulvus - telemetric investigations in the laboratory and in the field. Zoology 102, Suppl. II: 15 Data from Goethe-University, Group of Prof. Prinzinger Developed telemetric system for measuring heart beats of flying birds Important for ecological questions: metabolic rate.
7 Univariate linear regression: how and why? Prinzinger, R., E. Karl, R. Bögel, Ch. Walzer (1999): Energy metabolism, body temperature, and cardiac work in the Griffon vulture Gyps vulvus - telemetric investigations in the laboratory and in the field. Zoology 102, Suppl. II: 15 Data from Goethe-University, Group of Prof. Prinzinger Developed telemetric system for measuring heart beats of flying birds Important for ecological questions: metabolic rate. metabolic rate can only be measured in the lab
8 Univariate linear regression: how and why? Prinzinger, R., E. Karl, R. Bögel, Ch. Walzer (1999): Energy metabolism, body temperature, and cardiac work in the Griffon vulture Gyps vulvus - telemetric investigations in the laboratory and in the field. Zoology 102, Suppl. II: 15 Data from Goethe-University, Group of Prof. Prinzinger Developed telemetric system for measuring heart beats of flying birds Important for ecological questions: metabolic rate. metabolic rate can only be measured in the lab can we infer metabolic rate from heart beat frequency?
9 Univariate linear regression: how and why? griffon vulture, , 16 degrees C metabolic rate [J/(g*h)] heart beats [per minute]
10 Univariate linear regression: how and why? griffon vulture, , 16 degrees C metabolic rate [J/(g*h)] heart beats [per minute]
11 Univariate linear regression: how and why? vulture day heartbpm metabol mintemp maxtemp medtemp / / / / / / / / / / / (14 different days)
12 Univariate linear regression: how and why? > model <- lm(metabol~heartbpm,data=vulture, subset=day=="17.05.") > summary(model) Call: lm(formula = metabol ~ heartbpm, data = vulture, subset = day "17.05.") Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-08 *** heartbpm e-14 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 17 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 17 DF, p-value: 2.979e-14
13 Univariate linear regression: how and why? 0 0
14 Univariate linear regression: how and why? 0 0
15 Univariate linear regression: how and why? y 3 y 2 y x x x 1 2 3
16 Univariate linear regression: how and why? y 3 y 2 y 1 a intercept 0 0 x x x 1 2 3
17 Univariate linear regression: how and why? y 3 b slope y 2 1 y 1 a intercept 0 0 x x x 1 2 3
18 Univariate linear regression: how and why? y 3 b slope y 2 y 1 y y b= 2 1 x x 2 1 x x 2 1 y y a intercept 0 0 x x x 1 2 3
19 Univariate linear regression: how and why? y 3 b slope y 2 y 1 b= y y 2 1 x x 2 1 x x 2 1 y y y=a+bx a intercept 0 0 x x x 1 2 3
20 Univariate linear regression: how and why? 0 0
21 Univariate linear regression: how and why? 0 0
22 Univariate linear regression: how and why? r n r 1 r 3 r i r 2 0 0
23 Univariate linear regression: how and why? r n r 1 r 3 r i r 2 residuals r = y (a+bx ) i i i 0 0
24 Univariate linear regression: how and why? r n r 1 r 3 r i r 2 residuals r = y (a+bx ) i i i the line must minimize the sum of squared residuals 0 r 2+ r r n 0
25 Univariate linear regression: how and why? define the regression line y = â + ˆb x by minimizing the sum of squared residuals: (â, ˆb) = arg min (y i (a + b x i )) 2 (a,b) this is based on the model assumption that values a, b exist, such that, for all data points (x i, y i ) we have i y i = a + b x i + ε i, whereas all ε i are independent and normally distributed with the same variance σ 2.
26 Univariate linear regression: how and why? givend data: Y X y 1 x 1 y 2 x 2 y 3 x 3.. y n x n
27 Univariate linear regression: how and why? givend data: Y X y 1 x 1 y 2 x 2 y 3 x 3.. Model: there are values a, b, σ 2 such that y 1 = a + b x 1 + ε 1 y 2 = a + b x 2 + ε 2 y 3 = a + b x 3 + ε 3.. y n x n y n = a + b x n + ε n
28 Univariate linear regression: how and why? givend data: Y X y 1 x 1 y 2 x 2 y 3 x 3.. Model: there are values a, b, σ 2 such that y 1 = a + b x 1 + ε 1 y 2 = a + b x 2 + ε 2 y 3 = a + b x 3 + ε 3.. y n x n y n = a + b x n + ε n ε 1, ε 2,..., ε n are independent N (0, σ 2 ).
29 Univariate linear regression: how and why? givend data: Y X y 1 x 1 y 2 x 2 y 3 x 3.. Model: there are values a, b, σ 2 such that y 1 = a + b x 1 + ε 1 y 2 = a + b x 2 + ε 2 y 3 = a + b x 3 + ε 3.. y n x n y n = a + b x n + ε n ε 1, ε 2,..., ε n are independent N (0, σ 2 ). y 1, y 2,..., y n are independent y i N (a + b x i, σ 2 ).
30 Univariate linear regression: how and why? givend data: Y X y 1 x 1 y 2 x 2 y 3 x 3.. Model: there are values a, b, σ 2 such that y 1 = a + b x 1 + ε 1 y 2 = a + b x 2 + ε 2 y 3 = a + b x 3 + ε 3.. y n x n y n = a + b x n + ε n ε 1, ε 2,..., ε n are independent N (0, σ 2 ). y 1, y 2,..., y n are independent y i N (a + b x i, σ 2 ). a, b, σ 2 are unknown, but not random.
31 Univariate linear regression: how and why? We estimate a and b by computing (â, ˆb) := arg min (y i (a + b x i )) 2. (a,b) i
32 Univariate linear regression: how and why? We estimate a and b by computing (â, ˆb) := arg min (y i (a + b x i )) 2. (a,b) i Theorem Compute â and ˆb by i ˆb = (y i ȳ) (x i x) i (x = i x) 2 and â = ȳ ˆb x. i y i (x i x) i (x i x) 2
33 Univariate linear regression: how and why? We estimate a and b by computing (â, ˆb) := arg min (y i (a + b x i )) 2. (a,b) i Theorem Compute â and ˆb by i ˆb = (y i ȳ) (x i x) i (x = i x) 2 and â = ȳ ˆb x. i y i (x i x) i (x i x) 2 Please keep in mind: The line y = â + ˆb x goes through the center of gravity of the cloud of points (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ).
34 Univariate linear regression: how and why? Sketch of the proof of the theorem Let g(a, b) = i (y i (a + b x i )) 2. We optimize g, by setting the derivatives of g to 0 and obtain g(a, b) a g(a, b) b = i = i 2 (y i (a + bx i )) ( 1) 2 (y i (a + bx i )) ( x i ) 0 = i 0 = i (y i (â + ˆbx i )) ( 1) (y i (â + ˆbx i )) ( x i )
35 Univariate linear regression: how and why? 0 = i (y i (â + ˆbx i )) 0 = i (y i (â + ˆbx i )) x i gives us 0 = 0 = ( ) ( ) y i n â ˆb x i i ( ) ( ) ( y i x i â x i ˆb i i i i x 2 i ) and the theorem follows by solving this for â and ˆb.
36 Univariate linear regression: how and why? vulture day heartbpm metabol mintemp maxtemp medtemp / / / / / / / / / / / (14 different days)
37 Univariate linear regression: how and why? > model <- lm(metabol~heartbpm,data=vulture, subset=day=="17.05.") > summary(model) Call: lm(formula = metabol ~ heartbpm, data = vulture, subset = day == "17.05.") Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-08 *** heartbpm e-14 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 17 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 17 DF, p-value: 2.979e-14
38 Univariate linear regression: how and why? Optimizing the clutch size Example:Cowpea weevil (also bruchid beetle) Callosobruchus maculatus German: Erbsensamenkäfer Wilson, K. (1994) Evolution of clutch size in insects. II. A test of static optimality models using the beetle Callosobruchus maculatus (Coleoptera: Bruchidae). Journal of Evolutionary Biology 7: How does survival probability depnend on clutch size?
39 Univariate linear regression: how and why? Optimizing the clutch size Example:Cowpea weevil (also bruchid beetle) Callosobruchus maculatus German: Erbsensamenkäfer Wilson, K. (1994) Evolution of clutch size in insects. II. A test of static optimality models using the beetle Callosobruchus maculatus (Coleoptera: Bruchidae). Journal of Evolutionary Biology 7: How does survival probability depnend on clutch size? Which clutch size optimizes the expected number of surviving offspring?
40 Univariate linear regression: how and why? viability clutchsize
41 Univariate linear regression: how and why? viability clutchsize
42 Univariate linear regression: how and why? clutchsize * viability clutchsize
43 Univariate linear regression: how and why? clutchsize * viability clutchsize
44 Contents t-test for linear regression 1 Univariate linear regression: how and why? 2 t-test for linear regression 3 log-scaling the data 4 Checking model assumptions 5 Linear regression example with scaling 6 Why it s called regression
45 t-test for linear regression Example: red deer (Cervus elaphus) theory: femals can influence the sex of their offspring
46 t-test for linear regression Example: red deer (Cervus elaphus) theory: femals can influence the sex of their offspring Evolutionary stable strategy: weak animals may tend to have female offspring, strong animals may tend to have male offspring. Clutton-Brock, T. H., Albon, S. D., Guinness, F. E. (1986) Great expectations: dominance, breeding success and offspring sex ratios in red deer. Anim. Behav. 34,
47 t-test for linear regression > hind rank ratiomales CAUTION: Simulated data, inspired by original paper
48 t-test for linear regression hind$rank hind$ratiomales
49 t-test for linear regression hind$rank hind$ratiomales
50 t-test for linear regression > mod <- lm(ratiomales~rank,data=hind) > summary(mod) Call: lm(formula = ratiomales ~ rank, data = hind) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-06 *** rank e-09 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 52 degrees of freedom Multiple R-squared: , Adjusted R-squared:
51 t-test for linear regression Model: Y = a + b X + ε mit ε N (0, σ 2 )
52 t-test for linear regression Model: Y = a + b X + ε mit ε N (0, σ 2 ) How to compute the significance of a relationship between the explanatory trait X and the target variable Y?
53 t-test for linear regression Model: Y = a + b X + ε mit ε N (0, σ 2 ) How to compute the significance of a relationship between the explanatory trait X and the target variable Y? In other words: How can we test the null hypothesis b = 0?
54 t-test for linear regression Model: Y = a + b X + ε mit ε N (0, σ 2 ) How to compute the significance of a relationship between the explanatory trait X and the target variable Y? In other words: How can we test the null hypothesis b = 0? We have estimated b by ˆb 0. Could the true b be 0?
55 t-test for linear regression Model: Y = a + b X + ε mit ε N (0, σ 2 ) How to compute the significance of a relationship between the explanatory trait X and the target variable Y? In other words: How can we test the null hypothesis b = 0? We have estimated b by ˆb 0. Could the true b be 0? How large is the standard error of ˆb?
56 t-test for linear regression y i = a + b x i + ε mit ε N (0, σ 2 ) not random: a, b, x i, σ 2 random: ε, y i var(y i ) = var(a + b x i + ε) = var(ε) = σ 2 and y 1, y 2,..., y n are stochastically independent.
57 t-test for linear regression not random: a, b, x i, σ 2 y i = a + b x i + ε mit ε N (0, σ 2 ) random: ε, y i var(y i ) = var(a + b x i + ε) = var(ε) = σ 2 and y 1, y 2,..., y n are stochastically independent. ˆb = i y i(x i x) i (x i x) 2
58 t-test for linear regression not random: a, b, x i, σ 2 y i = a + b x i + ε mit ε N (0, σ 2 ) random: ε, y i var(y i ) = var(a + b x i + ε) = var(ε) = σ 2 and y 1, y 2,..., y n are stochastically independent. ˆb = i y i(x i x) i (x i x) 2 ( var(ˆb) = var i y ) i(x i x) i (x i x) 2 = var ( i y i(x i x)) ( i (x i x) 2 ) 2
59 t-test for linear regression not random: a, b, x i, σ 2 y i = a + b x i + ε mit ε N (0, σ 2 ) random: ε, y i var(y i ) = var(a + b x i + ε) = var(ε) = σ 2 and y 1, y 2,..., y n are stochastically independent. ˆb = i y i(x i x) i (x i x) 2 ( var(ˆb) = var i y ) i(x i x) i (x = var ( i y i(x i x)) i x) 2 ( i (x i x) 2 ) 2 i = var (y i) (x i x) 2 i ( i (x = σ 2 (x i x) 2 i x) 2 ) 2 ( i (x i x) 2 ) 2
60 t-test for linear regression not random: a, b, x i, σ 2 y i = a + b x i + ε mit ε N (0, σ 2 ) random: ε, y i var(y i ) = var(a + b x i + ε) = var(ε) = σ 2 and y 1, y 2,..., y n are stochastically independent. ˆb = i y i(x i x) i (x i x) 2 ( var(ˆb) = var i y ) i(x i x) i (x = var ( i y i(x i x)) i x) 2 ( i (x i x) 2 ) 2 i = var (y i) (x i x) 2 i ( i (x = σ 2 (x i x) 2 i x) 2 ) 2 ( i (x i x) 2 ) 2 = σ 2 / i (x i x) 2
61 t-test for linear regression In fact ˆb is normally distributed with mean b and var(ˆb) = σ 2 / i (x i x) 2
62 t-test for linear regression In fact ˆb is normally distributed with mean b and var(ˆb) = σ 2 / i (x i x) 2 Problem: We do not know σ 2.
63 t-test for linear regression In fact ˆb is normally distributed with mean b and var(ˆb) = σ 2 / i (x i x) 2 Problem: We do not know σ 2. We estimate σ 2 by considering the residual variance: s 2 := i (y i â ˆb ) 2 x i n 2
64 t-test for linear regression In fact ˆb is normally distributed with mean b and var(ˆb) = σ 2 / i (x i x) 2 Problem: We do not know σ 2. We estimate σ 2 by considering the residual variance: s 2 := i (y i â ˆb ) 2 x i n 2 Note that we divide by n 2. The reason for this is that two model parameters a and b have been estimated, which means that two degrees of freedom got lost.
65 t-test for linear regression var(ˆb) = σ 2 / i (x i x) 2 Estimate σ 2 by Then s 2 = i (y i â ˆb ) 2 x i. n 2 ˆb b / s i (x i x) 2 is Student-t-distributed with n 2 degrees of freedom and we can apply the t-test to test the null hypothesis b = 0.
66 Contents log-scaling the data 1 Univariate linear regression: how and why? 2 t-test for linear regression 3 log-scaling the data 4 Checking model assumptions 5 Linear regression example with scaling 6 Why it s called regression
67 log-scaling the data Data example: typical body weight [kg] and and brain weight [g] of 62 mammals species (and 3 dinosaurs) > data weight.kg. brain.weight.g species extinct african elephant no no no no asian elephant no no no no cat no chimpanzee no
68 log-scaling the data typische Werte bei 62 Saeugeierarten Gehirngewicht [g] Koerpergewicht [kg]
69 log-scaling the data typische Werte bei 65 Saeugeierarten african elephant asian elephant Gehirngewicht [g] 1e 01 1e+00 1e+01 1e+02 1e+03 mouse giraffe horse chimpanzee donkey cow rhesus monkey sheep jaguar pig potar monkey grey goat wolf kangaroo cat rabbit mountain beaver guinea pig mole rat hamster human Triceratops Diplodocus 1e 02 1e+00 1e+02 1e+04 Brachiosa Koerpergewicht [kg]
70 log-scaling the data 1e 02 1e+00 1e+02 1e+04 1e 01 1e+00 1e+01 1e+02 1e+03 typische Werte bei 65 Saeugeierarten Koerpergewicht [kg] Gehirngewicht [g]
71 log-scaling the data > modell <- lm(brain.weight.g~weight.kg.,subset=extinct=="no") > summary(modell) Call: lm(formula = brain.weight.g ~ weight.kg., subset = extinct == "no") Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * weight.kg <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 60 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 60 DF, p-value: < 2.2e-16
72 log-scaling the data qqnorm(modell$residuals) Normal Q Q Plot Sample Quantiles Theoretical Quantiles
73 log-scaling the data plot(modell$fitted.values,modell$residuals) modell$residuals e 02 1e+00 1e+02 1e+04 modell$model$weight.kg.
74 log-scaling the data plot(modell$fitted.values,modell$residuals,log= x ) modell$residuals modell$fitted.values
75 log-scaling the data plot(modell$model$weight.kg.,modell$residuals) modell$residuals modell$model$weight.kg.
76 log-scaling the data plot(modell$model$weight.kg.,modell$residuals,log= x ) modell$residuals e 02 1e+00 1e+02 1e+04 modell$model$weight.kg.
77 log-scaling the data We see that the residuals varaince depends on the fitted values (or the body weight): heteroscadiscity
78 log-scaling the data We see that the residuals varaince depends on the fitted values (or the body weight): heteroscadiscity The model assumes homoscedascity, i.e. the random deviations must be (almost) independent of the explaining traits (body weight) and the fitted values.
79 log-scaling the data We see that the residuals varaince depends on the fitted values (or the body weight): heteroscadiscity The model assumes homoscedascity, i.e. the random deviations must be (almost) independent of the explaining traits (body weight) and the fitted values. variance-stabilizing transformation: can be rescale body- and brain size to make deviations independent of variables
80 log-scaling the data Actually not so surprising: An elephant s brain of typically 5 kg can easily be 500 g lighter or heavier from individual to individual. This can not happen for a mouse brain of typically 5 g. The latter will rather also vary by 10%, i.e. 0.5 g. Thus, the variance is not additive but rather multiplicative: brain mass = (expected brain mass) random We can convert this into something with additive randomness by taking the log: log(brain mass) = log(expected brain mass) + log(random)
81 log-scaling the data > logmodell <- lm(log(brain.weight.g)~log(weight.kg.),subset=e > summary(logmodell) Call: lm(formula = log(brain.weight.g) ~ log(weight.kg.), subset = e "no") Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** log(weight.kg.) <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 60 degrees of freedom Multiple R-squared: , Adjusted R-squared:
82 log-scaling the data qqnorm(modell$residuals) Normal Q Q Plot Theoretical Quantiles Sample Quantiles
83 log-scaling the data plot(logmodell$fitted.values,logmodell$residuals) logmodell$fitted.values logmodell$residuals
84 log-scaling the data plot(logmodell$fitted.values,logmodell$residuals,log= x ) 1e 03 1e 02 1e 01 1e+00 1e logmodell$fitted.values logmodell$residuals
85 log-scaling the data plot(weight.kg.[extinct== no ],logmodell$residuals) weight.kg.[extinct == "no"] logmodell$residuals
86 log-scaling the data plot(weight.kg.[extinct= no ],logmodell$residuals,log= x ) 1e 02 1e+00 1e+02 1e weight.kg.[extinct == "no"] logmodell$residuals
87 Contents Checking model assumptions 1 Univariate linear regression: how and why? 2 t-test for linear regression 3 log-scaling the data 4 Checking model assumptions 5 Linear regression example with scaling 6 Why it s called regression
88 Checking model assumptions Is the model appropriate for the data?, e.g Y = a + b X + ε mit ε N (0, σ 2 )
89 Checking model assumptions Is the model appropriate for the data?, e.g Y = a + b X + ε mit ε N (0, σ 2 ) If the model fits, the residuals must be ) y i (â + b xi look normally distributed and must not have obvious dependencies with X or â + b X.
90 Checking model assumptions Example: is the relation between X and Y sufficiently well described by the linear equation Y i = a + b X i + ε i? Y X
91 Checking model assumptions > mod <- lm(y ~ X) > summary(mod) Call: lm(formula = Y ~ X) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) X <2e-16 *** --- Signif. codes: 0 *** ** 0.01 *
92 Checking model assumptions > plot(x,residuals(mod)) residuals(mod) X
93 Checking model assumptions > plot(x,residuals(mod)) residuals(mod) X
94 Checking model assumptions > plot(x,residuals(mod)) residuals(mod) X Obviously, the residuals tend to be larger for very large and very small values of X than for mean values of X. That should not be!
95 Checking model assumptions Idea: Instead fit a section of a parabola instead of aline to (x i, y i ), i.e. a model of the form Y i = a + b X i + c X 2 i + ε i.
96 Checking model assumptions Idea: Instead fit a section of a parabola instead of aline to (x i, y i ), i.e. a model of the form Y i = a + b X i + c X 2 i + ε i. Is this still a linear model?
97 Checking model assumptions Idea: Instead fit a section of a parabola instead of aline to (x i, y i ), i.e. a model of the form Y i = a + b X i + c X 2 i + ε i. Is this still a linear model? Yes: Let Z = X 2, then Y is linear in X and Z.
98 Checking model assumptions Idea: Instead fit a section of a parabola instead of aline to (x i, y i ), i.e. a model of the form Y i = a + b X i + c X 2 i + ε i. Is this still a linear model? Yes: Let Z = X 2, then Y is linear in X and Z. In R: > Z <- X^2 > mod2 <- mod <- lm(y ~ X+Z)
99 Checking model assumptions > summary(mod2) Call: lm(formula = Y ~ X + Z) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** X * Z <2e-16 *** --- Signif. codes: 0 *** ** 0.01 *
100 Checking model assumptions For this model there is no obvious dependence between X and the residuals: plot(x,residuals(mod2)) residuals(mod2) X
101 Checking model assumptions Is the assumption of normality in the model Y i = a + b X i + ε i in accordance with the data?
102 Checking model assumptions Is the assumption of normality in the model Y i = a + b X i + ε i in accordance with the data? Are the residuals r i = Y i (â + b X i ) more or less normally distributed?
103 Checking model assumptions Is the assumption of normality in the model Y i = a + b X i + ε i in accordance with the data? Are the residuals r i = Y i (â + b X i ) more or less normally distributed? Graphical Methods: compare the theoretical quantiles of the standard normal distribution N (0, 1) with those of the residuals.
104 Checking model assumptions Is the assumption of normality in the model Y i = a + b X i + ε i in accordance with the data? Are the residuals r i = Y i (â + b X i ) more or less normally distributed? Graphical Methods: compare the theoretical quantiles of the standard normal distribution N (0, 1) with those of the residuals. Background: If we plot the quantiles of N (µ, σ 2 ) against those of N (0, 1), we obtain a line y(x) = µ + σ x.
105 Checking model assumptions Is the assumption of normality in the model Y i = a + b X i + ε i in accordance with the data? Are the residuals r i = Y i (â + b X i ) more or less normally distributed? Graphical Methods: compare the theoretical quantiles of the standard normal distribution N (0, 1) with those of the residuals. Background: If we plot the quantiles of N (µ, σ 2 ) against those of N (0, 1), we obtain a line y(x) = µ + σ x. (Reason: If X is standard-normally distributed and Y = a + b X, then Y is normally distributed with mean a and variance b 2.)
106 Checking model assumptions Before we fit the model with lm() we first have to check whether the model assumptions are fulfilled.
107 Checking model assumptions Before we fit the model with lm() we first have to check whether the model assumptions are fulfilled.
108 Checking model assumptions Before we fit the model with lm() we first have to check whether the model assumptions are fulfilled. To check the assumptions underlying a linear model we need the residuals.
109 Checking model assumptions Before we fit the model with lm() we first have to check whether the model assumptions are fulfilled. To check the assumptions underlying a linear model we need the residuals. To compute the residuals we first have to fit the model (in R with lm()).
110 Checking model assumptions Before we fit the model with lm() we first have to check whether the model assumptions are fulfilled. To check the assumptions underlying a linear model we need the residuals. To compute the residuals we first have to fit the model (in R with lm()). After that we can check the model assumptions and decide whether we stay with this model or still have to modify it.
111 Checking model assumptions p <- seq(from=0,to=1,by=0.01) plot(qnorm(p,mean=0,sd=1),qnorm(p,mean=1,sd=0.5), pch=16,cex=0.5) abline(v=0,h=0) qnorm(p, mean = 1, sd = 0.5) qnorm(p, mean = 0, sd = 1)
112 Checking model assumptions If we plot the empirical quantiles of a sample from a normal distribution against the theoretical quantiles of a standard normal distribution, the values are not precisely on the line but are scattered around a line.
113 Checking model assumptions If we plot the empirical quantiles of a sample from a normal distribution against the theoretical quantiles of a standard normal distribution, the values are not precisely on the line but are scattered around a line. If no systematic deviations from an imaginary line are recognizable: Normal distribution assumption is acceptable
114 Checking model assumptions If we plot the empirical quantiles of a sample from a normal distribution against the theoretical quantiles of a standard normal distribution, the values are not precisely on the line but are scattered around a line. If no systematic deviations from an imaginary line are recognizable: Normal distribution assumption is acceptable If systematic deviations from an imaginary line are obvious: Assumption of normality may be problematic. It may be necessary to rescale variables or to take additional explanatory variables into account.
115 Contents Linear regression example with scaling 1 Univariate linear regression: how and why? 2 t-test for linear regression 3 log-scaling the data 4 Checking model assumptions 5 Linear regression example with scaling 6 Why it s called regression
116 Linear regression example with scaling Data: For 301 US-american (Counties) number of white female inhabitants from 1960 and number of deaths by breast cancer in this group between 1950 and (Rice (2007) Mathematical Statistics and Data Analysis.) > canc deaths inhabitants
117 Linear regression example with scaling Is the average number of deaths proportional to population size, i.e. Edeaths = b inhabitants or does the cancer risk depend on the size of the county, such that a different model fits better? e.g. with a 0. Edeaths = a + b inhabitants
118 Linear regression example with scaling > modell <- lm(deaths~inhabitants,data=canc) > summary(modell) Call: lm(formula = deaths ~ inhabitants, data = canc) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e e inhabitants 3.578e e <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 13 on 299 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 4315 on 1 and 299 DF, p-value: < 2.2e-16
119 Linear regression example with scaling The intercept is estimated to , but not significantly different from 0.
120 Linear regression example with scaling The intercept is estimated to , but not significantly different from 0. Thus we cannot reject the null hypothesis that the county size has no influence on the cancer risk.
121 Linear regression example with scaling The intercept is estimated to , but not significantly different from 0. Thus we cannot reject the null hypothesis that the county size has no influence on the cancer risk. But.. does the model fit?
122 Linear regression example with scaling qqnorm(modell$residuals) Normal Q Q Plot Sample Quantiles Theoretical Quantiles
123 Linear regression example with scaling plot(modell$fitted.values,modell$residuals) modell$residuals modell$fitted.values
124 Linear regression example with scaling plot(modell$fitted.values,modell$residuals,log= x ) modell$residuals modell$fitted.values
125 Linear regression example with scaling plot(canc$inhabitants,modell$residuals,log= x ) modell$residuals e+02 2e+03 5e+03 2e+04 5e+04 canc$inhabitants
126 Linear regression example with scaling Th variance of the residuals depends on the fitted values. Heteroscedasticity
127 Linear regression example with scaling Th variance of the residuals depends on the fitted values. Heteroscedasticity The linear model assumgs Homoscedasticity.
128 Linear regression example with scaling Th variance of the residuals depends on the fitted values. Heteroscedasticity The linear model assumgs Homoscedasticity. Variance Stabilizing Transformation: How can we rescale the population size such that we obtain homoscedastic data?
129 Linear regression example with scaling Where does the variance come from?
130 Linear regression example with scaling Where does the variance come from? If n is the number of white female inhabitants and p the individual probability to die by breast cancer within 10 years, then np is the expected number of deaths and the variance is n p (1 p) n p (Maybe approximate binomial by Poisson). Standard deviation: n p.
131 Linear regression example with scaling Where does the variance come from? If n is the number of white female inhabitants and p the individual probability to die by breast cancer within 10 years, then np is the expected number of deaths and the variance is n p (1 p) n p (Maybe approximate binomial by Poisson). Standard deviation: n p. In this case we can approximately stabilize variance by taking the root on both sides of the equation.
132 Linear regression example with scaling Explanation: y = b x + ε y = (b x + ε) 2 = b 2 x + 2 b x ε + ε 2 SD is not exactly proportional to x, but at least 2 b x ε has SD prop. to x, namely 2 b x σ. The Term ε 2 is the σ 2 -fold of a χ 2 1 -distributed random variable and has SD=σ2 2. If σ is small compared to b x, the approximation y b 2 x + 2 b x ε is reasonable and the SD of y is approximately proportional to x.
133 Linear regression example with scaling > modellsq <- lm(sqrt(deaths~sqrt(inhabitants),data=canc) > summary(modellsq) Call: lm(formula = sqrt(deaths) ~ sqrt(inhabitants), data = canc) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) sqrt(inhabitants) <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 299 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 4051 on 1 and 299 DF, p-value: < 2.2e-16
134 Linear regression example with scaling qqnorm(modell$residuals) Normal Q Q Plot Sample Quantiles Theoretical Quantiles
135 Linear regression example with scaling plot(modellsq$fitted.values,modellsq$residuals,log= x ) plot(canc$inhabitants,modellsq$residuals,log= x ) modellsq$fitted.values modellsq$residuals 5e+02 2e+03 5e+03 2e+04 5e canc$inhabitants modellsq$residuals
136 Linear regression example with scaling The qqnorm plot is not perfect by at least the variance is stabilized.
137 Linear regression example with scaling The qqnorm plot is not perfect by at least the variance is stabilized. The result remains the same: No significant relation between county size and breast cancer death risk.
138 Contents Why it s called regression 1 Univariate linear regression: how and why? 2 t-test for linear regression 3 log-scaling the data 4 Checking model assumptions 5 Linear regression example with scaling 6 Why it s called regression
139 Why it s called regression Origin of the word Regression Sir Francis Galton ( ): Regression toward the mean.
140 Why it s called regression Origin of the word Regression Sir Francis Galton ( ): Regression toward the mean. Tall fathers tend to have sons that are slightly smaller than the fathers. Sons of small fathers are on average larger than their fathers.
141 Why it s called regression Koerpergroessen Sohn Vater
142 Why it s called regression Koerpergroessen Sohn Vater
143 Why it s called regression Koerpergroessen Sohn Vater
144 Why it s called regression Koerpergroessen Sohn Vater
145 Why it s called regression Koerpergroessen Sohn Vater
146 Why it s called regression Koerpergroessen Sohn Vater
147 Why it s called regression Koerpergroessen Sohn Vater
148 Why it s called regression Koerpergroessen Sohn Vater
149 Similar effects Why it s called regression In sports: The champion of the season will tend to fail the high expectations in the next year.
150 Why it s called regression Similar effects In sports: The champion of the season will tend to fail the high expectations in the next year. In school: If the worst 10% of the students get extra lessons and are not the worst 10% in the next year, then this does not proof that the extra lessons are useful.
Statistics for EES 7. Linear regression and linear models
Statistics for EES 7. Linear regression and linear models Dirk Metzler http://www.zi.biologie.uni-muenchen.de/evol/statgen.html 26. May 2009 Contents 1 Univariate linear regression: how and why? 2 t-test
More informationMultivariate Statistics in Ecology and Quantitative Genetics Linear Regression and Linear Models
Multivariate Statistics in Ecology and Quantitative Genetics Linear Regression and Linear Models Dirk Metzler http://evol.bio.lmu.de/_statgen July 12, 2017 Univariate linear regression Contents Univariate
More informationStatistics for EES Linear regression and linear models
Statstcs for EES Lnear regresson and lnear models Drk Metzler June 11, 2018 Contents 1 Unvarate lnear regresson: how and why? 1 2 t-test for lnear regresson 6 3 log-scalng the data 9 4 Checkng model assumptons
More informationLecture 16: Again on Regression
Lecture 16: Again on Regression S. Massa, Department of Statistics, University of Oxford 10 February 2016 The Normality Assumption Body weights (Kg) and brain weights (Kg) of 62 mammals. Species Body weight
More informationMultivariate Statistics in Ecology and Quantitative Genetics Summary
Multivariate Statistics in Ecology and Quantitative Genetics Summary Dirk Metzler & Martin Hutzenthaler http://evol.bio.lmu.de/_statgen 5. August 2011 Contents Linear Models Generalized Linear Models Mixed-effects
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationPHAR2821 Drug Discovery and Design B Statistics GARTH TARR SEMESTER 2, 2013
PHAR2821 Drug Discovery and Design B Statistics GARTH TARR SEMESTER 2, 2013 Housekeeping Contact details Email: garth.tarr@sydney.edu.au Room: 806 Carslaw Building Consultation: by appointment (email to
More informationMinimum Regularized Covariance Determinant Estimator
Minimum Regularized Covariance Determinant Estimator Honey, we shrunk the data and the covariance matrix Kris Boudt (joint with: P. Rousseeuw, S. Vanduffel and T. Verdonck) Vrije Universiteit Brussel/Amsterdam
More information6.0 Lesson Plan. Answer Questions. Regression. Transformation. Extrapolation. Residuals
6.0 Lesson Plan Answer Questions Regression Transformation Extrapolation Residuals 1 Information about TAs Lab grader: Pontus, npl@duke.edu Hwk grader: Rachel, rmt6@duke.edu Quiz (Tuesday): Matt, matthew.campbell@duke.edu
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationStudy Sheet. December 10, The course PDF has been updated (6/11). Read the new one.
Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is
More informationSimple linear regression
Simple linear regression Prof. Giuseppe Verlato Unit of Epidemiology & Medical Statistics, Dept. of Diagnostics & Public Health, University of Verona Statistics with two variables two nominal variables:
More informationStat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights
Stat 529 (Winter 2011) A simple linear regression (SLR) case study Reading: Sections 8.1 8.4, 8.6, 8.7 Mammals brain weights and body weights Questions of interest Scatterplots of the data Log transforming
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationOct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope
Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u
More informationChapter 16: Understanding Relationships Numerical Data
Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear
More informationIES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc
IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More information18.0 Multiple and Nonlinear Regression
18.0 Multiple and Nonlinear Regression 1 Answer Questions Multiple Regression Nonlinear Regression 18.1 Multiple Regression Recall the regression assumptions: 1. Each point (X i,y i ) in the scatterplot
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationMath 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.
,, Dr. Back Oct. 14, 2009 Son s Heights from Their Fathers Galton s Original 1886 Data If you know a father s height, what can you say about his son s? Son s Heights from Their Fathers Galton s Original
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationRegression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.
Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationStat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz
Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationStatistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran
Statistics and Quantitative Analysis U4320 Segment 10 Prof. Sharyn O Halloran Key Points 1. Review Univariate Regression Model 2. Introduce Multivariate Regression Model Assumptions Estimation Hypothesis
More informationDistribution Assumptions
Merlise Clyde Duke University November 22, 2016 Outline Topics Normality & Transformations Box-Cox Nonlinear Regression Readings: Christensen Chapter 13 & Wakefield Chapter 6 Linear Model Linear Model
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Motivation: Why Applied Statistics?
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationEconometrics. 4) Statistical inference
30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution
More informationBiostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich
Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Correlation and linear regression Analysis
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationStat 101: Lecture 6. Summer 2006
Stat 101: Lecture 6 Summer 2006 Outline Review and Questions Example for regression Transformations, Extrapolations, and Residual Review Mathematical model for regression Each point (X i, Y i ) in the
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationMultiple Regression and Regression Model Adequacy
Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More informationHandout 4: Simple Linear Regression
Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More information11 Correlation and Regression
Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationStat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov
Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple
More informationExercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer
Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationTransformations. Merlise Clyde. Readings: Gelman & Hill Ch 2-4, ALR 8-9
Transformations Merlise Clyde Readings: Gelman & Hill Ch 2-4, ALR 8-9 Assumptions of Linear Regression Y i = β 0 + β 1 X i1 + β 2 X i2 +... β p X ip + ɛ i Model Linear in X j but X j could be a transformation
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More information-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).
Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental
More informationGROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION
FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationEstadística II Chapter 4: Simple linear regression
Estadística II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents Objectives of the analysis. Model specification. Least Square Estimators (LSE): construction and properties
More informationMathematics for Economics MA course
Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between
More informationProbability and Statistics Notes
Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline
More informationSLR output RLS. Refer to slr (code) on the Lecture Page of the class website.
SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association
More informationRegression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).
Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,
More informationStatistical View of Least Squares
Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationQUANTITATIVE TECHNIQUES
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION (For B Com. IV Semester & BBA III Semester) COMPLEMENTARY COURSE QUANTITATIVE TECHNIQUES QUESTION BANK 1. The techniques which provide the decision maker
More informationLecture 17. Ingo Ruczinski. October 26, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 17 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University October 26, 2015 1 2 3 4 5 1 Paired difference hypothesis tests 2 Independent group differences
More informationCIVL 7012/8012. Simple Linear Regression. Lecture 3
CIVL 7012/8012 Simple Linear Regression Lecture 3 OLS assumptions - 1 Model of population Sample estimation (best-fit line) y = β 0 + β 1 x + ε y = b 0 + b 1 x We want E b 1 = β 1 ---> (1) Meaning we want
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationReview. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression
Ch. 3: Correlation & Relationships between variables Scatterplots Exercise Correlation Race / DNA Review Why numbers? Distribution & Graphs : Histogram Central Tendency Mean (SD) The Central Limit Theorem
More informationHomework 9 Sample Solution
Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationQuantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression
Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly
More informationIntroduction to the Analysis of Hierarchical and Longitudinal Data
Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models
More informationSTAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS
STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in
More informationLecture 2: Linear and Mixed Models
Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationStat 135 Fall 2013 FINAL EXAM December 18, 2013
Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationStatistics and Quantitative Analysis U4320
Statistics and Quantitative Analysis U3 Lecture 13: Explaining Variation Prof. Sharyn O Halloran Explaining Variation: Adjusted R (cont) Definition of Adjusted R So we'd like a measure like R, but one
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationLecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012
Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed
More informationStatistics for EES 3. From standard error to t-test
Statistics for EES 3. From standard error to t-test Dirk Metzler http://evol.bio.lmu.de/_statgen May 23, 2011 Contents 1 The Normal Distribution 2 Taking standard errors into account 3 The t-test for paired
More information