1/9 MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression Dominique Guillot Deartments of Mathematical Sciences University of Delaware February 15, 2016
Distribution of regression coecients 2/9 Observations Y = (y i ) R n, X = (x ij ) R n.
Distribution of regression coecients 2/9 Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error).
2/9 Distribution of regression coecients Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error). In other words: Y = Xβ + ɛ. (β = (β 1,... β ) is a xed unknown vector)
2/9 Distribution of regression coecients Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error). In other words: Y = Xβ + ɛ. (β = (β 1,... β ) is a xed unknown vector) 2 x ij are non-random. ɛ i are random.
2/9 Distribution of regression coecients Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error). In other words: Y = Xβ + ɛ. (β = (β 1,... β ) is a xed unknown vector) 2 x ij are non-random. ɛ i are random. 3 ɛ i are indeendent N(0, σ 2 ). We have ˆβ = (X T X) 1 X T Y.
2/9 Distribution of regression coecients Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error). In other words: Y = Xβ + ɛ. (β = (β 1,... β ) is a xed unknown vector) 2 x ij are non-random. ɛ i are random. 3 ɛ i are indeendent N(0, σ 2 ). We have ˆβ = (X T X) 1 X T Y. What is the distribution of ˆβ?
Multivariate normal distribution 3/9 Recall: X = (X 1,..., X ) N(µ, Σ) where µ R, Σ = (σ ij ) R is ositive denite, if 1 P (X A) = e 1 (2π) 2 (x µ)t Σ 1 (x µ) dx 1... dx. det Σ A
Multivariate normal distribution 3/9 Recall: X = (X 1,..., X ) N(µ, Σ) where µ R, Σ = (σ ij ) R is ositive denite, if 1 P (X A) = e 1 (2π) 2 (x µ)t Σ 1 (x µ) dx 1... dx. det Σ A Bivariate case:
Multivariate normal distribution 3/9 Recall: X = (X 1,..., X ) N(µ, Σ) where µ R, Σ = (σ ij ) R is ositive denite, if 1 P (X A) = e 1 (2π) 2 (x µ)t Σ 1 (x µ) dx 1... dx. det Σ A Bivariate case: We have E(X) = µ, Cov(X i, X j ) = σ ij.
Multivariate normal distribution 3/9 Recall: X = (X 1,..., X ) N(µ, Σ) where µ R, Σ = (σ ij ) R is ositive denite, if 1 P (X A) = e 1 (2π) 2 (x µ)t Σ 1 (x µ) dx 1... dx. det Σ A Bivariate case: We have E(X) = µ, Cov(X i, X j ) = σ ij. If Y = c + BX, where c R and B R m, then Y N(c + Bµ, BΣB T ).
Distribution of the regression coecients (cont.) 4/9 Back to our roblem: Y = Xβ + ɛ where ɛ i are iid N(0, σ 2 ). We have Y N(Xβ, σ 2 I).
Distribution of the regression coecients (cont.) 4/9 Back to our roblem: Y = Xβ + ɛ where ɛ i are iid N(0, σ 2 ). We have Y N(Xβ, σ 2 I). Therefore, ˆβ = (X T X) 1 X T Y N(β, σ 2 (X T X) 1 ).
Distribution of the regression coecients (cont.) 4/9 Back to our roblem: Y = Xβ + ɛ where ɛ i are iid N(0, σ 2 ). We have Y N(Xβ, σ 2 I). Therefore, ˆβ = (X T X) 1 X T Y N(β, σ 2 (X T X) 1 ). In articular, Thus, ˆβ is unbiased. E( ˆβ) = β.
Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β.
Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β.
Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β. A sequence of estimators {θ n } n=1 of a arameter θ is said to be consistent if θ n θ in robability (θ n θ) as n.
Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β. A sequence of estimators {θ n } n=1 of a arameter θ is said to be consistent if θ n θ in robability (θ n θ) as n. (Recall: θ n θ if for every ɛ > 0, lim P ( θ n θ ɛ) = 0. n
Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β. A sequence of estimators {θ n } n=1 of a arameter θ is said to be consistent if θ n θ in robability (θ n θ) as n. (Recall: θ n θ if for every ɛ > 0, lim P ( θ n θ ɛ) = 0. n In order to rove that ˆβ n (estimator with n samles) is consistent, we will make some assumtions on the data generating model.
Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β. A sequence of estimators {θ n } n=1 of a arameter θ is said to be consistent if θ n θ in robability (θ n θ) as n. (Recall: θ n θ if for every ɛ > 0, lim P ( θ n θ ɛ) = 0. n In order to rove that ˆβ n (estimator with n samles) is consistent, we will make some assumtions on the data generating model. (Without any assumtions, nothing revents the observations to be all the same for examle... )
Statistical consistency of least squares (cont.) 6/9 Observations: y = (y i ) R n, X = (x ij ) R n.
Statistical consistency of least squares (cont.) 6/9 Observations: y = (y i ) R n, X = (x ij ) R n. Let x i := (x i,1,..., x i,n ) R (i = 1,..., n).
Statistical consistency of least squares (cont.) 6/9 Observations: y = (y i ) R n, X = (x ij ) R n. Let x i := (x i,1,..., x i,n ) R (i = 1,..., n). We will assume: 1 (x i ) n i=1 are iid random vectors. 2 y i = β 1 x i,1 + + β x i, + ɛ i where ɛ i are iid N(0, σ 2 ). 3 The error ɛ i is indeendent of x i. 4 Ex 2 ij < (nite second moment). 5 Q = E(x i x T i ) R is invertible.
Statistical consistency of least squares (cont.) 6/9 Observations: y = (y i ) R n, X = (x ij ) R n. Let x i := (x i,1,..., x i,n ) R (i = 1,..., n). We will assume: 1 (x i ) n i=1 are iid random vectors. 2 y i = β 1 x i,1 + + β x i, + ɛ i where ɛ i are iid N(0, σ 2 ). 3 The error ɛ i is indeendent of x i. 4 Ex 2 ij < (nite second moment). 5 Q = E(x i x T i ) R is invertible. Under these assumtions, we have the following theorem. Theorem: Let ˆβ n = (X T X) 1 X T y. Then, under the above assumtions, we have ˆβ n β.
Background for the roof 7/9 Recall: Weak law of large numbers: Let (X i ) i=1 be iid random variables with nite rst moment E( X i ) <. Let µ := E(X i ). Then X n := 1 n X i µ. n i=1
Background for the roof 7/9 Recall: Weak law of large numbers: Let (X i ) i=1 be iid random variables with nite rst moment E( X i ) <. Let µ := E(X i ). Then X n := 1 n X i µ. n i=1 Continuous maing theorem: Let S, S be metric saces. Suose (X i ) i=1 are S-valued random variables such that X i X. Let g : S S. Denote by D g the set of oints in S where g is discontinuous and suose P (X D g ) = 0. Then g(x n ) g(x).
Proof of the theorem 8/9 We have ˆβ = (X T X) 1 X T y = ( 1 n n x i x T i i=1 ) 1 ( 1 n ) n x i y i. i=1
Proof of the theorem 8/9 We have ˆβ = (X T X) 1 X T y = Using CauchySchwarz, ( 1 n n x i x T i i=1 ) 1 ( 1 n ) n x i y i. i=1 E( x ij x ik ) (E(x 2 ij)e(x 2 ik ))1/2 <.
Proof of the theorem 8/9 We have ˆβ = (X T X) 1 X T y = Using CauchySchwarz, ( 1 n n x i x T i i=1 ) 1 ( 1 n ) n x i y i. i=1 E( x ij x ik ) (E(x 2 ij)e(x 2 ik ))1/2 <. In a similar way, we rove that E( x ij y i ) <.
Proof of the theorem 8/9 We have ˆβ = (X T X) 1 X T y = Using CauchySchwarz, ( 1 n n x i x T i i=1 ) 1 ( 1 n ) n x i y i. i=1 E( x ij x ik ) (E(x 2 ij)e(x 2 ik ))1/2 <. In a similar way, we rove that E( x ij y i ) <. By the weak law of large numbers, we obtain 1 n 1 n n x i x T i i=1 E(x i x T i ) = Q, n x i y i E(xi y i ). i=1
Proof of the theorem (cont.) 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.)
Proof of the theorem (cont.) 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.) Recall: y i = x T i β + ɛ i. So x i y i = x i x T i β + x i ɛ i.
Proof of the theorem (cont.) 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.) Recall: y i = x T i β + ɛ i. So x i y i = x i x T i β + x i ɛ i. Taking exectations, E(x i y i ) = E(x i x T i )β + E(x i ɛ i ).
Proof of the theorem (cont.) 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.) Recall: y i = x T i β + ɛ i. So Taking exectations, x i y i = x i x T i β + x i ɛ i. E(x i y i ) = E(x i x T i )β + E(x i ɛ i ). Note that E(x i ɛ i ) = 0 since x i and ɛ i are indeendent by assumtion.
Proof of the theorem (cont.) and so ˆβ n β. 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.) Recall: y i = x T i β + ɛ i. So Taking exectations, x i y i = x i x T i β + x i ɛ i. E(x i y i ) = E(x i x T i )β + E(x i ɛ i ). Note that E(x i ɛ i ) = 0 since x i and ɛ i are indeendent by assumtion. We conclude that β = E(x i x T i ) 1 E(x i y i )