MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

Similar documents
MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests

MATH 829: Introduction to Data Mining and Analysis Least angle regression

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Review of Probability Theory II

Lecture 2: Consistency of M-estimators

Chapter 7: Special Distributions

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

1 Extremum Estimators

B8.1 Martingales Through Measure Theory. Concept of independence

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

Tsung-Lin Cheng and Yuan-Shih Chow. Taipei115,Taiwan R.O.C.

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Notes on Random Vectors and Multivariate Normal

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

San Francisco State University ECON 851 Summer Problem Set 1

Asymptotic theory for linear regression and IV estimation

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

LEIBNIZ SEMINORMS IN PROBABILITY SPACES

Real Analysis 1 Fall Homework 3. a n.

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

On Z p -norms of random vectors

ECON 4130 Supplementary Exercises 1-4

Principal Components Analysis and Unsupervised Hebbian Learning

Notes on Instrumental Variables Methods

MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new

Mollifiers and its applications in L p (Ω) space

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

On the minimax inequality and its application to existence of three solutions for elliptic equations with Dirichlet boundary condition

Regression and Statistical Inference

Analysis of some entrance probabilities for killed birth-death processes

STAT Homework 8 - Solutions

1 Probability Spaces and Random Variables

1 Gambler s Ruin Problem

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution

6 Stationary Distributions

General Random Variables

1 1 c (a) 1 (b) 1 Figure 1: (a) First ath followed by salesman in the stris method. (b) Alternative ath. 4. D = distance travelled closing the loo. Th

ECON Answers Homework #2

Existence and nonexistence of positive solutions for quasilinear elliptic systems

Lecture 3 Consistency of Extremum Estimators 1

Elementary Analysis in Q p

Matrix Approach to Simple Linear Regression: An Overview

Computing the covariance of two Brownian area integrals

Asymptotically Optimal Simulation Allocation under Dependent Sampling

LIST OF FORMULAS FOR STK1100 AND STK1110

7. Introduction to Large Sample Theory

MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new

Probability Lecture III (August, 2006)

YOUNESS LAMZOURI H 2. The purpose of this note is to improve the error term in this asymptotic formula. H 2 (log log H) 3 ζ(3) H2 + O

Estimating Time-Series Models

Lecture: Condorcet s Theorem

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). Andrew Dabrowski

GENERICITY OF INFINITE-ORDER ELEMENTS IN HYPERBOLIC GROUPS

Asymptotic Statistics-III. Changliang Zou

Extension of Minimax to Infinite Matrices

MATH 829: Introduction to Data Mining and Analysis Clustering II

MATH 2710: NOTES FOR ANALYSIS

Heuristics on Tate Shafarevitch Groups of Elliptic Curves Defined over Q

Econometrics I. September, Part I. Department of Economics Stanford University

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Homework Solution 4 for APPM4/5560 Markov Processes

Piotr Blass. Jerey Lang

General Linear Model Introduction, Classes of Linear models and Estimation

Lecture 1: August 28

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

MATH 829: Introduction to Data Mining and Analysis Support vector machines

On a class of Rellich inequalities

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction

Introduction to Banach Spaces

3. Show that if there are 23 people in a room, the probability is less than one half that no two of them share the same birthday.

Random Variables and Their Distributions

arxiv:cond-mat/ v2 25 Sep 2002

Sharp gradient estimate and spectral rigidity for p-laplacian

Notation We will be studying several counting functions related to integer artitions. For clarity, we begin with the denitions and notation. If n is a

CERIAS Tech Report The period of the Bell numbers modulo a prime by Peter Montgomery, Sangil Nahm, Samuel Wagstaff Jr Center for Education

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Solution: (Course X071570: Stochastic Processes)

STA 2201/442 Assignment 2

738 SCIENCE IN CHINA (Series A) Vol. 46 Let y = (x 1 x ) and the random variable ß m be the number of sibs' alleles shared identity by descent (IBD) a

A MONOTONICITY RESULT FOR A G/GI/c QUEUE WITH BALKING OR RENEGING

= p(t)(1 λδt + o(δt)) (from axioms) Now p(0) = 1, so c = 0 giving p(t) = e λt as required. 5 For a non-homogeneous process we have

1 Random Variables and Probability Distributions

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de

Lecture 11. Multivariate Normal theory

Estimation of the large covariance matrix with two-step monotone missing data

CHAPTER 2: SMOOTH MAPS. 1. Introduction In this chapter we introduce smooth maps between manifolds, and some important

On a Markov Game with Incomplete Information

CSC165H, Mathematical expression and reasoning for computer science week 12

Transcription:

1/9 MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression Dominique Guillot Deartments of Mathematical Sciences University of Delaware February 15, 2016

Distribution of regression coecients 2/9 Observations Y = (y i ) R n, X = (x ij ) R n.

Distribution of regression coecients 2/9 Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error).

2/9 Distribution of regression coecients Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error). In other words: Y = Xβ + ɛ. (β = (β 1,... β ) is a xed unknown vector)

2/9 Distribution of regression coecients Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error). In other words: Y = Xβ + ɛ. (β = (β 1,... β ) is a xed unknown vector) 2 x ij are non-random. ɛ i are random.

2/9 Distribution of regression coecients Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error). In other words: Y = Xβ + ɛ. (β = (β 1,... β ) is a xed unknown vector) 2 x ij are non-random. ɛ i are random. 3 ɛ i are indeendent N(0, σ 2 ). We have ˆβ = (X T X) 1 X T Y.

2/9 Distribution of regression coecients Observations Y = (y i ) R n, X = (x ij ) R n. Assumtions: 1 Y i = β 1 X i,1 + + β X i, + ɛ i (ɛ i = error). In other words: Y = Xβ + ɛ. (β = (β 1,... β ) is a xed unknown vector) 2 x ij are non-random. ɛ i are random. 3 ɛ i are indeendent N(0, σ 2 ). We have ˆβ = (X T X) 1 X T Y. What is the distribution of ˆβ?

Multivariate normal distribution 3/9 Recall: X = (X 1,..., X ) N(µ, Σ) where µ R, Σ = (σ ij ) R is ositive denite, if 1 P (X A) = e 1 (2π) 2 (x µ)t Σ 1 (x µ) dx 1... dx. det Σ A

Multivariate normal distribution 3/9 Recall: X = (X 1,..., X ) N(µ, Σ) where µ R, Σ = (σ ij ) R is ositive denite, if 1 P (X A) = e 1 (2π) 2 (x µ)t Σ 1 (x µ) dx 1... dx. det Σ A Bivariate case:

Multivariate normal distribution 3/9 Recall: X = (X 1,..., X ) N(µ, Σ) where µ R, Σ = (σ ij ) R is ositive denite, if 1 P (X A) = e 1 (2π) 2 (x µ)t Σ 1 (x µ) dx 1... dx. det Σ A Bivariate case: We have E(X) = µ, Cov(X i, X j ) = σ ij.

Multivariate normal distribution 3/9 Recall: X = (X 1,..., X ) N(µ, Σ) where µ R, Σ = (σ ij ) R is ositive denite, if 1 P (X A) = e 1 (2π) 2 (x µ)t Σ 1 (x µ) dx 1... dx. det Σ A Bivariate case: We have E(X) = µ, Cov(X i, X j ) = σ ij. If Y = c + BX, where c R and B R m, then Y N(c + Bµ, BΣB T ).

Distribution of the regression coecients (cont.) 4/9 Back to our roblem: Y = Xβ + ɛ where ɛ i are iid N(0, σ 2 ). We have Y N(Xβ, σ 2 I).

Distribution of the regression coecients (cont.) 4/9 Back to our roblem: Y = Xβ + ɛ where ɛ i are iid N(0, σ 2 ). We have Y N(Xβ, σ 2 I). Therefore, ˆβ = (X T X) 1 X T Y N(β, σ 2 (X T X) 1 ).

Distribution of the regression coecients (cont.) 4/9 Back to our roblem: Y = Xβ + ɛ where ɛ i are iid N(0, σ 2 ). We have Y N(Xβ, σ 2 I). Therefore, ˆβ = (X T X) 1 X T Y N(β, σ 2 (X T X) 1 ). In articular, Thus, ˆβ is unbiased. E( ˆβ) = β.

Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β.

Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β.

Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β. A sequence of estimators {θ n } n=1 of a arameter θ is said to be consistent if θ n θ in robability (θ n θ) as n.

Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β. A sequence of estimators {θ n } n=1 of a arameter θ is said to be consistent if θ n θ in robability (θ n θ) as n. (Recall: θ n θ if for every ɛ > 0, lim P ( θ n θ ɛ) = 0. n

Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β. A sequence of estimators {θ n } n=1 of a arameter θ is said to be consistent if θ n θ in robability (θ n θ) as n. (Recall: θ n θ if for every ɛ > 0, lim P ( θ n θ ɛ) = 0. n In order to rove that ˆβ n (estimator with n samles) is consistent, we will make some assumtions on the data generating model.

Statistical consistency of least squares 5/9 We saw that E( ˆβ) = β. What haens as the samle size n goes to innity? We exect ˆβ = ˆβ(n) β. A sequence of estimators {θ n } n=1 of a arameter θ is said to be consistent if θ n θ in robability (θ n θ) as n. (Recall: θ n θ if for every ɛ > 0, lim P ( θ n θ ɛ) = 0. n In order to rove that ˆβ n (estimator with n samles) is consistent, we will make some assumtions on the data generating model. (Without any assumtions, nothing revents the observations to be all the same for examle... )

Statistical consistency of least squares (cont.) 6/9 Observations: y = (y i ) R n, X = (x ij ) R n.

Statistical consistency of least squares (cont.) 6/9 Observations: y = (y i ) R n, X = (x ij ) R n. Let x i := (x i,1,..., x i,n ) R (i = 1,..., n).

Statistical consistency of least squares (cont.) 6/9 Observations: y = (y i ) R n, X = (x ij ) R n. Let x i := (x i,1,..., x i,n ) R (i = 1,..., n). We will assume: 1 (x i ) n i=1 are iid random vectors. 2 y i = β 1 x i,1 + + β x i, + ɛ i where ɛ i are iid N(0, σ 2 ). 3 The error ɛ i is indeendent of x i. 4 Ex 2 ij < (nite second moment). 5 Q = E(x i x T i ) R is invertible.

Statistical consistency of least squares (cont.) 6/9 Observations: y = (y i ) R n, X = (x ij ) R n. Let x i := (x i,1,..., x i,n ) R (i = 1,..., n). We will assume: 1 (x i ) n i=1 are iid random vectors. 2 y i = β 1 x i,1 + + β x i, + ɛ i where ɛ i are iid N(0, σ 2 ). 3 The error ɛ i is indeendent of x i. 4 Ex 2 ij < (nite second moment). 5 Q = E(x i x T i ) R is invertible. Under these assumtions, we have the following theorem. Theorem: Let ˆβ n = (X T X) 1 X T y. Then, under the above assumtions, we have ˆβ n β.

Background for the roof 7/9 Recall: Weak law of large numbers: Let (X i ) i=1 be iid random variables with nite rst moment E( X i ) <. Let µ := E(X i ). Then X n := 1 n X i µ. n i=1

Background for the roof 7/9 Recall: Weak law of large numbers: Let (X i ) i=1 be iid random variables with nite rst moment E( X i ) <. Let µ := E(X i ). Then X n := 1 n X i µ. n i=1 Continuous maing theorem: Let S, S be metric saces. Suose (X i ) i=1 are S-valued random variables such that X i X. Let g : S S. Denote by D g the set of oints in S where g is discontinuous and suose P (X D g ) = 0. Then g(x n ) g(x).

Proof of the theorem 8/9 We have ˆβ = (X T X) 1 X T y = ( 1 n n x i x T i i=1 ) 1 ( 1 n ) n x i y i. i=1

Proof of the theorem 8/9 We have ˆβ = (X T X) 1 X T y = Using CauchySchwarz, ( 1 n n x i x T i i=1 ) 1 ( 1 n ) n x i y i. i=1 E( x ij x ik ) (E(x 2 ij)e(x 2 ik ))1/2 <.

Proof of the theorem 8/9 We have ˆβ = (X T X) 1 X T y = Using CauchySchwarz, ( 1 n n x i x T i i=1 ) 1 ( 1 n ) n x i y i. i=1 E( x ij x ik ) (E(x 2 ij)e(x 2 ik ))1/2 <. In a similar way, we rove that E( x ij y i ) <.

Proof of the theorem 8/9 We have ˆβ = (X T X) 1 X T y = Using CauchySchwarz, ( 1 n n x i x T i i=1 ) 1 ( 1 n ) n x i y i. i=1 E( x ij x ik ) (E(x 2 ij)e(x 2 ik ))1/2 <. In a similar way, we rove that E( x ij y i ) <. By the weak law of large numbers, we obtain 1 n 1 n n x i x T i i=1 E(x i x T i ) = Q, n x i y i E(xi y i ). i=1

Proof of the theorem (cont.) 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.)

Proof of the theorem (cont.) 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.) Recall: y i = x T i β + ɛ i. So x i y i = x i x T i β + x i ɛ i.

Proof of the theorem (cont.) 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.) Recall: y i = x T i β + ɛ i. So x i y i = x i x T i β + x i ɛ i. Taking exectations, E(x i y i ) = E(x i x T i )β + E(x i ɛ i ).

Proof of the theorem (cont.) 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.) Recall: y i = x T i β + ɛ i. So Taking exectations, x i y i = x i x T i β + x i ɛ i. E(x i y i ) = E(x i x T i )β + E(x i ɛ i ). Note that E(x i ɛ i ) = 0 since x i and ɛ i are indeendent by assumtion.

Proof of the theorem (cont.) and so ˆβ n β. 9/9 Using the continuous maing theorem, we obtain ˆβ n E(xi x T i ) 1 E(x i y i ). (dene g : R R R by g(a, b) = A 1 b.) Recall: y i = x T i β + ɛ i. So Taking exectations, x i y i = x i x T i β + x i ɛ i. E(x i y i ) = E(x i x T i )β + E(x i ɛ i ). Note that E(x i ɛ i ) = 0 since x i and ɛ i are indeendent by assumtion. We conclude that β = E(x i x T i ) 1 E(x i y i )