Today we will prove one result from probability that will be useful in several statistical tests. ... B1 B2 Br. Figure 23.1:

Similar documents
Lecture 11. Multivariate Normal theory

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

[Chapter 6. Functions of Random Variables]

7.1 Projections and Components

Hypothesis Testing One Sample Tests

Multivariate Statistical Analysis

1 Review of Probability and Distributions

j=1 π j = 1. Let X j be the number

Lecture 3. Inference about multivariate normal distribution

ANOVA: Analysis of Variance - Part I

STA 302f16 Assignment Five 1

STAT 430/510: Lecture 16

Dot product. The dot product is an inner product on a coordinate vector space (Definition 1, Theorem

ECE313 Summer Problem Set 5. Note: It is very important that you solve the problems first and check the solutions. 0, if X = 2k, = 3.

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

14.30 Introduction to Statistical Methods in Economics Spring 2009

Probability and Distributions

Stat 700 HW2 Solutions, 9/25/09

CS286.2 Lecture 15: Tsirelson s characterization of XOR games

Introduction to Maximum Likelihood Estimation

Lecture 1.4: Inner products and orthogonality

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Stat 5101 Lecture Notes

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Summary of Chapters 7-9

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Orthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality

Multivariate Statistical Analysis

Institute of Actuaries of India

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26

UNIVERSITY OF TORONTO Faculty of Arts and Science

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Large deviations for random projections of l p balls

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 6: Bivariate Correspondence Analysis - part II

Lecture: Face Recognition and Feature Reduction

The Multinomial Model

Face Recognition and Biometric Systems

Data Mining and Analysis: Fundamental Concepts and Algorithms

6 Pattern Mixture Models

Generalized Linear Models (1/29/13)

1 Exercises for lecture 1

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

PROBABILITY AND STATISTICS IN COMPUTING. III. Discrete Random Variables Expectation and Deviations From: [5][7][6] German Hernandez

IE 303 Discrete-Event Simulation

Next is material on matrix rank. Please see the handout

1 Principal Components Analysis

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

Statistics Ph.D. Qualifying Exam: Part II November 9, 2002

DS-GA 1002 Lecture notes 10 November 23, Linear models

Inner Product and Orthogonality

STA 4322 Exam I Name: Introduction to Statistics Theory

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

3 Multiple Discrete Random Variables

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

Final Review: Problem Solving Strategies for Stat 430

Statistical Methods in Particle Physics

Lecture 3. Biostatistics in Veterinary Science. Feb 2, Jung-Jin Lee Drexel University. Biostatistics in Veterinary Science Lecture 3

Chapter 2. Discrete Distributions

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Test Problems for Probability Theory ,

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Probability and Statistics Notes

Lecture: Face Recognition and Feature Reduction

Directional Statistics by K. V. Mardia & P. E. Jupp Wiley, Chichester, Errata to 1st printing

Slides 5: Random Number Extensions

Meixner matrix ensembles

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

MAS223 Statistical Inference and Modelling Exercises

Spring 2012 Math 541B Exam 1

Midterm Exam 1 Solution

Multivariate Distributions

12.5 Equations of Lines and Planes

Statistical Data Analysis Stat 3: p-values, parameter estimation

X = X X n, + X 2

Chapter 3: Maximum Likelihood Theory

Hypothesis Testing hypothesis testing approach

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Chapter 9: Hypothesis Testing Sections

Recall the convention that, for us, all vectors are column vectors.

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

Vectors Coordinate frames 2D implicit curves 2D parametric curves. Graphics 2008/2009, period 1. Lecture 2: vectors, curves, and surfaces

2. Matrix Algebra and Random Vectors

Y i. is the sample mean basal area and µ is the population mean. The difference between them is Y µ. We know the sampling distribution

1 Overview. CS348a: Computer Graphics Handout #8 Geometric Modeling Original Handout #8 Stanford University Thursday, 15 October 1992

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

CS 372: Computational Geometry Lecture 14 Geometric Approximation Algorithms

Chapter 10: Inferences based on two samples

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Statistics 1B. Statistics 1B 1 (1 1)

The Hodge Star Operator

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

Asymptotic Statistics-III. Changliang Zou

Confidence intervals for parameters of normal distribution.

Lecture 4: Two-point Sampling, Coupon Collector s problem

6 The normal distribution, the central limit theorem and random samples

6 Central Limit Theorem. (Chs 6.4, 6.5)

CS70: Jean Walrand: Lecture 22.

Transcription:

Lecture 23 23. Pearson s theorem. Today we will prove one result from probability that will be useful in several statistical tests. Let us consider r boxes B,..., B r as in figure 23.... B B2 Br Figure 23.: Assume that we throw n balls X,..., X n into these boxes randomly independently of each other with probabilities X i B p,..., X i B r p r, where probabilities add up to one p +... + p r. Let ν j be a number of balls in the jth box: ν j #{balls X,..., X n in the box B j } n IX l B j. On average, the number of balls in the jth box will be np j, so random variable ν j should be close to np j. One can also use Central Limit Theorem to describe how close ν j is to np j. The next result tells us how we can describe in some sense the closeness of ν j to np j simultaneously for all j r. The main difficulty in this Thorem comes from the fact that random variables ν j for j r are not independent, for example, because the total number of balls is equal to n, ν +... + ν r n, 89

LECTURE 23. 90 i.e. if we know these numbers in n boxes we will automatically know their number in the last box. Theorem. We have that the random variable 2 χ 2 r np j j converges in distribution to χ 2 r distribution with r degrees of freedom. Proof. Let us fix a box B j. The random variables IX B j,..., IX n B j that indicate whether each observation X i is in the box B j or not are i.i.d. with Bernoully distribution Bp j with probability of success IX B j X B j p j and variance VarIX B j p j p j. Therefore, by Central Limit Theorem we know that the random variable npj p j n IX l B j np j npj p j n IX l B j n N0, nvar converges to standard normal distribution. Therefore, the random variable npj p j N0, N0, p j converges to normal distribution with variance p j. Let us be a little informal and simply say that Z j npj where random variable Z j N0, p j. We know that each Z j has distribution N0, p j but, unfortunately, this does not tell us what the distribution of the sum Zj 2 will be, because as we mentioned above r.v.s ν j are not independent and their correlation structure will play an important role. To compute the covariance between Z i and Z j let us first compute the covariance between ν i np i and npi npj

LECTURE 23. 9 which is equal to ν i np j npi npj n p i p j ν i ν j ν i np j ν j np i + n 2 p i p j n p i p j ν i ν j np i np j np j np i + n 2 p i p j n p i p j ν i ν j n 2 p i p j. To compute ν i ν j we will use the fact that one ball cannot be inside two different boxes simultaneously which means that Therefore, ν i ν j IX l B i IX l B j 0. 23. n n IX l B i IX l B j IX l B i IX l B j l,l l IX l B i IX l B j + ll } {{} IX l B i IX l B j l l this equals to 0 by 23. nn IX l B j IX l B j nn p i p j. Therefore, the covariance above is equal to n nn p i p j n 2 p i p j p i p j. p i p j To summarize, we showed that the random variable 2 j np j where random variables Z,..., Z n satisfy Zj 2. j Z 2 i p i and covariance Z i Z j p i p j. To prove the Theorem it remains to show that this covariance structure of the sequence of Z i s will imply that their sum of squares has distribution χ 2 r. To show this we will find a different representation for Zi 2. Let g,, g r be i.i.d. standard normal sequence. Consider two vectors g g,..., g r and p p,..., p r

LECTURE 23. 92 and consider a vector g g p p, where g p g p +...+g r pr is a scalar product of g and p. We will first prove that g g p p has the same joint distribution as Z,..., Z r. 23.2 To show this let us consider two coordinates of the vector g g p p : i th : g i g l pl pi and j th : g j g l pl pj and compute their covariance: g i g l pl pi g j p i pj p j pi + Similarly, it is easy to compute that g i g l pl pj n p l pi pj 2 p i p j + p i p j p i p j. 2 g l pl pi pi. This proves 23.2, which provides us with another way to formulate the convergence, namely, we have νj np 2 j i th coordinate 2 npj j where we consider the coorinates of the vector g g p p. But this vector has a simple geometric interpretation. Since vector p is a unit vector: i p 2 p i 2 p i, vector V p g p is the projection of vector g on the line along p and, therefore, vector V 2 g p g p will be the projection of g onto the plane orthogonal to p, as shown in figures 23.2 and 23.3. Let us consider a new orthonormal coordinate system with the last basis vector last axis equal to p. In this new coordinate system vector g will have coordinates g g,..., g r gv

LECTURE 23. 93 V p V2 θ g Figure 23.2: Projections of g. * p gr gr g 90 o * Rotation g2 V2 g2 g g Figure 23.3: Rotation of the coordinate system. obtained from g by orthogonal transformation V that maps canonical basis into this new basis. But we proved a few lectures ago that in that case g,..., g r will also be i.i.d. standard normal. From figure 23.3 it is obvious that vector V 2 g p g p in the new coordinate system has coordinates and, therefore, g,..., g r, 0 i th coordinate 2 g 2 +... + g r 2. i But this last sum, by definition, has χ 2 r distribution since g,, g r are i.i.d. standard normal. This finishes the proof of Theorem.