Today we will prove one result from probability that will be useful in several statistical tests. ... B1 B2 Br. Figure 23.1:

Lecture 23 23. Pearson s theorem. Today we will prove one result from probability that will be useful in several statistical tests. Let us consider r boxes B,..., B r as in figure 23.... B B2 Br Figure 23.: Assume that we throw n balls X,..., X n into these boxes randomly independently of each other with probabilities X i B p,..., X i B r p r, where probabilities add up to one p +... + p r. Let ν j be a number of balls in the jth box: ν j #{balls X,..., X n in the box B j } n IX l B j. On average, the number of balls in the jth box will be np j, so random variable ν j should be close to np j. One can also use Central Limit Theorem to describe how close ν j is to np j. The next result tells us how we can describe in some sense the closeness of ν j to np j simultaneously for all j r. The main difficulty in this Thorem comes from the fact that random variables ν j for j r are not independent, for example, because the total number of balls is equal to n, ν +... + ν r n, 89

LECTURE 23. 90 i.e. if we know these numbers in n boxes we will automatically know their number in the last box. Theorem. We have that the random variable 2 χ 2 r np j j converges in distribution to χ 2 r distribution with r degrees of freedom. Proof. Let us fix a box B j. The random variables IX B j,..., IX n B j that indicate whether each observation X i is in the box B j or not are i.i.d. with Bernoully distribution Bp j with probability of success IX B j X B j p j and variance VarIX B j p j p j. Therefore, by Central Limit Theorem we know that the random variable npj p j n IX l B j np j npj p j n IX l B j n N0, nvar converges to standard normal distribution. Therefore, the random variable npj p j N0, N0, p j converges to normal distribution with variance p j. Let us be a little informal and simply say that Z j npj where random variable Z j N0, p j. We know that each Z j has distribution N0, p j but, unfortunately, this does not tell us what the distribution of the sum Zj 2 will be, because as we mentioned above r.v.s ν j are not independent and their correlation structure will play an important role. To compute the covariance between Z i and Z j let us first compute the covariance between ν i np i and npi npj

LECTURE 23. 9 which is equal to ν i np j npi npj n p i p j ν i ν j ν i np j ν j np i + n 2 p i p j n p i p j ν i ν j np i np j np j np i + n 2 p i p j n p i p j ν i ν j n 2 p i p j. To compute ν i ν j we will use the fact that one ball cannot be inside two different boxes simultaneously which means that Therefore, ν i ν j IX l B i IX l B j 0. 23. n n IX l B i IX l B j IX l B i IX l B j l,l l IX l B i IX l B j + ll } {{} IX l B i IX l B j l l this equals to 0 by 23. nn IX l B j IX l B j nn p i p j. Therefore, the covariance above is equal to n nn p i p j n 2 p i p j p i p j. p i p j To summarize, we showed that the random variable 2 j np j where random variables Z,..., Z n satisfy Zj 2. j Z 2 i p i and covariance Z i Z j p i p j. To prove the Theorem it remains to show that this covariance structure of the sequence of Z i s will imply that their sum of squares has distribution χ 2 r. To show this we will find a different representation for Zi 2. Let g,, g r be i.i.d. standard normal sequence. Consider two vectors g g,..., g r and p p,..., p r

LECTURE 23. 92 and consider a vector g g p p, where g p g p +...+g r pr is a scalar product of g and p. We will first prove that g g p p has the same joint distribution as Z,..., Z r. 23.2 To show this let us consider two coordinates of the vector g g p p : i th : g i g l pl pi and j th : g j g l pl pj and compute their covariance: g i g l pl pi g j p i pj p j pi + Similarly, it is easy to compute that g i g l pl pj n p l pi pj 2 p i p j + p i p j p i p j. 2 g l pl pi pi. This proves 23.2, which provides us with another way to formulate the convergence, namely, we have νj np 2 j i th coordinate 2 npj j where we consider the coorinates of the vector g g p p. But this vector has a simple geometric interpretation. Since vector p is a unit vector: i p 2 p i 2 p i, vector V p g p is the projection of vector g on the line along p and, therefore, vector V 2 g p g p will be the projection of g onto the plane orthogonal to p, as shown in figures 23.2 and 23.3. Let us consider a new orthonormal coordinate system with the last basis vector last axis equal to p. In this new coordinate system vector g will have coordinates g g,..., g r gv

LECTURE 23. 93 V p V2 θ g Figure 23.2: Projections of g. * p gr gr g 90 o * Rotation g2 V2 g2 g g Figure 23.3: Rotation of the coordinate system. obtained from g by orthogonal transformation V that maps canonical basis into this new basis. But we proved a few lectures ago that in that case g,..., g r will also be i.i.d. standard normal. From figure 23.3 it is obvious that vector V 2 g p g p in the new coordinate system has coordinates and, therefore, g,..., g r, 0 i th coordinate 2 g 2 +... + g r 2. i But this last sum, by definition, has χ 2 r distribution since g,, g r are i.i.d. standard normal. This finishes the proof of Theorem.