Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Size: px
Start display at page:

Download "Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at"

Transcription

1 Biometrika Trust Testing Multivariate Normality Author(s): D. R. Cox and N. J. H. Small Source: Biometrika, Vol. 65, No. 2 (Aug., 1978), pp Published by: Oxford University Press on behalf of Biometrika Trust Stable URL: Accessed: :49 UTC JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust, Oxford University Press are collaborating with JSTOR to digitize, preserve and extend access to Biometrika

2 Biometrika (1978), 65, 2, pp Printed in Great Britain Testing multivariate normality BY D. R. COX AND N. J. H. SMALL Department of Mathematics, Imperial College, London SUMMARY Previous work on testing multivariate normality is reviewed. Coordinate-dependent and invariant procedures are distinguished. The arguments for concentrating on tests of linearity of regression are indicated and such tests, both coordinate-dependent and invariant, are developed. Some key word8: Goodness of fit; Invariance; Multivariate normality; Nonlinearity; Probability plot; Trransformation; Tukey's degree of freedom. 1. INTRODUCTION There has been much recent work on testing univariate normality, stemming partly from work on weak convergence (Durbin, 1973) and partly from more empirical ideas. Unfortunately little of this work can be directly applied to testing multivariate normality. Even when v, the number of component variables, is only two immediate adaptation of univariate tests such as the chi-squared goodness of fit test is clumsy and if v is larger such tests are quite impracticable. Further, the absence of a simple yet general family of distributions extending the multivariate normal precludes the use of a likelihood ratio test; see, however, Barndorff-Nielsen (1977). Just as in other applications of significance tests, the practical purpose of the test must be considered. It is a central theme of the present paper that a main objective of tests of multivariate normality is to see whether an estimated covariance matrix provides an adequate summary of the interrelationships among a set of variables; most practical applications of multivariate analysis depend either upon a direct interpretation of one or more covariance matrices or upon some further analysis of such matrices. While in particular applications very specific kinds of departure from multivariate normality might be of concern, the departure with the most serious consequences is often the occurrence of appreciable nonlinearity of dependence. In its simplest form, the covariance of two random variables is even qualitatively a poor indication of their association if appreciable curvature is present. Nonnormality of marginal distribution, as such, does not have this consequence. Therefore for the great majority of this paper we consider tests of linearity of regression rather than directly of normality. There is a general distinction in multivariate analysis between procedures that are invariant under arbitrary nonsingular linear transformations of the v component variables and those that are dependent on the particular coordinate system used to record the data. Despite the great theoretical power and importance of invariance considerations in multivariate analysis, there are many practical situations where the particular choice of components is important, i.e. where effects are in some sense most usefully to be detected or interpreted in particular directions in the v-dimensional space of the variables. Therefore we give separate discussions of invariant and of coordinate-dependent techniques. The coordinate-dependent procedures are, however, all invariant under scale and location changes of the components.

3 264 D. R. COX AND N. J. H. SMALL 2. PREVIOUS WORK An excellent broad review of the assessment of multivariate distributional properties has been given recently by Gnanadesikan (1977, pp ) so that only a brief outline of previous work on tests of multivariate normality need be given here. A quite powerful coordinate-dependelnt approach is to consider parametric trainsformations coordinate by coordinate, e.g. of ys into YS(Ad = f(y8 1)/AS (As 0), -logys (AS = 0), for s = 1,..., v. Then it may be reasonable to assume that for some unknown A = (Al,..., AV) the transformed observations are multivariate normal with unknown mean and covariance matrix. The required A can be estimated by maximum likelihood and the null hypothesis As = 1 (s = 1,...,v) tested by a likelihood ratio test. In some applications in which the component variables are similar in kind it may be sensible to suppose that A1 =... = A,. This genera tion of the univariate technique of Box & Cox (1964) was probably first used in unpublished work by the late T. Burnaby; it was developed entirely independently and in more detail by Andrews, Gnanadesikan & Warner (1971, 1973). This approach is, of course, coordinatedependent, although Andrews et al. (1973) have considered the possibility of a preliminary rotation of coordinates before the consideration of transformations. This general approach has the advantage over most of the others to be mentioned that it gives an explicit suggestion of the analysis to be adopted if clear evidence against the null hypothesis of multivariate normality is found. Indeed, the only other general procedure based on an explicit alternative model is the fitting of a mixture of normal components, usually with different means and the same covariance matrix (Day, 1969). A widely useful invariant graphical procedure (Healy, 1968; Cox, 1968; Andrews et al., 1971) is based on the distribution of the ordered Mahalanobis distances of the individual points from their mean in the metric defined by the sample covariance matrix. Thus if Y1,..., Y. are n independent observations of a v-dimensional vector, Y their sample mean and S their estimated covariance matrix, we compute D? = (Y- y)t S-1(yi1) and plot the ordered D' against the expected order statistics for samples of size n from the chi-squared distribution with v degrees of freedom. It would be useful to have a significance test based on this procedure. Often, too, it will be informative to supplement the information about the distances of the individuals from the mean by some consideration of angular position (Gnanadesikan, 1977, pp ). Important tests of univariate normality are based on standardized third and fourth cumulants, these being of particular value because of their diagnostic power in indicating the qualitative nature of any departure from normality. One simple possibility for a coordinate-dependent multivariate procedure is to examine separately the marginal distribution of each component (Andrews et al., 1973). A conservative composite significance test can be obtained from the most significant of the individual component statistics by using a Bonferroni bound, i.e. by multiplying the most extreme significance level by v. Alternatively a more detailed analysis may be based on the estimated correlation matrix of the original variables; details are given in an unpublished paper by N. J. H. Small. An invariant procedure similar in spirit to that developed in? 4 of the present paper was given by Malkovich

4 Testing multivariate normality 265 & Afifi (1973) who considered as a possible statistic the supremum of, for example, the standardized skewness over all linear combinations a, yi a y,. They applied th notion to other univariate statistics. Mardia (1970) has obtained invariant combinations of the third- and fourth-order cumulants by examining those combinations that have maximum effect on the null hypothesis distribution of the Hotelling T2 statistic. Estimates of these invariant combinations are suggested for use as test statistics. Subsequently (Mardia, 1975) the relation between these and the distances D2 has been explored. 3. COORDINATE-DEPENDENT PROCEDURES 3*1. General In this section we consider tests for linearity of regression relationships which are coordinate-dependent. We deal with situations in which the v component variables are to be treated symmetrically. Of course, if there are available both response and explanatory variables, we shall normally condition on the observed values of the explanatory variables and be directly interested in distributional properties only as they concern the conditional distribution of the response variable given the explanatory variables. A complication in the discussion that follows is that when v is large a natural analysis leads to rather a large number of component statistics, and some way of simplifying this procedure will be necessary; see? 33. 3*2. Two component variables While the bivariate case, v = 2, is not of great practical interest, it is worthwhile beginning with a discussion of it. Let the observations be (Yil, Yi2) (i = 1,..., n), regarded as n independent observations. The null hypothesis is that these correspond to independent and identically distributed random variables (Y1, Y2) with a bivariate normal distribution. A simple test of the linearity of the regression of Y2 on Y1 is provided by Q2,1, the standar Student t statistic for the significance of the regression coefficient of Y2 on Y2 in a univariate linear model in which Y2 is regressed on Y1 and Y2. In special circumstances nonlinear functions other than Y2, for example 1/Y1, could be used, or an F statistic could be calculated for regression on a set of such functions. To treat the component variables symmetrically, we take with Q2,1 the statistic Q1,2* The joint distribution of (Q2,1, Q12) is complicated, even though the individual distribut of Q2,1 and Q1,2 are simple, so that to work with a test depending on both components consider the asymptotic distribution. In fact under the null hypothesis (Q2,1 Q1,2) is asym tically bivariate normal with zero mean and unit variance. It remains therefore to find the asymptotic correlation coefficient of Q2,1 and Q1,2. For this we may ignore the denominators of the Student t statistics and examiine the correlation between the random variables T21 = SYi2{(Yil - -(Yil - Y1) m30/m20 -M20} T12 = 1Yil{(yi2 -. 2)2 -(Yi2 - F.2) Mr03/MO2 -M02 where, for example, Y1 = XYil/n, mro = - (Yi- 1)r/n. We can without loss of generality take the random variables (Yil, Yi2) as bivariate normal of zero mean, unit variances and correlation coefficient p. Then Yi2 = PYi1 + '7Zi, where (Yil, Zi) are independently standard normal, so that _T2 = 1-p2. It follows that T21 = 52Z7(Y2-1) +Op(l)

5 266 D. R. COX AND N. J. H. SMALL and that therefore, cov (T21, T12>nE{(Y2-pY1) ( Y-1) (Y 1-pY2) (1Y- 1 )} = 2np(2-p2) (2-3p2), on evaluating the relevant moments. Thus asymptotically corr (Q2,1, Q1,2) = p(2-3p2), (1) where p = corr (Yl, Y2). This is consistently estimated on replacing p by r12' the sample correlation coefficient of Y1 and Y2. Thus, if required, we can form a composite test statistic either as or as the quadratic form max (IQ2,1 1, IQ1,2 1) (2) [1 r2(2-3rl2)]-4 Q2]( [7122-3r2 1 Lia The statistic (2) can, for large samples, be tested for significance from tables of the bivariate normal distribution and the statistic (3) by the chi-squared distribution with two degrees of freedom. Note from (1) that if p is small Q2,1 and Q1,2 have a correlation with the same sign as p whereas if p is large the correlations have opposite signs; this last fact has a simple geometrical interpretation. If information is available from several independent samples, all concerning the same two variables, a composite statistic can be formed in various ways More than two component variables The ideas of? 3*2 can be generalized to v component variables in several ways. Among these methods are the following. (a) We may regress each Y, linearly on all other Y. and on Y,2 and thus obtain a Student t statistic Qtt) (t $ r) for the quadratic contribution. There are v(v -1) such statistics, and they can be regarded as forming a v x v array with empty main diagonal. (b) The statistics considered in (a) can be supplemented by a further set of statistics Q(v)" (r $ t + u), examining the regression of Y1 on Yi Yu, adjusting for the linear terms as b This gives a further iv(v -1) (v -2) statistics, and so in all Jv2(v - 1) statistics. (c) Approaches (a) and (b) could be applied to marginal dependencies, regressing Yr on y2 and Y., omitting all other variables, i.e. obtaining the statistics Q(2) of? 3-2. More generally suitable v'-dimensional definitions (v' < v) could be examined. (d) Instead of isolating single degrees of freedom, we may combine the contributions by forming in the standard way an F statistic for, in case (a), fitting all Y2 (s + r) in regressing Y, on all Y, and all Y2. (e) We may use Tukey's degree of freedom for nonadditivity (Tukey, 1949) to obtain one degree of freedom from each variable in turn. If Yr is the dependent variable, let Ir be the fitted value arising from linear regression on the remaining variables; then the degree of freedom gives the Student t statistic associated with including 2 in the model, in addition to all linear terms. While this procedure has, especially for large v, the advantage of limiting the number of subsidiary statistics to be examined, empirical experience suggests that the dangers of overlooking major effects are too great for the procedure to be safely recommended, at least on its own. In all these methods nonlinear functions such as reciprocals could be used instead of squares. The methods give rise to a set of test statistics, in general correlated. If there are an appreciable ilurnber of these they can be plotted against an appropriate probability scale,

6 Testing multivariate normality 267 often the standard normal; it is known that moderate correlation between the values has little effect on the linearity of the plots. The plots can be replaced by or augmented by approximate significance tests and we discuss these briefly below. The advantage of procedures that are based on single degrees of freedom is that they give more information for detailed diagnosis if evidence of a departure from the null hypothesis of multivariate normality is found. This suggests that for v not exceeding about 10, one of (a)-(c) should be used. For larger values of v, either the variates should be split into meaningful subsections, or (d), or, conceivably for very large v, (e), applied. For the remaining discussion we concentrate on (a) and (c). In either approach, the natural graphical method is to plot the ordered Q's against the expected order statistics in samples of size m from the standard normal distribution, where m is the number of Q statistics to be plotted. It is assumed that the sample size from which the statistics are computed is such that the Student t distribution can be treated as effectively normal; if not a nonlinear transformation to marginal normality could be applied. For interpretation it is essential that at least the more extreme points in the plot should be labelled with the two defining suffixes. Note that the signs of the Q's are meaningful, provided that the signs of the original variables are, so that a normal plot is appropriate, rather than a half-normal plot of absolute values. For more detailed numerical interpretation of the Q's, it is natural to consider them as a square array and to examine row and column sums Q(v), Q() or Q(2), Q(2), or sums of squares S(v)- {Q(v)1}2 S(v) = {Q(v)}2. (4) Sr(,. s r r,8 '.,8 r-= r,s. It can be shown that approximately the statistics (4) have means v -1 and variances 2(v-1) {1 + 2(v-2)/n}. It is hard to give a firm discussion of the relative merits of the statistics Q(M) and Q(2). Computational simplicity to some extent favours Q(2) and this will also have advantages if v is comparable with n and the variables are almost independent, for then the fitting of linear regressions will effectively induce 'noise' masking the effects under study. On the other hand, if strong roughly linear relationships are known to be present, it seems sensible to eliminate them and hence to use the statistics Q(V). If n is large compared with v the simplest general procedure is to use the Q(v). If information from several samples is combined it will usually be best to take the combined Q(r) as a weighted sum of the separate statistics, weighting by the sample size. 4. INVARIANT PROCEDURES 4'1. General idea The procedures of? 3 are coordinate-dependent. They in effect look for nonlinearities associated particularly with the variables Y1,..., Yv; of course an initial transformation of th original data might be made. To obtain an invariant procedure examining nonlinearity the most direct approach is to find that pair of variables, linear combinations of the original variables, such that one has maximum curvature in its regression on the other. The amount of curvature so achieved is the test statistic, and the form of the two maximizing variables will, hopefully, be a useful diagnostic tool Development of directions of maximum curvature: Population theory In the following discussion we can work either with samples and sample moments, or with random variables and corresponding population moments, which is what is done here.

7 268 D. R. COX AND N. J. H. SMALL Suppose that the variable Y = (Y,...,Y,)T is standardized so that its components have mean zero and let their covariance matrix be z = ((a0)). For the higher moments, write for r,s,t,u = 1,...,v, E(YY8Yt)=1(r,8,t), E(YrY8YtYu) = p,(r,8,t,u). Consider X = aty and W = bty with at a = btyeb = 1, so that X and W have zero mean and unit variance. Let y = Yxw denote the least squares regression coefficient of X on W2, adjusting for linear regression on W. This is found most simply by considering the orthogonalized form, X = 3W+y{W2-WE(W3)- 1}+E, (5) where e is an error term uncorrelated with W and W2, so that E(XW2)-E(W3) E(X W) (6 Yxw E(W4)-1-{E(W3)}2 One population measure of the quadratic contribution to regression is qxw = YxwI[E( W4) - 1- {E( W3)}2]i. (7) An interpretation of 71xw is as the proportion of the total unit var the quadratic component in the least squares regression of X on W and W2. We can express Yxw and "7xw in terms of a and b. For fixed b we wish to maximize the numerator of Yxw that is, to maximize Z(a, b) = la, b8 bt p,(r, s, t) - {IbA bt bp /l(r, s, t)} (2ar b8 are) (8) subject to Zar a,,a, = b b rra = 1. Consider ;(a, b) - AarYa,, a,, where A is a Lagrang multiplier, and differentiate with respect to a. to give for u = 1,..., v at a stationary poi Zb2 btb4 (U, 8, t) -(Ybt car) {brb, bt bb(r, s, t)}-azatant = 0. (9) Multiplication by bu followed by summation over u gives AZat bu aut = 0, and multiplication by au and summation gives 4(a, b) - A = 0. Because it is clear that the maximized {(., b) is nonzero, unless all ti(r, s, t) are zero, it follows that 2at buc a = 0, that is that the associated X and W are uncorrelated. Further, au = {Yb- b8 p(r, s, t) ucu - bu Ebr b8 b,t,(r, s, t)}/l(a, b), (1 where ((aij)) = E-1, which is assumed to exist. Hence 712(b), the supremum of -q2(a, b) over a for fixed b, is 2(b) - Zbr bs bt bu,u(r, s, p),(t, u, q) ap - {br b8 bt p(r, s, t)}2 (11) Zbr bs bt bu p(r, s, t, u)-1-{ br bsbt,u(r, 8, t)}2 The required directions for maximum curvature are obtained by maximizing this expression subject to 2br b8 a,8 = 1; except possibly in extremely special cases, this maximization has to be done numerically. The value of b is obtained directly and that of a by substitution in (10) Computation of the maximum curvature We shall continue with the notation above, although thinking rather more of using sample moments and of calculating the maximum of 7q2(b) for use as a test statistic. To avoid eomputational instability arising from gross differences in scale, the variates should be standardized to have unit variance as well as zero mean. Although the constraint bt Lb = 1 is important for the magnitude of the curvature, Yxw' it is irrelevant for -x and if no use is made in?4.2 of the relation 2brb8a- = 1, then we

8 Testing multivariate normality 269 ultimately obtain the form, homogeneous in b, 2(b)= (2br bs &rs) {br bs be bu,u(r, s, p) j2(t, u, q) &p2}- {Xb, b8 bi j2(r, s, t)}2 A2(b) = r_.9 ars r I-L P 8 (2br b8 are) {Zbr b8 be bu 2(r, s, t, u)}- (2b bs a )3- {br b8 bt u(r, s, t)j2 where a circumflex denotes a sample value. This expression is now to be maximized without constraint on b. In nearly normal cases some simplification can be achieved by giving the denominator of A2(b) its normal theory value of 2(b b8 s)3 and then concentrating on the maximization of the numerator. Given one or more starting values bo of b, the maximization of 712(b) can be carried out the use of a 'hill-climbing' algorithm. Suitable bo may be selected from evaluations of 71 for a sequence of b values defined by the intersections in a grid of lines of 'latitude and longitude' on a half-surface of a v-dimensional sphere, noting that 2(b) = A12(- b). Such a grid may be formed by making uniform divisions of the angles in a system of spherical polar coordinates; the resulting points are spread fairly uniformly over the surface of the hypersphere. If each angular coordinate is divided into m parts (m > 2) then there are {(m - 1)v- -}/(m-2) points; this increases very rapidly with v, even if a coarse m, is employed, in which case the probability that the selected bo lead to the g rather than merely to local maxima, is reduced. Also, the effort in evaluating 712(b) is roughly proportional to v4, assuming that an array ' (r, s, t, u) =Y(r, s, p) j2(t, u, q) &a' is used. These two facts combine to make v = 6 about the limit for computational feasibility. Larger numbers of variables could, for example, be dealt with by dividing them into subsets of size 6 or less. For a test statistic we concentrate on the global maximum of 2(b). For interpretation, however, in a nonnull case, it may well be useful to know the several roughly equal local maxima The null hypothesis distribution Denote the maximum of 2(b) by 2 ax To apply a significance test based on 2 X we need to know at least approximately its distribution under the null hypothesis of multivariate normality. Clearly this distribution depends only on v and n. Analytical study of the distribution seems not to be feasible. Simulation shows that approximately for n > 50, v < 6, loga2.ax is normally distributed with mean log {(5v2)/(8n)} and standard deviation 0 90 (v = (v = 3), 0-38 (v = 4), 0-31 (v = 5), 0-17 (v = 6). It would be good to have some qualitative explanation both of the log normal shape and of the form of the mean and standard deviation. The nature of the dependence on n may be accounted for in general terms by the following argument. For fixed a and b, q (a, b) has a distribution with asymptotic mean and variance 3(v - 1)/(2n) and 9(v - 1)/(2n2) respectively, under the null hypothesis. We now consider fitting this with the log normal distribution, corresponding to the distribution N(p,, a2), which has mean eis+il2 and variance e2(+-2) (e2-1). Upon equating moments, we obtain,u = log {(v-1)2(v+ 1)-ln-1}, a2 = log {(v + l)/(v- 1)}. Of course 77na > 9(a, b), so that in fitting a log normal distribution to qm..ax the equalitie above could only be maintained by the introduction of constants that were functions of n and p. However, the asymptotic dependence on n should remain unaltered, and hence in making the transition from 9(a, b) to 77 ax the similarity, as functions of n, between th and a above and those obtained empirically, should be preserved.

9 2770 D. R. COX AND N. J. H. SMALL Note that quite apart from its use as a test statistic 7nax has a direct numerical interpretation as a maximal proportion of variance accounted for by quadratic regression. 5. INTERPRETATION While the procedures of?? 3 and 4 have been described in the first place as tests of significance, if evidence of nonlinearity is found some interpretation has always to be attempted. In the absence of a simple widely applicable alternative family of distributions, no general rules can be given, but the following comments may be helpful. Inspection of scatter diagrams will always be required for interpretation. For the procedure of? 4 a first step will be to examine the plot for the derived variables for which -q2 is maxima For the coordinate dependent approach of? 3, pairs of original variates may be plotted, or alternatively residuals of YT and Y2 from their linear regression on the remaining variables, when Q(11) appears interestingly large. In clearcut cases the nonlinearity will arise either from a small number of aberrant points, which will then need special consideration, or from a consistent curvature. If the curvature arises in connection with only one or two component variables, it may be sensible to treat the remainder as multivariate normal and to describe separately the dependence of the anomalous variables on the remainder. Consistent patterns of signis in the curvature may indicate the general nature of appropriate transformations. It would in principle be possible to develop techniques corresponding to those of?? 3 and 4 but using some form of robust regression rather than least squares regression. We have not investigated this. 6. AN EXAMPLE To illustrate the way in which the results might be applied, we now give in brief outline an analysis of some data circulated some years ago by Dr P. D. P. Wood, Milk Marketing Board, to the Multivariate Study Group of the Royal Statistical Society. The data comprised 8 measurements on the pelvis of each of 90 Friesian cows. The upper estimates in Table 1 give the first four estimated moments of the marginal distributions. For samples of size 90 from a univariate normal distribution, the lower and upper 5% points for g, are roughly -0-6 and 0-6, and for g and 1-47, respectivel A frequency plot for variable 7 showed an observation at 292 mm, the range for the other animals being 150 to 223 mm. All observations on this extreme animal were omitted in the subsequent analyses. In particular this omission reduced Y1 and g2 for variable 7 to and Study of the marginal distributions showed no other obvious outliers, although the first variable was markedly bimodal. Note that there is no evidence of systematic skewness; a log transformation was therefore not applied. Table 1. Pelvic measurements on cows. First four marginal moments. Upper values, all 90 cows. Lower values, selected 84 cows Mean (mm) X6 189X0 189X2 St. dev. (mm) X X63 1X I03-0*38 0*66 0_35 0A

10 Testing multivariate normality 271 The next step was to use the coordinate-dependent methods of? 3 3. Method (e), using Tukey's degree of freedom for nonadditivity, was tried; the largest Student t statistic out of 8 is 2-58, for the regression of variable 2 on the square of the fitted value of its linear regression on the other variables, but such a value is not markedly extreme. The array of t statistics generated by method (a) contained a number of abnormally large values, mostly but not entirely connected with variable 2. Various scatter plots showed that there were 5 animals which in the space of the first 5 variables form an outlying group not on the main linear regression and therefore inducing curvature. They, too, were omitted for separate interpretation and method (a) of? 3.3 reapplied to the remaining 84 individuals. Table 2 shows the resulting Student t statistics and their marginal sums of squares: see (4). The ranked Student t statistics can be plotted against expected normal order statistics. There is nothing untoward. The invariant procedure of? 4 was then applied to these remaining 84 individuals, taking 6 variables at a time for computational reasons. The largest value of a2. obtained was 0-34 which is not only well short of statistical significance but corresponds to only a modest degree of curvature. Table 2. Pelvic measurements. Curvature analysis for 84 selected cows Sum of Dependent Squared variable squares variable for row X29 1'63 0X89 12X X85-0X X F Sum ofsquaresfor column The marginal moments for the 84 individuals are recorded in the lower values of Table 1; there is some bimodality in variable 1. To summarize, one outlier and five anomalous individuals have been detected. The remaining 84 individuals, while showing some evidence of marginal nonnormality, show no evidence of nonlinearity, and interpretation of the interrelationships among the 8 variables via their covariance matrix seems in order. No doubt these conclusions could be reached via other routes. We are grateful to Dr P. D. P. Wood for permission to use the data analysed in? 6. N. J. H. Small's work was supported by the Science Research Council. REFERENCES ANDREWS, D. F., GNANADESIKAN, R. & WARNER, J. L. (1971). Transformati Biometrics 27, ANDREWS, D. F., GNANADESIKAN, R. & WARNER, J. L. (1973). Methods for assessing multivariate normality. In Multivariate Analysis, Vol. 3, Ed. P. R. Krishnaiah, pp New York: Academic Press. BARNDORFF-NIELSEN, 0. (1977). Discussion of paper by D. R. Cox. Scand. J. Statist. 4, Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations. J. R. Statist. Soc. B 26, Cox, D. R. (1968). Notes on some aspects of regression analysis. J. R. Statist. Soc. A 131, DAY, N. R. (1969). Divisive cluster analysis and a test for multivariate normality. Bull. I.S.I. 43, 2

11 272 D. R. COX AND N. J. H. SMALL DUJRBIN, J. (1973). Distribution Theoryfor Tests Based on the Sample Distribution Function. Philadelphia: Society for Industrial and Applied Mathematics. GNANADESIKAN, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations. New York: Wiley. HEALY, M. J. R. (1968). Multivariate normal plotting. Appl. Statist. 17, MALKOVICH, J. F. & AFIFI, A. A. (1973). On tests for multivariate normality. J. Am. Statist. Assoc. 68, MARDIA, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, MARDIA, K. V. (1975). Assessment of multinormality and the robustness of Hotelling's T2 test. Appi. Statist. 24, TUKEY, J. W. (1949). One degree of freedom for non-additivity. Biometrics 5, [Received November Revised February 1978]

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Robust Regression via Discriminant Analysis Author(s): A. C. Atkinson and D. R. Cox Source: Biometrika, Vol. 64, No. 1 (Apr., 1977), pp. 15-19 Published by: Oxford University Press on

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at American Society for Quality A Note on the Graphical Analysis of Multidimensional Contingency Tables Author(s): D. R. Cox and Elizabeth Lauh Source: Technometrics, Vol. 9, No. 3 (Aug., 1967), pp. 481-488

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at A Note on the Efficiency of Least-Squares Estimates Author(s): D. R. Cox and D. V. Hinkley Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 30, No. 2 (1968), pp. 284-289

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Some Remarks on Overdispersion Author(s): D. R. Cox Source: Biometrika, Vol. 70, No. 1 (Apr., 1983), pp. 269-274 Published by: Oxford University Press on behalf of Biometrika Trust Stable

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Regression Analysis when there is Prior Information about Supplementary Variables Author(s): D. R. Cox Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 22, No. 1 (1960),

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at On the Estimation of the Intensity Function of a Stationary Point Process Author(s): D. R. Cox Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 27, No. 2 (1965), pp. 332-337

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at The Analysis of Multivariate Binary Data Author(s): D. R. Cox Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 21, No. 2 (1972), pp. 113-120 Published by: Wiley for

More information

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika. Biometrika Trust An Improved Bonferroni Procedure for Multiple Tests of Significance Author(s): R. J. Simes Source: Biometrika, Vol. 73, No. 3 (Dec., 1986), pp. 751-754 Published by: Biometrika Trust Stable

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Some Simple Approximate Tests for Poisson Variates Author(s): D. R. Cox Source: Biometrika, Vol. 40, No. 3/4 (Dec., 1953), pp. 354-360 Published by: Oxford University Press on behalf of

More information

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika. Biometrika Trust A Stagewise Rejective Multiple Test Procedure Based on a Modified Bonferroni Test Author(s): G. Hommel Source: Biometrika, Vol. 75, No. 2 (Jun., 1988), pp. 383-386 Published by: Biometrika

More information

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. On the Probability of Covering the Circle by Rom Arcs Author(s): F. W. Huffer L. A. Shepp Source: Journal of Applied Probability, Vol. 24, No. 2 (Jun., 1987), pp. 422-429 Published by: Applied Probability

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

Hypothesis Testing for Var-Cov Components

Hypothesis Testing for Var-Cov Components Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output

More information

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. On Runs of Residues Author(s): D. H. Lehmer and Emma Lehmer Source: Proceedings of the American Mathematical Society, Vol. 13, No. 1 (Feb., 1962), pp. 102-106 Published by: American Mathematical Society

More information

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika. Biometrika Trust Discrete Sequential Boundaries for Clinical Trials Author(s): K. K. Gordon Lan and David L. DeMets Reviewed work(s): Source: Biometrika, Vol. 70, No. 3 (Dec., 1983), pp. 659-663 Published

More information

Mind Association. Oxford University Press and Mind Association are collaborating with JSTOR to digitize, preserve and extend access to Mind.

Mind Association. Oxford University Press and Mind Association are collaborating with JSTOR to digitize, preserve and extend access to Mind. Mind Association Response to Colyvan Author(s): Joseph Melia Source: Mind, New Series, Vol. 111, No. 441 (Jan., 2002), pp. 75-79 Published by: Oxford University Press on behalf of the Mind Association

More information

The Robustness of the Multivariate EWMA Control Chart

The Robustness of the Multivariate EWMA Control Chart The Robustness of the Multivariate EWMA Control Chart Zachary G. Stoumbos, Rutgers University, and Joe H. Sullivan, Mississippi State University Joe H. Sullivan, MSU, MS 39762 Key Words: Elliptically symmetric,

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Some Applications of Exponential Ordered Scores Author(s): D. R. Cox Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 26, No. 1 (1964), pp. 103-110 Published by: Wiley

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at The Interpretation of Interaction in Contingency Tables Author(s): E. H. Simpson Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 13, No. 2 (1951), pp. 238-241 Published

More information

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. 400: A Method for Combining Non-Independent, One-Sided Tests of Significance Author(s): Morton B. Brown Reviewed work(s): Source: Biometrics, Vol. 31, No. 4 (Dec., 1975), pp. 987-992 Published by: International

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at A Look at Some Data on the Old Faithful Geyser Author(s): A. Azzalini and A. W. Bowman Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 39, No. 3 (1990), pp. 357-365

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis CHAPTER 8 MODEL DIAGNOSTICS We have now discussed methods for specifying models and for efficiently estimating the parameters in those models. Model diagnostics, or model criticism, is concerned with testing

More information

1 Introduction Suppose we have multivariate data y 1 ; y 2 ; : : : ; y n consisting of n points in p dimensions. In this paper we propose a test stati

1 Introduction Suppose we have multivariate data y 1 ; y 2 ; : : : ; y n consisting of n points in p dimensions. In this paper we propose a test stati A Test for Multivariate Structure Fred W. Huer Florida State University Cheolyong Park Keimyung University Abstract We present a test for detecting `multivariate structure' in data sets. This procedure

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Queues with Time-Dependent Arrival Rates: II. The Maximum Queue and the Return to Equilibrium Author(s): G. F. Newell Source: Journal of Applied Probability, Vol. 5, No. 3 (Dec., 1968), pp. 579-590 Published

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Detection of Influential Observation in Linear Regression. R. Dennis Cook. Technometrics, Vol. 19, No. 1. (Feb., 1977), pp

Detection of Influential Observation in Linear Regression. R. Dennis Cook. Technometrics, Vol. 19, No. 1. (Feb., 1977), pp Detection of Influential Observation in Linear Regression R. Dennis Cook Technometrics, Vol. 19, No. 1. (Feb., 1977), pp. 15-18. Stable URL: http://links.jstor.org/sici?sici=0040-1706%28197702%2919%3a1%3c15%3adoioil%3e2.0.co%3b2-8

More information

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] 1 Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] Insights: Price movements in one market can spread easily and instantly to another market [economic globalization and internet

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

The Periodogram and its Optical Analogy.

The Periodogram and its Optical Analogy. The Periodogram and Its Optical Analogy Author(s): Arthur Schuster Reviewed work(s): Source: Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character,

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Monitoring Random Start Forward Searches for Multivariate Data

Monitoring Random Start Forward Searches for Multivariate Data Monitoring Random Start Forward Searches for Multivariate Data Anthony C. Atkinson 1, Marco Riani 2, and Andrea Cerioli 2 1 Department of Statistics, London School of Economics London WC2A 2AE, UK, a.c.atkinson@lse.ac.uk

More information

Assessing Multivariate Normality using Normalized Hermite Moments

Assessing Multivariate Normality using Normalized Hermite Moments BIWI-TR-5 May 999 Assessing Multivariate Normality using Normalized Hermite Moments Christian Stoecklin, Christian Brechbühler and Gábor Székely Swiss Federal Institute of Technology, ETH Zentrum Communication

More information

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris. /pgf/stepx/.initial=1cm, /pgf/stepy/.initial=1cm, /pgf/step/.code=1/pgf/stepx/.expanded=- 10.95415pt,/pgf/stepy/.expanded=- 10.95415pt, /pgf/step/.value required /pgf/images/width/.estore in= /pgf/images/height/.estore

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

1 Introduction Suppose we have multivariate data y 1 ; y 2 ; : : : ; y n consisting of n points in p dimensions. In this paper we propose a test stati

1 Introduction Suppose we have multivariate data y 1 ; y 2 ; : : : ; y n consisting of n points in p dimensions. In this paper we propose a test stati A Test for Multivariate Structure Fred W. Huer Florida State University Cheolyong Park Keimyung University Abstract We present a test for detecting `multivariate structure' in data sets. This procedure

More information

CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3

CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3 CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3 N. Reid Department of Statistics University of Toronto Toronto,

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS REVSTAT Statistical Journal Volume 3, Number 1, June 2005, 1 18 ROBUSTNESS OF TWO-PHASE REGRESSION TESTS Authors: Carlos A.R. Diniz Departamento de Estatística, Universidade Federal de São Carlos, São

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. On the Bound for a Pair of Consecutive Quartic Residues of a Prime Author(s): R. G. Bierstedt and W. H. Mills Source: Proceedings of the American Mathematical Society, Vol. 14, No. 4 (Aug., 1963), pp.

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Testing the homogeneity of variances in a two-way classification

Testing the homogeneity of variances in a two-way classification Biomelrika (1982), 69, 2, pp. 411-6 411 Printed in Ortal Britain Testing the homogeneity of variances in a two-way classification BY G. K. SHUKLA Department of Mathematics, Indian Institute of Technology,

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Analysis of Variability with Large Numbers of Small Samples Author(s): D. R. Cox and P. J. Solomon Source: Biometrika, Vol. 73, No. 3 (Dec., 1986), pp. 543-554 Published by: Oxford University

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION FACTOR ANALYSIS AS MATRIX DECOMPOSITION JAN DE LEEUW ABSTRACT. Meet the abstract. This is the abstract. 1. INTRODUCTION Suppose we have n measurements on each of taking m variables. Collect these measurements

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

EXERCISE SET 5.1. = (kx + kx + k, ky + ky + k ) = (kx + kx + 1, ky + ky + 1) = ((k + )x + 1, (k + )y + 1)

EXERCISE SET 5.1. = (kx + kx + k, ky + ky + k ) = (kx + kx + 1, ky + ky + 1) = ((k + )x + 1, (k + )y + 1) EXERCISE SET 5. 6. The pair (, 2) is in the set but the pair ( )(, 2) = (, 2) is not because the first component is negative; hence Axiom 6 fails. Axiom 5 also fails. 8. Axioms, 2, 3, 6, 9, and are easily

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Mathematics of Operations Research.

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Mathematics of Operations Research. New Finite Pivoting Rules for the Simplex Method Author(s): Robert G. Bland Reviewed work(s): Source: Mathematics of Operations Research, Vol. 2, No. 2 (May, 1977), pp. 103-107 Published by: INFORMS Stable

More information

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. Uncountably Many Inequivalent Analytic Actions of a Compact Group on Rn Author(s): R. S. Palais and R. W. Richardson, Jr. Source: Proceedings of the American Mathematical Society, Vol. 14, No. 3 (Jun.,

More information

Financial Econometrics

Financial Econometrics Financial Econometrics Nonlinear time series analysis Gerald P. Dwyer Trinity College, Dublin January 2016 Outline 1 Nonlinearity Does nonlinearity matter? Nonlinear models Tests for nonlinearity Forecasting

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Gaussian processes. Basic Properties VAG002-

Gaussian processes. Basic Properties VAG002- Gaussian processes The class of Gaussian processes is one of the most widely used families of stochastic processes for modeling dependent data observed over time, or space, or time and space. The popularity

More information

Testing for Anomalous Periods in Time Series Data. Graham Elliott

Testing for Anomalous Periods in Time Series Data. Graham Elliott Testing for Anomalous Periods in Time Series Data Graham Elliott 1 Introduction The Motivating Problem There are reasons to expect that for a time series model that an anomalous period might occur where

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

COMMON CORE STATE STANDARDS TO BOOK CORRELATION

COMMON CORE STATE STANDARDS TO BOOK CORRELATION COMMON CORE STATE STANDARDS TO BOOK CORRELATION Conceptual Category: Number and Quantity Domain: The Real Number System After a standard is introduced, it is revisited many times in subsequent activities,

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Cointegration Lecture I: Introduction

Cointegration Lecture I: Introduction 1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Eighth Grade Algebra I Mathematics

Eighth Grade Algebra I Mathematics Description The Appleton Area School District middle school mathematics program provides students opportunities to develop mathematical skills in thinking and applying problem-solving strategies. The framework

More information

Mathematics of Operations Research, Vol. 2, No. 2. (May, 1977), pp

Mathematics of Operations Research, Vol. 2, No. 2. (May, 1977), pp New Finite Pivoting Rules for the Simplex Method Robert G. Bland Mathematics of Operations Research, Vol. 2, No. 2. (May, 1977), pp. 103-107. Stable URL: http://links.jstor.org/sici?sici=0364-765x%28197705%292%3a2%3c103%3anfprft%3e2.0.co%3b2-t

More information

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC A Simple Approach to Inference in Random Coefficient Models March 8, 1988 Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC 27695-8203 Key Words

More information

Testing Structural Equation Models: The Effect of Kurtosis

Testing Structural Equation Models: The Effect of Kurtosis Testing Structural Equation Models: The Effect of Kurtosis Tron Foss, Karl G Jöreskog & Ulf H Olsson Norwegian School of Management October 18, 2006 Abstract Various chi-square statistics are used for

More information

Fitting Linear Statistical Models to Data by Least Squares: Introduction

Fitting Linear Statistical Models to Data by Least Squares: Introduction Fitting Linear Statistical Models to Data by Least Squares: Introduction Radu Balan, Brian R. Hunt and C. David Levermore University of Maryland, College Park University of Maryland, College Park, MD Math

More information

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

ANALYSIS OF VARIANCE AND QUADRATIC FORMS 4 ANALYSIS OF VARIANCE AND QUADRATIC FORMS The previous chapter developed the regression results involving linear functions of the dependent variable, β, Ŷ, and e. All were shown to be normally distributed

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

More information

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression Simulating Properties of the Likelihood Ratio est for a Unit Root in an Explosive Second Order Autoregression Bent Nielsen Nuffield College, University of Oxford J James Reade St Cross College, University

More information

Tennessee s State Mathematics Standards - Algebra I

Tennessee s State Mathematics Standards - Algebra I Domain Cluster Standards Scope and Clarifications Number and Quantity Quantities The Real (N Q) Number System (N-RN) Use properties of rational and irrational numbers Reason quantitatively and use units

More information

Page 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03

Page 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03 Page 5 Lecture : Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 008/10/0 Date Given: 008/10/0 Inner Product Spaces: Definitions Section. Mathematical Preliminaries: Inner

More information

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information