Bootstrapping Analogs of the One Way MANOVA Test

Size: px
Start display at page:

Download "Bootstrapping Analogs of the One Way MANOVA Test"

Transcription

1 Bootstrapping Analogs of the One Way MANOVA Test Hasthika S Rupasinghe Arachchige Don and David J Olive Southern Illinois University July 17, 2017 Abstract The classical one way MANOVA model is used to test whether the mean measurements are the same or differ across p groups, and assumes that each group has the same population covariance matrix This paper suggests using the Olive (2017abc) bootstrap technique to develop analogs of the one way MANOVA test The new tests can have some outlier resistance, and the tests do not need the population covariance matrices to be equal KEY WORDS: Behrens Fisher problem, bootstrap, prediction region, coordinatewise median David J Olive is Professor, Hasthika S Rupasinghe Arachchige Don is PhD student, Department of Mathematics, Southern Illinois University, Carbondale, IL 62901, USA 1

2 1 INTRODUCTION The multivariate linear model y i = B T x i + ɛ i for i = 1,, n has m 2 response variables Y 1,, Y m and p predictor variables x 1, x 2,, x p The ith case is (x T i, y T i ) = (x i1, x i2,, x ip, Y i1,, Y im ) The model is written in matrix form as Z = XB + E where the matrices are defined below The model has E(ɛ k ) = 0 and Cov(ɛ k ) = Σɛ = (σ ij ) for k = 1,, n Then the p m coefficient matrix B = [ ] β 1 β 2 β m and the m m covariance matrix Σ ɛ are to be estimated, and E(Z) = XB while E(Y ij ) = x T i β j The ɛ i are assumed to be independent and identically distributed (iid) The univariate linear model corresponds to m = 1 response variable, and is written in matrix form as Y = Xβ + e Subscripts are needed for the m univariate linear models Y j = Xβ j + e j for j = 1,, m where E(e j ) = 0 For the multivariate linear model, Cov(e i, e j ) = σ ij I n for i, j = 1,, m where I n is the n n identity matrix The n m matrix Z = [ Y 1 Y 2 Y m ] = The n p design matrix X of predictor variables is not necessarily of full rank p, and where often v 1 = 1 The p m matrix X = [ v 1 v 2 v p ] = B = [ β 1 β 2 β m ] y T 1 y T n x T 1 x T n The n m matrix E = [ e 1 e 2 e m ] = ɛ T 1 ɛ T n Considering the ith row of Z, X, and E shows that y T i = x T i B + ɛt i The multivariate linear regression model and one way MANOVA model are special cases of the multivariate linear model, but using double subscripts will be useful for describing the one way MANOVA model Suppose there are independent random samples of size n i from p different populations (treatments), or n i cases are randomly assigned to p treatment groups where n = p i=1 n i Assume that m response variables y ij = (Y ij1,, Y ijm ) T are measured for the ith treatment group and the jth case (often an individual or thing) in the group Hence i = 1,, p and j = 1,, n i The Y ijk follow different one way ANOVA models for k = 1,, m Assume E(y ij ) = µ i and Cov(y ij ) = 2

3 Σɛ Hence the p treatments have different mean vectors µ i, but common covariance matrix Σɛ The one way MANOVA test is used to test H 0 : µ 1 = µ 2 = = µ p Often µ i = µ + τ i, so H 0 becomes H 0 : τ 1 = = τ p If m = 1, the one way MANOVA model is the one way ANOVA model MANOVA is useful since it takes into account the correlations between the m response variables The Hotelling s T 2 test that uses a common covariance matrix is a special case of the one way MANOVA model with p = 2 Let µ i = µ + τ i where p i=1 n i τ i = 0 The jth case from the ith population or treatment group is y ij = µ + τ j + ɛ ij where ɛ ij is an error vector, i = 1,, p and j = 1,, n i Let y = ˆµ = p ni i=1 j=1 y ij /n be the overall mean Let y i = n i j=1 y ij /n i = ˆµ i so ˆτ i = y i y Let the residual vector ˆɛ ij = y ij y i = y ij ˆµ ˆτ i Then y ij = y + (y i y) + (y ij y i ) = ˆµ + ˆτ i + ˆɛ ij = ˆµ i + ˆɛ ij Several m m matrices will be useful Let S i be the sample covariance matrix corresponding to the ith treatment group Then the within sum of squares and cross products matrix is W = (n 1 1)S 1 + +( 1)S p = p i=1 j=1(y ij y i )(y ij y i ) T Then ˆΣɛ = W/(n p) The treatment or between sum of squares and cross products matrix is p B T = n i (y i y)(y i y) T i=1 The total corrected (for the mean) sum of squares and cross products matrix is T = B T + W = p ni i=1 j=1(y ij y)(y ij y) T Note that S = T/(n 1) is the usual sample covariance matrix of the y ij if it is assumed that all n of the y ij are iid so that the µ i µ for i = 1,, p The one way MANOVA model is y ij = µ i + ɛ ij where the ɛ ij are iid with E(ɛ ij ) = 0 and Cov(ɛ ij ) = Σɛ If all n of the y ij are iid with E(y ij ) = µ and Cov(y ij ) = Σɛ, it can be shown that A/df P Σɛ where A = W, B T, or T and df is the corresponding degrees of freedom Let t 0 be the test statistic Often Pillai s trace statistic, the Hotelling Lawley trace statistic, or Wilks lambda are used Wilks lambda Λ = W B T + W = W T p ni i=1 p i=1 ni = p i=1(n i 1)S i (n 1)S j=1(y ij y i )(y ij y i ) T ni j=1(y ij y)(y ij y) T Then t o = [n 05(m + p 2)] log(λ) and the test rejects H 0 if t 0 > χ 2 m(p 1) (1 α) See Johnson and Wichern (1988, p 238) Following Mardia, Kent, and Bibby (1979, p 335), let λ 1 λ 2 λ m be the eigenvalues of W 1 B T Then 1 + λ i for i = 1,, m are the eigenvalues of W 1 T and Λ = m i=1 (1 + λ i ) 1 Following Fujikoshi (2002) and Kakizawa (2009), let the Hotelling Lawley trace statistic U = tr(b T W 1 ) = tr(w 1 B T ) = m i=1 λ i, and let Pillai s trace statistic V = m tr(b T T 1 ) = tr(t 1 λ i B T ) = If the y i=1 1 + λ ij µ j are iid with common covariance matrix Σɛ, and if H 0 is true, then under regularity conditions [n 05(m + p i 3 =

4 2)] log(λ) D χ 2 m(p 1), (n m p 1)U D χ 2 m(p 1), and (n 1)V D χ 2 m(p 1) Note that the common covariance matrix assumption implies that each of the p treatment groups or populations has the same covariance matrix Σ i = Σɛ for i = 1,, p, an extremely strong assumption Kakizawa (2009) and Olive, Pelawa Watagoda, and Rupasinghe Arachchige Don (2015) show that similar results hold for the multivariate linear model The common covariance matrix assumption, Cov(ɛ k ) = Σɛ for k = 1,, n, is often reasonable for the multivariate linear regression model A useful one way MANOVA model is Z = XB + E where X is the full rank matrix where the first column of X is v 1 = 1 and the ith column v i of X is an indicator for group i 1 for i = 2,, p For example, v 3 = (0 T,1 T,0 T,,0 T ) T where the p vectors in v 3 have lengths n 1, n 2,,, respectively Then ˆβ 1k = Y p0k = ˆµ pk for k = 1,, m, and ˆβ ik = Y i 1,0k Y p0k = ˆµ i 1,k ˆµ pk for k = 1,, m and i = 2,, p Thus testing H 0 : µ 1 = = µ p is equivalent to testing H 0 : LB = 0 where L = [0 I p 1 ] Press (2005, p 262) uses the above model Then y ij = µ i + ɛ ij and µ T p (µ 1 µ p ) T (µ 2 µ p ) T B = (µ p 2 µ p ) T (µ p 1 µ p ) T Then a test statistic for the one way MANOVA model is w given by Equation (11) with T i = ˆµ i = y i where it is assumed that Σ i Σɛ for i = 1,, p Large sample theory can be used to derive a better test that does not need the equal population covariance matrix assumption Σ i Σɛ To simplify the large sample theory, assume n i = π i n where 0 < π i < 1 and p i=1 π i = 1 Assume H 0 is true, and let µ i = µ for i = 1,, p Suppose n i (T i µ) D N m (0,Σ i ), and ( n(t i µ) D N m 0, Σ ) i Let π i w = T 1 T p T 2 T p T p 2 T p T p 1 T p Then nw D N m(p 1) (0,Σw) with Σw = (Σ ij ) where Σ ij = Σ p for i j, and Σ ii = Σ i + Σ p for i = j Hence π p π i π p t 0 = nw T ˆΣ 1 ww = w T (11) ( ) 1 ˆΣw w D χ 2 m(p 1) n 4

5 as the n i if H 0 is true Here ˆΣw n = ˆ Σ 1 n 1 + ˆΣ p ˆ ˆΣ p Σ p ˆΣ 2 n 2 + ˆΣ p ˆ Σ p ˆΣ p ˆΣ p ˆΣ p ˆΣ p ˆ Σ p ˆ Σ p ˆ Σ p ˆΣ p is a block matrix where the off diagonal block entries equal ˆΣ p / and the ith diagonal block entry is ˆΣ i + ˆΣ p for i = 1,, (p 1) n i Reject H 0 if t 0 > m(p 1)F m(p 1),dn (1 α) (12) where d n = min(n 1,, ) It may make sense to relabel the groups so that is the largest n i or ˆΣ p / has the smallest generalized variance of the ˆΣ i /n i This test may start to outperform the one way MANOVA test if n (m + p) 2 and n i 20m for i = 1,, p Olive (2017b, ch 10) has the above result where T i = y i is the sample mean and ˆΣ i = S i is the sample covariance matrix of the ith group Then Σ i is the population covariance matrix of the ith group Rupasinghe Arachchige Don (2017) gives the general result If T = (T T 1, T T 2,, T T P ) T, θ = (µ T 1, µ T 2,, µ T p ) T, c is a constant vector, and A is a full rank r mp matrix with rank r, then a large sample test of the form H 0 : Aθ = c versus H 1 : Aθ c uses A n(t θ) D N r (0, A diag ( Σ1, Σ 2,, Σ ) p π 1 π 2 π p ) A T When H 0 is true, the statistic t 0 = [AT c] T [A diag ( ˆΣ1, ˆΣ 2,, ˆΣ ) p n 1 n 2 A T ] 1 [AT c] D χ 2 r The same statistic was used by Zhang and Liu (2013, p 138) with T i = y i and ˆΣ i = S i Section 2 shows how to get a bootstrap confidence region that can be used to test H 0 when ˆΣw is unknown or difficult to estimate Section 3 gives some simulations and an example 2 Bootstrapping Hypothesis Tests and the Prediction Region Method Olive (2017bc) shows that there is a useful relationship betweerediction regions and confidence regions Consider predicting a future r 1 test vector x f, giveast training 5

6 data x 1,, x n A large sample 100(1 δ)% prediction region is a set A n such that P(x f A n ) 1 δ while a large sample 100(1 δ)% confidence region for a parameter τ is a set A n such that P(τ A n ) 1 δ as n Consider testing H 0 : τ = c versus H 1 : τ c where c is a known r 1 vector Some notation is needed to describe the Olive (2013) prediction region for the multivariate location and dispersion model Let the r 1 column vector T be a multivariate location estimator, and let the r r symmetric positive definite matrix C be a dispersion estimator Then the ith squared sample Mahalanobis distance is the scalar D 2 i = D 2 i (T, C) = D 2 x i (T, C) = (x i T) T C 1 (x i T) (21) for each observation x i Notice that the Euclidean distance of x i from the estimate of center T is D i (T, I r ) where I r is the r r identity matrix The classical Mahalanobis distance uses (T, C) = (x, S), the sample mean and sample covariance matrix where x = 1 n n i=1 x i and S = 1 n 1 n (x i x)(x i x) T (22) i=1 A large sample 100(1 δ)% prediction region is the hyperellipsoid {w : D 2 w (x, S) D2 (c) } = {w : D w(x, S) D (c) } (23) for appropriate c Using c = n(1 δ) covers about 100(1 δ)% of the training data cases x i, but the prediction region will have coverage lower than the nominal coverage of 1 δ for moderate n This result is not surprising since empirically statistical methods perform worse on test data Increasing c will improve the coverage for moderate samples Let q n = min(1 δ + 005, 1 δ + r/n) for δ > 01 and q n = min(1 δ/2, 1 δ + 10δr/n), otherwise (24) If 1 δ < 0999 and q n < 1 δ , set q n = 1 δ Let D (Un) be the 100q n th percentile of the D i Then the Olive (2013) large sample 100(1 δ)% nonparametric prediction region for a future value x f given iid data x 1,, x n is {w : D 2 w(x, S) D 2 (U n)}, (25) while the classical large sample 100(1 δ)% prediction region is {w : Dw 2 (x, S) χ2 r,1 δ } (26) The Olive (2017abc) prediction region method obtains a confidence region for τ by applying the nonparametric prediction region (25) to the bootstrap sample T1,, T B, and the theory for the method is sketched below Let T and S T be the sample mean and sample covariance matrix of the bootstrap sample Assume n(t τ) D N r (0,Σ A ), and ns P T Σ A See Machado and Parente (2005) for regularity conditions for this assumption Following Bickel and Ren (2001), let the vector of parameters τ = T(F), the statistic T n = T(F n ), and T = T(Fn) where F is the cdf of iid x 1,, x n, F n is the empirical 6

7 cdf, and Fn is the empirical cdf of x 1,, x n, a sample from F n using the nonparametric bootstrap If n(f n F) D z F, a Gaussian random process, and if T is sufficiently smooth (with a Hadamard derivative T(F)), then n(t n τ) D U and n(ti T n ) D U with U = T(F)z F Olive (2017bc) uses these results to show that if U N r (0,Σ A ), then n(t T n ) D 0, n(ti T ) D U, n(t τ) D U, and that the prediction region method large sample 100(1 δ)% confidence region for τ is {w : (w T ) T [S T] 1 (w T ) D 2 (U B )} = {w : D 2 w(t, S T) D 2 (U B )} (27) where D(U 2 B ) is computed from D2 i = (Ti T ) T [S T] 1 (Ti T ) for i = 1,, B Note that the corresponding test for H 0 : τ = τ 0 rejects H 0 if (T τ 0 ) T [S T] 1 (T τ 0 ) > D(U 2 B ) This procedure is basically the one sample Hotelling s T 2 test applied to the Ti using S T as the estimated covariance matrix and replacing the χ2 r,1 δ cutoff by D2 (U B ) The prediction region method for testing H 0 : τ = c versus H 1 : τ c is simple Let ˆτ be a consistent estimator of τ and make a bootstrap sample w i = ˆτ i c for i = 1,, B Make the nonparametric prediction region (27) for the w i and fail to reject H 0 if 0 is in the prediction region, reject H 0 otherwise The Bickel and Ren (2001) hypothesis testing method is equivalent to using confidence region (27) with T replaced by T n and U B replaced by B(1 δ) If region (27) or the Bickel and Ren (2001) region is a large sample 100(1 δ)% confidence region, then so is the other region if n(t T n ) D 0 Hadamard differentiability and asymptotic normality are two of the sufficient conditions for both regions to be large sample confidence regions if ns P T Σ A, but Bickel and Ren (2001) showed that their method can work when Hadamard differentiability fails The location model with means, medians, and trimmed means is one example where the Bickel and Ren (2001, p 96) method works Since the univariate sample mean, sample median, and sample trimmed mean are Hadamard differentiable and asymptotically normal, each coordinate satisfies n(t in T i) D 0 for i = 1,, p Hence n(tn T ) D 0, and (27) is a large sample 100(1 δ)% confidence region if T n is the coordinatewise sample mean, median, or trimmed mean Fréchet differentiability implies Hadamard differentiability, and many statistics are shown to be Hadamard differentiable in Bickel and Ren (2001), Clarke (1986, 2000), Fernholtz (1983), Gill (1989), Ren (1991) and Ren and Sen (1995) Since the common covariance matrix assumption Cov(ɛ k ) = Σ ɛ for k = 1,, n is extremely strong, using the prediction region method for testing may be a useful alternative If T = (T1 T, T 2 T,, T P T)T, θ = (µ T 1, µt 2,, µt p )T, c is a constant vector, and A is a full rank r mp matrix with rank r, then consider a large sample test of the form H 0 : Aθ = c versus H 1 : Aθ c Then τ = Aθ, ˆτ = AT, and ˆτ i = AT i where T = (T1 T, T2 T,, TP T ) T, and Ti = ˆµ i We will illustrate this method with the one way MANOVA test for H 0 : Aθ = 0, where 0 is an r 1 vector of zeroes with r = (p 1)m This test is equivalent to H 0 : LB = 0 where L and B are given in Section 1, and 0 is a (p 1) m matrix of zeroes Take a sample of size n i with replacement from the n i cases for each group for i = 1, 2,, p Let ˆB i be the ith bootstrap estimator of B for i = 1,, B Let the 7

8 (p 1)m 1 vector w i = vec(l ˆB i) = ((ˆµ 1 ˆµ p) T,, (ˆµ p 1 ˆµ p) T ) T i for i = 1,, B, where vec(a) stacks columns of a matrix into a vector For a robust test use w i = AT i = ((T 1 T p )T,, (T p 1 T p )T ) T i where T i is a robust location estimator, such as the coordinatewise median or trimmed mean, applied to the cases in the ith treatment group The prediction region method fails to reject H 0 if 0 is in the resulting confidence region 3 EXAMPLE AND SIMULATIONS Example The Cornwell and Trumbull (1994) North Carolina Crime data consists of 630 observations on 24 variables This data set is available online from ( dockgithubio/rdatasets/datasetshtml) Region is a categorical variable with three categories: Central, West and Other with the number of observations 238, 147, and 245 respectively, and forms the three groups The m = 5 variables are Y 1 = wsta = weekly wage of state employees, Y 2 = avgsen = average sentence days, Y 3 = prbarr = probability of arrest, Y 5 = prbconv = probability of conviction, and Y 5 = taxpc = tax revenue per capita There were a few outliers and boxplots of the variables, not shown, showed that the sample medians of the three groups were nearly the same for all 5 variables The variables were highly skewed with different amounts of skew for the three groups Hence the location measures other than the population coordinatewise median likely do differ The test with the coordinatewise median had D 0 = 4086 with the cutoff of 432 and failed to reject H 0 The classical one way MANOVA test had a p-value of 0001 and rejected the null hypothesis The simulation used 5000 runs with B bootstrap samples and p = 3 groups We may need n 40mp, n (m + p) 2, and n i 40m Olive (2017bc) suggests that the prediction region method can give good results when the number of bootstrap samples B 50r = 50m(p 1), and the simulation used various values of B The sample mean, coordinatewise median, and coordinatewise 25% trimmed mean were the statistics T used The classical one way MANOVA Hotelling Lawley test statistic was also used Four types of data distributions w i were considered that were identical for i = 1, 2, and 3 Then y 1 = σ 1 Cw 1 + δ 1 1, y 2 = σ 2 Cw 2 + δ 2 1, and y 3 = σ 3 Cw 3 + δ 3 1 or y 3 = w 3 where 1 = (1,, 1) T is a vector of ones and C = diag(1, 2,, m) The w i distributions were the multivariate normal distribution N m (0, I), the mixture distribution 06N m (0, I) + 04N m (0, 25I), the multivariate t distribution with 4 degrees of freedom, and the multivariate lognormal distribution shifted to have nonzero mean µ = , but a population coordiatewise median of 0 If σ 1 = 1 and δ i = 0 for i = 1, 2, 3, note that Cov(y 2 ) = σ 2 2 Cov(y 1 ), and for the first three distributions, E(y i ) = E(w i ) = 0 If y 3 = w 3 then Cov(y 3 ) = ci m for some constant c > 0 If σ 1 = 1 and y 3 = σ 3 Cw 3 +δ 3 1, then Cov(y 3 ) = σ 2 3 Cov(y 1) Adding the same type and proportion of outliers to all three groups often resulted in three distributions that were still similar Hence outliers were added to the first group but not the second or third, making the covariance structures of the three groups quite different The outlier proportion was 100γ% Let y 1 = (y 11,, y m1 ) T The five outlier types for group 1 were type 1: a tight cluster at the major axis (0,, 0, z) T, type 2: a 8

9 tight cluster at the minor axis (z, 0,, 0) T, type 3: N m (z1, diag(1,, m)), type 4: y m1 replaced by z, and type 5: y 11 replaced by z The quantity z determines how far the outliers are from the clean data Let the coverage be the proportion of times that H 0 is rejected We want the coverage near 005 when H 0 is true and the coverage close to 10 for good power when H 0 is false With 5000 runs, an observed coverage inside of (004, 006) suggests that the true coverage is close to the nominal 005 coverage when H 0 is true The new tests work well with all the distributions and with the different covariance settings Tables 1 through 4 show simulation results for two distributions with various covariance settings We took δ 1 = δ 3 = 0 and B = the size of the bootstrap sample Balanced and unbalanced designs have also been considered For Tables 1 and 2, Σ i diag(1, 2,, m) for i = 1, 2, 3 For Tables 3 and 4, σ 2 = σ 3 = 1, and Σ 3 = ci does not have the same shape as Σ 1 and Σ 2 Tables 1 and 3 are for the multivariate normal (MVN) distribution The classical test works well with multivariate normal data when the covariance matrices are the same, but the type I error tends to be higher than the nominal level when the covariance matrices differ The classical test can be too conservative when the design is unbalanced Having an unbalanced design and different covariance matrices was the worst case scenario for the classical test regardless of the data distribution The bootstrap tests using the mean and coordinatewise trimmed mean usually performed well but occasionally had coverage near 007 Tables 2 and 4 are for the lognormal distribution, where the location measures other than the coordinatewise median differ if σ 2 σ 3 (then coverage near 1 is desired) Figures 1, 2, and 3 generated power curves for the bootstrap tests and for the Zhang and Liu (2013) MANOVA type test (12) based on the sample means y i and S i for the 3 groups The bootstrap test based on the sample means bootstraps the test (12) For these power curves, group i has mean µ i = δ i 1 where δ 2 = 2 δ 1 and δ 3 = 3 δ 1 When δ 1 increases, the distance between the mean vectors increases The power curves for the bootstrap test based on the sample means and for test (12) were always similar Figure 1 shows the power curve for clean MVN data with a balanced design where the groups have the same covariance matrices Here the three mean based tests had similar power The power curve for the classical test was poor for the next two figures Figure 2 shows clean MVN data with m = 5, σ 1 = 1, σ 2 = 2, σ 3 = 5, n 1 = 200, n 2 = 400, and n 3 = 600 Figure 3 used settings similar to Figure 2 with the multivariate t 4 distribution, and the coordinatewise trimmed mean had the best power Simulations were also done for type I error with contamination using the five types of outliers, and (γ, z) = (01, 10) or (005, 20) In Table 5 with m = 5, the test with the coordinatewise median works reasonably well (close to the nominal coverage) for 10% outliers with all the distributions and for all the outlier types with the exception of outlier type 3 All the other tests, including the classical test, failed Results were similar with m = 10, n i = 800, B = 1000, and γ = 005 Increasing z as m increases can help, but if m and γ are large enough, then the outliers move the coordinatewise median of the first group enough so that the test tends to reject H 0 9

10 Table 1: Type I error for clean MVN data with Σ 3 ci m n 1 n 2 n 3 B σ 2 σ 3 Median Mean TrMn Class Median Mean TrMean ManovaType Classical delta1 Figure 1: Power curve for clean MVN data with m = 5, σ 1 = 1, σ 2 = 1, σ 3 = 1, n 1 = 200, n 2 = 200, and n 3 =

11 Table 2: Type I error for clean lognormal data with Σ 3 ci m n 1 n 2 n 3 B σ 2 σ 3 Median Mean TrMn Class Table 3: Type I error for clean MVN data with Σ 3 = ci m n 1 n 2 n 3 B Median Mean TrMn Class

12 Table 4: Type I error for clean lognormal data with Σ 3 = ci m n 1 n 2 n 3 B Median Mean TrMn Class Median Mean TrMean ManovaType Classical delta1 Figure 2: Power curve for clean MVN data with m = 5, σ 1 = 1, σ 2 = 2, σ 3 = 5, n 1 = 200, n 2 = 400, and n 3 =

13 Median Mean TrMean ManovaTyp Classical Figure 3: Power curve for clean multivariate t 4 data with m = 5, σ 1 = 1, σ 2 = 2, σ 3 = 5, n 1 = 200, n 2 = 400, and n 3 = 600 Table 5: Type I error with contaminated data: m = 5, γ = 01 Dist n 1 = n 2 = n 3 B outlier Median Mean TrMn Class

14 4 CONCLUSIONS Bootstrapping different estimators of multivariate locatiorovides an alternative to the one way MANOVA test that assumes the population covariance matrices of the p groups are the same The bootstrap test and test (12) were similar when the sample means y i were used A larger simulation is in Rupasinghe Arachchige Don (2017) Rupasinghe Arachchige Don and Pelawa Watagoda (2017) consider bootstrapping analogs of the two sample Hotelling s T 2 test, and Konietschke, Bathke, Harrar, and Pauly (2015) suggest a method for bootstrapping the MANOVA model References for robust one way MANOVA tests are in Finch and French (2013), Todorov and Filzmoser (2010), Van Aelst and Willems (2011), Wilcox (1995), and Zhang and Liu (2013) The R software was used in the simulation See R Core Team (2016) Programs are in the Olive (2017b) collection of R functions mpacktxt available from ( siuedu/olive/mpacktxt) The function manbtsim2 was used to simulate the tests of hypotheses, and predreg computes the confidence region given the bootstrap values 5 References Bickel, PJ, and Ren, J J (2001), The Bootstrap in Hypothesis Testing, in State of the Art in Probability and Statistics: Festschrift for William R van Zwet, eds de Gunst, M, Klaassen, C, and van der Vaart, A, The Institute of Mathematical Statistics, Hayward, CA, Clarke, BR (1986), Nonsmooth Analysis and Fréchet Differentiability of M Functionals, Probability Theory and Related Fields, 73, Clarke, BR (2000), A Review of Differentiability in Relation to Robustness With an Application to Seismic Data Analysis, Proceedings of the Indian National Science Academy, A, 66, Cornwell, C, and Trumbull, WN (1994), Estimating the Economic Model of Crime with Panel Data, Review of Economics and Statistics, 76, Fernholtz, LT (1983), von Mises Calculus for Statistical Functionals, Springer, New York, NY Finch, H, and French, B (2013), A Monte Carlo Comparison of Robust MANOVA Test Statistics, Journal of Modern Applied Statistical Methods, 12, Fujikoshi, Y (2002), Asymptotic Expansions for the Distributions of Multivariate Basic Statistics and One-Way MANOVA Tests Under Nonnormality, Journal of Statistical Planning and Inference, 108, Gill, RD (1989), Non- and Semi-Parametric Maximum Likelihood Estimators and the von Mises Method, Part 1, Scandinavian Journal of Statistics, 16, Johnson, RA, and Wichern, DW (1988), Applied Multivariate Statistical Analysis, 2nd ed, Prentice Hall, Englewood Cliffs, NJ Kakizawa, Y (2009), Third-Order Power Comparisons for a Class of Tests for Multivariate Linear Hypothesis Under General Distributions, Journal of Multivariate Analysis, 100, Konietschke, F, Bathke, AC, Harrar, SW, and Pauly, M (2015), Parametric and 14

15 Nonparametric Bootstrap Methods for General MANOVA, Journal of Multivariate Analysis, 140, Machado, JAF, and Parente, P (2005), Bootstrap Estimation of Covariance Matrices Via the Percentile Method, Econometrics Journal, 8, Mardia, KV, Kent, JT, and Bibby, JM (1979), Multivariate Analysis, Academic Press, London, UK Olive, DJ (2013), Asymptotically Optimal Regression Prediction Intervals and Prediction Regions for Multivariate Data, International Journal of Statistics and Probability, 2, Olive, DJ (2017a), Applications of Hyperellipsoidal Prediction Regions, Statistical Papers, to appear Olive, DJ (2017b), Robust Multivariate Analysis, Springer, New York, NY, to appear Olive, DJ (2017c), Bootstrapping Hypothesis Tests and Confidence Regions, unpublished manuscript with the bootstrap material from Olive (2017b) at ( Olive, DJ, Pelawa Watagoda, LCR, and Rupasinghe Arachchige Don, HS (2015), Visualizing and Testing the Multivariate Linear Regression Model, International Journal of Statistics and Probability, 4, Press, SJ (2005), Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, 2nd ed, Dover, Mineola, NY R Core Team (2016), R: a Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, (wwwr-projectorg) Ren, J-J (1991), On Hadamard Differentiability of Extended Statistical Functional, Journal of Multivariate Analysis, 39, Ren, J-J, and Sen, PK (1995), Hadamard Differentiability on D[0,1] p, Journal of Multivariate Analysis, 55, Rupasinghe Arachchige Don, HS (2017), Bootstrapping Analogs of the One Way MANOVA Test, PhD Thesis, Southern Illinois University, at ( siuedu/olive/shasthikaphdpdf) Rupasinghe Arachchige Don, HS, and Pelawa Watagoda, LCR (2017), Bootstrapping Analogs of the Two Sample Hotelling s T 2 Test, Communications and Statistics: Theory and Methods, to appear See preprint at ( stwosamplepdf) Todorov, V, and Filzmoser, P (2010), Robust Statistics for the One-Way MANOVA, Computational Statistics & Data Analysis, 54, Van Aelst, S, and Willems, G (2011), Robust and Efficient One-Way MANOVA Tests, Journal of the American Statistical Association, 106, Wilcox, R R (1995), Simulation Results on Solutions to the Multivariate Behrens- Fisher Problem via Trimmed Means, The Statistician, 44, Zhang, J-T, and Liu, X (2013), A Modified Bartlett Test for Heteroscedastic One- Way MANOVA, Metrika, 76,

A Squared Correlation Coefficient of the Correlation Matrix

A Squared Correlation Coefficient of the Correlation Matrix A Squared Correlation Coefficient of the Correlation Matrix Rong Fan Southern Illinois University August 25, 2016 Abstract Multivariate linear correlation analysis is important in statistical analysis

More information

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic http://ijspccseetorg Iteratioal Joural of Statistics ad Probability Vol 7, No 6; 2018 A Relatioship Betwee the Oe-Way MANOVA Test Statistic ad the Hotellig Lawley Trace Test Statistic Hasthika S Rupasighe

More information

Visualizing and Testing the Multivariate Linear Regression Model

Visualizing and Testing the Multivariate Linear Regression Model Visualizing and Testing the Multivariate Linear Regression Model Abstract Recent results make the multivariate linear regression model much easier to use. This model has m 2 response variables. Results

More information

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus Chapter 9 Hotelling s T 2 Test 9.1 One Sample The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus H A : µ µ 0. The test rejects H 0 if T 2 H = n(x µ 0 ) T S 1 (x µ 0 ) > n p F p,n

More information

Chapter 7, continued: MANOVA

Chapter 7, continued: MANOVA Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to

More information

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE Supported by Patrick Adebayo 1 and Ahmed Ibrahim 1 Department of Statistics, University of Ilorin, Kwara State, Nigeria Department

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School 2017 Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Craig J. Bartelsmeyer Southern Illinois University

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

Bootstrapping Hypotheses Tests

Bootstrapping Hypotheses Tests Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School Summer 2015 Bootstrapping Hypotheses Tests Chathurangi H. Pathiravasan Southern Illinois University Carbondale, chathurangi@siu.edu

More information

STAT 730 Chapter 5: Hypothesis Testing

STAT 730 Chapter 5: Hypothesis Testing STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The

More information

arxiv: v1 [math.st] 11 Jun 2018

arxiv: v1 [math.st] 11 Jun 2018 Robust test statistics for the two-way MANOVA based on the minimum covariance determinant estimator Bernhard Spangl a, arxiv:1806.04106v1 [math.st] 11 Jun 2018 a Institute of Applied Statistics and Computing,

More information

Highest Density Region Prediction

Highest Density Region Prediction Highest Density Region Prediction David J. Olive Southern Illinois University November 3, 2015 Abstract Practical large sample prediction regions for an m 1 future response vector y f, given x f and past

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) II Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 1 Compare Means from More Than Two

More information

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study A comparison of efficient permutation tests for unbalanced ANOVA in two by two designs and their behavior under heteroscedasticity arxiv:1309.7781v1 [stat.me] 30 Sep 2013 Sonja Hahn Department of Psychology,

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 5: Hypothesis tests for more than one sample

Lecture 5: Hypothesis tests for more than one sample 1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated

More information

Multivariate Linear Models

Multivariate Linear Models Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

MULTIVARIATE ANALYSIS OF VARIANCE

MULTIVARIATE ANALYSIS OF VARIANCE MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,

More information

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3. Simfit Tutorials and worked examples for simulation, curve fitting, statistical analysis, and plotting. http://www.simfit.org.uk MANOVA examples From the main SimFIT menu choose [Statistcs], [Multivariate],

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition

More information

Prediction Intervals in the Presence of Outliers

Prediction Intervals in the Presence of Outliers Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA)

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA) ISSN 2224-584 (Paper) ISSN 2225-522 (Online) Vol.7, No.2, 27 Robust Wils' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA) Abdullah A. Ameen and Osama H. Abbas Department

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Comparisons of Several Multivariate Populations Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

i=1 (Ŷ i Y ) 2 and error (or residual) i=1 (Y i Ŷ i ) 2 = n

i=1 (Ŷ i Y ) 2 and error (or residual) i=1 (Y i Ŷ i ) 2 = n Math 583 Exam 3 is on Wednesday, Dec 3 You are allowed 10 sheets of notes and a calculator CHECK FORMULAS! 85 If the MLR (multiple linear regression model contains a constant, then SSTO = SSE + SSR where

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Rationale and MANOVA test statistics underlying principles MANOVA assumptions Univariate ANOVA Planned and unplanned Multivariate ANOVA

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

5 Inferences about a Mean Vector

5 Inferences about a Mean Vector 5 Inferences about a Mean Vector In this chapter we use the results from Chapter 2 through Chapter 4 to develop techniques for analyzing data. A large part of any analysis is concerned with inference that

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay Lecture 5: Multivariate Multiple Linear Regression The model is Y n m = Z n (r+1) β (r+1) m + ɛ

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 17 for Applied Multivariate Analysis Outline Multivariate Analysis of Variance 1 Multivariate Analysis of Variance The hypotheses:

More information

Testing equality of two mean vectors with unequal sample sizes for populations with correlation

Testing equality of two mean vectors with unequal sample sizes for populations with correlation Testing equality of two mean vectors with unequal sample sizes for populations with correlation Aya Shinozaki Naoya Okamoto 2 and Takashi Seo Department of Mathematical Information Science Tokyo University

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure)

COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure) STAT 52 Completely Randomized Designs COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure) Completely randomized means no restrictions on the randomization

More information

INFERENCE AFTER VARIABLE SELECTION. Lasanthi C. R. Pelawa Watagoda. M.S., Southern Illinois University Carbondale, 2013

INFERENCE AFTER VARIABLE SELECTION. Lasanthi C. R. Pelawa Watagoda. M.S., Southern Illinois University Carbondale, 2013 INFERENCE AFTER VARIABLE SELECTION by Lasanthi C. R. Pelawa Watagoda M.S., Southern Illinois University Carbondale, 2013 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Doctor

More information

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith

More information

Inferences about a Mean Vector

Inferences about a Mean Vector Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

odhady a jejich (ekonometrické)

odhady a jejich (ekonometrické) modifikace Stochastic Modelling in Economics and Finance 2 Advisor : Prof. RNDr. Jan Ámos Víšek, CSc. Petr Jonáš 27 th April 2009 Contents 1 2 3 4 29 1 In classical approach we deal with stringent stochastic

More information

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT Regression analysis is one of the most used statistical methodologies. It can be used to describe or predict causal

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test la Contents The two sample t-test generalizes into Analysis of Variance. In analysis of variance ANOVA the population consists

More information

Other hypotheses of interest (cont d)

Other hypotheses of interest (cont d) Other hypotheses of interest (cont d) In addition to the simple null hypothesis of no treatment effects, we might wish to test other hypothesis of the general form (examples follow): H 0 : C k g β g p

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.

More information

Non-Parametric Combination (NPC) & classical multivariate tests

Non-Parametric Combination (NPC) & classical multivariate tests Non-Parametric Combination (NPC) & classical multivariate tests Anderson M. Winkler fmrib Analysis Group 5.May.26 Winkler Non-Parametric Combination (NPC) / 55 Winkler Non-Parametric Combination (NPC)

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Group comparison test for independent samples

Group comparison test for independent samples Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations

More information

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included

More information

High Breakdown Analogs of the Trimmed Mean

High Breakdown Analogs of the Trimmed Mean High Breakdown Analogs of the Trimmed Mean David J. Olive Southern Illinois University April 11, 2004 Abstract Two high breakdown estimators that are asymptotically equivalent to a sequence of trimmed

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Understanding and Predicting Crime Rates Using Statistical Methods Carlos Espino, Xavier Gonzalez, Diego Llarrull, Woojin Kim December 15, 2015

Understanding and Predicting Crime Rates Using Statistical Methods Carlos Espino, Xavier Gonzalez, Diego Llarrull, Woojin Kim December 15, 2015 Understanding and Predicting Crime Rates Using Statistical Methods Carlos Espino, Xavier Gonzalez, Diego Llarrull, Woojin Kim December 15, 215 Contents 1 Introduction 2 2 Dataset 2 3 Analysis 6 3.1 Influencial

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Empirical Power of Four Statistical Tests in One Way Layout

Empirical Power of Four Statistical Tests in One Way Layout International Mathematical Forum, Vol. 9, 2014, no. 28, 1347-1356 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2014.47128 Empirical Power of Four Statistical Tests in One Way Layout Lorenzo

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Hypothesis Testing For Multilayer Network Data

Hypothesis Testing For Multilayer Network Data Hypothesis Testing For Multilayer Network Data Jun Li Dept of Mathematics and Statistics, Boston University Joint work with Eric Kolaczyk Outline Background and Motivation Geometric structure of multilayer

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

Comparisons of Several Multivariate Populations

Comparisons of Several Multivariate Populations Comparisons of Several Multivariate Populations Edps/Soc 584, Psych 594 Carolyn J Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees,

More information

A New Procedure for Multiple Testing of Econometric Models

A New Procedure for Multiple Testing of Econometric Models A New Procedure for Multiple Testing of Econometric Models Maxwell L. King 1, Xibin Zhang, and Muhammad Akram Department of Econometrics and Business Statistics Monash University, Australia April 2007

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Empirical likelihood-based methods for the difference of two trimmed means

Empirical likelihood-based methods for the difference of two trimmed means Empirical likelihood-based methods for the difference of two trimmed means 24.09.2012. Latvijas Universitate Contents 1 Introduction 2 Trimmed mean 3 Empirical likelihood 4 Empirical likelihood for the

More information

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC Journal of Applied Statistical Science ISSN 1067-5817 Volume 14, Number 3/4, pp. 225-235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWO-FACTOR ANALYSIS OF VARIANCE

More information