POLYNOMIAL SINGLE COMPONENT ANALYSIS USING MULTIVARIATE MOMENTS

Size: px

Start display at page:

Download "POLYNOMIAL SINGLE COMPONENT ANALYSIS USING MULTIVARIATE MOMENTS"

Ambrose Weaver
5 years ago
Views:

1 POLYNOMIAL SINGLE COMPONENT ANALYSIS USING MULTIVARIATE MOMENTS JAN DE LEEUW Abstract. Polynomial single component analysis is defined. Algorithms, based on multidimensional array approimation methods, to fit a model with a single common factor to observed multivariate moments are described and applied to a psychometric eample. Software in R is included. Date: Tuesday 0 th July, 0 7h 7min Typeset in Lucida Bright. 000 Mathematics Subject Classification. 6H. Key words and phrases. Component Analysis, Polynomials, Array Decmposition.

2 JAN DE LEEUW. Model Suppose y j are random variables, with j =,, m. We use the convention of underlining random variables [Hemelrijk, 966]. In polynomial single component analysis we assume there eist a random variable such that () y j = P j (), where the P j are univariate polynomials of degree s. Thus s () P j () = β jp p. p=0 Closely related previous work on the polynomial factor analysis model, which differs from () because it includes unique factors, is in Mooijaart and Bentler [986], who also discuss earlier work by Gibson and McDonald. Mooijaart and Bentler [986] use higher order product moments, specifically third order moments, and the special case of one-dimensional polynomial factor analysis is worked out and applied to a real eample. Since Mooijaart and Bentler [986] the emphasis has shifted away from higher order statistics to using instrumental variables and other more specialized techniques, and from component and factor analysis to general nonlinear structural equation models. Good historical overviews of these more recent developments are in Yalcin and Amemiya [00] and Mooijaart and Bentler [00]. In this paper we go back to the simple single component model, we use higher-order statistics, specifically multivariate moments, and we use decomposition methods for multidimensional arrays to fit the models to sample moments. Thus, compared to Mooijaart and Bentler [986], we develop the case of non-normal factors, provide algorithms and software to deal with moments of arbitrary order, and develop algorithms and software based on fitting multilinear

3 POLYNOMIAL SINGLE COMPONENT ANALYSIS array decompositions. All software is written in R [R Development Core Team, 0]. In a companion paper [De Leeuw, 0] we have worked out the equations in terms of cumulants, instead of moments. Cumulants are useful for laying the groundwork for generalizations to approimations with multiple components, and to factor analysis models. In the special case of single-factor component analysis, however, working with moments is much simpler.. Product Moments From the multilinearity property of moments for any r variables, not necessarily distinct, we have s s () E(P j () P jr ()) = β j p β jr p r c p p r, p =0 p r =0 where () c p p r = µ p + +p r, and where we write µ t for the t th raw moment of. The product moments of order r of the polynomials are a supersymmetric multilinear function of the elements of B = {β jp }, for a given kernel C = {c p p r }. Supper-symmetry means that the array elements are invariant under all permutations of the subscript. Moreover the kernel must have the moment structure (). The s r elements of the kernel array are simple linear functions of the first sr moments of. In fact we can write () C = sr t= µ t I t, where the I t are super-symmetric orthogonal binary arrays with {I t } p p r = only if p + + p r = t.

4 JAN DE LEEUW Of course not all vectors are possible vectors of the first moments of a random variable. Defining all possible vectors of moments is the truncated version of the classical Moment Problem, which requires that the Hankel matri formed from the moments is positive semi-definite. See, for eample, Curto and Fialkow [99] for the details. Instead of dealing with the complicated moment inequalities we can use the parametrization µ t = t df() and think of the kernel as a function of the distribution function F. By Mulholland and Rogers [98] we can limit ourselves, without loss of generality, to step functions with at most sr + steps. When dealing with multivariate moments it is convenient to introduce an additional variable y 0 which is a.s. equal to one and a corresponding polynomial P 0 which has β 0p = 0 for p =,, s. If we include P 0 in the epectations (), then the moment array includes all moments up to order r, i.e. also the moments of order,, r.. Loss Function Given the result () it becomes straightforward to generalize the methods for linear component analysis in De Leeuw and Kukuyeva [0] to the polynomial case. We first compute an array of moments A (r ) = {a (r ) j j r } of order up to r from a given data matri. This can be done with the code in segment. Insert Code Segment about here The data is a super-symmetric moment array of order (m + ) r. The kernel is a super-symmetric array of order s r, and B is an (m + ) s matri of loadings. The first row of B is zero, ecept for its first element.

5 POLYNOMIAL SINGLE COMPONENT ANALYSIS We then define the least squares loss function by (6) σ (B, C) = j J j r J r w j j r (a j j r s p =0 and minimize it over both B and C. s p r =0 β j p β jr p r c p p r ). The w j j r are given non-negative weights. They can be used to give lower or higher weights to the moments of various orders, or even to incorporate information about the standard errors of the moments. Note that the loss function can be written in matri form as (7) σ (B, C) = {vec(a) (B } {{ B} )vec(c)} V {vec(a) (B } {{ B} )vec(c))}, r times r times where V = diag(vec(w )). In actual computation and reporting we normalize the loss function to its root-mean-square form σ (B, C) (8) σ (B, C) = tr(v ). For the fitting of C three different cases are of interest... Fied Kernel. The problem is relatively simple if the kernel C is completely known. This requires, for eample, the additional assumption that has a standard normal distribution. The code in segment can be used to compute moments up to order r, including µ 0 =, of the standard normal distribution. Insert Code Segment about here The code in can be used to put moments into the kernel array, using the formula (). Insert Code Segment about here

6 6 JAN DE LEEUW In the case of a fied kernel the loss function only needs to be minimized over B... Free Kernel. In the second special case we ignore the fact that the kernel array C = {c p p r } is, according to (), an array formed of moments of the same variable. Thus we ignore the structure imposed by (). We merely imposing super-symmetry on C. We now minimize the loss function over both B and super-symmetric C. Ignoring the structure of the kernel array is convenient, but it does imply we will not be able to distinguish a linear model with s + correlated components from a polynomial model of degree s. This is a major disadvantage of the method. Nevertheless, if using higher order moments forces identification, then we will be able to estimate factor loadings (or polynomial coefficients) consistently, up to a linear transformation... Structured Kernel. This is the more complicated case, and in several important ways the most interesting one. We minimize over B and over all C with the structure (). We minimize loss over B and over these vectors of moments. There are two variations. In the first we minimize over all vectors µ, without imposing the condition that the vector consists of moments of a single variable. In the second we minimize over all distributions F, which determine the moments. In fact, it usually makes more sense to minimize over a finite-dimensional conve set of distribution functions, for eample the set of all conve combinations of B-splines of a given order with a given set of knots. In this case we can write B + (9) µ t = π b t df b (). b= Collecting the moments of the basis distributions in a matri M allows us to write µ = Mπ.

7 POLYNOMIAL SINGLE COMPONENT ANALYSIS 7 The code in segment computes the basis I t of indicators for the kernel used in formula (). It is wasteful to compute and store the indicators, but for small eamples it will be both feasible and computationally efficient. Insert Code Segment about here The code in segment uses Neuman [98, Proposition.] to compute the moments of B-splines for any knot sequence to any order. Insert Code Segment about here. Algorithms Various segments of code needed for our minimizations, written in R, to minimize (6) are given in De Leeuw [008, 0]; De Leeuw and Kukuyeva [0]. We adopt and etend them to deal with our form of component analysis... Initial Solution. Because of the clear possibility of local minima we want to start with a good initial configuration. For the kernel we always use the one based on moments of the standard normal. For the loadings we approimate the first moments y and second momentss by an equation the form y 0 c b. y S b B c C 0 B The product on the right works out to b + c B b + Bc bb + Bcb + bc B + BCB Now choose b = y Bc, so that we fit the means eactly. Then the fit becomes y y yy + B(C cc )B,

8 8 JAN DE LEEUW and we find B by using unweighted least squares to approimate the covariance S yy by B(C cc )B. This is a simple eigenvalueeigenvector problem... Fied Kernel. To minimize over B we use cyclic coordinate descent (CCD), i.e. we change one coordinate of B at a time while keeping all other fied at their current values. One iteration to update B is a cycle in which all coordinates are changed. CCD is discussed, for eample, in Zangwill [969, Page -], and in a similar contet it has been used in ALSCAL [Takane et al., 977]. It is currently popular in machine learning. See for some eamples and references. If we deal with cumulants of order four, then changing one coordinate makes the loss function a polynomial of order eight of that coordinate. If the cumulants are of order three, the polynomial is of order si, and so on. The polynomial is minimized by using the optimize() function in R. There is no guarantee CCD will find the global minimum. As in ALSCAL, it is possible, at least in principle, to use global optimization methods for multivariate (non-negative) polynomials [?]. We have not tried this, and for larger problems the computational requirements are quite daunting... Free Kernel. Fitting a free kernel is a simple modification of the basic algorithm. After a CCD cycle we update the kernel for given coordinates, and this is a simple linear least squares problem. The resulting Alternating Least Squares (ALS) algorithm, which alternates CCD cycles and kernel updates, is a block relaation algorithm in the sense of De Leeuw [99].

9 POLYNOMIAL SINGLE COMPONENT ANALYSIS 9 Note that from (7) we see, using Moore-Penrose inverses, that the optimal kernel for fied B is (0) vec(c) = (B } + {{ B + })vec(a). r times.. Structured Kernel. From () () (B } {{ B} )vec(c) = r times sr t= µ t (B } {{ B} )vec(i t ) r times Collecting the vectors (B } {{ B} )vec(i t ) in the columns of a matri r times H allows us to write the loss function as () σ (B, µ) = (vec(a) Hµ) V (vec(a) Hµ), while using (9) gives () σ (B, π) = (vec(a) HMπ) V (vec(a) HMπ). Thus the two variations for a structured kernel are to minimize () over all µ and to minimize () over all π in the unit simple. Both are linear least squares problems, in the second case with simple linear inequality and equality constraints. We solve the quadratic programming problem with the Frank-Wolfe linearization algorithm using the optimal step-size. The code is in segment. Insert Code Segment about here. Real Eample We use the YouthGratitude data from the R package psychotools [Zeileis et al., 0]. They are described in detail in the article by Froh et al. [0]. The si seven-point Likert scale variables of the GQ-6 scale are used, with responses from 0 students aged 0-9 years. In Table A of Froh et al. [0] we see that a linear factor analysis gives loadings between.8 and.9 for items and, between.6 and.7 for items,, and, and between. and. for

10 0 JAN DE LEEUW item 6. On the basis of these results the authors eliminate item 6 from further analysis, and we will do the same thing. The data are generate by the following code. library (psychotools) data ("YouthGratitude") gq6 <- as.matri (YouthGratitude[,:8]) We then load the required packages and source the required files. require (abind) require (apl) require (polynom) source ("tuckerccd.r") 6 source ("kernelinitial.r") 7 source ("kernelupdate.r") 8 source ("tuckermore.r") 9 source ("tuckerfinal.r") 0 source ("tuckerkernel.r") source ("tuckerau.r") source ("tuckerbasis.r") source ("tuckerplot.r") source ("utilities.r") source ("simplels.r") For the runs we set the iteration precision, the decrease of successive loss function values, to and we set the maimum number of outer iterations to 00. The number of inner iterations, which are the coordinate descent cycles to fit B, is set at. If we use a miture of B-spline densities for the moments of the distribution of we use equally spaced knots at the positive and negative integers from -6 to +6 and choose order (i.e. we use piecewise quadratic polynomials differentiable at the knots).

11 POLYNOMIAL SINGLE COMPONENT ANALYSIS The code below does 8 analyses, for each of the four kernel types, using moments of order r =, and polynomial degrees s =,,. loss <- array (0, c(,, )) itel <- array (0, c(,, )) R <- : S <- : 6 K <- : 7 8 for (r in R) { 9 for (s in S) { 0 for (k in K) { kertype <- switch (k, "fied", "free", " moment", "cdf") vname <- paste (kertype, r, s, sep = "") pname <- paste (vname, ".pdf", sep = "") assign (paste (kertype, r, s, sep = ""), tuckerccd (gq6, r, s, kertype = kertype )) llist <- eval (as.name (vname)) 6 pdf (pname) 7 plotpol (llist $ b, vname) 8 dev.off () 9 loss [s, r -, k] <- llist $ tloss 0 itel [s, r -, k] <- llist $ otel } } } dump (c("loss", "itel"), file = "results.r" ).. Results for Fied Normal Kernel.

12 JAN DE LEEUW 6. Etensions and Elaborations 6.. Weights and Efficiency. Adding weights W = {w j j r } to (6) is straightforward. It is more complicated to use the standard errors of the multivariate cumulants to improve (asymptotically distribution free) efficiency of the estimates. Again we will rely on (??). 6.. Constraints on Loadings. Adding linear constraints (some elements are zero, some elements are equal) on B is again straightforward. This makes it possible to use different polynomial degrees for different variables. We can also impose ordinal constraints, for instance that polynomials are monotonic. 6.. Consistency. Sample cumulants converge in probability to population cumulants. Minimizers of the loss function (or stationary points of the algorithm) are, under regularity conditions, continuous functions of the sample cumulants. Thus, under additional identifiability conditions, the estimates computed by the algorithm are consistent estimates of the population parameters. 6.. Multiple Common Factors. Additional work is needed to adapt this approach to structures with more than one common factor. The common part of each variable will be a multivariate polynomial of the common factors, i.e. a symmetric multilinear form. It is consequently still true that the multivariate polynomial is a linear combination of powers and products of powers, and thus the basic results (??) and (??) continue to apply. 6.. Non-additive Unique Factors. The additive combination rule in () is not as compelling as in the linear case. This is also the reason we have not paid much attention to factor analysis, as opposed to component analysis, in this paper. The more general decomposition y j = P j (, u j ) becomes interesting, but this is more naturally handled by the etension to mutiple common factors. For

13 POLYNOMIAL SINGLE COMPONENT ANALYSIS this case (??) no longer applies, but we can continue to rely on (??). This more general description, together with work on more than one factor, will become a topic for further research Convergence Accelleration. In further work we plan to use the methods discussed in De Leeuw [008] to speed up convergence. It must be emphasized, however, that there are many more parts of the algorithm that can be accelerated in various ways.

14 JAN DE LEEUW References R.E. Curto and L.A. Fialkow. Recursiveness, Positivity, and Truncated Moment Problems. Houston Journal of Mathematics, 7: 60 6, 99. J. De Leeuw. Block Relaation Algorithms in Statistics. In H.H. Bock, W. Lenski, and M.M. Richter, editors, Information Systems and Data Analysis, pages 08, Berlin, 99. Springer Verlag. URL chapters/deleeuw_c_9c.pdf. J. De Leeuw. Polynomial etrapolation to accelerate fied point algorithms. Preprint Series, UCLA Department of Statistics, Los Angeles, CA, 008. URL ~deleeuw/janspubs/008/reports/deleeuw_r_08i.pdf. J. De Leeuw. The Multiway Package. Preprint Series, UCLA Department of Statistics, Los Angeles, CA, 008. URL reports/deleeuw_r_08j.pdf. J. De Leeuw. Multivariate Cumulants in R. Unpublished, 0. URL notes/deleeuw_u_a.pdf. J. De Leeuw. Single Factor Polynomial Component Analysis. Unpublished, 0. URL janspubs/0/notes/deleeuw_u_a.pdf. J. De Leeuw and I. Kukuyeva. Component Analysis Using Multivariate Cumulants. Unpublished, 0. URL notes/deleeuw_kukuyeva_u_.pdf. J.J. Froh, J. Fan, R.A. Emmons, G. Bono, E. S. Huebner, and P. Watkins. Measuring Gratitude in Youth: Assessing the Psychometric Properties of Adult Gratitude Scales in Children and Adolescents. Psychological Assessment, :, 0. J. Hemelrijk. Underlining Random Variables. Statistica Neerlandica, 0: 7, 966.

15 POLYNOMIAL SINGLE COMPONENT ANALYSIS A. Mooijaart and P. Bentler. Random Polynomial Factor Analysis. In E. Diday et al., editor, Data Analysis and Informatics, volume IV, pages 0, 986. A. Mooijaart and P.M. Bentler. An Alternative Approach for Nonlinear Latent Variable Models. Structural Equation Modeling, 7: 7 7, 00. H.P. Mulholland and C.A. Rogers. Representation Theorems for Distribution Functions. Proceedings of the London Mathematical Society, 8:77, 98. E. Neuman. Moments and Fourier Transforms of B-splines. Journal of Computational and Applied Mathematics, 7: 6, 98. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 0. URL ISBN Y. Takane, F.W. Young, and J. De Leeuw. Nonmetric individual differences in multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika, :7 67, 977. URL janspubs/977/articles/takane_young_deleeuw_a_77. pdf. I. Yalcin and Y. Amemiya. Nonlinear Factor Analysis as a Statistical Method. Statistical Science, 6:7 9, 00. W. I. Zangwill. Nonlinear Programming: a Unified Approach. Prentice-Hall, Englewood-Cliffs, N.J., 969. A. Zeileis, C. Strobl, and F. Wickelmaier. psychotools: Infrastructure for Psychometric Modeling, 0. URL R-project.org/package=psychotools. R package version 0.-.

16 6 JAN DE LEEUW Appendi A. Code Code Segment Raw Moments up to Order p rawmomentsuptop <- function (, p = ) { n <- nrow () m <- ncol () if (p == ) { return (c (, apply (,, mean))) 6 } 7 y <- array (0, rep (m +, p)) 8 for (i in : n) { 9 i <- c (, [i, ]) 0 z <- i for (s in :p) { z <- outer (z, i) } y <- y + z } 6 return (y / n) 7 }

17 POLYNOMIAL SINGLE COMPONENT ANALYSIS 7 Code Segment Moments of Standard Normal normalevenmoment <- function (n) { if ((n %% ) == ) { stop ("even moments only") } return (doublefactorial (n - )) 6 } 7 8 doublefactorial <- function (n) { 9 if ((n %% ) == 0) { 0 stop ("odd argument only") } k <- (n + ) / return (prod ( * ( : k) - )) } 6 normalmoments <- function (n) { 7 m <- rep (0, n) 8 ind <- which((( : n) %% ) == 0) 9 m [ind] <- sapply (ind, normalevenmoment) 0 return (c (, m)) }

18 8 JAN DE LEEUW Code Segment Kernel from Moments require("apl") makemomentkernel <- function (s, r, mu) { if (length (mu)!= ((s * r) + )) { stop ("vector of moments wrong length") 6 } 7 c <- array(0, rep (s +, r)) 8 sr <- (s + ) ^ r 9 for (i in : sr) { 0 l <- aplencode (i, rep (s +, r)) - k <- sum (l) c [i] <- mu [k + ] } return (c) }

19 POLYNOMIAL SINGLE COMPONENT ANALYSIS 9 Code Segment Kernel Indicators makeindicatorbasismatri <- function (s, r) { ii <- replist (array (0, rep (s +, r)), (s * r) + ) sr <- (s + ) ^ r for (i in ( : sr)) { l <- aplencode (i, rep (s +, r)) - 6 k <- sum (l) 7 ii [[k + ]] [i] <- 8 } 9 return (ii) 0 }

20 0 JAN DE LEEUW Code Segment Moments of B-splines makebsplinemoments <- function (lup, t) { kup <- length (t) - m <- matri (0, kup, lup + ) for (l in (0 : lup)) { s <- (t [] ^ (l + )) - (t [] ^ (l + )) 6 s <- (l + ) * (t [] - t []) 7 m [, l + ] <- s / s 8 for (k in ( : kup)) { 9 ss <- 0 0 for (j in (0 : l)) { s <- t [k + ] ^ (l - j) s <- choose (k + j -, j) s <- m [k -, j + ] ss <- ss + s * s * s } 6 m [k, l + ] <- ss / choose (k + l, k) 7 } 8 } 9 return (m [kup, ]) 0 } makebsplinemomentmatri <- function (lup, t, order) { k <- length (t) - order m <- matri (0, k, lup + ) for (i in ( : k)) { 6 m[i, ] <- makebsplinemoments (lup, t [i : (i + order)]) 7 } 8 return (m) 9 }

21 POLYNOMIAL SINGLE COMPONENT ANALYSIS Code Segment 6 CCD Algorithm for Fied/Free Kernel tuckerccd <- function (data, r, s, w = array (, dim (a)), kertype = "fied", order =, knots = -6:6, inma =, ineps = e-, outma = 00, outeps = e-, verbose = ) { a <- rawmomentsuptop (data, r) vassign (i, m, p, q, k, values = kernelinitial (s, r, k, kertype, knots, order)) vassign (b, otel, values = list (tuckerinitial (a, k), )) oloss <- tuckervalue (a, k, w, b) 6 repeat { 7 vassign (itel, bold, values = list (, b)) 8 repeat { 9 bnew <- oneccdcycle (a, k, w, bold) 0 chng <- ma (abs (bnew - bold)) if (verbose > ) cat ("Ineration: ", formatc (itel,, ), "IneChange: ", formatc (chng, 6, 0, " f"), "\n") if ( (chng < ineps) (itel == inma)) break () 6 vassign (bold, itel, values = list (bnew, itel + )) 7 } 8 b <- bnew 9 loss <- tuckervalue (a, k, w, b) 0 vassign (k, p, q, values = kernelupdate (a, k, i, m, p, w, b, kertype)) nloss <- tuckervalue (a, k, w, b) if (verbose >= ) cat ("Iteration:", formatc (otel,, ), "OLoss: ", formatc (oloss, 6, 0, "f"), "BLoss: ", formatc (loss, 6, 0, "f"), 6 "Kloss: ", formatc (nloss, 6, 0, "f"), "\ n") 7 if (((oloss - nloss) < outeps) (otel == outma)) 8 break () 9 vassign (oloss, otel, values = list (nloss, otel + )) 0 } return (list(b = b, otel = otel, tloss = nloss, k = k, p = p, q = q)) }

22 JAN DE LEEUW Code Segment 7 Tucker Subroutines tuckervalue <- function (a, k, w, b) { r <- length (dim (a)) res <- as.vector (a) - arrkronecker (replist (b, r)) % *% as.vector (k) return (sqrt (sum (as.vector (w) * res * res) / sum (w) } 6 )) 7 tuckerinitial <- function (a, k) { 8 r <- length (dim (a)) 9 s <- dim (a)[] 0 t <- dim (k) [] h <- asub (a, c (replist ( : s, ), replist (, r - ) )) m <- (h - outer (h [, ], h[, ])) [-, -] e <- eigen (m) v <- asub (k, c (replist ( : t, ), replist (, r - ) )) p <- nrow (v) - 6 u <- (v - outer (v [, ], v[, ])) [-, -] 7 if (p == ) { 8 <- as.matri (e $ vectors [, ]) %*% sqrt (e $ 9 } else { values []) 0 <- e $ vectors [, : p] %*% diag (sqrt (e $ } values [ : p])) bb <- %*% solve (t (chol (u))) bv <- h [-, ] - bb %*% v [-, ] return (rbind (c (, rep (0, t - )), cbind (bv, bb))) }

23 POLYNOMIAL SINGLE COMPONENT ANALYSIS Code Segment 8 More Tucker Subroutines oneccdcycle <- function (a, k, w, b) { p <- dim (k) [] m <- dim (a) [] f <- function (, j, s) tuckervalue (a, k, w, addelement (b, j, s, )) loss <- tuckervalue (a, k, w, b) 6 for (j in : m) { 7 for (s in : p) { 8 oo <- optimize (f, c(-, ), j, s) 9 ls <- oo $ objective 0 ip <- oo $ minimum if (ls > loss) { net () } loss <- ls b <- addelement (b, j, s, ip) 6 } 7 } 8 return (b) 9 } 0 addelement <- function (, i, j, s) { [i, j] <- [i, j] + s return () }

24 JAN DE LEEUW Code Segment 9 Tucker Auilaries arrkronecker <- function (, fun=" * ") { nmat <- length () if (nmat == 0) { stop("empty argument in arrkronecker") } 6 res <- [[]] 7 if (length () == ) { 8 return (res) 9 } 0 for (i in : nmat) { res <- kronecker (res, [[i]], fun) } return (res) } 6 replist <- function (, n) { 7 z <- list() 8 if (n == 0) { 9 return (z) 0 } for (i in : n) z <- c (z, list ()) return (z) } 6 vassign <- function(..., values, envir=parent.frame()) { 7 vars <- as.character(substitute(...())) 8 values <- rep(values, length.out=length(vars)) 9 for(i in seq_along(vars)) { 0 assign(vars[[i]], values[[i]], envir) } }

25 POLYNOMIAL SINGLE COMPONENT ANALYSIS Code Segment 0 Plotting Polynomials plotpol <- function (, title) { m <- nrow () s <- seq (-.,., length = 00) h <- seq (-.,., length = 0) v <- matri (0, m -, 00) 6 u <- matri (0, m -, 0) 7 cols <- rainbow (m - ) 8 for (i in : (m - )) { 9 v[i, ] <- predict (polynomial ([i +, ]), s) 0 u[i, ] <- predict (polynomial ([i +, ]), h) } yma <- 7 ymin <- plot (0, lim = c (-.,.), ylim = c (ymin, yma), main = title, lab = "", ylab = "", type = "n") for (i in : (m - )) { 6 lines (s, v[i, ]) 7 points (cbind (h, u[i, ]), pch = as.character (i), col = cols[i]) 8 } 9 }

26 6 JAN DE LEEUW Code Segment Frank-Wolfe Algorithm qpfrankwolfe <- function (, y, w = rep (, length (y)), b = rep (, ncol ()) / ncol (), itma = 00, eps = e-6, verbose = FALSE) { w <- crossprod (, w * ) fold <- Inf itel <- repeat { 6 res <- c (y - %*% b) 7 fnew<- sum (w * res ^ ) / 8 g <- - colsums (w * res * ) 9 mg <- min (g) 0 ig <- which.min (g) bnew <- rep (0, length (b)) bnew [ig] <- d <- bnew - b tau <- sum (d * crossprod (, w * res)) / sum (w * outer (d, d)) tau <- min (, ma (0, tau)) 6 b <- b + tau * d 7 if (verbose) { 8 cat ("itel: ", formatc (itel,, ), 9 "fold: ", formatc (fold, 0,, "f"), 0 "fnew: ", formatc (fnew, 0,, "f"), } "\n") if (((fold - fnew) < eps) (itel == itma)) { break () } itel <- itel + 6 fold <- fnew 7 } 8 return (list (b = b, f = fnew)) 9 }

27 POLYNOMIAL SINGLE COMPONENT ANALYSIS 7

28 8 JAN DE LEEUW Appendi B. Figures fied (a) fied (b) fied (c) fied (d) fied (e) fied (f) fied (g) fied (h) fied (i) fied (j) fied (k) fied (l) Figure. Twelve solutions with a fied normal kernel. (a) sr; (b) sr; (c) sr; (d) sr; (e) sr; (f) sr; (g) sr; (h) sr; (i) sr; (j) sr; (k) sr; and, (l) sr.

29 POLYNOMIAL SINGLE COMPONENT ANALYSIS free (a) free (b) free (c) free (d) free (e) free (f) free (g) free (h) free (i) free (j) free (k) free (l) Figure. Twelve solutions with a free kernel. (a) sr; (b) sr; (c) sr; (d) sr; (e) sr; (f) sr; (g) sr; (h) sr; (i) sr; (j) sr; (k) sr; and, (l) sr.

30 0 JAN DE LEEUW moment (a) moment (b) moment (c) moment (d) moment (e) moment (f) moment (g) moment (h) moment (i) moment (j) moment (k) moment (l) Figure. Twelve solutions with a moment kernel. (a) sr; (b) sr; (c) sr; (d) sr; (e) sr; (f) sr; (g) sr; (h) sr; (i) sr; (j) sr; (k) sr; and,?? sr.

31 POLYNOMIAL SINGLE COMPONENT ANALYSIS cdf cdf cdf (a) (b) (c) cdf cdf cdf (d) (e) (f) cdf cdf cdf (g) (h) (i) cdf cdf cdf (j) (k) (l) Figure. Twelve solutions with a CDF kernel. (a) sr; (b) sr; (c) sr; (d) sr; (e) sr; (f) sr; (g) sr; (h) sr; (i) sr; (j) sr; (k) sr; and, (l) sr.

32 JAN DE LEEUW Appendi C. Tables fied r = r = r = s = s = s = s = free r = r = r = s = s = s = s = moment r = r = r = s = s = s = s = cdf r = r = r = s = s = s = s = Table. Minimum loss function values for different kernel types, polynomial degrees s and moment orders r.

33 POLYNOMIAL SINGLE COMPONENT ANALYSIS fied r = r = r = s = 6 s = s = s = 0 00 free r = r = r = s = 00 s = s = 0 00 s = moment r = r = r = s = 00 s = s = s = cdf r = r = r = s = s = s = 8 00 s = 00 Table. Number of iterations for different kernel types, polynomial degrees s and moment orders r. Department of Statistics, University of California, Los Angeles, CA address, Jan de Leeuw: deleeuw@stat.ucla.edu URL, Jan de Leeuw:

MULTIVARIATE CUMULANTS IN R. 1. Introduction

MULTIVARIATE CUMULANTS IN R. 1. Introduction MULTIVARIATE CUMULANTS IN R JAN DE LEEUW Abstract. We give algorithms and R code to compute all multivariate cumulants up to order p of m variables. 1. Introduction Multivariate cumulants are of increasing