Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Size: px
Start display at page:

Download "Analyzing Tensor Power Method Dynamics in Overcomplete Regime"

Transcription

1 Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical Engineering an Computer Science University of California, Irvine Engineering Hall, Room 4408 Irvine, CA 92697, USA Rong Ge Department of Computer Science Due University 308 Research Drive (LSRC Builing), Room D226 Durham, NC 27708, USA Maji Janzamin Department of Electrical Engineering an Computer Science University of California, Irvine Engineering Hall, Room 4406 Irvine, CA 92697, USA Eitor: Tommi Jaaola Abstract We present a novel analysis of the ynamics of tensor power iterations in the overcomplete regime where the tensor CP ran is larger than the input imension. Fining the CP ecomposition of an overcomplete tensor is NP-har in general. We consier the case where the tensor components are ranomly rawn, an show that the simple power iteration recovers the components with boune error uner mil initialization conitions. We apply our analysis to unsupervise learning of latent variable moels, such as multi-view mixture moels an spherical Gaussian mixtures. Given the thir orer moment tensor, we learn the parameters using tensor power iterations. We prove it can correctly learn the moel parameters when the number of hien components is much larger than the ata imension, up to = o( 1.5 ). We initialize the power iterations with ata samples an prove its success uner mil conitions on the signal-to-noise ratio of the samples. Our analysis significantly expans the class of latent variable moels where spectral methos are applicable. Our analysis also eals with noise in the input tensor leaing to sample complexity result in the application to learning latent variable moels. Keywors: tensor ecomposition, tensor power iteration, overcomplete representation, unsupervise learning, latent variable moels c 2017 Animashree Ananumar, Rong Ge, an Maji Janzamin. License: CC-BY 4.0, see Attribution requirements are provie at

2 Ananumar, Ge, an Janzamin 1. Introuction CANDECOMP/PARAFAC (CP) ecomposition of a symmetric tensor T R is the process of ecomposing it into a succinct sum of ran-one tensors, given by T = j [] λ j a j a j a j, λ j R, a j R, (1) where enotes the outer prouct. The minimum for which the tensor can be ecompose in the above form is calle the (symmetric) tensor ran. Tensor power iteration is a simple, popular an efficient metho for recovering the tensor ran-one components a j s. The tensor power iteration is given by T (I, x, x) x T (I, x, x), (2) where T (I, x, x) := j,l [] x j x l T (:, j, l) R is a multilinear combination of tensor fibers, an is the l 2 norm operator. See Section 1.3 for an overview of tensor notations an preliminaries. The tensor power iteration is a generalization of matrix power iteration: for matrix M R, the power iteration is given by x Mx/ Mx. Dynamics an convergence properties of matrix power iterations are well unerstoo (Horn an Johnson, 2012). On the other han, a theoretical unerstaning of tensor power iterations is much more limite. Tensor power iteration can be viewe as a graient escent step (with infinite step size), corresponing to the problem of fining the best ran-1 approximation of the input tensor T (Ananumar et al., 2014c). This optimization problem is non-convex. Unlie the matrix case, where the number of isolate stationary points of power iteration is at most the imension (given by eigenvectors corresponing to unique eigenvalues), in the tensor case, the number of stationary points is, in fact, exponential in the input imension (Cartwright an Sturmfels, 2013). This maes the analysis of tensor power iteration far more challenging. Despite the above challenges, many avances have been mae in unerstaning the tensor power iterations in specific regimes. When the components a j s are orthogonal to one another, it is nown that there are no spurious local optima for tensor power iterations, an the only stable fixe points correspon to the true a j s (Zhang an Golub, 2001; Ananumar et al., 2014c). Any tensor with linearly inepenent components a j s can be orthogonalize, via an invertible transformation (whitening) an thus, its components can be recovere efficiently. A careful perturbation analysis in this setting was carrie out in Ananumar et al. (2014c). The framewor in Ananumar et al. (2014c) is however not applicable in the overcomplete setting, where the tensor ran excees the imension. Such overcomplete tensors cannot be orthogonalize an fining guarantee ecomposition is a challenging open problem. It is nown that fining CP tensor ecomposition is NP-har (Hillar an Lim, 2013). In this paper, we mae significant heaway in showing that the simple power iterations can recover the components in the overcomplete regime uner a set of mil conitions on the components a j s. 2

3 Tensor Power Metho Dynamics in Overcomplete Regime Overcomplete tensors also arise in many machine learning applications such as moments of many latent variable moels, e.g., multiview mixtures, inepenent component Analysis (ICA), an sparse coing moels, where the number of hien variables excees the input imensions (Ananumar et al., 2015). Overcomplete moels often have impressive empirical performance (Coates et al., 2011), an can provie greater flexibility in moeling, an are more robust to noise (Lewici an Sejnowsi, 2000). By stuying algorithms for overcomplete tensor ecomposition, we expan the class of moels that can be learnt efficiently using simple spectral methos such as tensor power iterations. Note there are other algorithms for ecomposing overcomplete tensors (De Lathauwer et al., 2007; Goyal et al., 2013; Bhasara et al., 2013), but they all require tensors of at least 4-th orer an require large computational complexity. Ge an Ma (2015) wors for 3r orer tensor but requires quasi-polynomial time. The main contribution of this paper is an analysis for the practical power metho in the overcomplete regime. 1.1 Summary of Results We analyze the ynamics of thir orer tensor power iterations in the overcomplete regime. We assume that the tensor components a j s are ranomly rawn from the unit sphere. Since general tensor ecomposition is challenging in the overcomplete regime, we argue that this is a natural first step to consier for tractable recovery. We characterize the basin of attraction for the local optima near the ran-one components a j s. We show that uner mil initialization conition, there is fast convergence to these local optima in O(log log ) iterations. This result is the core technical analysis of this paper state in the following theorem. Theorem 1 (Dynamics of tensor power iteration) Consier tensor ˆT = T + E such that exact tensor T has ran- ecomposition in (1) with ran-one components a j R, j [] being uniformly i.i.. rawn from the unit -imensional sphere, an the ratio of maximum an minimum (in absolute value) weights λ j s being constant. In aition, suppose the perturbation tensor E has boune spectral norm as ( ) E ɛ, where ɛ < o. (3) Let tensor ran = o( 1.5 ), an the unit-norm initial vector x (1) satisfy the correlation boun x (1), a j β, (4) w.r.t. some true component a j, j [], for some β > (log ) c for some universal constant c > 0. After N = Θ (log log ) iterations, the tensor power iteration in (2) outputs a vector having w.h.p. a constant correlation with the true component a j as x (N+1), a j 1 γ, for any fixe constant γ > 0. As a corollary, this result can be use for learning latent variable moels such as multiview mixtures. We show that the above initialization conition is satisfie using a sample with mil signal-to-noise ratio; see Section 2 for more etails on this. 3

4 Ananumar, Ge, an Janzamin The above result is a significant improvement over the recent analysis by Ananumar et al. (2015, 2014a,b) for overcomplete tensor ecomposition. In these wors, it is require for the initialization vectors to have a constant amount of correlation with the true a j s. However, obtaining such strong initializations is usually not realistic in practice. On the other han, the initialization conition in (4) is mil, an ecaying even when the ran is significantly larger than imension ; up to = o( 1.5 ). In learning the mixture moel, such initialization vectors can be obtaine as samples from the mixture moel, even when there is a large amount of noise. Given this improvement, we combine our analysis in Theorem 1, an the guarantees in Ananumar et al. (2015, 2014a), proving that the moel parameters can be recovere consistently. A etaile proof outline for Theorem 1 is provie in Section 3.1. Uner the ranom assumption, it is not har to show that the first iteration of tensor power upate maes progress. However, after the first iteration, the input vector an the tensor components are no longer inepenent of each other. Therefore, we cannot irectly repeat the same argument for the secon step. How o we analyze the secon step even though the vector an tensor components are correlate? The main intuition is to characterize the epenency between the vector an the tensor components, an show that there is still enough ranomness left for us to repeat the argument. This iea was inspire by the analysis of Approximate Message Passing (AMP) algorithms (Bayati an Montanari, 2010). However, our analysis here is very ifferent in several ey aspects: 1) In approximate message passing, typically the analysis wors in the large system limit, where the number of iterations is fixe an the imension goes to infinity. Here we can hanle a superconstant number of iterations O(log log ), even for finite ; 2) Usually is assume to be a constant factor times in the AMP-lie analysis, while here we allow them to be polynomially relate. 1.2 Relate Wor Tensor ecomposition for learning latent variable moels: In the introuction, some relate wors are mentione which stuy the theoretical an practical aspects of spectral techniques for learning latent variable moels. Among them, Ananumar et al. (2014c) provie the analysis of tensor power iteration for learning several latent variable moels in the unercomplete regime. Ananumar et al. (2014a) provie the analysis in the overcomplete regime an Ananumar et al. (2014b) provie tensor concentration bouns an apply the analysis in (Ananumar et al., 2014a) to learning LVMs proposing tight sample complexity guarantees. Learning mixture of Gaussians: Here, we provie a subset of relate wors stuying learning mixture of Gaussians which are more comparable with our result. For a more etaile list of these wors, see Ananumar et al. (2014c); Hsu an Kaae (2013). The problem of learning mixture of Gaussians ates bac to the wor by Pearson (1895). They propose a moment-base technique that involves solving systems of multivariate polynomials which is in general challenging in both computational an statistical sense. Recently, lots of stuies on learning Gaussian mixture moels have been one improving both aspects which can be ivie to two main classes: istance-base an spectral methos. 4

5 Tensor Power Metho Dynamics in Overcomplete Regime Distance-base methos impose separation conition on the mean vectors showing that uner enough separation the parameters can be estimate. Among such approaches, we can mention Dasgupta (1999); Vempala an Wang (2002); Arora an Kannan (2005). As iscusse in the summary of results, these results wor even if > 1.5 as long as the separation conition between means is satisfie, but our wor can tolerate higher level of noise in the regime of = o( 1.5 ) with polynomial computational complexity. The guarantees in (Vempala an Wang, 2002) also wor in the high noise regime but nee higher computational complexity as polynomial in O() an. In the spectral approaches, the observe moments are constructe an the spectral ecomposition of the observe moments are performe to recover the parameters (Kalai et al., 2010; Ananumar et al., 2012, 2014b). Kalai et al. (2010) analyze the problem of learning mixture of two general Gaussians an provie algorithm with high orer polynomial sample an computational complexity. Note that in general, the complexity of such methos grows exponentially with the number of components without further assumptions (Moitra an Valiant, 2010). Hsu an Kaae (2013) provie a spectral algorithm uner non-egeneracy conitions on the mean vectors an propose guarantees with polynomial sample complexity epening on the conition number of the moment matrices. Ananumar et al. (2014b) perform tensor power iteration on the thir orer moment tensor to recover the mean vectors in the overcomplete regime as long as = o( 1.5 ), but nee very goo initialization vector having constant correlation with the true mean vector. Here, we improve the correlation level require for convergence. 1.3 Notation an Tensor Preliminaries Let [] := {1, 2,..., }, an v enote the l 2 norm of vector v. We use Õ an Ω to hie polylog factors in asymptotic notations O an Ω, respectively. Tensor preliminaries: A real p-th orer tensor T p R is a member of the outer prouct of Eucliean spaces R. The ifferent imensions of the tensor are referre to as moes. For instance, for a matrix, the first moe refers to columns an the secon moe refers to rows. In aition, fibers are higher orer analogues of matrix rows an columns. A fiber is obtaine by fixing all but one of the inices of the tensor (an is arrange as a column vector). For example, for a thir orer tensor T R, the moe-1 fiber is given by T (:, j, l). Similarly, slices are obtaine by fixing all but two of the inices of the tensor. For example, for the thir orer tensor T, the slices along 3r moe are given by T (:, :, l). We view a tensor T R as a multilinear form. In particular, for vectors u, v, w R, we have 1 T (I, v, w) := v j w l T (:, j, l) R, (5) j,l [] which is a multilinear combination of the tensor moe-1 fibers. Similarly T (u, v, w) R is a multilinear combination of the tensor entries, an T (I, I, w) R is a linear combination of the tensor slices. A 3r orer tensor T R is sai to be ran-1 if it can be written in the form T = λ a b c T (i, j, l) = λ a(i) b(j) c(l), (6) 1. Compare with the matrix case where for M R, we have M(I, u) = Mu := j [] ujm(:, j). 5

6 Ananumar, Ge, an Janzamin h z 1 z 2 z p Figure 1: Multiview mixture moel. where notation represents the outer prouct an a, b, c R are unit vectors. A tensor T R is sai to have a CP ran at most if it can be written as the sum of ran-1 tensors as T = i [] λ i a i b i c i, λ i R, a i, b i, c i R. (7) For thir orer tensor T R, the spectral (operator) norm is efine as T := sup T (u, v, w). u = v = w =1 In the rest of the paper, Section 2 escribes how to apply our tensor results to learning multiview mixture moels. Section 3 illustrates the proof ieas, with more etails in the Appenix. Finally we conclue in Section Learning Multiview Mixture Moel through Tensor Methos We propose our main technical result in Section 1.1 proviing convergence guarantees for the tensor power iterations given mil initialization conitions in the overcomplete regime; see Theorem 1. Along this result we provie the application to learning multiview mixtures moel in Theorem 2. In this section, we briefly introuce the tensor ecomposition framewor as the learning algorithm an then state the learning guarantees with more etails an remars. 2.1 Multiview Mixture Moel Consier an exchangeable multiview mixture moel with components an p 3 views; see Figure 1. Suppose that hien variable h is a iscrete categorical ranom variable taing one of the states. It is convenient to represent it by basis vectors such that h = e j R if an only if it taes the j-th state. Note that e j R enotes the j-the basis vector in the -imensional space. The prior probability for each hien state is Pr[h = e j ] = λ j, j []. For simplicity, in this paper we assume all the λ i s are the same. However, similar argument wors even when the ratio of maximum an minimum prior probabilities λ max /λ min is boune by some constant. The variables (views) z l R are relate to the hien state through factor matrix A R such that z l = Ah + η l, l [p], 6

7 Tensor Power Metho Dynamics in Overcomplete Regime Algorithm 1 Learning multiview mixture moel via tensor power iterations Require: 1) Thir orer moment tensor T R in (8), 2) n samples of z 1 in multiview mixture moel as z (τ) 1, τ [n], an 3) number of iterations N. 1: for τ = 1 to n o 2: Initialize unit vectors x (1) τ z (τ) 1 / z (τ) 1. 3: for t = 1 to N o 4: Tensor power upates (see (5) for the efinition of the multilinear form): x (t+1) τ = ( T I, x (t) τ ( I, x (t) τ T ), x (t) τ, x (t) τ ), (9) 5: en for 6: en for { } 7: return the output of Proceure 2 with input x (N+1) τ : τ [n] as estimates x j. where zero-mean noise vectors η l R are inepenent of each other an the hien state h. Given this, the variables (views) z l R are conitionally inepenent given the latent variable h, an the conitional means are E[z l h = e j ] = a j, where a j R enotes the j-th column of factor matrix A = [a 1 a ] R. In aition, the above properties imply that the orer of observations z l o not matter an the moel is exchangeable. The goal of the learning problem is to recover the parameters of the moel (factor matrix) A given observations. For this moel, the thir orer 2 observe moment has the form (Ananumar et al., 2014c) E[z 1 z 2 z 3 ] = j [] λ j a j a j a j. (8) Hence, given thir orer observe moment, the unsupervise learning problem (recovering factor matrix A) reuces to computing a tensor ecomposition as in (8). 2.2 Tensor Decomposition Algorithm The algorithm for unsupervise learning of multiview mixture moel is base on tensor ecomposition techniques provie in Algorithm 1. The main step in (9) performs tensor power iteration; 3 see (5) for the multilinear form efinition. After running the algorithm for all ifferent initialization vectors, the clustering process from Ananumar et al. (2015) ensures that the best converge vectors are returne as the estimation of true components a j. 2. It is enough to form the thir orer moment for our learning purpose. 3. This is the generalization of matrix power iteration to 3r orer tensors. 7

8 Ananumar, Ge, an Janzamin Proceure 2 Clustering process (Ananumar et al., 2015) { } Require: Tensor T R, set S := x (N+1) τ : τ [n], parameter ν. 1: while S is not empty o 2: Choose x S which maximizes T (x, x, x). 3: Do N more iterations of (9) starting from x. 4: Output the result of iterations enote by ˆx. 5: Remove all the x S with x, ˆx > ν/2. 6: en while 2.3 Learning Guarantees We assume a Gaussian prior on the mean vectors, i.e., the vectors a j N (0, I /), j [] are i.i.. rawn from a stanar multivariate Gaussian istribution with unit expecte square norm. Note that in the high imension (growing ), this assumption is the same as uniformly rawing from unit sphere since the norm of vector concentrates in the high imension an there is no nee for normalization. Even though we impose a prior istribution, we o not use a MAP estimator, since the corresponing optimization is NP-har. Instea, we learn the moel parameters through ecomposition of the thir orer moments through tensor power iterations. The assumption of a Gaussian prior is stanar in machine learning applications. We impose it here for tractable analysis of power iteration ynamics. Such Gaussian assumptions have been use before for analysis of other iterative methos such as approximate message passing algorithms, an there are eviences that similar results hol for more general istributions; see (Bayati an Montanari, 2010) an references there. As explaine in the previous sections, we use tensor power metho to learn the components a j s, an the metho is initialize with observe samples z i. Intuitively, this initialization is useful since z i = Ah+η i is a perturbe version of esire parameter a j (when h = e j ). Thus, we present the result in terms of the signal-to-noise (SNR) ratio which is the expecte norm of signal a j (which is one here) ivie by the expecte norm of noise η i, i.e., the SNR in the i-th sample z i = a j + η i (assume h = e j ) is efine as SNR := E[ a j ]/E[ η i ]. This specifies how much noise the initialization vector z i can tolerate in orer to ensure the convergence of tensor power iteration to a esire local optimum. We now propose the conitions require for recovery guarantees, an state a brief explanation of them. Conitions for Theorems 2 an 3: Ran conition: o( 1.5 ). The columns of A are uniformly i.i.. rawn from unit -imensional sphere. The noise vectors η l, l [3], are inepenent of matrix A an each other. In aition, the signal-to-noise ratio (SNR) is w.h.p. boune as ( ) max{, } SNR Ω, 1 β for some β (log ) c for universal constant c > 0. 8

9 Tensor Power Metho Dynamics in Overcomplete Regime The ran conition bouns the level of overcompleteness for which the recovery guarantees are satisfie. The ranom assumption on the columns of A are crucial for analyzing the ynamics of tensor power iteration. We use it to argue there exists enough ranomness left in the components after conitioning on the previous iterations; see Section 3.1 for the etails. The boun on the SNR is require to mae sure the given sample use for initialization is close enough to the corresponing mean vector. This ensures that the initial vector is insie the basin-of-attraction of the corresponing component, an hence, the convergence to the mean vector can be guarantee. Uner these assumptions we have the following theorem. Theorem 2 (Learning multiview mixture moel: closeness to single columuns) Consier a multiview mixture moel (or a spherical Gaussian mixture) in the above settings with components in imensions. If the above conitions hol, then the tensor power iteration converges to a vector close to one of the true mean vectors a j s (having constant correlation). In particular, for milly overcomplete moels, where = α for some constant α > 1, the signal-to-noise ratio (SNR) is as low as Ω( 1/2+ɛ ), for any ɛ > 0. Thus, we can learn mixture moels with a high level of noise. In general, we establish how the require noise level scales with the number of hien components, as long as = o( 1.5 ). The above theorem states convergence to esire local optima which are close to true components a j s. In Theorem 3, we show that we can sharpen the above result, by jointly iterating over the recovere vectors, an consistently recover the components a j s. This result also uses the analysis from Ananumar et al. (2015). Theorem 3 (Learning multiview mixture moel: recovering the factor matrix) Assume the above conitions hol. The initialization of power iteration is performe by samples of z 1 in multiview mixture moel. Suppose the tensor power iterations are at least initialize once for each a j, j [] such that z 1 = a j +η 1. 4 Then by using the exact 3r orer moment tensor in (8) as input, the tensor ecomposition algorithm outputs an estimate  (up to permutation of its columns) satisfying w.h.p. (over the ranomness of the components a j s)  A ɛ, F where the number of iterations of the algorithm is N = Θ ( log ( 1 ɛ ) + log log ). See Section 3 for the proof. The above theorems assume the exact thir orer tensor is given to the algorithm. We provie the results given empirical tensor in Section Learning spherical Gaussian mixtures: Consier a mixture of ifferent Gaussian vectors with spherical covariance. Let a j R, j [] enote the mean vectors an the covariance matrices are σ 2 I. Assuming the parameter σ is nown, the moifie thir orer observe moment M 3 := E[z z z] σ 2 i [] (E[z] e i e i + e i E[z] e i + e i e i E[z]) 4. Note that this happens for component j with high probability when the number of initializations is proportional to inverse prior probability corresponing to that mixture. 9

10 Ananumar, Ge, an Janzamin has the tensor ecomposition form (Hsu an Kaae, 2012) M 3 = j [] λ j a j a j a j, where λ j is the probability of rawing j-th Gaussian mixture. The above guarantees can be applie to learning mean vectors a j in this moel with the aitional property that the noise is spherical Gaussian. Learning multiview mixture moel with istinct factor matrices: Consier the multiview mixture moel with ifferent factor matrices where the first three views are relate to the hien state as z 1 = Ah + η 1, z 2 = Bh + η 2, z 3 = Ch + η 3. Then, the guarantees in the above theorem can be extene to recovering the columns of all three factor matrices A, B, an C with appropriate moifications in the power iteration algorithm as follows. First the upate formula (9) is change as ( ) ( ) ( x (t+1) 1,τ = T T I, x (t) 2,τ, x(t) 3,τ ( I, x (t) 2,τ, x(t) 3,τ ), x (t+1) 2,τ = T ( T x (t) 1,τ x (t) 1,τ, I, x(t) 3,τ, I, x(t) 3,τ ), x (t+1) 3,τ = ) T x (t) 1,τ, x(t) 2,τ (, I ) x (t), 1,τ, x(t) 2,τ, I which is the alternating asymmetric version of symmetric power iteration in (9). Here, we alternate among ifferent moes of the tensor. In aition, the initialization for each moe of the tensor is appropriately performe with the samples corresponing to that moe. Note that the analysis still wors in the asymmetric version since there exists even more inepenence relationships through the iterations of the power upate because of introucing new ranom matrices B an C Sample Complexity Analysis In the previous section, we assume the exact thir orer tensor in (8) is given to the tensor ecomposition Algorithm 1. We now estimate the tensor given n samples z (i) 1, z(i) 2, z(i) 3, i [n], as ˆT = 1 z (i) 1 z (i) 2 z (i) 3 n. (10) i [n] For the multiview mixture moel introuce in Section 2.1, let the noise vector η l be spherical, an ζ 2 enote the variance of each entry of noise vector. We now provie the following recovery guarantees. Aitional conitions for Theorem 4: Let E 1 := [η (1) 1, η(2) 1,..., η(n) 1 ] R n, where η (i) 1 R is the i-th sample of noise vector η 1. These noise matrices satisfy the following RIP property which is aapte from Canes an Tao (2006). ( ) Matrix E 1 R n satisfies a wea RIP conition such that for any subset of O number of columns, the spectral norm of E log 2 1 restricte to those columns is boune by 2. The same conition is satisfie for similarly efine noise matrices E 2 an E 3. T 10

11 Tensor Power Metho Dynamics in Overcomplete Regime The number of samples n satisfies lower boun such that ( ) ( ) ζ n + λ max + ζ 2 n n + λ 1.5 max n ( ) { } + ζ n + min ɛ n, Õ(λ min), (11) ( / ) where ɛ < o. Theorem 4 (Learning multiview mixture moel) Consier the empirical tensor in (10) as the input to tensor ecomposition Algorithm 1. Suppose the above aitional conitions are also satisfie. Then, the same guarantees as in Theorem 2 hol. In aition, the same guarantees as in Theorem 3 also hol with the recovery boun (up to permutation of columns of Â) change as ( ) Â A F E Õ, where E enotes the perturbation tensor originate from empirical estimation in (10), an its spectral norm E is boune by the LHS of (11). See Section 3 for the proof. 3. Proof Outline Our main technical result is the analysis of thir orer tensor power iteration provie in Theorem 1 which also allows to tolerate some amount of noise in the input tensor. We analyze the noiseless an noisy settings in ifferent ways. We basically first prove the result for the noiseless setting where the input tensor has an exact ran- ecomposition in (1). When the noise is also consiere, we show that the contribution of noise in the analysis is ominate by the main signal, an thus, the same result still hols. For the rest of this section we focus on the noiseless setting, while we iscuss the proof ieas for the noisy case in Section 3.2. We first iscuss the proof of Theorem 3 which involves two phases. In the first phase, we show that uner certain small amount of correlation (see (13)) between the initial vector an the true component, the power iteration in (2) converges to some vector which has constant correlation with the true component. This result is the core technical analysis of this paper which is provie in Lemma 5. In the secon phase, we incorporate the result of Ananumar et al. (2015, 2014a) which guarantees the convergence of power iteration (followe by a coorinate escent iteration) given initial vectors having constant correlation with the true components. This is state in Lemma 6. To simplify the notation, we consier the tensor 5 λ min T = j [] a j a j a j, a j N (0, 1 I ). (12) 5. In the analysis, we assume that all the weights are equal to one which can be generalize to the case when the ratio of maximum an minimum weights (in absolute value) are constant. 11

12 Ananumar, Ge, an Janzamin Notice that this is exactly proportional to the 3r orer moment tensor of the multiview mixture moel in (8). The following lemma is restatement of Theorem 1 in the noiseless setting. Lemma 5 (Dynamics of tensor power iteration, phase 1) Consier the ran- tensor T of the form in (12). Let tensor ran = o( 1.5 ), an the unit-norm initial vector x (1) satisfies the correlation boun x (1), a j β, (13) w.r.t. some true component a j, j [], for some β > (log ) c for some universal constant c > 0. After N = Θ (log log ) iterations, the tensor power iteration in (2) outputs a vector having w.h.p. a constant correlation with the true component a j as for any fixe constant γ > 0. x (N+1), a j 1 γ, The proof outline of above lemma is provie in Section 3.1. Next, we provie the following lemma from Ananumar et al. (2015) which provies the ynamics of tensor power iteration when the initialization satisfies the constant correlation boun state below. Lemma 6 (Dynamics of tensor power iteration, phase 2) Consier the ran- tensor T of the form in (12) with ran conition o( 1.5 ). Let the initial vectors x (1) j satisfy the constant correlation boun x (1) j, a j 1 γ j, w.r.t. true components a j, j [], for some constants γ j > 0. Let the output of the tensor power upates 6 in (2) applie to all these ifferent initialization vectors after N = Θ ( log 1 ) ɛ iterations be stace as columns of matrix Â. Then, we have w.h.p.7 Â A F ɛ, where the recovery error is up to permutation of columns of Â. See Ananumar et al. (2015) for the proof of above lemma. Given the above two lemmas, the learning result in Theorem 3 is irectly prove. Proof of Theorem 3 The result is prove by combining Lemma 5 an Lemma 6. Note that the initialization conition in (4) is w.h.p. satisfie given the SNR boun assume. Proof of Theorem 4 In Theorem 3, we provie the result given exact tensor by combining Lemmas 5 an 6. The only ifference here is we are given an empirical estimate of the tensor an we nee to incorporate the effect of noise in the empirical input. We now use 6. This result also nees an aitional step of coorinate escent iterations since the true components are not the fixe points of power iteration; see Ananumar et al. (2015, 2014a) for the etails. 7. Ananumar et al. (2015, 2014a) recover the vector up to sign since they wor in the asymmetric case. In symmetric case it is easy to resolve sign ambiguity issue. 12

13 Tensor Power Metho Dynamics in Overcomplete Regime Theorem 1 that characterizes the effect of noise in first step (aapting Lemma 5 to noisy setting), an Ananumar et al. (2015) that provie the result of Lemma 6 in noisy setting. In aition, the tensor concentration boun for multiview mixture moel is analyze in Theorem 1 of Ananumar et al. (2014b) (Lemma 56 in Ananumar et al. (2015)) that shows the error between empirical an exact tensors is boune as ˆT T ζ ( ) ( ) ( n + λ max + ζ 2 n n + λ 1.5 max + ζ n n + ). n The sample complexity requirement in (11) is then erive by imposing the error requirements in our noisy analysis of tensor power ynamics in Theorem 1 (see Equation (3)) an the noisy analysis of Lemma 6 (see Theorem 1 of Ananumar et al. (2015) where the perturbation tensor E nees to be boune as E Õ(λ min)). The final recovery error on  A F is also from Theorem 1 of Ananumar et al. (2015). 3.1 Proof Outline of Lemma 5 (Noiseless Case of Theorem 1) First step: We first intuitively show the first step of the algorithm maes progress. Suppose the tensor is T = j [] a j a j a j, an the initial vector x has correlation x, a 1 β with the first component. The result of the first iteration is the normalize version of the following vector: x = j [] a j, x 2 a j. Intuitively, this vector shoul have roughly a 1, x = 2β correlation with a 2 1 (as the other terms are ranom they on t contribute much). On the other han, the norm of this vector is roughly O( /): this is because a j, x 2 for j 1 is roughly 8 1/, an the sum of ranom vectors with length 1/ will have length roughly O( /). These arguments can be mae precise showing the normalize version x/ x has correlation 2β with a 1 ensuring progress in the first step. Going forwar: As we explaine, the basic iea behin proving Lemma 5 is to characterize the conitional istribution of ranom Gaussian tensor components a j s given previous iterations. In particular, we show that the resiual inepenent ranomness left in these conitional istributions is large enough an we can exploit it to obtain tighter concentration bouns throughout the analysis of the iterations. The Gaussian assumption on the components, an small enough number of iterations are crucial in this argument. Notations: For two vectors u, v R, the Haamar prouct enote by is efine as the entry-wise multiplication of vectors, i.e., (u v) j := u j v j for j []. For a matrix A, let P A enote the projection operator to the subspace orthogonal to column span of A. For a subspace R, let R enote the space orthogonal to it. Therefore, for a subspace R, the projection operator on the subspace orthogonal to R is equivalently enote by P R or P R. For a ranom matrix D, let D {u = Dv} enote the conitional istribution of D 8. The correlation between two unit Gaussian vectors in imensions is roughly 1/. 13

14 Ananumar, Ge, an Janzamin given linear constraints u = Dv. We also use equality notation () = to enote the equivalence in istribution. Lemma 5 involves analyzing the ynamics of power iteration in (2) for 3r orer ran- T (I,x,x) T (I,x,x) tensors. For the ran- tensor in (12), the power iterative form x can be written as x (t+1) = A ( A x (t)) 2 A ( A x (t)) 2, (14) where the multilinear form in (5) is use. Here, A = [a 1 a ] R enotes the factor matrix, an for vector y R, y 2 := y y R represents the element-wise square of entries of y. We consier the case where a i N (0, 1 I) are i.i.. rawn an we analyze the evolution of the ynamics of the power upate. As explaine earlier, for a given initialization x (1), the upate in the first step can be analyze easily since A is inepenent of x (1). However, in subsequent steps, the upates x (t) are epenent on A, an it is no longer clear how to provie a tight boun on the evolution of x (t). In this wor, we provie a careful analysis by controlling the amount of correlation buil-up by exploiting the structure of Gaussian matrices uner linear constraints. This enables us to provie better guarantees for matrix A with Gaussian entries compare to general matrices A. Intermeiate upate steps an variables:before we procee, we nee to brea own power upate in (2) an introuce some intermeiate upate steps an variables as follows. Recall that x (1) R enotes the initialization vector. Without loss of generality, let us analyze the convergence of power upate to first component of ran- tensor T enote by a 1. Hence, let the first entry of x (1) enote by x (1) 1 be the maximum entry (in absolute value) of x (1), i.e., x (1) 1 = x (1). Let B := [a 2 a 3 a ] R ( 1), an therefore A = [a 1 B]. We brea the power upate formula in (2) into a few steps by introucing intermeiate variables y (t) R an x (t+1) R as y (t) := A x (t), x (t+1) := A(y (t) ) 2. Note that x (t+1) is the unnormalize version of x (t+1) := x (t+1) / x (t+1), i.e., x (t+1) := T (I, x (t), x (t) ). Thus, we nee to jointly analyze the ynamics of all variables x (t), y (t) an (y (t) ) 2. Define [ X [t] := x (1)... x (t)] [, Y [t] := y (1)... y (t)]. Matrix B is ranomly rawn with i.i.. Gaussian entries B ij N (0, 1 ). As the iterations procee, we consier the following conitional istributions B (t,1) := B {X [t], Y [t] }, B (t,2) := B {X [t+1], Y [t] }. (15) Thus, B (t,1) is the conitional istribution of B at the mile of t th iteration (before upate step x (t+1) = A(y (t) ) 2 ) an B (t,2) is the conitional istribution at the en of t th iteration (after upate step x (t+1) = A(y (t) ) 2 ). By analyzing the above conitional istributions, we can characterize the left inepenent ranomness in B. 14

15 Tensor Power Metho Dynamics in Overcomplete Regime Conitional Distributions In orer to characterize the conitional istribution of B uner evolution of x (t) an y (t) in (15), we exploit the following basic fact (see (Bayati an Montanari, 2010) for proof). Lemma 7 (Conitional istribution of Normal matrices uner linear conition) Consier ranom matrix D with i.i.. Gaussian entries D ij N (0, σ 2 ). Conitione on u = Dv with nown vectors u an v, the matrix D is istribute as D {u = Dv} () = 1 v 2 uv + DP v, where ranom matrix D is an inepenent copy of D with i.i.. Gaussian entries D ij N (0, σ 2 ), an P v is the projection operator on to the subspace orthogonal to v. We refer to DP v as the resiual ranom matrix since it represents the remaining ranomness left after conitioning. It is a ranom matrix whose rows are inepenent ranom vectors that are orthogonal to v, an the variance in each irection orthogonal to v is equal to σ 2. The above Lemma can be exploite to characterize the conitional istribution of B introuce in (15). However, a naive irect application using the constraint Y [t] = A X [t] is not transparent for analysis. The reason is the evolution of x (t) an y (t) are themselves governe by the conitional istribution of B given previous iterations. Therefore, we nee the following recursive version of Lemma 7 which can be immeiately argue by inuction. Corollary 8 (Iterative conitioning) Consier ranom matrix D with i.i.. Gaussian entries D ij N (0, σ 2 ), an let F () = P C DP R be the ranom Gaussian matrix whose columns are orthogonal to space C an rows are orthogonal to space R. Conitione on the linear constraint u = Dv, where 9 u C, the matrix F is istribute as F {u = Dv} () = 1 (P R v) 2 u(p R v) + P C DP {R,v}, where ranom matrix D is an inepenent copy of D with i.i.. Gaussian entries D ij N (0, σ 2 ). Thus, the resiual ranom matrix P C DP {R,v} is a ranom Gaussian matrix whose columns are orthogonal to C an rows are orthogonal to span{r, v}. The variance in any remaining imension is equal to σ Form of Iterative Upates Now we exploit the conitional istribution arguments propose in the previous section to characterize the conitional istribution of B given the upate variables x an y up to the current iteration; recall (15) where B (t,1) is the conitional istribution of B at the mile of t th iteration an B (t,2) at the en of t th iteration. Before that, we nee to introuce some more intermeiate variables. 9. We nee that u C, otherwise the event u = Dv is impossible. 15

16 Ananumar, Ge, an Janzamin Intermeiate variables: We separate the first entry of y an (y) 2 from the rest, i.e., we have y (t) 1 = a 1 x (t), y (t) 1 = B x (t) (B (t 1,2) ) x (t), where y (t) 1 R 1 enotes y (t) R with the first entry remove. The upate formula for x (t+1) can be also ecompose as x (t+1) = (y (t) 1 )2 a 1 + Bw (t) (y (t) 1 )2 a 1 + B (t,1) w (t), where w (t) := (y (t) 1 ) 2 R 1, is the new intermeiate variable in the power iterations. Let B res. (t,1) an B res. (t,2) resiual ranom matrices corresponing to B (t,1) an B (t,2) respectively, an enote the u (t+1) := B (t,1) res. w (t), v (t) := (B (t 1,2) res. ) x (t), where u (t) R an v (t) R 1 are respectively the part of x (t) an y (t) 1 representing the resiual ranomness after conitioning on the previous iterations. We also summarize all variables an notations in Table 1 in the Appenix which can be use as a reference throughout the paper. Finally we mae the following observations. Lemma 9 (Form of iterative upates) The conitional istribution of B at the mile of t th iteration enote by B (t,1) satisfies (t,1) () B = i [t 1] u (i+1) (P W [i 1] w(i) ) P W [i 1] w(i) 2 + P X [i 1] x(i) (v (i) ) P X i [t] [i 1] x(i) 2 + B (t,1) res., (16) B res. (t,1) () = P X BP, (17) [t] W [t 1] where ranom matrix B is an inepenent copy of B with i.i.. Gaussian entries B ij N (0, 1 ). Similarly, the conitional istribution of B at the en of tth iteration enote by B (t,2) satisfies (t,2) () B = ( u (i+1) (P W [i 1] w(i) ) P W i [t] [i 1] w(i) 2 + P x(i) (v (i) ) ) X[i 1] P X [i 1] x(i) 2 + B res. (t,2), (18) B res. (t,2) () = P X [t] B P, (19) W [t] where ranom matrix B is an inepenent copy of B with i.i.. Gaussian entries B ij N (0, 1 ). The lemma can be irectly prove by applying the iterative conitioning argument in Corollary 8. See the etaile proof in the appenix. 16

17 Tensor Power Metho Dynamics in Overcomplete Regime x (t) y (t) w (t) x (t+1) y (t+1) upate steps at iteration t Figure 2: Flow of the power upate algorithm stating intermeiate steps. Iteration t for which the inuctive step shoul be argue is also inicate Analysis of Iterative Upates Lemma 9 characterizes the conitional istribution of B given the upate variables x an y up to the current iteration; see (15) for the efinition of conitional forms of B enote by B (t,1) an B (t,2). Intuitively, when the number of iterations t, then the resiual inepenent ranomness left in B (t,1) an B (t,2) (respectively enote by B res. (t,1) an B res. (t,2) ) characterize in Lemma 9 is large enough an we can exploit it to obtain tighter concentration bouns throughout the analysis of the iterations. Note that the goal is to show that uner t, the iterations x (t) converge to the true component with constant error, i.e., x (t), a 1 1 γ for some constant γ > 0. If this alreay hols before iteration t we are one, an if it oes not hol, next iteration is analyze to finally achieve the goal. This analysis is one via inuction argument. During the iterations, we maintain several invariants to analyze the ynamics of power upate. The goal is to ensure progress in each iteration as in (20). Inuction hypothesis: The following are assume at the beginning of the iteration t as inuction hypothesis; see Figure 2 for the scope of inuctive step. 1. Length of Projection on x: δ t P X [t 1] x(t) 1, where δ t is of orer 1/ polylog, an the value of δ t only epens on t an log. 2. Length of Projection on w: δ t 1 P W [t 2] w(t 1) t 1, P W [t 2] w(t 1) 1 t 1, where δ t is of orer 1/ polylog an t is of orer polylog. Both δ t an t only epen on t an log. 17

18 Ananumar, Ge, an Janzamin 3. Progress: 10 a 1, x (t) [δ t, t ] β2t 1, (20) a 1, P X [t 1] x(t) t β2t Norm of u,v: δ t 1 2 δ t 1 2 v(t 1) 2, u(t) 2 t 1. The analysis for basis of inuction an inuctive step are provie in Appenix B. 3.2 Effect of Noise in Theorem 1 Given ran- ranom tensor T in (12), an a starting point x (1), our analysis in the noiseless setting shows that the tensor power iteration in (2) outputs a vector which will be close to a j if x (1) has a large enough correlation with a j. Now suppose we are given noisy tensor ˆT = T + E where E has some small norm. In this case where the noise is also present, we get a sequence ˆx (t) = x (t) + ξ (t) where x (t) is the component not incorporating any noise (as in previous section 11 ), while ξ (t) represents the contribution of noise tensor E in the power iteration; see (21) below. We prove that ξ (t) is a very small noise that oes not change our calculations state in the following lemma. Lemma 10 (Bouning norm of error) Suppose the spectral norm of the error tensor E is boune as E ɛ /, where ɛ < o( /). Then the noise vector ξ (t) at iteration t satisfies the l 2 norm boun ξ (t) Õ(β2t 1 ɛ). Note that when t is the first number such that β2t 1 /, we have ξ (t) = o(1). Notice that since when β2t 1 /, the main inuction is alreay over an we now x (t) is constant close to the true component, an thus, the noise is always small. Proof iea: We now provie an overview of ieas for proving the above lemma; see Appenix D for the complete proof which is base on an inuction argument. We first 10. Note that although the bouns on y (t) 1 are argue at iteration t, the boun on the first entry of y(t) enote by y (t) 1 = a 1, x (t) is assume here in the inuction hypothesis at the en of iteration t Note that there is a subtle ifference between notation x (t) in the noiseless an noisy settings. In the noiseless setting, this vector is normalize, while in the noisy setting the whole vector ˆx (t) = x (t) + ξ (t) is normalize. 18

19 Tensor Power Metho Dynamics in Overcomplete Regime write the following recursion expaning the contribution of main signal an noise terms in the tensor power iteration as ( ) x (t+1) + ξ (t+1) = Norm ˆT (x (t) + ξ (t), x (t) + ξ (t), I) ( ) = Norm T (x (t), x (t), I) + 2T (x (t), ξ (t), I) + T (ξ (t), ξ (t), I) + E(ˆx (t), ˆx (t), I), where for vector v, we have Norm(v) := v/ v, i.e., it normalizes the vector. The first term is the esire main signal an shoul have the largest norm, an the rest of the terms are the noise terms. The thir term is of orer ξ (t) 2, an hence, it shoul be fine whenever we choose E to be small enough. The last term is O( E ) an is the same for all iterations so that is also fine. The problematic term is the secon term, whose norm if we boun naively is 2 ξ (t). However the normalization factor also contributes a factor of roughly /, an thus, this term grows exponentially; it is still fine if we just o a constant number of iterations, but the exponent will epen on the number of iterations. In orer to solve this problem, an mae sure that the amount of noise we can tolerate is inepenent of the number of iterations, we nee a better way to boun the noise term ξ (t). The main problem here is we boun the norm of T (x (t), ξ (t), I) by T ξ (t) O(ξ (t) ), by oing this we ignore the fact that x (t) is uncorrelate with the components in T. In orer to get a tighter boun, we introuce another norm ; see Definition 21 for the exact form. Intuitively, the norm captures the fact that x oes not have a high correlation with the components (except for the first component that x will converge to), an gives a better boun. In particular we have T (x (t), ξ (t), I) factor is compensate by the aitional term 4. Conclusion. (21) ξ(t) 2. Therefore, the normalization In this paper, we provie a novel analysis for the ynamics of thir orer tensor power iteration showing convergence guarantees to vectors having constant correlation with the tensor component. This enables us to prove unsupervise learning of latent variable moels in the challenging overcomplete regime where the hien imensionality is larger than the observe imension. The main technical observation is that uner ranom Gaussian tensor components an small number of iterations, the resiual ranomness in the components (which are involve in the iterative steps) are sufficiently large. This enables us to show progress in the next iteration of the upate step. As future wor, it is very interesting to generalize this analysis to higher orer tensor power iteration, an more generally to other ins of iterative upates. Acnowlegments A. Ananumar is supporte in part by Microsoft Faculty Fellowship, NSF Career awar CCF , NSF awar CCF , ONR awar N , ARO YIP awar W911NF , an AFOSR YIP awar FA M. Janzamin is supporte by NSF Awar CCF

20 Ananumar, Ge, an Janzamin Variable Space Description Recursion formula A R mapping matrix in (14) n.a. x (t) R upate variable in (14) x (t+1) := A(y(t) ) 2 A(y (t) ) 2 y (t) R intermeiate variable in (14) y (t) := A x (t) x (t) R unnormalize version of x (t) x (t+1) := A(y (t) ) 2 ˆx (t) R noisy version of x (t) ˆx (t) = x (t) + ξ (t) ; see (21) ξ (t) R Contribution of noise in tensor power upate given noisy tensor ˆT = T + E ˆx(t) = x (t) + ξ (t) ; see (21) B R ( 1) matrix A := [a 1 a 2 a ] with first column remove, i.e., B := [a 2 a 3 a ]. Note that the first column a 1 is the esire one to recover. B (t,1) R ( 1) previous iterations at the mile of t th iteration (before upate step conitional istribution of B given x (t+1) = A(y (t) ) 2 ) B (t,2) R ( 1) previous iterations at the en of t th iteration (after upate step conitional istribution of B given x (t+1) = A(y (t) ) 2 ) B (t,1) res. R ( 1) resiual inepenent ranomness left in B (t,1) ; see Lemma 9. B (t,2) res. R ( 1) resiual inepenent ranomness left in B (t,2) ; see Lemma 9. w (t) R 1 intermeiate variable in upate formula (14) u (t) R part of x (t) representing the left inepenent ranomness v (t) R 1 part of y (t) 1 representing the left inepenent ranomness n.a. B (t,1) () = B {X [t], Y [t] } B (t,2) () = B {X [t+1], Y [t] } see equation (17) see equation (19) w (t) := (y (t) 1 ) 2 u (t+1) := B (t,1) res. w (t) v (t) := (B res. (t 1,2) ) x (t) Table 1: Table of parameters an variables. Superscript (t) enotes the variable at t-th iteration. 20

21 Tensor Power Metho Dynamics in Overcomplete Regime Appenix A. Proof of Lemma 9 Proof of Lemma 9 Recall that we have upates of the form x (t+1) = A(y (t) ) 2, w (t) := (y (t) 1 ) 2, y (t) = A x (t). Let X [t]\1 := [ x (2)... x (t)], an let the rows of Y [t] are partitione as the first an the rest of rows as [ Y [t] = Y [t] 1 Y [t] 1 ]. We now mae the following simple observations (t,1) () B = B {Y [t] = A X [t], X[t]\1 = A(Y [t 1] ) 2 } () = B {Y [t] 1 = B X [t], X[t]\1 = a 1 (Y [t 1] 1 ) 2 + BW [t 1] } () = B {v (1) = B x (1),..., v (t) = (B (t 1,2) res. ) x (t), u (2) = B (1,1) res. w (1),..., u (t) = B (t 1,1) res. w (t 1) }, where the secon equivalence comes from the fact that B is matrix A with first column remove. Now applying Corollary 8, we have the result. The istribution of B (t,2) follow similarly. Appenix B. Analysis of Inuction Argument In this section, we analyze the basis of inuction an inuctive step for the inuction argument propose in Section for the proof of Lemma 5. B.1 Basis of Inuction We first show that the hypothesis hols for initialization vector x (1) as the basis of inuction. Claim 1 (Basis of inuction) The inuction hypothesis is true for t = 1. Proof Notice that inuction hypothesis for t = 1 only involves the bouns on x (1) an a 1, x (1) as in Hypotheses 1 an 3, respectively. These bouns are irectly argue by the correlation assumption on the initial vector x (1) state in (13) where δ 1 = δ 1 = 1 = 1. 21

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Discrete Mathematics

Discrete Mathematics Discrete Mathematics 309 (009) 86 869 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: wwwelseviercom/locate/isc Profile vectors in the lattice of subspaces Dániel Gerbner

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

arxiv: v1 [cs.lg] 22 Mar 2014

arxiv: v1 [cs.lg] 22 Mar 2014 CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com

More information

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

arxiv: v2 [math.pr] 27 Nov 2018

arxiv: v2 [math.pr] 27 Nov 2018 Range an spee of rotor wals on trees arxiv:15.57v [math.pr] 7 Nov 1 Wilfrie Huss an Ecaterina Sava-Huss November, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

On conditional moments of high-dimensional random vectors given lower-dimensional projections

On conditional moments of high-dimensional random vectors given lower-dimensional projections Submitte to the Bernoulli arxiv:1405.2183v2 [math.st] 6 Sep 2016 On conitional moments of high-imensional ranom vectors given lower-imensional projections LUKAS STEINBERGER an HANNES LEEB Department of

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Algorithms and matching lower bounds for approximately-convex optimization

Algorithms and matching lower bounds for approximately-convex optimization Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

On the Surprising Behavior of Distance Metrics in High Dimensional Space

On the Surprising Behavior of Distance Metrics in High Dimensional Space On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis

Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis Chuang Wang, Yonina C. Elar, Fellow, IEEE an Yue M. Lu, Senior Member, IEEE Abstract We present a high-imensional analysis

More information

Lecture 6 : Dimensionality Reduction

Lecture 6 : Dimensionality Reduction CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing

More information

Range and speed of rotor walks on trees

Range and speed of rotor walks on trees Range an spee of rotor wals on trees Wilfrie Huss an Ecaterina Sava-Huss May 15, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial configuration on regular trees

More information

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION

More information

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners Lower Bouns for Local Monotonicity Reconstruction from Transitive-Closure Spanners Arnab Bhattacharyya Elena Grigorescu Mahav Jha Kyomin Jung Sofya Raskhonikova Davi P. Wooruff Abstract Given a irecte

More information

DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS

DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS SHANKAR BHAMIDI 1, JESSE GOODMAN 2, REMCO VAN DER HOFSTAD 3, AND JÚLIA KOMJÁTHY3 Abstract. In this article, we explicitly

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information

Database-friendly Random Projections

Database-friendly Random Projections Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional

More information

Quantum mechanical approaches to the virial

Quantum mechanical approaches to the virial Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

A Unified Theorem on SDP Rank Reduction

A Unified Theorem on SDP Rank Reduction A Unifie heorem on SDP Ran Reuction Anthony Man Cho So, Yinyu Ye, Jiawei Zhang November 9, 006 Abstract We consier the problem of fining a low ran approximate solution to a system of linear equations in

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Linear Algebra- Review And Beyond. Lecture 3

Linear Algebra- Review And Beyond. Lecture 3 Linear Algebra- Review An Beyon Lecture 3 This lecture gives a wie range of materials relate to matrix. Matrix is the core of linear algebra, an it s useful in many other fiels. 1 Matrix Matrix is the

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

The Subtree Size Profile of Plane-oriented Recursive Trees

The Subtree Size Profile of Plane-oriented Recursive Trees The Subtree Size Profile of Plane-oriente Recursive Trees Michael FUCHS Department of Applie Mathematics National Chiao Tung University Hsinchu, 3, Taiwan Email: mfuchs@math.nctu.eu.tw Abstract In this

More information

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS

SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS GEORGE A HAGEDORN AND CAROLINE LASSER Abstract We investigate the iterate Kronecker prouct of a square matrix with itself an prove an invariance

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

Energy behaviour of the Boris method for charged-particle dynamics

Energy behaviour of the Boris method for charged-particle dynamics Version of 25 April 218 Energy behaviour of the Boris metho for charge-particle ynamics Ernst Hairer 1, Christian Lubich 2 Abstract The Boris algorithm is a wiely use numerical integrator for the motion

More information

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

inflow outflow Part I. Regular tasks for MAE598/494 Task 1 MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the

More information

On a limit theorem for non-stationary branching processes.

On a limit theorem for non-stationary branching processes. On a limit theorem for non-stationary branching processes. TETSUYA HATTORI an HIROSHI WATANABE 0. Introuction. The purpose of this paper is to give a limit theorem for a certain class of iscrete-time multi-type

More information

Interconnected Systems of Fliess Operators

Interconnected Systems of Fliess Operators Interconnecte Systems of Fliess Operators W. Steven Gray Yaqin Li Department of Electrical an Computer Engineering Ol Dominion University Norfolk, Virginia 23529 USA Abstract Given two analytic nonlinear

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

MEASURES WITH ZEROS IN THE INVERSE OF THEIR MOMENT MATRIX

MEASURES WITH ZEROS IN THE INVERSE OF THEIR MOMENT MATRIX MEASURES WITH ZEROS IN THE INVERSE OF THEIR MOMENT MATRIX J. WILLIAM HELTON, JEAN B. LASSERRE, AND MIHAI PUTINAR Abstract. We investigate an iscuss when the inverse of a multivariate truncate moment matrix

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

Designing Information Devices and Systems II Fall 2017 Note Theorem: Existence and Uniqueness of Solutions to Differential Equations

Designing Information Devices and Systems II Fall 2017 Note Theorem: Existence and Uniqueness of Solutions to Differential Equations EECS 6B Designing Information Devices an Systems II Fall 07 Note 3 Secon Orer Differential Equations Secon orer ifferential equations appear everywhere in the real worl. In this note, we will walk through

More information

Self-normalized Martingale Tail Inequality

Self-normalized Martingale Tail Inequality Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

Homework 2 EM, Mixture Models, PCA, Dualitys

Homework 2 EM, Mixture Models, PCA, Dualitys Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The

More information

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena Chaos, Solitons an Fractals (7 64 73 Contents lists available at ScienceDirect Chaos, Solitons an Fractals onlinear Science, an onequilibrium an Complex Phenomena journal homepage: www.elsevier.com/locate/chaos

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Closed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device

Closed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device Close an Open Loop Optimal Control of Buffer an Energy of a Wireless Device V. S. Borkar School of Technology an Computer Science TIFR, umbai, Inia. borkar@tifr.res.in A. A. Kherani B. J. Prabhu INRIA

More information