Fast and Memory Optimal Low-Rank Matrix Approximation

Size: px
Start display at page:

Download "Fast and Memory Optimal Low-Rank Matrix Approximation"

Transcription

1 Fast and Meory Optial Low-Rank Matrix Approxiation Yun Se-Young, Marc Lelarge, Alexandre Proutière To cite this version: Yun Se-Young, Marc Lelarge, Alexandre Proutière. Fast and Meory Optial Low-Rank Matrix Approxiation. NIPS 2015, Dec 2015, Montreal, Canada. <hal > HAL Id: hal Subitted on 12 Jan 2016 HAL is a ulti-disciplinary open access archive for the deposit and disseination of scientific research docuents, whether they are published or not. The docuents ay coe fro teaching and research institutions in France or abroad, or fro public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de docuents scientifiques de niveau recherche, publiés ou non, éanant des établisseents d enseigneent et de recherche français ou étrangers, des laboratoires publics ou privés.

2 Fast and Meory Optial Low-Rank Matrix Approxiation Se-Young Yun MSR, Cabridge Marc Lelarge Inria & ENS Alexandre Proutiere KTH, EE School / ACL alepro@kth.se Abstract In this paper, we revisit the proble of constructing a near-optial rank k approxiation of a atrix M [0, 1] n under the streaing data odel where the coluns of M are revealed sequentially. We present SLA (Streaing Low-rank Approxiation), an algorith that is asyptotically accurate, when ks k+1 (M) = o( n) where s k+1 (M) is the (k + 1)-th largest singular value of M. This eans that its average ean-square error converges to 0 as and n grow large (i.e., ˆM (k) M (k) 2 F = o(n) with high probability, where ˆM (k) and M (k) denote the output of SLA and the optial rank k approxiation of M, respectively). Our algorith akes one pass on the data if the coluns of M are revealed in a rando order, and two passes if the coluns of M arrive in an arbitrary order. To reduce its eory footprint and coplexity, SLA uses rando sparsification, and saples each entry of M with a sall probability δ. In turn, SLA is eory optial as its required eory space scales as k(+n), the diension of its output. Furtherore, SLA is coputationally efficient as it runs in O(δkn) tie (a constant nuber of operations is ade for each observed entry of M), which can be as sall as O(k log() 4 n) for an appropriate choice of δ and if n. 1 Introduction We investigate the proble of constructing, in a eory and coputationally efficient anner, an accurate estiate of the optial rank k approxiation M (k) of a large ( n) atrix M [0, 1] n. This proble is fundaental in achine learning, and has naturally found nuerous applications in coputer science. The optial rank k approxiation M (k) iniizes, over all rank k atrices Z, the Frobenius nor M Z F (and any nor that is invariant under rotation) and can be coputed by Singular Value Decoposition (SVD) of M in O(n 2 ) tie (if we assue that n). For assive atrices M (i.e., when and n are very large), this becoes unacceptably slow. In addition, storing and anipulating M in eory ay becoe difficult. In this paper, we design a eory and coputationally efficient algorith, referred to as Streaing Low-rank Approxiation (SLA), that coputes a near-optial rank k approxiation ˆM (k). Under ild assuptions on M, the SLA algorith is asyptotically accurate in the sense that as and n grow large, its average ean-square error converges to 0, i.e., ˆM (k) M (k) 2 F = o(n) with high probability (we interpret M (k) as the signal that we ai to recover for a noisy observation M). To reduce its eory footprint and running tie, the proposed algorith cobines rando sparsification and the idea of the streaing data odel. More precisely, each entry of M is revealed to the algorith with probability δ, called the sapling rate. Moreover, SLA observes and treats the Work perfored as part of MSR-INRIA joint research centre. M.L. acknowledges the support of the French Agence Nationale de la Recherche (ANR) under reference ANR-11-JS (GAP project). A. Proutiere s research is supported by the ERC FSA grant, and the SSF ICT-Psi project. 1

3 coluns of M one after the other in a sequential anner. The sequence of observed coluns ay be chosen uniforly at rando in which case the algorith requires one pass on M only, or can be arbitrary in which case the algorith needs two passes. SLA first stores l = 1/(δ log()) randoly selected coluns, and extracts via spectral decoposition an estiator of parts of the k top right singular vectors of M. It then copletes the estiator of these vectors by receiving and treating the reain coluns sequentially. SLA finally builds, fro the estiated top k right singular vectors, the linear projection onto the subspace generated by these vectors, and deduces an estiator of M (k). The analysis of the perforance of SLA is presented in Theores 7, and 8. In suary: when n, log4 () δ 8/9, with probability 1 kδ, the output ˆM (k) of SLA satisfies: M (k) ˆM (k) 2 ( ( F s 2 = O k 2 k+1 (M) + log() )), (1) n n δ where s k+1 (M) is the (k + 1)-th singular value of M. SLA requires O(kn) eory space, and if δ log4 () and k log6 (), its tie is O(δkn). To ensure the asyptotic accuracy of SLA, the upper-bound in (1) needs to converge to 0 which is true as soon as ks k+1 (M) = o( n). In the case where M is seen as a noisy version of M (k), this condition quantifies the axiu aount of noise allowed for our algorith to be asyptotically accurate. SLA is eory optial, since any rank k approxiation algorith needs to at least store its output, i.e., k right and left singular vectors, and hence needs at least O(kn) eory space. Further observe that aong the class of algoriths sapling each entry of M at a given rate δ, SLA is coputational optial, since it runs in O(δkn) tie (it does a constant nuber of operations per observed entry if k = O(1)). In turn, to the best of our knowledge, SLA is both faster and ore eory efficient than existing algoriths. SLA is the first eory optial and asyptotically accurate low rank approxiation algorith. The approach used to design SLA can be readily extended to devise eory and coputationally efficient atrix copletion algoriths. We present this extension in the suppleentary aterial. Notations. Throughout the paper, we use the following notations. For any n atrix A, we denote by A its transpose, and by A 1 its pseudo-inverse. We denote by s 1 (A) s n (A) 0, the singular values of A. When atrices A and B have the sae nuber of rows, [A, B] to denote the atrix whose first coluns are those of A followed by those of B. A denotes an orthonoral basis of the subspace perpendicular to the linear span of the coluns of A. A j, A i, and A ij denote the j-th colun of A, the i-th row of A, and the entry of A on the i-th line and j-th colun, respectively. For h l, A h:l (resp. A h:l ) is the atrix obtained by extracting the coluns (resp. lines) h,..., l of A. For any ordered set B = {b 1,..., b p } {1,..., n}, A (B) refers to the atrix coposed by the ordered set B of coluns of A. A (B) is defined siilarly (but for lines). For real nubers a b, we define A b a the atrix with (i, j) entry equal to ( A b a) ij = in(b, ax(a, A ij )). Finally, for any vector v, v denotes its Euclidean nor, whereas for any atrix A, A F denotes its Frobenius nor, A 2 its operator nor, and A its l -nor, i.e., A = ax i,j A ij. 2 Related Work Low-rank approxiation algoriths have received a lot of attention over the last decade. There are two types of error estiate for these algoriths: either the error is additive or relative. To translate our bound (1) in an additive error is easy: ( ( ) ) M ˆM (k) F M M (k) s k+1 (M) F + O k + log1/2 n. (2) n (δ) 1/4 Sparsifying M to speed-up the coputation of a low-rank approxiation has been proposed in the literature and the best additive error bounds have been obtained in [AM07]. When the sapling rate δ satisfies δ log4, the authors show that with probability 1 exp( log4 ), ( M M k (k) F M M (k) 1/2 n 1/2 F + O + k1/4 n 1/4 ) M (k) 1/2 δ 1/2 δ 1/4 F. (3) 2

4 This perforance guarantee is derived fro Lea 1.1 and Theore 1.4 in [AM07]. To copare (2) and (3), note that our assuptions on the bounded entries of M ensures that: s 2 k+1 (M) n 1 k and M (k) F M F ( ) n. In particular, we see that the worst case k n bound for (3) is 1/2 δ + k1/4 (δ) which is always lower than the worst case bound for (2): ( ) 1/4 1/2 1 k k + log δ n. When k = O(1), our bound is only larger by a logarithic ter in copared to [AM07]. However, the algorith proposed in [AM07] requires to store O(δn) entries of M whereas SLA needs O(n) eory space. Recall that log 4 δ 1/9 so that our algorith akes a significant iproveent on the eory requireent at a low price in the error guarantee bounds. Although biased sapling algoriths can reduce the error, the algorith have to run leverage scores with ultiple passes over data [BJS15]. In a recent work, [CW13] proposes a tie efficient algorith to copute a low-rank approxiation of a sparse atrix. Cobined with [AM07], we obtain an algorith running in tie O(δn) + O(nk 2 + k 3 ) but with an increased additive error ter. We can also copare our result to papers providing an estiate M (k) of the optial low-rank approxiation of M with a relative error ε, i.e. such that M M (k) F (1 + ε) M M (k) F. To the best of our knowledge, [CW09] provides the best result in this setting. Theore 4.4 in [CW09] shows that provided the rank of M is at least 2(k + 1), their algorith outputs with probability 1 η a rank-k atrix M (k) with relative error ε using eory space O (k/ε log(1/η)(n + )) (note that in [CW09], the authors use as unit of eory a bit whereas we use as unit of eory an entry of the atrix so we reoved a log n factor in their expression to ake fair coparisons). To copare with our result, we can translate our bound (1) in a relative error, and we need to take: ε = O k s k+1(m) + log 1/2 (δ) n 1/4. M M (k) F First note that since M is assued to be of rank at least 2(k + 1), we have M M (k) F s k+1 (M) > 0 and ε is well-defined. Clearly, for our ε to tend to zero, we need M M (k) F to be not too sall. For the scenario we have in ind, M is a noisy version of the signal M (k) so that M M (k) is the noise atrix. When every entry of M M (k) is generated independently at rando with a constant variance, M M (k) F = Θ( + n) while s k+1 (M) = Θ( n). In such a case, we have ε = o(1) and we iprove the eory requireent of [CW09] by a factor ε 1 log(kδ) 1. [CW09] also considers a odel where the full coluns of M are revealed one after the other in an arbitrary order, and proposes a one-pass algorith to derive the rank-k approxiation of M with the sae eory requireent. In this general setting, our algorith is required to ake two passes on the data (and only one pass if the order of arrival of the colun is rando instead of arbitrary). The running tie of the algorith scales as O(knε 1 log(kδ) 1 ) to project M onto kε 1 log(kδ) 1 diensional rando space. Thus, SLA iproves the tie again by a factor of ε 1 log(kδ) 1. We could also think of using sketching and streaing PCA algoriths to estiate M (k). When the coluns arrive sequentially, these algoriths identify the left singular vectors using one-pass on the atrix and then need a second pass on the data to estiate the right singular vectors. For exaple, [Lib13] proposes a sketching algorith that updates the p ost frequent directions as coluns are observed. [GP14] shows that with O(k/ε) eory space (for p = k/ε), this sketching algorith finds k atrix Û such that M P Û M F (1 + ε) M M (k) F, where PÛ denotes the projection atrix to the linear span of the coluns of Û. The running tie of the algorith is roughly O(knε 1 ), which is uch greater than that of SLA. Note also that to identify such atrix Û in one pass on M, it is shown in [Woo14] that we have to use Ω(k/ε) eory space. This result does not contradict the perforance analysis of SLA, since the latter needs two passes on M if the coluns of M are observed in an arbitrary anner. Finally, note that the streaing PCA algorith proposed in [MCJ13] does not apply to our proble as this paper investigates a very specific proble: the spiked covariance odel where a colun is randoly generated in an i.i.d. anner. 3 Streaing Low-rank Approxiation Algorith 3

5 Algorith 1 Streaing Low-rank Approxiation (SLA) 1 Input: M, k, δ, and l = δ log() 1. A (B1), A (B2) independently saple entries of [M 1,..., M l ] at rate δ 2. PCA for the first l coluns: Q SPCA(A (B1), k) 3. Triing the rows and coluns of A (B2): A (B2) set the entries of rows of A (B2) having ore than two non-zero entries to 0 A (B2) set the entries of the coluns of A (B2) having ore than 10δ non-zero entries to 0 4. W A (B2)Q 5. ˆV (B1) (A (B1)) W 6. Î A (B1) (B 1) ˆV Reove A (B1), A (B2), and Q fro the eory space for t = l + 1 to n do 7. A t saple entries of M t at rate δ 8. ˆV t (A t ) W 9. Î Î + A ˆV t t Reove A t fro the eory space end for 10. ˆR find ˆR using the Gra-Schidt process such that ˆV ˆR is an orthonoral atrix 11. Û 1ˆδ Î ˆR ˆR Output: ˆM (k) = Û ˆV 1 0 Algorith 2 Spectral PCA (SPCA) Input: C [0, 1] l, k Ω l k Gaussian rando atrix Triing: C set the entries of the rows of C with ore than 10 non-zero entries to 0 Φ C C diag( C C) Power Iteration: QR QR decoposition of Φ 5 log(l) Ω Output: Q In this section, we present the Streaing Low-rank Approxiation (SLA) algorith and analyze its perforance. SLA akes one pass on the atrix M, and is provided with the coluns of M one after the other in a streaing anner. The SVD of M is M = UΣV where U and V are ( ) and (n n) unitary atrices and Σ is the ( n) atrix diag(s 1 (M),... s n (M)). We assue (or ipose by design of SLA) that the l (specified below) first observed coluns of M are chosen uniforly at rando aong all coluns. An extension of SLA to scenarios where coluns are observed in an arbitrary order is presented in 3.5, but this extension requires two passes on M. To be eory efficient, SLA uses sapling. Each observed entry of M is erased (i.e., set equal to 0) with probability 1 δ, where δ > 0 is referred to as the sapling rate. The algorith, whose pseudo-code is presented in Algorith 1, proceeds in three steps: 1 1. In the first step, we observe l = δ log() coluns of M chosen uniforly at rando. These coluns for the atrix M (B) = UΣ(V (B) ), where B denotes the ordered set of the indexes of the l first observed coluns. M (B) is sapled at rate δ. More precisely, we apply two independent sapling procedures, where in each of the, every entry of M (B) is sapled at rate δ. The two resulting independent rando atrices A (B1), and A (B2) are stored in eory. A (B1), referred to as A (B) to siplify the notations, is used in this first step, whereas A (B2) will be used in subsequent steps. Next through a spectral decoposition of A (B), we derive a (l k) orthonoral atrix Q such that the span of its colun vectors approxiates that of the colun vectors of V (B). The first step corresponds to Lines 1 and 2 in the pseudo-code of SLA. 2. In the second step, we coplete the construction of our estiator of the top k right singular vectors V of M. Denote by ˆV the k n atrix fored by these estiated vectors. We first copute the coponents of these vectors corresponding to the set of indexes B as ˆV (B) = A (B 1) W with W = A (B2) Q. Then for t = l + 1,..., n, after receiving the t-th colun M t of M, we set ˆV t = A t W, where A t is obtained by sapling entries of M t at rate δ. Hence after one pass on M, we get ˆV = à W, where à = [A (B 1), A l+1,..., A n ]. As it turns out, ultiplying W by à aplifies the useful signal contained in W, and yields an accurate approxiation of the span of the 4

6 top k right singular vectors V of M. The second step is presented in Lines 3, 4, 5, 7 and 8 in SLA pseudo-code. 3. In the last step, we deduce fro ˆV a set of colun vectors gathered in atrix Û such that Û ˆV provides an accurate approxiation of M (k). First, using the Gra-Schidt process, we find ˆR such that ˆV ˆR is an orthonoral atrix and copute Û = 1 δ A ˆV ˆR ˆR in a streaing anner as in Step 2. Then, Û ˆV = 1 δ A ˆV ˆR( ˆV ˆR) where ˆV ˆR( ˆV ˆR) approxiates the projection atrix onto the linear span of the top k right singular vectors of M. Thus, Û ˆV is close to M (k). This last step is described in Lines 6, 9, 10 and 11 in SLA pseudo-code. In the next subsections, we present in ore details the rationale behind the three steps of SLA, and provide a perforance analysis of the algorith. 3.1 Step 1. Estiating right-singular vectors of the first batch of coluns The objective of the first step is to estiate V (B), those coponents of the top k right singular vectors of M whose indexes are in the set B (reeber that B is the set of indexes of the l first observed coluns). This estiator, denoted by Q, is obtained by applying the power ethod to extract the top k right singular vector of M (B), as described in Algorith 2. In the design of this algorith and its perforance analysis, we face two challenges: (i) we only have access to a sapled version A (B) of M (B) ; and (ii) UΣ(V (B) ) is not the SVD of M (B) since the colun vectors of are not orthonoral in general (we keep the coponents of these vectors corresponding to the set of indexes B). Hence, the top k right singular vectors of M (B) that we extract in Algorith 2 do not necessarily correspond to V (B). V (B) To address (i), in Algorith 2, we do not directly extract the top k right singular vectors of A (B). We first reove the rows of A (B) with too any non-zero entries (i.e., too any observed entries fro M (B) ), since these rows would perturb the SVD of A (B). Let us denote by Ā the obtained tried atrix. We then for the covariance atrix Ā Ā, and reove its diagonal entries to obtain the atrix Φ = Ā Ā diag(ā Ā). Reoving the diagonal entries is needed because of the sapling procedure. Indeed, the diagonal entries of Ā Ā scale as δ, whereas its off-diagonal entries scale as δ 2. Hence, when δ is sall, the diagonal entries would clearly becoe doinant in the spectral decoposition. We finally apply the power ethod to Φ to obtain Q. In the analysis of the perforance of Algorith 2, the following lea will be instruental, and provides an upper bound of the gap between Φ and (M (B) ) M (B) using the atrix Bernstein inequality (Theore 6.1 [Tro12]). All proofs are detailed in Appendix. Lea 1 If δ 8 9, with probability 1 1 l 2, Φ δ 2 (M (B) ) M (B) 2 c 1 δ l log(l), for soe constant c 1 > 1. To address (ii), we first establish in Lea 2 that for an appropriate choice of l, the colun vectors of V (B) are approxiately orthonoral. This lea is of independent interest, and relates the SVD of a truncated atrix, here M (B), to that of the initial atrix M. More precisely: Lea 2 If δ 8/9, there exists a l k atrix V (B) such that its colun vectors are orthonoral, and with probability 1 exp( 1/7 ), for all i k satisfying that s 2 i (M) δl n l log(l), n l V (B) (B) 1:i V 1:i Note that as suggested by the above lea, it ight be ipossible to recover V (B) i when the corresponding singular value s i (M) is sall (ore precisely, when s 2 i (M) δl n l log(l)). However, the singular vectors corresponding to such sall singular values generate very little error for lowrank approxiation. Thus, we are only interested in singular vectors whose singular values are above the threshold ( δl n l log(l)) 1/2. Let k = ax{i : s 2 i (M) δl n l log(l), i k}. Now to analyze the perforance of Algorith 2 when applied to A (B), we decopose Φ as Φ = δ 2 l (B) n V (Σ )2 (B) ( V ) + Y, where Y = Φ δ2 l (B) n V (Σ )2 (B) ( V ) is a noise atrix. The 5

7 following lea quantifies how noise ay affect the perforance of the power ethod, i.e., it (B) provides an upper bound of the gap between Q and V as a function of the operator nor of the noise atrix Y : Lea 3 With probability 1 1 l, the output Q of SPCA when applied to A 2 (B) satisfies for all i k (B) : ( V 1:i ) Q 2 3 Y 2 δ 2 l n si(m)2. In the proof, we analyze the power iteration algorith fro results in [HMT11]. To coplete the perforance analysis of Algorith 2, it reains to upper bound Y 2. To this ai, we decopose Y into three ters: Y = ( Φ δ 2 (M (B) ) ) M (B) + δ 2 (M (B) ) ( I U U ) M(B) + ( δ 2 (M (B) ) U U M (B) l ) (B) V (Σ n )2 (B) ( V ). The first ter can be controlled using Lea 1, and the last ter is upper bounded using Lea 2. Finally, the second ter corresponds to the error ade by ignoring the singular vectors which are not within the top k. To estiate this ter, we use the atrix Chernoff bound (Theore 2.2 in [Tro11]), and prove that: Lea 4 With probability 1 exp( 1/4 ), (I U U )M (B) δ l log(l) + l n s2 k+1 (M). In suary, cobining the four above leas, we can establish that Q accurately estiates V (B) : Theore 5 If δ 8/9, with probability 1 3 l 2, the output Q of Algorith 2 when applied to (B) A (B) satisfies for all i k: ( V 1:i ) Q 2 3δ2 (s 2 k+1 (M)+2 c 1 is the constant fro Lea Step 2: Estiating the principal right singular vectors of M 2 3 n)+3(2+c1)δ n l δ 2 s 2 i (M) l log(l), where In this step, we ai at estiating the top k right singular vectors V, or at least at producing k vectors whose linear span approxiates that of V. Towards this objective, we start fro Q derived in the previous step, and define the ( k) atrix W = A (B2)Q. W is stored and kept in eory for the reaining of the algorith. It is tepting to directly read fro W the top k left singular vectors U. Indeed, we know that Q n l V (B), and E[A (B 2)] = δuσ(v (B) ), and hence E[W ] δ n l U Σ. However, the level of the noise in W is too iportant so as to accurately extract U. In turn, W can be written as δuσ(v (B) ) Q + Z, where Z = (A (B2) δuσ(v (B) ) )Q partly captures the noise in W. It is then easy to see that the level of the noise Z satisfies E[ Z 2 ] E[ Z F / k] = Ω( δ). Indeed, first observe that Z is of rank k. Then E[ Z 2 F ] = k i=1 j=1 E[Z2 ij ] kδ: this is due to the facts that (i) Q and A (B2) δuσ(v (B) ) are independent (since A (B1) and A (B2) are independent), (ii) Q j 2 2 = 1 for all j k, and (iii) the entries of A (B2) are independent with variance Θ(δ(1 δ)). However, for all j k, the j-th singular value of δuσ(v (B) ) Q scales as O(δ l) = O( δ log() ), since s j(m) l n and s j (M (B) ) n s j(m) when j k fro Lea 2. Instead, fro W, A (B1) and the subsequent sapled arriving coluns A t, t > l, we produce a (n k) atrix ˆV whose linear span approxiates that of V. More precisely, we first let ˆV (B) = A (B 1) W. Then for all t = l + 1,..., n, we define ˆV t = A t W, where A t is obtained fro the t-th observed colun of M after sapling each of its entries at rate δ. Multiplying W by à = [A (B1), A l+1,..., A n ] aplifies the useful signal in W, so that ˆV = à W constitutes a good approxiation of V. To understand why, we can rewrite ˆV as follows: ˆV = δ 2 M M (B) Q + δm (A (B2) δm (B) )Q + (à δm) W. 6

8 In the above equation, the first ter corresponds to the useful signal and the two reaining ters constitute noise atrices. Fro Theore 5, the linear span of coluns of Q approxiates that of the coluns of V (B) and thus, for j k, s j (δ 2 M M (B) Q) δ 2 s 2 j (M) l n δ n log(l). The spectral nors of the noise atrices are bounded using rando atrix arguents, and the fact that (A (B2) δm (B) ) and (Ã δm) are zero-ean rando atrices with independent entries. We can show (see Lea 14 given in the suppleentary aterial) using the independence of A (B1) and A (B2) that with high probability, δm (A (B2) δm (B) )Q 2 = O(δ n). We ay also establish that with high probability, (Ã δm) W 2 = O(δ ( + n)). This is a consequence of a result derived in [AM07] (quoted in Lea 13 in the suppleentary aterial) stating that with high probability, Ã δm = O( δ( + n)) and of the fact that due to the triing process presented in Line 3 in Algorith 1, W 2 = O( δ). In suary, as soon as n scales at least as, the noise level becoes negligible, and the span of ˆV provides an accurate approxiation of that of V. The above arguents are ade precise and rigorous in the suppleentary aterial. The following theore suarizes the accuracy of our estiator of V. Theore 6 With log4 () δ 8 9 for all i k, there exists a constant c 2 such that with probability 1 kδ, Vi ( ˆV s ) 2 c 2 k+1 (M)+n log() /δ+ n log()/δ 2 s 2 i (M). 3.3 Step 3: Estiating the principal left singular vectors of M In the last step, we estiate the principal left singular vectors of M to finally derive an estiator of M (k), the optial rank-k approxiation of M. The construction of this estiator is based on the observation that M (k) = U Σ V = MP V, where P V = V V is an (n n) atrix representing the projection onto the linear span of the top k right singular vectors V of M. Hence to estiate M (k), we try to approxiate the atrix P V. To this ai, we construct a (k k) atrix ˆR so that the colun vectors of ˆV ˆR for an orthonoral basis whose span corresponds to that of the colun vectors of ˆV. This construction is achieved using Gra-Schidt process. We then approxiate P V by P ˆV = ˆV ˆR ˆR ˆV, and finally our estiator ˆM (k) of M (k) is 1 δ ÃP ˆV. The construction of ˆM (k) can be ade in a eory efficient way accoodating for our streaing odel where the coluns of M arrive one after the other, as described in the pseudo-code of SLA. First, after constructing ˆV (B) in Step 2, we build the atrix Î = A ˆV (B (B) 1). Then, for t = l + 1,..., n, after constructing the t-th line ˆV t of ˆV, we update Î by adding to it the atrix A ˆV t t, so that after all coluns of M are observed, Î = Ã ˆV. Hence we can build an estiator Û of the principal left singular vectors of M as Û = 1 δ Î ˆR ˆR, and finally obtain ˆM (k) = Û ˆV 1 0. To quantify the estiation error of ˆM (k), we decopose M (k) ˆM (k) as: M (k) ˆM (k) = M (k) (I P ˆV ) + (M (k) M)P ˆV + (M 1 δ Ã)P ˆV. The first ter of the r.h.s. of the above equation can be bounded using Theore 6: for i k, we have s i (M) 2 Vi ˆV z = c 2 (s 2 k+1 (M) + n log() /δ + n log()/δ), and hence we can conclude that for all i k, s i (M)U i Vi (I P ˆV ) 2 z. The second ter can be easily bounded observing that the atrix F (M (k) M)P ˆV is of rank k: (M (k) M)P ˆV 2 F k (M (k) M)P ˆV 2 2 k M (k) M 2 2 = ks k+1 (M) 2. The last ter in the r.h.s. can be controlled as in the perforance analysis of Step 2, and ( ) observing that ( 1 δ Ã M)P 1 ˆV is of rank k: δ Ã M P ˆV 2 F k 1 δ Ã M 2 = O(kδ(+n)). 2 It is then easy to reark that for the range of the paraeter δ we are interested in, the upper bound z of the first ter doinates the upper bound of the two other ters. Finally, we obtain the following result (see the suppleentary aterial for a coplete proof): Theore 7 When log4 () δ 8 9, with probability ( 1 kδ, the output of the SLA algorith ) satisfies with constant c 3 : M (k) [Û ˆV ] F n = c 3 k 2 s 2 k+1 (M) n + log() log() δ + δn. 7

9 Note that if log4 () δ 8 9, then log() δ = o(1). Hence if n, the SLA algorith provides an asyptotically accurate estiate of M (k) as soon as s k+1(m) 2 = o(1). 3.4 Required Meory and Running Tie Required eory. Lines 1-6 in SLA pseudo-code. A (B1) and A (B2) have O(δl) non-zero entries and we need O(δl log ) bits to store the id of these entries. Siilarly, the eory required to store Φ is O(δ 2 l 2 (B log(l)). Storing Q further requires O(lk) eory. Finally, ˆV 1) and Î coputed in Line 6 require O(lk) and O(k) eory space, respectively. Thus, when l = 1 δ log, this first part of the algorith requires O(k( + n)) eory. Lines 7-9. Before we treat the reaining coluns, A (B1), A (B2), and Q are reoved fro the eory. Using this released eory, when the t-th colun arrives, we can store it, copute ˆV t and Î, and reove the colun to save eory. Therefore, we do not need additional eory to treat the reaining coluns. Lines 10 and 11. Fro Î and ˆV, we copute Û. To this ai, the eory required is O(k( + n)). Running tie. Fro line 1 to 6. The SPCA algorith requires O(lk(δ 2 l + k) log(l)) floating-point operations to copute Q. W, ˆV, and Î are inner products, and their coputations require O(δkl) operations. 1 With l = δ log(), the nuber of operations to treat the first l coluns is O(lk(δ2 l + k) log(l) + kδl) = O(k) + O( k2 δ ). Fro line 7 to 9. To copute ˆV t and Î when the t-th colun arrives, we need O(δk) operations. Since there are n l reaining coluns, the total nuber of operations is O(δkn). Lines 10 and 11 ˆR is coputed fro ˆV using the Gra-Schidt process which requires O(k 2 ) operations. We then copute Î ˆR ˆR using O(k 2 ) operations. Hence we conclude that: In suary, we have shown that: Theore 8 The eory required to run the SLA algorith is O(k( + n)). Its running tie is O(δkn + k2 δ + k2 ). Observe that when δ ax( (log())4, (log())2 n ) and k (log()) 6, we have δkn k 2 /δ k 2, and therefore, the running tie of SLA is O(δkn). 3.5 General Streaing Model SLA is a one-pass low-rank approxiation algorith, but the set of the l first observed coluns of M needs to be chosen uniforly at rando. We can readily extend SLA to deal with scenarios where the coluns of M can be observed in an arbitrary order. This extension requires two passes on M, but otherwise perfors exactly the sae operations as SLA. In the first pass, we extract a set of l coluns chosen uniforly at rando, and in the second pass, we deal with all other coluns. To extract l randoly selected coluns in the first pass, we proceed as follows. Assue that when the t-th colun of M arrives, we have already extracted l coluns. Then the t-th colun is extracted with probability those of SLA. 4 Conclusion l l n t+1 n. This two-pass version of SLA enjoys the sae perforance guarantees as This paper revisited the low rank approxiation proble. We proposed a streaing algorith that saples the data and produces a near optial solution with a vanishing ean square error. The algorith uses a eory space scaling linearly with the abient diension of the atrix, i.e. the eory required to store the output alone. Its running tie scales as the nuber of sapled entries of the input atrix. The algorith is relatively siple, and in particular, does exploit elaborated techniques (such as sparse ebedding techniques) recently developed to reduce the eory requireent and coplexity of algoriths addressing various probles in linear algebra. 8

10 References [AM07] [BJS15] [CW09] [CW13] [GP14] Diitris Achlioptas and Frank Mcsherry. Fast coputation of low-rank atrix approxiations. Journal of the ACM (JACM), 54(2):9, Srinadh Bhojanapalli, Prateek Jain, and Sujay Sanghavi. Tighter low-rank approxiation via sapling the leveraged eleent. In Proceedings of the Twenty-Sixth Annual ACM- SIAM Syposiu on Discrete Algoriths, pages SIAM, Kenneth L Clarkson and David P Woodruff. Nuerical linear algebra in the streaing odel. In Proceedings of the forty-first annual ACM syposiu on Theory of coputing, pages ACM, Kenneth L Clarkson and David P Woodruff. Low rank approxiation and regression in input sparsity tie. In Proceedings of the forty-fifth annual ACM syposiu on Theory of coputing, pages ACM, Mina Ghashai and Jeff M Phillips. Relative errors for deterinistic low-rank atrix approxiations. In SODA, pages SIAM, [HMT11] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randoness: Probabilistic algoriths for constructing approxiate atrix decopositions. SIAM review, 53(2): , [Lib13] [MCJ13] [Tro11] [Tro12] Edo Liberty. Siple and deterinistic atrix sketching. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data ining, pages ACM, Ioannis Mitliagkas, Constantine Caraanis, and Prateek Jain. Meory liited, streaing PCA. In Advances in Neural Inforation Processing Systes, Joel A Tropp. Iproved analysis of the subsapled randoized hadaard transfor. Advances in Adaptive Data Analysis, 3(01n02): , Joel A Tropp. User-friendly tail bounds for sus of rando atrices. Foundations of Coputational Matheatics, 12(4): , [Woo14] David Woodruff. Low rank approxiation lower bounds in row-update streas. In Advances in Neural Inforation Processing Systes, pages ,

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

SHORT TIME FOURIER TRANSFORM PROBABILITY DISTRIBUTION FOR TIME-FREQUENCY SEGMENTATION

SHORT TIME FOURIER TRANSFORM PROBABILITY DISTRIBUTION FOR TIME-FREQUENCY SEGMENTATION SHORT TIME FOURIER TRANSFORM PROBABILITY DISTRIBUTION FOR TIME-FREQUENCY SEGMENTATION Fabien Millioz, Julien Huillery, Nadine Martin To cite this version: Fabien Millioz, Julien Huillery, Nadine Martin.

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

Fundamental Limits of Database Alignment

Fundamental Limits of Database Alignment Fundaental Liits of Database Alignent Daniel Cullina Dept of Electrical Engineering Princeton University dcullina@princetonedu Prateek Mittal Dept of Electrical Engineering Princeton University pittal@princetonedu

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Lecture 21 Nov 18, 2015

Lecture 21 Nov 18, 2015 CS 388R: Randoized Algoriths Fall 05 Prof. Eric Price Lecture Nov 8, 05 Scribe: Chad Voegele, Arun Sai Overview In the last class, we defined the ters cut sparsifier and spectral sparsifier and introduced

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

The Hilbert Schmidt version of the commutator theorem for zero trace matrices

The Hilbert Schmidt version of the commutator theorem for zero trace matrices The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that

More information

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007 Deflation of the I-O Series 1959-2. Soe Technical Aspects Giorgio Rapa University of Genoa g.rapa@unige.it April 27 1. Introduction The nuber of sectors is 42 for the period 1965-2 and 38 for the initial

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Optimal Jamming Over Additive Noise: Vector Source-Channel Case Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

A Probabilistic and RIPless Theory of Compressed Sensing

A Probabilistic and RIPless Theory of Compressed Sensing A Probabilistic and RIPless Theory of Copressed Sensing Eanuel J Candès and Yaniv Plan 2 Departents of Matheatics and of Statistics, Stanford University, Stanford, CA 94305 2 Applied and Coputational Matheatics,

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

arxiv: v1 [math.na] 10 Oct 2016

arxiv: v1 [math.na] 10 Oct 2016 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

A new type of lower bound for the largest eigenvalue of a symmetric matrix

A new type of lower bound for the largest eigenvalue of a symmetric matrix Linear Algebra and its Applications 47 7 9 9 www.elsevier.co/locate/laa A new type of lower bound for the largest eigenvalue of a syetric atrix Piet Van Mieghe Delft University of Technology, P.O. Box

More information

arxiv: v1 [stat.ml] 31 Jan 2018

arxiv: v1 [stat.ml] 31 Jan 2018 Increental kernel PCA and the Nyströ ethod arxiv:802.00043v [stat.ml] 3 Jan 208 Fredrik Hallgren Departent of Statistical Science University College London London WCE 6BT, United Kingdo fredrik.hallgren@ucl.ac.uk

More information

Lean Walsh Transform

Lean Walsh Transform Lean Walsh Transfor Edo Liberty 5th March 007 inforal intro We show an orthogonal atrix A of size d log 4 3 d (α = log 4 3) which is applicable in tie O(d). By applying a rando sign change atrix S to the

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Estimating Entropy and Entropy Norm on Data Streams

Estimating Entropy and Entropy Norm on Data Streams Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Principal Components Analysis

Principal Components Analysis Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010 A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING By Eanuel J Candès Yaniv Plan Technical Report No 200-0 Noveber 200 Departent of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

An Improved Particle Filter with Applications in Ballistic Target Tracking

An Improved Particle Filter with Applications in Ballistic Target Tracking Sensors & ransducers Vol. 72 Issue 6 June 204 pp. 96-20 Sensors & ransducers 204 by IFSA Publishing S. L. http://www.sensorsportal.co An Iproved Particle Filter with Applications in Ballistic arget racing

More information

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

Birthday Paradox Calculations and Approximation

Birthday Paradox Calculations and Approximation Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

INNER CONSTRAINTS FOR A 3-D SURVEY NETWORK

INNER CONSTRAINTS FOR A 3-D SURVEY NETWORK eospatial Science INNER CONSRAINS FOR A 3-D SURVEY NEWORK hese notes follow closely the developent of inner constraint equations by Dr Willie an, Departent of Building, School of Design and Environent,

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Structures for Interpolation, Decimation, and Nonuniform Sampling Based on Newton s Interpolation Formula

Structures for Interpolation, Decimation, and Nonuniform Sampling Based on Newton s Interpolation Formula Structures for Interpolation, Deciation, and Nonunifor Sapling Based on Newton s Interpolation Forula Vesa Lehtinen, arkku Renfors To cite this version: Vesa Lehtinen, arkku Renfors. Structures for Interpolation,

More information

Necessity of low effective dimension

Necessity of low effective dimension Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that

More information

A Generalized Permanent Estimator and its Application in Computing Multi- Homogeneous Bézout Number

A Generalized Permanent Estimator and its Application in Computing Multi- Homogeneous Bézout Number Research Journal of Applied Sciences, Engineering and Technology 4(23): 5206-52, 202 ISSN: 2040-7467 Maxwell Scientific Organization, 202 Subitted: April 25, 202 Accepted: May 3, 202 Published: Deceber

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Reed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product.

Reed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product. Coding Theory Massoud Malek Reed-Muller Codes An iportant class of linear block codes rich in algebraic and geoetric structure is the class of Reed-Muller codes, which includes the Extended Haing code.

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

Using a De-Convolution Window for Operating Modal Analysis

Using a De-Convolution Window for Operating Modal Analysis Using a De-Convolution Window for Operating Modal Analysis Brian Schwarz Vibrant Technology, Inc. Scotts Valley, CA Mark Richardson Vibrant Technology, Inc. Scotts Valley, CA Abstract Operating Modal Analysis

More information

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting LogLog-Beta and More: A New Algorith for Cardinality Estiation Based on LogLog Counting Jason Qin, Denys Ki, Yuei Tung The AOLP Core Data Service, AOL, 22000 AOL Way Dulles, VA 20163 E-ail: jasonqin@teaaolco

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED PAPER) 1 Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding Lai Wei, Student Meber, IEEE, David G. M. Mitchell, Meber, IEEE, Thoas

More information

Recovery of Sparsely Corrupted Signals

Recovery of Sparsely Corrupted Signals TO APPEAR IN IEEE TRANSACTIONS ON INFORMATION TEORY 1 Recovery of Sparsely Corrupted Signals Christoph Studer, Meber, IEEE, Patrick Kuppinger, Student Meber, IEEE, Graee Pope, Student Meber, IEEE, and

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information