Learning Mixtures of Gaussians with Maximum-a-posteriori Oracle

Size: px
Start display at page:

Download "Learning Mixtures of Gaussians with Maximum-a-posteriori Oracle"

Transcription

1 Satyaki Mahalanabis Dept of Computer Science University of Rochester Abstract We consier the problem of estimating the parameters of a mixture of istributions, where each component istribution is from a given parametric family eg exponential, Gaussian etc We efine a learning moel in which the learner has access to a maximum-a-posteriori oracle which given any sample from a mixture of istributions, tells the learner which component istribution was the most likely to have generate it We escribe a learning algorithm in this setting which accurately estimates the parameters of a mixture of k spherical Gaussians in R assuming the component Gaussians satisfy a mil separation conition Our algorithm uses only polynomially many in, k) samples an oracle calls, an our separation conition is much weaker than those require by unsupervise learning algorithms like [Arora 0, Vempala 0] Introuction Learning mixtures of istributions, such as mixture of Gaussians, is one of the most well-stuie problems in machine learning see eg [Melnykov 0]) However, most of the known learning algorithms work in an unsupervise setting an not much is known theoretically about how sie information eg in the form of class labels [Basu 0] or pairwise constraints [Shental 03]) can help the learner In this paper, we efine a learning moel in which the learner has access to a maximum-a-posteriori oracle, which for any sample from a mixture of istributions, tells the learner which component istribution was most likely to have generate it Our maximum-a-posteriori oracle can be thought of as an expert who can be aske to guess the most likely Appearing in Proceeings of the 4 th International Conference on Artificial Intelligence an Statistics AISTATS) 0, Fort Lauerale, FL, USA Volume 5 of JMLR: W&CP 5 Copyright 0 by the authors class label for a sample point Then we emonstrate the avantage provie by this oracle by giving an algorithm Algorithm ) which recovers the parameters of a mixture of k, -imensional spherical Gaussians uner a weaker separation assumption an more efficiently than known unsupervise algorithms The number of oracle calls mae by our algorithm is polynomial in, k an is inepenent of the esire error The problem of learning mixture of Gaussians has a long history Various learning moels have been consiere in literature There is the clustering framework [Dasgupta 99, Arora 0] in which the goal is to correctly label each sample generate by the mixture with the Gaussian that generate it Then there is the istribution learning moel [Felman 06] in which the goal is to output a hypothesis mixture which is close to the unknown mixture as possible Probably the most popular moel however is one in which the parameters of the original mixture nee to be recovere eg [Dempster 77, Belkin 0] an this is the moel we use EM [Dempster 77] is perhaps the most well known algorithm in this moel Note that for the clustering problem, it is necessary to have a separation assumption [Arora 0], unlike for the problem of recovering parameters Note also that our moel is more ifficult than learning the istribution Almost all algorithms for clustering or parameter learning which give provable guarantees also require that each Gaussian in the hypothesis mixture be separate from others [Vempala 0, Arora 0] For the clustering framework, [Arora 0] give a lower boun on the separation for the case of spherical Gaussians We also nee to make an separation assumption see 7)) which is much weaker than those require by eg [Arora 0, Vempala 0] However we have access to a maximum-a-posteriori oracle whereas the settings in [Arora 0, Vempala 0, Dasgupta 99, Belkin 0, Moitra 0] are unsupervise Recently [Belkin 0, Moitra 0] have given algorithms for learning parameters which require no separation assumption However the absence of a separation assumption means that their algorithms, unlike ours or [Arora 0, Vempala 0], may require exponentially many samples in 489

2 the number of mixture components Our algorithm is also, not surprisingly, simpler than theirs We also note that our results are for spherical Gaussians only The algorithm in [Dasgupta 99] works only when all Gaussians have the same covariance matrix This was extene to Gaussians having arbitrary covariance matrices by [Arora 0] The results in [Vempala 0], like ours, are for spherical Gaussians They were the first to use spectral methos, an their technique has since been extene to arbitrary log-concave istributions [Kannan 08, Achlioptas 05] Algorithms in [Belkin 0, Moitra 0] can learn mixtures of arbitrary gaussians Even though semi-supervise clustering has been a popular area of research eg [Basu 0, Xing 0, Basu 04], not much is known theoretically about semi-supervise or active learning of mixture of Gaussians A variant of EM using pairwise constraints is given by [Shental 03] They have two types of constraints: pairs of points which were generate by the same Gaussian, an pairs which were not These type of pairwise relations seem to be the most popular moel of semi-supervise learning of mixture moels [Melnykov 0] In our moel we consier a maximum-a-posteriori oracle an as we point out in Section, learning with this oracle can be more challenging than with pairwise constraints Our setting is ifferent from semi-supervise learning because the learner is allowe to choose the points for which it wants labels However the algorithm we give for mixtures of Gaussians queries the label of every sample it raws initially an later on uses only unlabele samples Finally, while our aim in this paper is to theoretically emonstrate the avantage of the maximum-a-posteriori oracle, the oracle moels an expert in a natural way an shoul fin practical applications The various UCI atasets iscusse in the Experimental Results section of [Shental 03] are examples where our oracle learning moel coul be applicable Our Algorithm can be use whenever a ataset can be moele reasonably accurately using a mixture of Gaussians an sie information is available in the form of which clusters selecte ata points belong to The paper is organize as follows In Section we first efine our learning moel subsection ), an then we state our result about learning mixture of Gaussians in this moel subsection ) Then in Section 3 we justify the separation assumption that we require, an then present our algorithm an analyze it Finally, in Section 4 we iscuss how our moel an our algorithm coul be extene Summary of Contributions We first efine the maximum-a-posteriori oracle an then give our result for mixtures of gaussians Learning with Maximum-a-posteriori Oracle We first efine the learning moel Consier a mixture ensity in R efine as µx) = k i= w iµ i x), where i w i = an = min w i > 0 The ensities {µ i } k i= belong to a parametric family eg uniform, i Gaussian etc) an have parameters {θ i } k i= respectively Given an unknown mixture ensity, the goal of a learner is to output estimates {ˆθ i, ŵ i )} k i= of {θ i, w i )} k i= respectively The learner can raw inepenent unlabele) samples from µ by invoking an oracle calle SAMP For any sample x returne by SAMP, the learner can also choose to invoke an oracle, MAP, which labels x with the most likely ensity that coul have generate x ie MAPx) = argmax p i x) where i i, p i x) = w i µ i x) / w j µ j x) is the posterior istribution on { k} For the case of mixture of Gaussians with no two Gaussians being concentric), note that the set of points where or more ensities are equally likely has measure 0 an hence for a ranom sample from µ almost surely there is a unique most likely a posteriori) ensity as efine by ) We point out that in our moel the number of mixture components k is known to the learner Typically we have k We next compare our moel with other similar settings We point out that learning using the MAP oracle is more ifficult than in a semi-supervise setting where each point in a ranom subset of samples is labele accoring to which component actually generate it ie the label is rawn ranomly from posterior p i ) To see this consier a mixture of or more ientical almost concentric Gaussians in which one has a much greater weight than the others If points are labele accoring to the posterior, enough samples from each Gaussian can obtaine to estimate the parameters with arbitrary accuracy whereas the MAP oracle woul only return the label of the heaviest Gaussian an hence MAP oes not help recover the parameters of the other Gaussians Learning with the MAP oracle is more challenging than with pairwise constraints [Shental 03] as well because with sufficiently many constraints one can iscover the actual label of every sample an recover all the parameters with arbitrary accuracy Our moel is ifferent from that of [Castelli 96] where it is assume that the parameters of each component are known the mixing weights are unknown), an the learner just nees to be able to approximate the posterior The example of a mixture of or more ientical almost concentric Gaussians given above shows that in orer for the MAP oracle to help the learner, the component ensities nee to be sufficiently separate Such separation assumptions are often require for unsupervise learning eg see j ) 490

3 Satyaki Mahalanabis Table for the case of mixture of Gaussians), though the separation we require shoul be smaller because we have help from MAP Finally note that any learning algorithm requires Ω/ ) unlabele samples so that there are sufficiently many samples from each component from which to estimate their parameter Mixture of Gaussians Let N u,σ to enote a Gaussian with mean u an covariance matrix Σ The spherical Gaussian in R has ensity N u,σ I x) = πσ ) e x u /σ Table : Sample complexity an separation for learning mixture of Gaussians all algorithms except ours are unsupervise) Separation Complexity [Dasgupta 99] Ω / ) max{σ i, σ j} Õ polyk)) [Dasgupta 07] Ω /4 ) max{σ i, σ j} Õk ) [Arora 0] Ω /4 ln ) max{σ i, σ j} Õ polyk)) [Vempala 0] Ωk /4 lnk)) max{σ i, σ j} Õ 3 k ) [Belkin 0] none poly) e polyk) [Moitra 0] none poly) e polyk) Ours Ω p lnk)) max{σ i, σ j} Õk) where I is the ientity matrix, an enotes the eucliean norm We will emonstrate the avantage provie by the MAP oracle by giving an algorithm Algorithm ) which learns a mixture of Gaussians µx) = k i= w in ui,σ i I x) at a smaller separation than previous unsupervise algorithms like [Dasgupta 99, Arora 0] see Table ) Our main result an separation conition are as follows Theorem Restate from Theorem, Section 3) There is an algorithm which, for any, 0 < < 0, 0 < δ < an for any mixture of k Gaussians µ = i w i N uiσi I satisfying the following separation conition u i u j i j, 30 ln 44 ) / + 3 ) σi + σ ln, j raws at most 0 6 ln 4k+) δ ) 4k+) δ ) ) unlabele) samples, makes 06 ln calls to MAP, an returns {û i, ˆσ i, ŵ i )} k i= such that with probability at least δ, i, u i û i σ i, ˆσ i σ i σ i an ŵ i w i w i 3) Table compares our separation assumption ) with that require by previous unsupervise algorithms We want to be able to learn the mixture parameters for as small a separation as possible, ieally at most polylogarithmic in, k as in ) Our require separation is smaller than that require by eg [Vempala 0] Omax{σ i, σ j }k /4 polylog, k))) Note that uner separation ), the overlap between the Gaussians is such that unsupervise istance base clustering algorithms like [Arora 0, Vempala 0]) may not work We point out that for spherical Gaussians [Arora 0] show that a separation of Ω /4 ) max{σ i, σ j } is necessary such that one can tell with high probability which Gaussian actually generate each sample from µ Recently [Belkin 0, Moitra 0] have given algorithms for mixture of Gaussians which o not require any separation assumption However [Moitra 0] prove that any unsupervise) learning algorithm for mixture of Gaussians with no separation assumption will require exponentially many samples in k In contrast, our algorithm provably) requires only polynomially many in k) samples uner a weak) separation assumption an is much simpler than theirs 3 Detaile Results See Theorem an separation 5) for a more precise statement Theorem essentially says that the number of calls mae by our algorithm to the MAP oracle in orer to achieve an error of is inepenent of ie only O lnk)/ ), provie the separation between the Gaussians is only polylogarithmic in, / The accuracy achieve 3) by our algorithm implies that each estimate component is statistically close to the true component : for each i, N ui,σi I N û i,ˆσ i I This means the hypothesis mxture our algorithm outputs is within O) wrt L or total variation istance) of the true mixture istribution We will use N u,σ to enote both the ensity an the probability measure 3 Notations an Auxiliary Lemmas We will use e, e,, e to enote the usual orthonormal basis of R Given vectors u, v R, we will use u v for their inner prouct Φ will enote the cf of the stanar normal istribution N 0, Lemma Chernoff Boun, see eg [Dubhashi 09]) Let X = n i= X i where {X i } n i= are inepenently is- Table oes not explicitly show epenence on parameter an instea assumes, for the sake of easy comparison, that k wmin c for some constant c > 0 The separation k in [Arora 0] is more complicate an can allow for concentric gaussians if σ i, σ j are ifferent 49

4 tribute in [0, ] Then for any > > 0, [ Pr )E [ X ] X +)E [ X ]] e E[X]/3 4) Fact 3 The KL-ivergence between two Gaussians N u,σ I, N u,σ I is given by KL N u,σ I N u,σ I ) = u u +σ σ + ln ) σ σ We will use the following property of Gaussians in proving Lemma 0 Fact 4 Let ρ = Φ 3/4) Φ /4) For any r 0, > 0 an x, N u,σ, x]) r x u 5 rσ, an 5) Nu,σ [u ρσ, u + ρσ]) N u,σ [u, u + ]) r ρσ rσ 6) Assume wlg that ũ = 0 We have from Markov s inequality that for X Nũ, σ I, }) Nũ, σ I {N u,σ I x) t N u,σ I x) [ N u = Pr,σ I X) N u,σ I X) [ N u t E,σ I X) N u,σ I X) = e x u 4σ t = σr t σ σ e x u 4σ ) / e v r 8 a, ] t ] ) σ/ e x σ σ / ) x π σ ) Definition 5 We will say that a mixture of k Gaussians k i= w in ui,σi I in R is β-separate, where β > 0, if the following pairwise separation conition hols - ) u i u j i j 30 ln + ) σi + σ β ln wj w i j 7) Our require separation conition ) implies that the mixture to be learne is β = wmin 7 -separate We also nee to efine, for all i, j, i j A ij = { w i N ui,σ i I x) w jn uj,σ j I x)} ana i = j i A ij 8) ie A ij is the set where Gaussian i is more likely than j an A i is the set where i is the most likely Gaussian Note that A i is precisely the set of points for which MAP will return label i Consier two Gaussians which are far apart, an a thir Gaussian which is much closer to the first Gaussian than the secon The following Lemma shows that with high probability, the first Gaussian is more likely than the secon one at points rawn from the thir Gaussian Lemma 6 Consier two Gaussians N u,σ I, N u,σ I Let Nũ, σ I be another Gaussian such that u ũ 0 min{ σ, u u } an } σ σ min { σ, σ 0 Then for any t > 0, }) Nũ, σ I {N u,σ I x) t N u,σ I x) t e u u 80σ +σ +3 ) 9) 0) where v = u σ u σ, u 4σ r = σ + σ σ an a = u 4σ For σ σ σ σr, σ σ 5/ Also, uner assumption 9), v r 8 a u u 80σ +σ ) + 8 On substituting these in ), we get }) Nũ, σ I {N u,σ I x) t N u,σ I x) which gives us 0) te u u 80σ +σ +3 ), Next using Lemma 6 we show that in a β-separate mixture, with high probability, component i is more likely a posteriori than any other component j i at a ranom point generate by component i itself Lemma 7 For any β > 0 an any β-separate mixture of k Gaussians i w in ui,σi I, for all i, j, i j, N ui,σ i I A ij) > β ) Proof of Lemma 7) Applying Lemma 6 to Gaussians i, j, we have N ui,σi I A wj ij) e w i u j u i 80σ i +σ j ) +3 Note that conition 9) require for Lemma 6 is satisfie trivially since in this case N u,σ I, Nũ, σ I in Lemma 6 are the same Lemma 7 now follows by substituting separation 7) in the boun above Remark 8 The separation in 7) is the minimum require up to a constant factor) such that ) hols ie such that the i th Gaussian is more likely than others at most points generate by itself One can show this by using 49

5 Satyaki Mahalanabis the following inequality ue to Bretagnolle an Huber see eg [Devroye 0] Chapter 5, Exercise 56) for ensities f, f - f f ) e KLf f) 3 Learning Algorithm an Analysis We first state a subroutine EstimateParameters which takes a set of labele samples as input an uses a sample meianbase estimator for computing the parameters of each Gaussian component This subroutine is use by the main Algorithm Applying this to N ui,σi I, N u j,σj I for the case w i = w j yiels, using Fact 3, N ui,σi I A ij) = Nui,σi I N u j,σj I e ui u j max{σ i,σ j } which shows that separation 7) is tight The following Lemma will be use to show that in Algorithm step, when the estimate parameters {ū i, σ i, w i } k i= are sufficiently close to the true parameters {u i, σ i, w i )} k i=, component N ū i, σ i I is more likely than any other Nūj, σ j I at most points generate by the ith component N ui,σi I This means that once our estimates have converge close to the true parameters, we nee not make any more calls to the MAP oracle an instea approximate MAP using {ū i, σ i, w i } k i= Lemma 9 Let β > 0 Let i w i N ui,σi I be a β- separate mixture of k Gaussians, an i w i Nūi, σ i I be another mixture of k Gaussians such that i ū i u i σ i 0, σ i σ i σ i an w i w i w i 0 Then for all i, j, i j, { N ui,σi I wi Nūi, σ i I w } ) jnūj, σ j I β 3) Proof of Lemma 9) Conition 3) an β-separation 7) imply that conition 9) require for Lemma 6 is satisifie for Gaussians Nūi, σ i I, N ū j, σ j I an N u i,σi I Hence by Lemma 6, { N ui,σi I wi Nūi, σ i I w } ) jnūj, σ j I wj e ū i ū j 80 σ +3 wj i + σ j ) w i w i e u i u j 60σ i +σ j ) +4 4) where we have use the fact that uner separation 7) an 3), ū i ū j 4 5 u i u j an 9 0 wi w i, wj w j 0 an 9 0 σi σ i, σj σ j 0 Substituting 7) in 4) gives the Lemma Subroutine : EstimateParameters Input: Set S of labele samples Output: { ũ i, σ i, w i ) } k i= begin for i = k o S i { x x, i) S } ; ũ i meian x S i x e, meian x S i ) x e,, meian x e ; x S i σ i Φ 3/4) Φ /4) meian x S i x e ũ i e ; w i S i / S + S + + S k ) ; en en For estimating σ i in step of EstimateParameters, we coul have projecte the points in S i along any e l instea of e Intuitively, if the points in S i are rawn from the exact istribution N ui,σ I, then ũ i = u i, σ i = σ i in expectation However in our case, because of the overlap between Gaussians there will be some samples in S i which were generate by other components j i We use the sample meian to filter out these outliers A naïve way learning algorithm woul be to raw ) lnk) O unlabele samples an then invoke MAP on each of them The points labele i can then be use to compute the parameters of the i th Gaussian However we show below that Algorithm actually makes fewer calls to MAP by requesting labels only initially The ) number of labels require thus ecreases to O lnk) ie inepenent of ) Algorithm is base on Lemma 7 an Lemma 9 The parameters m, m are set as state in Theorem Algorithm works in two phases In the first phase steps, ) it uses the MAP oracle to label each ranom samplethe number of samples m is sufficiently large for the estimate centres {ū i } k i= to be within constant istance O) of the respective true centres {u i } k i= In the secon phase steps 3, 4, 5) Algorithm oes not nee to use MAP at all Instea it uses estimates {ū i, σ i, w i )} k i= from the first phase to approximate the MAP oracle see Lemma 9), an the final estimate centres {ū i } k i= are guarantee to be within istance of the respective true ones We first analyze subroutine EstimateParameters before an- 493

6 Algorithm : Learning Algorithm for Mixture of Gaussians Input: Sample sizes m, m Output: { û i, ˆσ i, ŵ i ) } k i= begin S ; S ; for i = m o x SAMP) ; l MAPx) ; S S {x, l)} ; en { ūi, σ i, w i ) } k i= EstimateParametersS) ; for i = m o 3 x SAMP) ; l argmax w l Nūl, σ l l I x) ; 4 S S {x, l )} ; en { 5 ûi, ˆσ i, ŵ i ) } k i= EstimateParametersS ) ; en alyzing Algorithm Lemma 0 Let > δ > 0 an 00 > > 0, 00 > β > 0 Let the input S to EstimateParameters be such that the points { x i x, i) S } are inepenent samples from the mixture of k Gaussians w i N ui,σi I Assume S 6 lnk + )/δ ) Further, in EstimateParameters assume that for each i = k, the set S i ie points labele i) consists of inepenent samples from a ensity µ i, where µ i N ui,σ i I β, an that 5) w i β ) E [ Si / S ] wi + β ) 6) Then with probability at least δ, i ũ i u i 5 + β ) σ i, σ i σ i 9 + β )σ i an w i w i + β )w i Proof of Lemma 0) From Chernoff boun 4) an assumption 6), it follows that given S 6 lnk+)/δ ), with probability at least δ +, for each i k, S i 6 ln+)k/δ ) an w i w i + β )w i Consier any e l For each i, it follows from conition 5) that Pr [ ] X el ũ i e l Nui e l,σi, ũi e l ] ) β X µ i 7) Let F i,l enote the empirical istribution of points in S i projecte onto e l ie F i,l A) = {x Si x e l A} / Si Now ũi e l is the meian of F i,l If S i 6 ln+)k/δ ), it follows from the Chernoff boun 4) that with probability at least δ k+), F i,l, ũ i e l ]) Pr X el ũ i e X µ i[ ] l = Pr [ ] 8) X el ũ i e l X µ i Combining 7), 8) we have N ]) u i e l,σi, ũi e l β + 9) Now 9) an boun 5) in Fact 4 together imply that ũ i e l u i e l 5β + )σ i Hence by applying the union boun over all irections l =, we have that with probability at least δ k+), ũ i u i 5β + ) σ i For σ i a similar argument applies Define ρ = Φ 3/4) Φ /4) From conition 5), it follows that Pr [ ] X e ũ i e ρ σ i X µ i N ui e,σ i [ũi e ρ σ i, ũ i e + ρ σ i ] ) β 0) Note that F i,l [ũi e ρ σ i, ũ i e + ρ σ i ] ) = by efinition If S i 6 lnk + )/δ ) then by the Chernoff boun 4), with probability at least δ k+), F i,l [ũi e ρ σ i, ũ i e + ρ σ i ] ) Pr X e ũ i e ρ σ X µ i[ ] i = Pr [ ] X e ũ i e ρ σi X µ i Combining 0), ) we get N u i e,σi [ui e ρσ i, u i e + ρσ i ] ) N ui e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) = N u i e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) β + Now it follows from 9) that N u i e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) N ui e,σ i [ui e ρ σ i, u i e + ρ σ i ] ) β + ) ) ) 3) 494

7 Satyaki Mahalanabis Combining ),3) we have N u i e,σi [ui e ρσ i, u i e + ρσ i ] ) N ui e,σi [ui e ρ σ i, u i e + ρ σ i ] ) N u i e,σi [ui e ρσ i, u i e + ρσ i ] ) N ui e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) + N u i e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) N ui e,σ i [ui e ρ σ i, u i e + ρ σ i ] ) β + ) + β + ) = 3β + ) 4) It follows from 4) an boun 6), Fact 4 that σ i σ i 6 ρ β + )σ i < 9β + )σ i with probability at least δ k+) Hence if S i 6 lnk + )/δ ) then with probability at least δ k, ũ i u i 5 + β ) σ i, σ i σ i 9 + β )σ i an w i w i + β )w i The Lemma now follows by applying the union boun to all i = k We finally state our main theorem Theorem Let 0 < < 0, 0 < δ < Then for any mixture of k Gaussians µ = i w in uiσi I satisfying the following separation conition u i u j i j, 30 ln 44 ) / + 3 ) σi + σ ln j ) 5) Algorithm with m = 06 ln 4k+) δ an m = ) 000 ln 4k+) δ returns {û i, ˆσ i, ŵ i )} k i= such that with probability at least δ, i, u i û i σ i, ˆσ i σ i σ i an ŵ i w i w i 6) Proof of Theorem ) Consier running Algorithm with parameters m = 06 ln4k + )/δ), m = 000 ln4k+)/δ) Note that Algorithm makes only m calls to MAP We first analyze steps, Uner separation 5), our mixture µ is β )-separate where β = 7 Therefore by Lemma 7 we have i j N ui,σ i I A ij) β β k 7) which implies that i j, N uj,σ j I A i) β an N ui,σ i I A i) β, 8) where we have use the union boun From 8) it follows that w i β) j w j N uj,σ j I A i) w i + β) 9) Now note that for each i, set of points S i which are labele i in step are rawn inepenently from µ i where µ ix) = P j wjn u j,σ j I x) P j wjn u j,σ j I Ai) if x A i 0 otherwise 30) Using 7), 8) an 9) we have that for each i, µ i N ui,σ i I = N ui,σi I R ) \ A i + µ i N ui,σ i A I i N ui,σi I R ) w j N \ A i + uj,σ j I j i A i j w jn uj,σj I A i) + w i A i j w jn uj,σj I A i) N u i,σi I β + β w i β) + β β < 4β 3) Now S = m 06 ln4k + )/δ) an by 9), w i β) E [ ] Si / S = j w jn uj,σj I A i) w i + β) Hence we can apply Lemma 0 with = 80, β = 4β = 8, δ = δ ) to the call to EstimateParameters in step Thus Lemma 0 implies that with probability at least δ i ū i u i 8 σ i, σ i σ i 0 σ i, an w i w i 45 w i 3) Next we analyze steps 3, 4 an 5 For each i, the set of samples S i which are labele i in step 3 are rawn inepenently from istribution µ i, efine as P j wjn u j,σ µ j I x) i x) = Pj wjn Āi) if x Āi u j,σ j I 33) 0 otherwise, where Āi = j i Āij an Āij = { w i Nūj, σ j I x) w j Nūj, σ j I x)} 495

8 If 3) hols, ) by Lemma 9 we have for each i j, N ui,σi I Āij β This implies proof is similar to that of 3), 9) above) that for each i, µ i N ui,σ i I 4β = 8 an w i β) E [ S i / S ] w i + β) Also S = m 000 ln4k + )/δ) So if 3) hols, we can apply Lemma 0 to the call to EstimateParameters in step 5 with = 8, β = 4β = 8, δ = δ This gives us that with probability at least δ, i û i u i < 5 9 σ i, ˆσ i σ i < σ i an ŵ i w i < 4 w i 34) Finally by the union boun over the two calls to EstimateParameters) we have that 34) hols with probability at least δ 4 Conclusion We have escribe an algorithm, Algorithm which, given ranom samples generate by a unknown mixture of Gaussians an the maximum-a-posteriori oracle ), is able to recover its parameters assuming only a mil separation conition ) between the component Gaussians hols There are a number of ways of extening our algorithm For instance we can prove the correctness of our algorithm only for spherical Gaussians, an it is likely that one can moify our algorithm an the separation conition to work for arbitrary Gaussians as well One can also ask how our algorithm coul better utilize unlabele samples - this coul probably give a better error boun an a better label complexity Finally, one can try to give algorithms for more realistic settings by allowing for the unlabele samples or the oracle to be noisy Acknowlegment I am grateful to Daniel Štefankovič for helpful iscussions an for suggesting improvements to an earlier version of this paper I woul also like to thank the anonymous reviewers for helpful comments This material is base upon work supporte by National Science Founation grants CCF-09045, IIS-04800, ISS an University of Pennsylvania subcontract NSF # Any opinions, finings, an conclusions or recommenations expresse in this material are those of the author an o not necessarily reflect the views of above name organizations References [Achlioptas 05] Dimitris Achlioptas & Frank McSherry On Spectral Learning of Mixtures of Distributions In COLT, pages , 005 [Arora 0] [Basu 0] [Basu 04] [Belkin 0] [Castelli 96] Sanjeev Arora & Ravi Kannan Learning mixtures of arbitrary gaussians In STOC, pages 47 57, 00 Sugato Basu, Arinam Banerjee & Raymon J Mooney Semi-supervise Clustering by Seeing In ICML, pages 7 34, 00 Sugato Basu, Arinam Banerjee & Raymon J Mooney Active Semi-Supervision for Pairwise Constraine Clustering In SDM, 004 Mikhail Belkin & Kaushik Sinha Polynomial Learning of Distribution Families In IEEE FOCS, 00 Vittorio Castelli & Thomas M Cover The relative value of labele an unlabele samples in pattern recognition with an unknown mixing parameter IEEE Transactions on Information Theory, vol 4, no 6, pages 0 7, 996 [Dasgupta 99] Sanjoy Dasgupta Learning Mixtures of Gaussians In FOCS, pages , 999 [Dasgupta 07] [Dempster 77] Sanjoy Dasgupta & Leonar J Schulman A Probabilistic Analysis of EM for Mixtures of Separate, Spherical Gaussians Journal of Machine Learning Research, vol 8, pages 03 6, 007 A P Dempster, N M Lair & D B Rubin Maximum likelihoo from incomplete ata via the EM algorithm J Roy Statist Soc Ser B, vol, pages 38, 977 [Devroye 0] Luc Devroye & Gabor Lugosi Combinatorial methos in ensity estimation Springer Series in Statistics Springer- Verlag, New York, 00 [Dubhashi 09] [Felman 06] Devatt Dubhashi & Alessanro Panconesi Concentration of measure for the analysis of ranomize algorithms Cambrige University Press, 009 Jon Felman, Rocco A Serveio & Ryan O Donnell PAC Learning Axis-Aligne Mixtures of Gaussians with No Separation Assumption In COLT, pages 0 34,

9 Satyaki Mahalanabis [Kannan 08] [Melnykov 0] [Moitra 0] [Shental 03] [Vempala 0] Ravinran Kannan, Hai Salmasian & Santosh Vempala The Spectral Metho for General Mixture Moels SIAM J Comput, vol 38, no 3, pages 4 56, 008 Voloymyr Melnykov & Ranjan Maitra Finite mixture moels an moel-base clustering Statist Surv, vol 4, 00 Ankur Moitra & Gregory Valiant Settling the Polynomial Learnability of Mixtures of Gaussians In IEEE FOCS, 00 Noam Shental, Aharon Bar-Hillel, Tomer Hertz & Daphna Weinshall Computing Gaussian Mixture Moels with EM Using Equivalence Constraints In NIPS, 003 Santosh Vempala & Grant Wang A Spectral Algorithm for Learning Mixtures of Distributions In IEEE FOCS, 00 [Xing 0] Eric P Xing, Anrew Y Ng, Michael I Joran & Stuart J Russell Distance Metric Learning with Application to Clustering with Sie-Information In NIPS, pages 505 5,

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

INDEPENDENT COMPONENT ANALYSIS VIA

INDEPENDENT COMPONENT ANALYSIS VIA INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 4 2 0 2 4 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 2 1 0 1 2 3 4 Marginal

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

arxiv: v1 [cs.lg] 22 Mar 2014

arxiv: v1 [cs.lg] 22 Mar 2014 CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com

More information

Chromatic number for a generalization of Cartesian product graphs

Chromatic number for a generalization of Cartesian product graphs Chromatic number for a generalization of Cartesian prouct graphs Daniel Král Douglas B. West Abstract Let G be a class of graphs. The -fol gri over G, enote G, is the family of graphs obtaine from -imensional

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Database-friendly Random Projections

Database-friendly Random Projections Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

Settling the Polynomial Learnability of Mixtures of Gaussians

Settling the Polynomial Learnability of Mixtures of Gaussians Settling the Polynomial Learnability of Mixtures of Gaussians Ankur Moitra Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 moitra@mit.edu Gregory Valiant

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Lecture 6 : Dimensionality Reduction

Lecture 6 : Dimensionality Reduction CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University

More information

Integration Review. May 11, 2013

Integration Review. May 11, 2013 Integration Review May 11, 2013 Goals: Review the funamental theorem of calculus. Review u-substitution. Review integration by parts. Do lots of integration eamples. 1 Funamental Theorem of Calculus In

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners Lower Bouns for Local Monotonicity Reconstruction from Transitive-Closure Spanners Arnab Bhattacharyya Elena Grigorescu Mahav Jha Kyomin Jung Sofya Raskhonikova Davi P. Wooruff Abstract Given a irecte

More information

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations Diophantine Approximations: Examining the Farey Process an its Metho on Proucing Best Approximations Kelly Bowen Introuction When a person hears the phrase irrational number, one oes not think of anything

More information

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH English NUMERICAL MATHEMATICS Vol14, No1 Series A Journal of Chinese Universities Feb 2005 TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH He Ming( Λ) Michael K Ng(Ξ ) Abstract We

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

On the Surprising Behavior of Distance Metrics in High Dimensional Space

On the Surprising Behavior of Distance Metrics in High Dimensional Space On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard

Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard Journal of Machine Learning Research 18 018 1-11 Submitted 1/16; Revised 1/16; Published 4/18 Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard Christopher Tosh Sanjoy Dasgupta

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,

More information

Algorithms and matching lower bounds for approximately-convex optimization

Algorithms and matching lower bounds for approximately-convex optimization Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department

More information

Iterated Point-Line Configurations Grow Doubly-Exponentially

Iterated Point-Line Configurations Grow Doubly-Exponentially Iterate Point-Line Configurations Grow Doubly-Exponentially Joshua Cooper an Mark Walters July 9, 008 Abstract Begin with a set of four points in the real plane in general position. A to this collection

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Calculus in the AP Physics C Course The Derivative

Calculus in the AP Physics C Course The Derivative Limits an Derivatives Calculus in the AP Physics C Course The Derivative In physics, the ieas of the rate change of a quantity (along with the slope of a tangent line) an the area uner a curve are essential.

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

PLAL: Cluster-based Active Learning

PLAL: Cluster-based Active Learning JMLR: Workshop an Conference Proceeings vol 3 (13) 1 22 PLAL: Cluster-base Active Learning Ruth Urner rurner@cs.uwaterloo.ca School of Computer Science, University of Waterloo, Canaa, ON, N2L 3G1 Sharon

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, an Tony Wu Abstract Popular proucts often have thousans of reviews that contain far too much information for customers to igest. Our goal for the

More information

3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes

3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes Fin these erivatives of these functions: y.7 Implicit Differentiation -- A Brief Introuction -- Stuent Notes tan y sin tan = sin y e = e = Write the inverses of these functions: y tan y sin How woul we

More information

Lecture 10: October 30, 2017

Lecture 10: October 30, 2017 Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

Quantum mechanical approaches to the virial

Quantum mechanical approaches to the virial Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Isotropic PCA and Affine-Invariant Clustering

Isotropic PCA and Affine-Invariant Clustering Isotropic PCA and Affine-Invariant Clustering (Extended Abstract) S. Charles Brubaker Santosh S. Vempala Georgia Institute of Technology Atlanta, GA 30332 {brubaker,vempala}@cc.gatech.edu Abstract We present

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

International Journal of Pure and Applied Mathematics Volume 35 No , ON PYTHAGOREAN QUADRUPLES Edray Goins 1, Alain Togbé 2

International Journal of Pure and Applied Mathematics Volume 35 No , ON PYTHAGOREAN QUADRUPLES Edray Goins 1, Alain Togbé 2 International Journal of Pure an Applie Mathematics Volume 35 No. 3 007, 365-374 ON PYTHAGOREAN QUADRUPLES Eray Goins 1, Alain Togbé 1 Department of Mathematics Purue University 150 North University Street,

More information

26.1 Metropolis method

26.1 Metropolis method CS880: Approximations Algorithms Scribe: Dave Anrzejewski Lecturer: Shuchi Chawla Topic: Metropolis metho, volume estimation Date: 4/26/07 The previous lecture iscusse they some of the key concepts of

More information

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Pseudo-Free Families of Finite Computational Elementary Abelian p-groups

Pseudo-Free Families of Finite Computational Elementary Abelian p-groups Pseuo-Free Families of Finite Computational Elementary Abelian p-groups Mikhail Anokhin Information Security Institute, Lomonosov University, Moscow, Russia anokhin@mccme.ru Abstract We initiate the stuy

More information

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS MARK SCHACHNER Abstract. When consiere as an algebraic space, the set of arithmetic functions equippe with the operations of pointwise aition an

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

On colour-blind distinguishing colour pallets in regular graphs

On colour-blind distinguishing colour pallets in regular graphs J Comb Optim (2014 28:348 357 DOI 10.1007/s10878-012-9556-x On colour-blin istinguishing colour pallets in regular graphs Jakub Przybyło Publishe online: 25 October 2012 The Author(s 2012. This article

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

The chromatic number of graph powers

The chromatic number of graph powers Combinatorics, Probability an Computing (19XX) 00, 000 000. c 19XX Cambrige University Press Printe in the Unite Kingom The chromatic number of graph powers N O G A A L O N 1 an B O J A N M O H A R 1 Department

More information

Sharp Thresholds. Zachary Hamaker. March 15, 2010

Sharp Thresholds. Zachary Hamaker. March 15, 2010 Sharp Threshols Zachary Hamaker March 15, 2010 Abstract The Kolmogorov Zero-One law states that for tail events on infinite-imensional probability spaces, the probability must be either zero or one. Behavior

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

Nested Saturation with Guaranteed Real Poles 1

Nested Saturation with Guaranteed Real Poles 1 Neste Saturation with Guarantee Real Poles Eric N Johnson 2 an Suresh K Kannan 3 School of Aerospace Engineering Georgia Institute of Technology, Atlanta, GA 3332 Abstract The global stabilization of asymptotically

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

An Analytical Expression of the Probability of Error for Relaying with Decode-and-forward

An Analytical Expression of the Probability of Error for Relaying with Decode-and-forward An Analytical Expression of the Probability of Error for Relaying with Decoe-an-forwar Alexanre Graell i Amat an Ingmar Lan Department of Electronics, Institut TELECOM-TELECOM Bretagne, Brest, France Email:

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information