Learning Mixtures of Gaussians with Maximum-a-posteriori Oracle
|
|
- Gloria Hutchinson
- 6 years ago
- Views:
Transcription
1 Satyaki Mahalanabis Dept of Computer Science University of Rochester Abstract We consier the problem of estimating the parameters of a mixture of istributions, where each component istribution is from a given parametric family eg exponential, Gaussian etc We efine a learning moel in which the learner has access to a maximum-a-posteriori oracle which given any sample from a mixture of istributions, tells the learner which component istribution was the most likely to have generate it We escribe a learning algorithm in this setting which accurately estimates the parameters of a mixture of k spherical Gaussians in R assuming the component Gaussians satisfy a mil separation conition Our algorithm uses only polynomially many in, k) samples an oracle calls, an our separation conition is much weaker than those require by unsupervise learning algorithms like [Arora 0, Vempala 0] Introuction Learning mixtures of istributions, such as mixture of Gaussians, is one of the most well-stuie problems in machine learning see eg [Melnykov 0]) However, most of the known learning algorithms work in an unsupervise setting an not much is known theoretically about how sie information eg in the form of class labels [Basu 0] or pairwise constraints [Shental 03]) can help the learner In this paper, we efine a learning moel in which the learner has access to a maximum-a-posteriori oracle, which for any sample from a mixture of istributions, tells the learner which component istribution was most likely to have generate it Our maximum-a-posteriori oracle can be thought of as an expert who can be aske to guess the most likely Appearing in Proceeings of the 4 th International Conference on Artificial Intelligence an Statistics AISTATS) 0, Fort Lauerale, FL, USA Volume 5 of JMLR: W&CP 5 Copyright 0 by the authors class label for a sample point Then we emonstrate the avantage provie by this oracle by giving an algorithm Algorithm ) which recovers the parameters of a mixture of k, -imensional spherical Gaussians uner a weaker separation assumption an more efficiently than known unsupervise algorithms The number of oracle calls mae by our algorithm is polynomial in, k an is inepenent of the esire error The problem of learning mixture of Gaussians has a long history Various learning moels have been consiere in literature There is the clustering framework [Dasgupta 99, Arora 0] in which the goal is to correctly label each sample generate by the mixture with the Gaussian that generate it Then there is the istribution learning moel [Felman 06] in which the goal is to output a hypothesis mixture which is close to the unknown mixture as possible Probably the most popular moel however is one in which the parameters of the original mixture nee to be recovere eg [Dempster 77, Belkin 0] an this is the moel we use EM [Dempster 77] is perhaps the most well known algorithm in this moel Note that for the clustering problem, it is necessary to have a separation assumption [Arora 0], unlike for the problem of recovering parameters Note also that our moel is more ifficult than learning the istribution Almost all algorithms for clustering or parameter learning which give provable guarantees also require that each Gaussian in the hypothesis mixture be separate from others [Vempala 0, Arora 0] For the clustering framework, [Arora 0] give a lower boun on the separation for the case of spherical Gaussians We also nee to make an separation assumption see 7)) which is much weaker than those require by eg [Arora 0, Vempala 0] However we have access to a maximum-a-posteriori oracle whereas the settings in [Arora 0, Vempala 0, Dasgupta 99, Belkin 0, Moitra 0] are unsupervise Recently [Belkin 0, Moitra 0] have given algorithms for learning parameters which require no separation assumption However the absence of a separation assumption means that their algorithms, unlike ours or [Arora 0, Vempala 0], may require exponentially many samples in 489
2 the number of mixture components Our algorithm is also, not surprisingly, simpler than theirs We also note that our results are for spherical Gaussians only The algorithm in [Dasgupta 99] works only when all Gaussians have the same covariance matrix This was extene to Gaussians having arbitrary covariance matrices by [Arora 0] The results in [Vempala 0], like ours, are for spherical Gaussians They were the first to use spectral methos, an their technique has since been extene to arbitrary log-concave istributions [Kannan 08, Achlioptas 05] Algorithms in [Belkin 0, Moitra 0] can learn mixtures of arbitrary gaussians Even though semi-supervise clustering has been a popular area of research eg [Basu 0, Xing 0, Basu 04], not much is known theoretically about semi-supervise or active learning of mixture of Gaussians A variant of EM using pairwise constraints is given by [Shental 03] They have two types of constraints: pairs of points which were generate by the same Gaussian, an pairs which were not These type of pairwise relations seem to be the most popular moel of semi-supervise learning of mixture moels [Melnykov 0] In our moel we consier a maximum-a-posteriori oracle an as we point out in Section, learning with this oracle can be more challenging than with pairwise constraints Our setting is ifferent from semi-supervise learning because the learner is allowe to choose the points for which it wants labels However the algorithm we give for mixtures of Gaussians queries the label of every sample it raws initially an later on uses only unlabele samples Finally, while our aim in this paper is to theoretically emonstrate the avantage of the maximum-a-posteriori oracle, the oracle moels an expert in a natural way an shoul fin practical applications The various UCI atasets iscusse in the Experimental Results section of [Shental 03] are examples where our oracle learning moel coul be applicable Our Algorithm can be use whenever a ataset can be moele reasonably accurately using a mixture of Gaussians an sie information is available in the form of which clusters selecte ata points belong to The paper is organize as follows In Section we first efine our learning moel subsection ), an then we state our result about learning mixture of Gaussians in this moel subsection ) Then in Section 3 we justify the separation assumption that we require, an then present our algorithm an analyze it Finally, in Section 4 we iscuss how our moel an our algorithm coul be extene Summary of Contributions We first efine the maximum-a-posteriori oracle an then give our result for mixtures of gaussians Learning with Maximum-a-posteriori Oracle We first efine the learning moel Consier a mixture ensity in R efine as µx) = k i= w iµ i x), where i w i = an = min w i > 0 The ensities {µ i } k i= belong to a parametric family eg uniform, i Gaussian etc) an have parameters {θ i } k i= respectively Given an unknown mixture ensity, the goal of a learner is to output estimates {ˆθ i, ŵ i )} k i= of {θ i, w i )} k i= respectively The learner can raw inepenent unlabele) samples from µ by invoking an oracle calle SAMP For any sample x returne by SAMP, the learner can also choose to invoke an oracle, MAP, which labels x with the most likely ensity that coul have generate x ie MAPx) = argmax p i x) where i i, p i x) = w i µ i x) / w j µ j x) is the posterior istribution on { k} For the case of mixture of Gaussians with no two Gaussians being concentric), note that the set of points where or more ensities are equally likely has measure 0 an hence for a ranom sample from µ almost surely there is a unique most likely a posteriori) ensity as efine by ) We point out that in our moel the number of mixture components k is known to the learner Typically we have k We next compare our moel with other similar settings We point out that learning using the MAP oracle is more ifficult than in a semi-supervise setting where each point in a ranom subset of samples is labele accoring to which component actually generate it ie the label is rawn ranomly from posterior p i ) To see this consier a mixture of or more ientical almost concentric Gaussians in which one has a much greater weight than the others If points are labele accoring to the posterior, enough samples from each Gaussian can obtaine to estimate the parameters with arbitrary accuracy whereas the MAP oracle woul only return the label of the heaviest Gaussian an hence MAP oes not help recover the parameters of the other Gaussians Learning with the MAP oracle is more challenging than with pairwise constraints [Shental 03] as well because with sufficiently many constraints one can iscover the actual label of every sample an recover all the parameters with arbitrary accuracy Our moel is ifferent from that of [Castelli 96] where it is assume that the parameters of each component are known the mixing weights are unknown), an the learner just nees to be able to approximate the posterior The example of a mixture of or more ientical almost concentric Gaussians given above shows that in orer for the MAP oracle to help the learner, the component ensities nee to be sufficiently separate Such separation assumptions are often require for unsupervise learning eg see j ) 490
3 Satyaki Mahalanabis Table for the case of mixture of Gaussians), though the separation we require shoul be smaller because we have help from MAP Finally note that any learning algorithm requires Ω/ ) unlabele samples so that there are sufficiently many samples from each component from which to estimate their parameter Mixture of Gaussians Let N u,σ to enote a Gaussian with mean u an covariance matrix Σ The spherical Gaussian in R has ensity N u,σ I x) = πσ ) e x u /σ Table : Sample complexity an separation for learning mixture of Gaussians all algorithms except ours are unsupervise) Separation Complexity [Dasgupta 99] Ω / ) max{σ i, σ j} Õ polyk)) [Dasgupta 07] Ω /4 ) max{σ i, σ j} Õk ) [Arora 0] Ω /4 ln ) max{σ i, σ j} Õ polyk)) [Vempala 0] Ωk /4 lnk)) max{σ i, σ j} Õ 3 k ) [Belkin 0] none poly) e polyk) [Moitra 0] none poly) e polyk) Ours Ω p lnk)) max{σ i, σ j} Õk) where I is the ientity matrix, an enotes the eucliean norm We will emonstrate the avantage provie by the MAP oracle by giving an algorithm Algorithm ) which learns a mixture of Gaussians µx) = k i= w in ui,σ i I x) at a smaller separation than previous unsupervise algorithms like [Dasgupta 99, Arora 0] see Table ) Our main result an separation conition are as follows Theorem Restate from Theorem, Section 3) There is an algorithm which, for any, 0 < < 0, 0 < δ < an for any mixture of k Gaussians µ = i w i N uiσi I satisfying the following separation conition u i u j i j, 30 ln 44 ) / + 3 ) σi + σ ln, j raws at most 0 6 ln 4k+) δ ) 4k+) δ ) ) unlabele) samples, makes 06 ln calls to MAP, an returns {û i, ˆσ i, ŵ i )} k i= such that with probability at least δ, i, u i û i σ i, ˆσ i σ i σ i an ŵ i w i w i 3) Table compares our separation assumption ) with that require by previous unsupervise algorithms We want to be able to learn the mixture parameters for as small a separation as possible, ieally at most polylogarithmic in, k as in ) Our require separation is smaller than that require by eg [Vempala 0] Omax{σ i, σ j }k /4 polylog, k))) Note that uner separation ), the overlap between the Gaussians is such that unsupervise istance base clustering algorithms like [Arora 0, Vempala 0]) may not work We point out that for spherical Gaussians [Arora 0] show that a separation of Ω /4 ) max{σ i, σ j } is necessary such that one can tell with high probability which Gaussian actually generate each sample from µ Recently [Belkin 0, Moitra 0] have given algorithms for mixture of Gaussians which o not require any separation assumption However [Moitra 0] prove that any unsupervise) learning algorithm for mixture of Gaussians with no separation assumption will require exponentially many samples in k In contrast, our algorithm provably) requires only polynomially many in k) samples uner a weak) separation assumption an is much simpler than theirs 3 Detaile Results See Theorem an separation 5) for a more precise statement Theorem essentially says that the number of calls mae by our algorithm to the MAP oracle in orer to achieve an error of is inepenent of ie only O lnk)/ ), provie the separation between the Gaussians is only polylogarithmic in, / The accuracy achieve 3) by our algorithm implies that each estimate component is statistically close to the true component : for each i, N ui,σi I N û i,ˆσ i I This means the hypothesis mxture our algorithm outputs is within O) wrt L or total variation istance) of the true mixture istribution We will use N u,σ to enote both the ensity an the probability measure 3 Notations an Auxiliary Lemmas We will use e, e,, e to enote the usual orthonormal basis of R Given vectors u, v R, we will use u v for their inner prouct Φ will enote the cf of the stanar normal istribution N 0, Lemma Chernoff Boun, see eg [Dubhashi 09]) Let X = n i= X i where {X i } n i= are inepenently is- Table oes not explicitly show epenence on parameter an instea assumes, for the sake of easy comparison, that k wmin c for some constant c > 0 The separation k in [Arora 0] is more complicate an can allow for concentric gaussians if σ i, σ j are ifferent 49
4 tribute in [0, ] Then for any > > 0, [ Pr )E [ X ] X +)E [ X ]] e E[X]/3 4) Fact 3 The KL-ivergence between two Gaussians N u,σ I, N u,σ I is given by KL N u,σ I N u,σ I ) = u u +σ σ + ln ) σ σ We will use the following property of Gaussians in proving Lemma 0 Fact 4 Let ρ = Φ 3/4) Φ /4) For any r 0, > 0 an x, N u,σ, x]) r x u 5 rσ, an 5) Nu,σ [u ρσ, u + ρσ]) N u,σ [u, u + ]) r ρσ rσ 6) Assume wlg that ũ = 0 We have from Markov s inequality that for X Nũ, σ I, }) Nũ, σ I {N u,σ I x) t N u,σ I x) [ N u = Pr,σ I X) N u,σ I X) [ N u t E,σ I X) N u,σ I X) = e x u 4σ t = σr t σ σ e x u 4σ ) / e v r 8 a, ] t ] ) σ/ e x σ σ / ) x π σ ) Definition 5 We will say that a mixture of k Gaussians k i= w in ui,σi I in R is β-separate, where β > 0, if the following pairwise separation conition hols - ) u i u j i j 30 ln + ) σi + σ β ln wj w i j 7) Our require separation conition ) implies that the mixture to be learne is β = wmin 7 -separate We also nee to efine, for all i, j, i j A ij = { w i N ui,σ i I x) w jn uj,σ j I x)} ana i = j i A ij 8) ie A ij is the set where Gaussian i is more likely than j an A i is the set where i is the most likely Gaussian Note that A i is precisely the set of points for which MAP will return label i Consier two Gaussians which are far apart, an a thir Gaussian which is much closer to the first Gaussian than the secon The following Lemma shows that with high probability, the first Gaussian is more likely than the secon one at points rawn from the thir Gaussian Lemma 6 Consier two Gaussians N u,σ I, N u,σ I Let Nũ, σ I be another Gaussian such that u ũ 0 min{ σ, u u } an } σ σ min { σ, σ 0 Then for any t > 0, }) Nũ, σ I {N u,σ I x) t N u,σ I x) t e u u 80σ +σ +3 ) 9) 0) where v = u σ u σ, u 4σ r = σ + σ σ an a = u 4σ For σ σ σ σr, σ σ 5/ Also, uner assumption 9), v r 8 a u u 80σ +σ ) + 8 On substituting these in ), we get }) Nũ, σ I {N u,σ I x) t N u,σ I x) which gives us 0) te u u 80σ +σ +3 ), Next using Lemma 6 we show that in a β-separate mixture, with high probability, component i is more likely a posteriori than any other component j i at a ranom point generate by component i itself Lemma 7 For any β > 0 an any β-separate mixture of k Gaussians i w in ui,σi I, for all i, j, i j, N ui,σ i I A ij) > β ) Proof of Lemma 7) Applying Lemma 6 to Gaussians i, j, we have N ui,σi I A wj ij) e w i u j u i 80σ i +σ j ) +3 Note that conition 9) require for Lemma 6 is satisfie trivially since in this case N u,σ I, Nũ, σ I in Lemma 6 are the same Lemma 7 now follows by substituting separation 7) in the boun above Remark 8 The separation in 7) is the minimum require up to a constant factor) such that ) hols ie such that the i th Gaussian is more likely than others at most points generate by itself One can show this by using 49
5 Satyaki Mahalanabis the following inequality ue to Bretagnolle an Huber see eg [Devroye 0] Chapter 5, Exercise 56) for ensities f, f - f f ) e KLf f) 3 Learning Algorithm an Analysis We first state a subroutine EstimateParameters which takes a set of labele samples as input an uses a sample meianbase estimator for computing the parameters of each Gaussian component This subroutine is use by the main Algorithm Applying this to N ui,σi I, N u j,σj I for the case w i = w j yiels, using Fact 3, N ui,σi I A ij) = Nui,σi I N u j,σj I e ui u j max{σ i,σ j } which shows that separation 7) is tight The following Lemma will be use to show that in Algorithm step, when the estimate parameters {ū i, σ i, w i } k i= are sufficiently close to the true parameters {u i, σ i, w i )} k i=, component N ū i, σ i I is more likely than any other Nūj, σ j I at most points generate by the ith component N ui,σi I This means that once our estimates have converge close to the true parameters, we nee not make any more calls to the MAP oracle an instea approximate MAP using {ū i, σ i, w i } k i= Lemma 9 Let β > 0 Let i w i N ui,σi I be a β- separate mixture of k Gaussians, an i w i Nūi, σ i I be another mixture of k Gaussians such that i ū i u i σ i 0, σ i σ i σ i an w i w i w i 0 Then for all i, j, i j, { N ui,σi I wi Nūi, σ i I w } ) jnūj, σ j I β 3) Proof of Lemma 9) Conition 3) an β-separation 7) imply that conition 9) require for Lemma 6 is satisifie for Gaussians Nūi, σ i I, N ū j, σ j I an N u i,σi I Hence by Lemma 6, { N ui,σi I wi Nūi, σ i I w } ) jnūj, σ j I wj e ū i ū j 80 σ +3 wj i + σ j ) w i w i e u i u j 60σ i +σ j ) +4 4) where we have use the fact that uner separation 7) an 3), ū i ū j 4 5 u i u j an 9 0 wi w i, wj w j 0 an 9 0 σi σ i, σj σ j 0 Substituting 7) in 4) gives the Lemma Subroutine : EstimateParameters Input: Set S of labele samples Output: { ũ i, σ i, w i ) } k i= begin for i = k o S i { x x, i) S } ; ũ i meian x S i x e, meian x S i ) x e,, meian x e ; x S i σ i Φ 3/4) Φ /4) meian x S i x e ũ i e ; w i S i / S + S + + S k ) ; en en For estimating σ i in step of EstimateParameters, we coul have projecte the points in S i along any e l instea of e Intuitively, if the points in S i are rawn from the exact istribution N ui,σ I, then ũ i = u i, σ i = σ i in expectation However in our case, because of the overlap between Gaussians there will be some samples in S i which were generate by other components j i We use the sample meian to filter out these outliers A naïve way learning algorithm woul be to raw ) lnk) O unlabele samples an then invoke MAP on each of them The points labele i can then be use to compute the parameters of the i th Gaussian However we show below that Algorithm actually makes fewer calls to MAP by requesting labels only initially The ) number of labels require thus ecreases to O lnk) ie inepenent of ) Algorithm is base on Lemma 7 an Lemma 9 The parameters m, m are set as state in Theorem Algorithm works in two phases In the first phase steps, ) it uses the MAP oracle to label each ranom samplethe number of samples m is sufficiently large for the estimate centres {ū i } k i= to be within constant istance O) of the respective true centres {u i } k i= In the secon phase steps 3, 4, 5) Algorithm oes not nee to use MAP at all Instea it uses estimates {ū i, σ i, w i )} k i= from the first phase to approximate the MAP oracle see Lemma 9), an the final estimate centres {ū i } k i= are guarantee to be within istance of the respective true ones We first analyze subroutine EstimateParameters before an- 493
6 Algorithm : Learning Algorithm for Mixture of Gaussians Input: Sample sizes m, m Output: { û i, ˆσ i, ŵ i ) } k i= begin S ; S ; for i = m o x SAMP) ; l MAPx) ; S S {x, l)} ; en { ūi, σ i, w i ) } k i= EstimateParametersS) ; for i = m o 3 x SAMP) ; l argmax w l Nūl, σ l l I x) ; 4 S S {x, l )} ; en { 5 ûi, ˆσ i, ŵ i ) } k i= EstimateParametersS ) ; en alyzing Algorithm Lemma 0 Let > δ > 0 an 00 > > 0, 00 > β > 0 Let the input S to EstimateParameters be such that the points { x i x, i) S } are inepenent samples from the mixture of k Gaussians w i N ui,σi I Assume S 6 lnk + )/δ ) Further, in EstimateParameters assume that for each i = k, the set S i ie points labele i) consists of inepenent samples from a ensity µ i, where µ i N ui,σ i I β, an that 5) w i β ) E [ Si / S ] wi + β ) 6) Then with probability at least δ, i ũ i u i 5 + β ) σ i, σ i σ i 9 + β )σ i an w i w i + β )w i Proof of Lemma 0) From Chernoff boun 4) an assumption 6), it follows that given S 6 lnk+)/δ ), with probability at least δ +, for each i k, S i 6 ln+)k/δ ) an w i w i + β )w i Consier any e l For each i, it follows from conition 5) that Pr [ ] X el ũ i e l Nui e l,σi, ũi e l ] ) β X µ i 7) Let F i,l enote the empirical istribution of points in S i projecte onto e l ie F i,l A) = {x Si x e l A} / Si Now ũi e l is the meian of F i,l If S i 6 ln+)k/δ ), it follows from the Chernoff boun 4) that with probability at least δ k+), F i,l, ũ i e l ]) Pr X el ũ i e X µ i[ ] l = Pr [ ] 8) X el ũ i e l X µ i Combining 7), 8) we have N ]) u i e l,σi, ũi e l β + 9) Now 9) an boun 5) in Fact 4 together imply that ũ i e l u i e l 5β + )σ i Hence by applying the union boun over all irections l =, we have that with probability at least δ k+), ũ i u i 5β + ) σ i For σ i a similar argument applies Define ρ = Φ 3/4) Φ /4) From conition 5), it follows that Pr [ ] X e ũ i e ρ σ i X µ i N ui e,σ i [ũi e ρ σ i, ũ i e + ρ σ i ] ) β 0) Note that F i,l [ũi e ρ σ i, ũ i e + ρ σ i ] ) = by efinition If S i 6 lnk + )/δ ) then by the Chernoff boun 4), with probability at least δ k+), F i,l [ũi e ρ σ i, ũ i e + ρ σ i ] ) Pr X e ũ i e ρ σ X µ i[ ] i = Pr [ ] X e ũ i e ρ σi X µ i Combining 0), ) we get N u i e,σi [ui e ρσ i, u i e + ρσ i ] ) N ui e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) = N u i e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) β + Now it follows from 9) that N u i e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) N ui e,σ i [ui e ρ σ i, u i e + ρ σ i ] ) β + ) ) ) 3) 494
7 Satyaki Mahalanabis Combining ),3) we have N u i e,σi [ui e ρσ i, u i e + ρσ i ] ) N ui e,σi [ui e ρ σ i, u i e + ρ σ i ] ) N u i e,σi [ui e ρσ i, u i e + ρσ i ] ) N ui e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) + N u i e,σi [ũi e ρ σ i, ũ i e + ρ σ i ] ) N ui e,σ i [ui e ρ σ i, u i e + ρ σ i ] ) β + ) + β + ) = 3β + ) 4) It follows from 4) an boun 6), Fact 4 that σ i σ i 6 ρ β + )σ i < 9β + )σ i with probability at least δ k+) Hence if S i 6 lnk + )/δ ) then with probability at least δ k, ũ i u i 5 + β ) σ i, σ i σ i 9 + β )σ i an w i w i + β )w i The Lemma now follows by applying the union boun to all i = k We finally state our main theorem Theorem Let 0 < < 0, 0 < δ < Then for any mixture of k Gaussians µ = i w in uiσi I satisfying the following separation conition u i u j i j, 30 ln 44 ) / + 3 ) σi + σ ln j ) 5) Algorithm with m = 06 ln 4k+) δ an m = ) 000 ln 4k+) δ returns {û i, ˆσ i, ŵ i )} k i= such that with probability at least δ, i, u i û i σ i, ˆσ i σ i σ i an ŵ i w i w i 6) Proof of Theorem ) Consier running Algorithm with parameters m = 06 ln4k + )/δ), m = 000 ln4k+)/δ) Note that Algorithm makes only m calls to MAP We first analyze steps, Uner separation 5), our mixture µ is β )-separate where β = 7 Therefore by Lemma 7 we have i j N ui,σ i I A ij) β β k 7) which implies that i j, N uj,σ j I A i) β an N ui,σ i I A i) β, 8) where we have use the union boun From 8) it follows that w i β) j w j N uj,σ j I A i) w i + β) 9) Now note that for each i, set of points S i which are labele i in step are rawn inepenently from µ i where µ ix) = P j wjn u j,σ j I x) P j wjn u j,σ j I Ai) if x A i 0 otherwise 30) Using 7), 8) an 9) we have that for each i, µ i N ui,σ i I = N ui,σi I R ) \ A i + µ i N ui,σ i A I i N ui,σi I R ) w j N \ A i + uj,σ j I j i A i j w jn uj,σj I A i) + w i A i j w jn uj,σj I A i) N u i,σi I β + β w i β) + β β < 4β 3) Now S = m 06 ln4k + )/δ) an by 9), w i β) E [ ] Si / S = j w jn uj,σj I A i) w i + β) Hence we can apply Lemma 0 with = 80, β = 4β = 8, δ = δ ) to the call to EstimateParameters in step Thus Lemma 0 implies that with probability at least δ i ū i u i 8 σ i, σ i σ i 0 σ i, an w i w i 45 w i 3) Next we analyze steps 3, 4 an 5 For each i, the set of samples S i which are labele i in step 3 are rawn inepenently from istribution µ i, efine as P j wjn u j,σ µ j I x) i x) = Pj wjn Āi) if x Āi u j,σ j I 33) 0 otherwise, where Āi = j i Āij an Āij = { w i Nūj, σ j I x) w j Nūj, σ j I x)} 495
8 If 3) hols, ) by Lemma 9 we have for each i j, N ui,σi I Āij β This implies proof is similar to that of 3), 9) above) that for each i, µ i N ui,σ i I 4β = 8 an w i β) E [ S i / S ] w i + β) Also S = m 000 ln4k + )/δ) So if 3) hols, we can apply Lemma 0 to the call to EstimateParameters in step 5 with = 8, β = 4β = 8, δ = δ This gives us that with probability at least δ, i û i u i < 5 9 σ i, ˆσ i σ i < σ i an ŵ i w i < 4 w i 34) Finally by the union boun over the two calls to EstimateParameters) we have that 34) hols with probability at least δ 4 Conclusion We have escribe an algorithm, Algorithm which, given ranom samples generate by a unknown mixture of Gaussians an the maximum-a-posteriori oracle ), is able to recover its parameters assuming only a mil separation conition ) between the component Gaussians hols There are a number of ways of extening our algorithm For instance we can prove the correctness of our algorithm only for spherical Gaussians, an it is likely that one can moify our algorithm an the separation conition to work for arbitrary Gaussians as well One can also ask how our algorithm coul better utilize unlabele samples - this coul probably give a better error boun an a better label complexity Finally, one can try to give algorithms for more realistic settings by allowing for the unlabele samples or the oracle to be noisy Acknowlegment I am grateful to Daniel Štefankovič for helpful iscussions an for suggesting improvements to an earlier version of this paper I woul also like to thank the anonymous reviewers for helpful comments This material is base upon work supporte by National Science Founation grants CCF-09045, IIS-04800, ISS an University of Pennsylvania subcontract NSF # Any opinions, finings, an conclusions or recommenations expresse in this material are those of the author an o not necessarily reflect the views of above name organizations References [Achlioptas 05] Dimitris Achlioptas & Frank McSherry On Spectral Learning of Mixtures of Distributions In COLT, pages , 005 [Arora 0] [Basu 0] [Basu 04] [Belkin 0] [Castelli 96] Sanjeev Arora & Ravi Kannan Learning mixtures of arbitrary gaussians In STOC, pages 47 57, 00 Sugato Basu, Arinam Banerjee & Raymon J Mooney Semi-supervise Clustering by Seeing In ICML, pages 7 34, 00 Sugato Basu, Arinam Banerjee & Raymon J Mooney Active Semi-Supervision for Pairwise Constraine Clustering In SDM, 004 Mikhail Belkin & Kaushik Sinha Polynomial Learning of Distribution Families In IEEE FOCS, 00 Vittorio Castelli & Thomas M Cover The relative value of labele an unlabele samples in pattern recognition with an unknown mixing parameter IEEE Transactions on Information Theory, vol 4, no 6, pages 0 7, 996 [Dasgupta 99] Sanjoy Dasgupta Learning Mixtures of Gaussians In FOCS, pages , 999 [Dasgupta 07] [Dempster 77] Sanjoy Dasgupta & Leonar J Schulman A Probabilistic Analysis of EM for Mixtures of Separate, Spherical Gaussians Journal of Machine Learning Research, vol 8, pages 03 6, 007 A P Dempster, N M Lair & D B Rubin Maximum likelihoo from incomplete ata via the EM algorithm J Roy Statist Soc Ser B, vol, pages 38, 977 [Devroye 0] Luc Devroye & Gabor Lugosi Combinatorial methos in ensity estimation Springer Series in Statistics Springer- Verlag, New York, 00 [Dubhashi 09] [Felman 06] Devatt Dubhashi & Alessanro Panconesi Concentration of measure for the analysis of ranomize algorithms Cambrige University Press, 009 Jon Felman, Rocco A Serveio & Ryan O Donnell PAC Learning Axis-Aligne Mixtures of Gaussians with No Separation Assumption In COLT, pages 0 34,
9 Satyaki Mahalanabis [Kannan 08] [Melnykov 0] [Moitra 0] [Shental 03] [Vempala 0] Ravinran Kannan, Hai Salmasian & Santosh Vempala The Spectral Metho for General Mixture Moels SIAM J Comput, vol 38, no 3, pages 4 56, 008 Voloymyr Melnykov & Ranjan Maitra Finite mixture moels an moel-base clustering Statist Surv, vol 4, 00 Ankur Moitra & Gregory Valiant Settling the Polynomial Learnability of Mixtures of Gaussians In IEEE FOCS, 00 Noam Shental, Aharon Bar-Hillel, Tomer Hertz & Daphna Weinshall Computing Gaussian Mixture Moels with EM Using Equivalence Constraints In NIPS, 003 Santosh Vempala & Grant Wang A Spectral Algorithm for Learning Mixtures of Distributions In IEEE FOCS, 00 [Xing 0] Eric P Xing, Anrew Y Ng, Michael I Joran & Stuart J Russell Distance Metric Learning with Application to Clustering with Sie-Information In NIPS, pages 505 5,
Multi-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationMulti-View Clustering via Canonical Correlation Analysis
Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationComputational Lower Bounds for Statistical Estimation Problems
Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationINDEPENDENT COMPONENT ANALYSIS VIA
INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 4 2 0 2 4 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 2 1 0 1 2 3 4 Marginal
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationarxiv: v1 [cs.lg] 22 Mar 2014
CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com
More informationChromatic number for a generalization of Cartesian product graphs
Chromatic number for a generalization of Cartesian prouct graphs Daniel Král Douglas B. West Abstract Let G be a class of graphs. The -fol gri over G, enote G, is the family of graphs obtaine from -imensional
More informationSchrödinger s equation.
Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of
More informationLECTURE NOTES ON DVORETZKY S THEOREM
LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More informationLecture 6: Calculus. In Song Kim. September 7, 2011
Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear
More informationDatabase-friendly Random Projections
Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional
More informationPermanent vs. Determinant
Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an
More informationSettling the Polynomial Learnability of Mixtures of Gaussians
Settling the Polynomial Learnability of Mixtures of Gaussians Ankur Moitra Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 moitra@mit.edu Gregory Valiant
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationLecture 6 : Dimensionality Reduction
CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More informationRelative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation
Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University
More informationIntegration Review. May 11, 2013
Integration Review May 11, 2013 Goals: Review the funamental theorem of calculus. Review u-substitution. Review integration by parts. Do lots of integration eamples. 1 Funamental Theorem of Calculus In
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationPDE Notes, Lecture #11
PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =
More informationEntanglement is not very useful for estimating multiple phases
PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationA note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz
A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae
More informationLower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners
Lower Bouns for Local Monotonicity Reconstruction from Transitive-Closure Spanners Arnab Bhattacharyya Elena Grigorescu Mahav Jha Kyomin Jung Sofya Raskhonikova Davi P. Wooruff Abstract Given a irecte
More informationDiophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations
Diophantine Approximations: Examining the Farey Process an its Metho on Proucing Best Approximations Kelly Bowen Introuction When a person hears the phrase irrational number, one oes not think of anything
More informationTOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH
English NUMERICAL MATHEMATICS Vol14, No1 Series A Journal of Chinese Universities Feb 2005 TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH He Ming( Λ) Michael K Ng(Ξ ) Abstract We
More informationA Review of Multiple Try MCMC algorithms for Signal Processing
A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationOn the Surprising Behavior of Distance Metrics in High Dimensional Space
On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com
More informationA Sketch of Menshikov s Theorem
A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p
More informationMaximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard
Journal of Machine Learning Research 18 018 1-11 Submitted 1/16; Revised 1/16; Published 4/18 Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard Christopher Tosh Sanjoy Dasgupta
More informationNecessary and Sufficient Conditions for Sketched Subspace Clustering
Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This
More informationConcentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification
Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,
More informationAlgorithms and matching lower bounds for approximately-convex optimization
Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department
More informationIterated Point-Line Configurations Grow Doubly-Exponentially
Iterate Point-Line Configurations Grow Doubly-Exponentially Joshua Cooper an Mark Walters July 9, 008 Abstract Begin with a set of four points in the real plane in general position. A to this collection
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationCalculus in the AP Physics C Course The Derivative
Limits an Derivatives Calculus in the AP Physics C Course The Derivative In physics, the ieas of the rate change of a quantity (along with the slope of a tangent line) an the area uner a curve are essential.
More informationConservation Laws. Chapter Conservation of Energy
20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action
More informationPLAL: Cluster-based Active Learning
JMLR: Workshop an Conference Proceeings vol 3 (13) 1 22 PLAL: Cluster-base Active Learning Ruth Urner rurner@cs.uwaterloo.ca School of Computer Science, University of Waterloo, Canaa, ON, N2L 3G1 Sharon
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationCUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu
CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, an Tony Wu Abstract Popular proucts often have thousans of reviews that contain far too much information for customers to igest. Our goal for the
More information3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes
Fin these erivatives of these functions: y.7 Implicit Differentiation -- A Brief Introuction -- Stuent Notes tan y sin tan = sin y e = e = Write the inverses of these functions: y tan y sin How woul we
More informationLecture 10: October 30, 2017
Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π
More informationarxiv: v4 [cs.ds] 7 Mar 2014
Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationOn combinatorial approaches to compressed sensing
On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu
More informationPerfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs
Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite
More informationQuantum mechanical approaches to the virial
Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationTHE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS
THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,
More informationOptimization of Geometries by Energy Minimization
Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.
More informationIsotropic PCA and Affine-Invariant Clustering
Isotropic PCA and Affine-Invariant Clustering (Extended Abstract) S. Charles Brubaker Santosh S. Vempala Georgia Institute of Technology Atlanta, GA 30332 {brubaker,vempala}@cc.gatech.edu Abstract We present
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationLecture 5. Symmetric Shearer s Lemma
Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here
More informationInternational Journal of Pure and Applied Mathematics Volume 35 No , ON PYTHAGOREAN QUADRUPLES Edray Goins 1, Alain Togbé 2
International Journal of Pure an Applie Mathematics Volume 35 No. 3 007, 365-374 ON PYTHAGOREAN QUADRUPLES Eray Goins 1, Alain Togbé 1 Department of Mathematics Purue University 150 North University Street,
More information26.1 Metropolis method
CS880: Approximations Algorithms Scribe: Dave Anrzejewski Lecturer: Shuchi Chawla Topic: Metropolis metho, volume estimation Date: 4/26/07 The previous lecture iscusse they some of the key concepts of
More informationBinary Discrimination Methods for High Dimensional Data with a. Geometric Representation
Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More informationConstruction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems
Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu
More informationChapter 6: Energy-Momentum Tensors
49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.
More informationPseudo-Free Families of Finite Computational Elementary Abelian p-groups
Pseuo-Free Families of Finite Computational Elementary Abelian p-groups Mikhail Anokhin Information Security Institute, Lomonosov University, Moscow, Russia anokhin@mccme.ru Abstract We initiate the stuy
More informationALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS
ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS MARK SCHACHNER Abstract. When consiere as an algebraic space, the set of arithmetic functions equippe with the operations of pointwise aition an
More informationSome Examples. Uniform motion. Poisson processes on the real line
Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]
More informationOn colour-blind distinguishing colour pallets in regular graphs
J Comb Optim (2014 28:348 357 DOI 10.1007/s10878-012-9556-x On colour-blin istinguishing colour pallets in regular graphs Jakub Przybyło Publishe online: 25 October 2012 The Author(s 2012. This article
More informationLinear Regression with Limited Observation
Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear
More informationThe chromatic number of graph powers
Combinatorics, Probability an Computing (19XX) 00, 000 000. c 19XX Cambrige University Press Printe in the Unite Kingom The chromatic number of graph powers N O G A A L O N 1 an B O J A N M O H A R 1 Department
More informationSharp Thresholds. Zachary Hamaker. March 15, 2010
Sharp Threshols Zachary Hamaker March 15, 2010 Abstract The Kolmogorov Zero-One law states that for tail events on infinite-imensional probability spaces, the probability must be either zero or one. Behavior
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationMath 1B, lecture 8: Integration by parts
Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores
More informationA Modification of the Jarque-Bera Test. for Normality
Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam
More informationNested Saturation with Guaranteed Real Poles 1
Neste Saturation with Guarantee Real Poles Eric N Johnson 2 an Suresh K Kannan 3 School of Aerospace Engineering Georgia Institute of Technology, Atlanta, GA 3332 Abstract The global stabilization of asymptotically
More informationCapacity Analysis of MIMO Systems with Unknown Channel State Information
Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,
More informationEquilibrium in Queues Under Unknown Service Times and Service Value
University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University
More informationAll s Well That Ends Well: Supplementary Proofs
All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee
More informationGeneralized Tractability for Multivariate Problems
Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,
More informationAn Analytical Expression of the Probability of Error for Relaying with Decode-and-forward
An Analytical Expression of the Probability of Error for Relaying with Decoe-an-forwar Alexanre Graell i Amat an Ingmar Lan Department of Electronics, Institut TELECOM-TELECOM Bretagne, Brest, France Email:
More informationELEC3114 Control Systems 1
ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.
More informationDiagonalization of Matrices Dr. E. Jacobs
Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is
More information