Versatility of Singular Value Decomposition (SVD) January 7, 2015

Size: px
Start display at page:

Download "Versatility of Singular Value Decomposition (SVD) January 7, 2015"

Transcription

1 Versatility f Singular Value Decmpsitin (SVD) January 7, 2015

2 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A.

3 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B + }{{} C. Real Data Nise

4 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ).

5 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ). k << n,d. small.

6 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ). k << n,d. small. C Cautin: C F (= 2 ij ) need nt be smaller than fr example B F. In wrds, verall nise can be larger than verall real data.

7 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ). k << n,d. small. C Cautin: C F (= 2 ij ) need nt be smaller than fr example B F. In wrds, verall nise can be larger than verall real data. Given any A, Singular Value Decmpsitin (SVD) finds B f rank k (r less) fr which A B is minimum. Space spanned by clumns f B is the best-fit subspace fr A in the sense f least sum ver all data pints f squared distances t subspace.

8 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ). k << n,d. small. C Cautin: C F (= 2 ij ) need nt be smaller than fr example B F. In wrds, verall nise can be larger than verall real data. Given any A, Singular Value Decmpsitin (SVD) finds B f rank k (r less) fr which A B is minimum. Space spanned by clumns f B is the best-fit subspace fr A in the sense f least sum ver all data pints f squared distances t subspace. A very pwerful tl. Decades f thery, algrithms. Here: Example applicatins.

9 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 k ), in d dimensins.

10 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 k ), in d dimensins.

11 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 ), in d k dimensins. Learning Prblem: Given i.i.d. samples frm F( ), find the cmpnents (µ i,σ i,w i ). Really a Clustering Prblem.

12 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 ), in d k dimensins. Learning Prblem: Given i.i.d. samples frm F( ), find the cmpnents (µ i,σ i,w i ). Really a Clustering Prblem. In 1-dimensin, we can slve the learning prblem if Means f the cmpnent densities are Ω(1) standard deviatins apart.

13 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 ), in d k dimensins. Learning Prblem: Given i.i.d. samples frm F( ), find the cmpnents (µ i,σ i,w i ). Really a Clustering Prblem. In 1-dimensin, we can slve the learning prblem if Means f the cmpnent densities are Ω(1) standard deviatins apart. But in d dimensins: Apprximate k means fails. Pair f Sample frm different clusters may be clser than a pair frm the same!

14 SVD t the Rescue Fr a mixture f k spherical Gaussians (with different variances), the best-fit k dimensinal subspace (fund by SVD) passes thrugh all the k centers. Vempala, Wang.

15 SVD t the Rescue Fr a mixture f k spherical Gaussians (with different variances), the best-fit k dimensinal subspace (fund by SVD) passes thrugh all the k centers. Vempala, Wang. Beautiful prf: Fr ne spherical Gaussian with nn-zer mean, the best fit 1-dim subspace passes thrugh the mean. And any k-dim subspace cntaining the mean is a best-fit k dimensinal space.

16 SVD t the Rescue Fr a mixture f k spherical Gaussians (with different variances), the best-fit k dimensinal subspace (fund by SVD) passes thrugh all the k centers. Vempala, Wang. Beautiful prf: Fr ne spherical Gaussian with nn-zer mean, the best fit 1-dim subspace passes thrugh the mean. And any k-dim subspace cntaining the mean is a best-fit k dimensinal space. S, nw if a k dimensinal space cntains all the k means, it is individually the best fr each cmpnent Gaussian!!

17 SVD t the Rescue Fr a mixture f k spherical Gaussians (with different variances), the best-fit k dimensinal subspace (fund by SVD) passes thrugh all the k centers. Vempala, Wang. Beautiful prf: Fr ne spherical Gaussian with nn-zer mean, the best fit 1-dim subspace passes thrugh the mean. And any k-dim subspace cntaining the mean is a best-fit k dimensinal space. S, nw if a k dimensinal space cntains all the k means, it is individually the best fr each cmpnent Gaussian!! Simple Observatin t finish : Given the k space cntaining the means, we need nly slve a k dim prblem. Can be dne in time expnential nly in k

18 Planted Clique Prblem Given G = G(n,1/2) +S S, (S unknwn, S = s), find S in ply time. Best knwn: s Ω( n) ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ±1 A = ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1

19 Planted Clique Prblem Given G = G(n,1/2) +S S, (S unknwn, S = s), find S in ply time. Best knwn: s Ω( n) ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ±1 A = ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 Planted Clique = s. Randm Matrix Thery: Randm ±1 matrix has nrm at mst 2 n. S, SVD finds S when s n. Aln, Bppanna-1985.

20 Planted Clique Prblem Given G = G(n,1/2) +S S, (S unknwn, S = s), find S in ply time. Best knwn: s Ω( n) ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ±1 A = ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 Planted Clique = s. Randm Matrix Thery: Randm ±1 matrix has nrm at mst 2 n. S, SVD finds S when s n. Aln, Bppanna Feldman, Grigrescu, Reyzin, Vempala, Xia (2014): Cannt be beaten by Statistical Learning Algrithms.

21 Planted Gaussians: Signal and Nise A n n matrix and S [n], S = k. A =. µ N(0,σ 2 )..

22 Planted Gaussians: Signal and Nise A n n matrix and S [n], S = k. A ij all independent r.v. s A =. µ N(0,σ 2 )..

23 Planted Gaussians: Signal and Nise A n n matrix and S [n], S = k. A ij all independent r.v. s Fr i,j S, Pr(A ij µ) 1/2. (Eg. N(µ,σ 2 )). Signal = µ. A =. µ N(0,σ 2 )..

24 Planted Gaussians: Signal and Nise A = A n n matrix and S [n], S = k. A ij all independent r.v. s Fr i,j S, Pr(A ij µ) 1/2. (Eg. N(µ,σ 2 )). Signal = µ. Fr ther i,j, A ij is N(0,σ 2 ). Nise = σ.. µ N(0,σ 2 )..

25 Planted Gaussians: Signal and Nise A n n matrix and S [n], S = k. A ij all independent r.v. s Fr i,j S, Pr(A ij µ) 1/2. (Eg. N(µ,σ 2 )). Signal = µ. Fr ther i,j, A ij is N(0,σ 2 ). Nise = σ. Given A,µ,σ, find S. [Recall Planted Clique.]. µ A =..... N(0,σ 2 )..

26 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.

27 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 )..

28 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 )..

29 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 ).. Subtract exp( µ 2 /2σ 2 )

30 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 ).. Subtract exp( µ 2 /2σ 2 ) k/4... Rand. Matrix nexp( cµ 2 /σ 2 )

31 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 ).. Subtract exp( µ 2 /2σ 2 ) k/4... Rand. Matrix S, SVD finds S prvided exp(c(µ/σ) 2 ) > nexp( cµ 2 /σ 2 ) n k.

32 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 ).. Subtract exp( µ 2 /2σ 2 ) k/4... Rand. Matrix S, SVD finds S prvided exp(c(µ/σ) 2 ) > Cf: Ordinary SVD succeeds if µ σ > n k. nexp( cµ 2 /σ 2 ) n k.

33 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features.

34 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters)

35 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster.

36 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster. A ij µ if feature i is a dminant feature f the dminant tpic f data pint j.

37 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster. A ij µ if feature i is a dminant feature f the dminant tpic f data pint j. A ij σ therwise.

38 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster. A ij µ if feature i is a dminant feature f the dminant tpic f data pint j. A ij σ therwise. If variance abve µ is larger than gap between µ and σ, a 2-clustering criterin (like 2-means) may split the high weight cluster instead f separating it frm the thers.

39 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster. A ij µ if feature i is a dminant feature f the dminant tpic f data pint j. A ij σ therwise. If variance abve µ is larger than gap between µ and σ, a 2-clustering criterin (like 2-means) may split the high weight cluster instead f separating it frm the thers. Tw Differences frm Mixtures: Sft, High Variance in dminant features.

40 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr.

41 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic).

42 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic). T generate dc j, generate a randm cnvex cmbinatin f tpic vectrs.

43 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic). T generate dc j, generate a randm cnvex cmbinatin f tpic vectrs. Generate wrds f dc. j in i.i.d. trials, each frm the multinmial with prb.s = Cnvex Cmbinatin. ***DRAW PICTURE ON BOARD WITH SPORTS, POLITICS, WEATHER***

44 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic). T generate dc j, generate a randm cnvex cmbinatin f tpic vectrs. Generate wrds f dc. j in i.i.d. trials, each frm the multinmial with prb.s = Cnvex Cmbinatin. ***DRAW PICTURE ON BOARD WITH SPORTS, POLITICS, WEATHER*** The Tpic Mdeling Prblem Given nly A, find an apprximatin t all tpic vectrs s that the l 1 errr in each tpic vectr is at mst ε. l 1 errr crucial. (l 2 misses small wrds.)

45 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic). T generate dc j, generate a randm cnvex cmbinatin f tpic vectrs. Generate wrds f dc. j in i.i.d. trials, each frm the multinmial with prb.s = Cnvex Cmbinatin. ***DRAW PICTURE ON BOARD WITH SPORTS, POLITICS, WEATHER*** The Tpic Mdeling Prblem Given nly A, find an apprximatin t all tpic vectrs s that the l 1 errr in each tpic vectr is at mst ε. l 1 errr crucial. (l 2 misses small wrds.) Generally NP-hard.

46 Tpic Mdeling is Sft Clustering Tpic Vectrs Cluster Centers

47 Tpic Mdeling is Sft Clustering Tpic Vectrs Cluster Centers Each data pint (dc) belngs t a weighted cmbinatin f clusters. Generated frm a distributin (happens t be multinmial) with expectatin = weighted cmbinatin.

48 Tpic Mdeling is Sft Clustering Tpic Vectrs Cluster Centers Each data pint (dc) belngs t a weighted cmbinatin f clusters. Generated frm a distributin (happens t be multinmial) with expectatin = weighted cmbinatin. Even if we manage t slve the clustering prblem smehw, it is nt true that cluster centers are averages f dcuments. Big Distinctin frm Learning Mixtures which is hard clusetring.

49 Gemetry Tpic Mdeling = Sft Clustering Given dc s (means f s), find μ l. Helps t find nearly pure dcs (X near crner) μ 3 X μ 1 X X X μ l = l th tpic vectr X = Weighted cmbinatin f μ l s are wrds in a dc iid chices with mean X X X μ 2

50 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Our Gals

51 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. Our Gals

52 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Our Gals

53 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Anandkumar, Fster, Hsu, Kakade, Liu d Tpic Mdeling under LDA, t l 2 errr using clever tensr methds. Parameters. Our Gals

54 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Anandkumar, Fster, Hsu, Kakade, Liu d Tpic Mdeling under LDA, t l 2 errr using clever tensr methds. Parameters. Arra, Ge, Mitra Assume Anchr Wrd + Other parameters : Each tpic has ne wrd (a) ccurring nly in that tpic (b) with high frequency. First Prvable Algrithm. Our Gals

55 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Anandkumar, Fster, Hsu, Kakade, Liu d Tpic Mdeling under LDA, t l 2 errr using clever tensr methds. Parameters. Arra, Ge, Mitra Assume Anchr Wrd + Other parameters : Each tpic has ne wrd (a) ccurring nly in that tpic (b) with high frequency. First Prvable Algrithm. Our Gals Intuitive, empirically verified assumptins.

56 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Anandkumar, Fster, Hsu, Kakade, Liu d Tpic Mdeling under LDA, t l 2 errr using clever tensr methds. Parameters. Arra, Ge, Mitra Assume Anchr Wrd + Other parameters : Each tpic has ne wrd (a) ccurring nly in that tpic (b) with high frequency. First Prvable Algrithm. Our Gals Intuitive, empirically verified assumptins. Natural, prvable Algrithm.

57 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number.

58 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number. Catchwrds: Each tpic has a set f wrds: (a) each ccurs mre frequently in the tpic than thers and (b) tgether, they have high frequency.

59 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number. Catchwrds: Each tpic has a set f wrds: (a) each ccurs mre frequently in the tpic than thers and (b) tgether, they have high frequency. Dminant Tpics Each Dcument has a dminant tpic which has weight (in that dc) f at least sme α, whereas, nn-dminant tpics have weight at mst sme β.

60 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number. Catchwrds: Each tpic has a set f wrds: (a) each ccurs mre frequently in the tpic than thers and (b) tgether, they have high frequency. Dminant Tpics Each Dcument has a dminant tpic which has weight (in that dc) f at least sme α, whereas, nn-dminant tpics have weight at mst sme β. Nearly Pure Dcuments Each tpic has a (small) fractin f dcuments which are 1 δ pure fr that tpic.

61 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number. Catchwrds: Each tpic has a set f wrds: (a) each ccurs mre frequently in the tpic than thers and (b) tgether, they have high frequency. Dminant Tpics Each Dcument has a dminant tpic which has weight (in that dc) f at least sme α, whereas, nn-dminant tpics have weight at mst sme β. Nearly Pure Dcuments Each tpic has a (small) fractin f dcuments which are 1 δ pure fr that tpic. N Lcal Min.: Fr every wrd, the plt f number f dcuments versus number f ccurrences f wrd (cnditined n dminant tpic) has n lcal min. [Zipf s law Or Unimdal.]

62 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k.

63 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s.

64 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s. SVD Use SVD n threshlded matrix t get starting centers fr k means algrithm.

65 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s. SVD Use SVD n threshlded matrix t get starting centers fr k means algrithm. k means Run k means. Will shw: This identifies dminant tpic.

66 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s. SVD Use SVD n threshlded matrix t get starting centers fr k means algrithm. k means Run k means. Will shw: This identifies dminant tpic. Identify Catchwrds Find the set f high frequency wrds in each cluster. Will shw: Set f Catchwrds fr tpic.

67 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s. SVD Use SVD n threshlded matrix t get starting centers fr k means algrithm. k means Run k means. Will shw: This identifies dminant tpic. Identify Catchwrds Find the set f high frequency wrds in each cluster. Will shw: Set f Catchwrds fr tpic. Identify Pure Dcs Find the set f dcuments with highest ttal number f ccurrences f set f catchwrds. Shw: Nearly Pure Dcs. Their average tpic vectr.

68 The advantage f Threshlding Diagnal blue blcks are Catchwrds fr each tpic. Black: Nn-Catchwrds.

69 μ 1 μ 2 μ 3 X X X X X X Thresh+SVD+k-means Dminant Tpics

70 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics.

71 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX.

72 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX. Dne? N. Need inter-cluster separatin intra-cluster spread (variance inside cluster).

73 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX. Dne? N. Need inter-cluster separatin intra-cluster spread (variance inside cluster). Catchwrds prvide sufficient inter-cluster separatin.

74 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX. Dne? N. Need inter-cluster separatin intra-cluster spread (variance inside cluster). Catchwrds prvide sufficient inter-cluster separatin. Inside-cluster variance bunded with machinery frm Randm Matrix Thery. Beware: Only clumns are independent. Rws are nt.

75 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX. Dne? N. Need inter-cluster separatin intra-cluster spread (variance inside cluster). Catchwrds prvide sufficient inter-cluster separatin. Inside-cluster variance bunded with machinery frm Randm Matrix Thery. Beware: Only clumns are independent. Rws are nt. Appeal t a result n k means (Kumar, K.: If inter-cluster separatin inside-cluster directinal stan. dev, then SVD fllwed by k means clusters.

76 Getting Tpic Vectrs PICTURE OF SIMPLEX with clumns f M as extreme pints and cluster f dc.s with each dminant tpic. Taking average f dcs in T l n gd.

77 Emprircal Results: Datasets NIPS: 1,500 NIPS full papers NYT: Randm subset f 30,000 dcuments frm the New Yrk Times dataset Pubmed: Randm subset f 30,000 dcuments frm the Pubmed abstracts dataset 20NG: 13,389 dcuments frm 20NewsGrup dataset

78 Empirical Results: Assumptins Crpus Dcuments K Fractin f Dcuments α = 0.4 α = 0.8 α = 0.9 NIPS 1, % 10.7% 4.8% NYT 30, % 20.9% 12.7% Pubmed 30, % 20.3% 10.7% 20NG 13, % 54.4% 44.3% Table: Fractin f dcuments satisfying dminant tpic assumptin. Crpus K Mean per tpic % Tpics frequency f CW with CW NIPS % NYT % Pubmed % 20NG % Table: CatchWrds (CW) assumptin with ρ = 1.1, ε = 0.25

79 Empirical Results: Semi-synthetic Data Generate semi-synthetic crpra frm LDA mdel trained by MCMC, t ensure that the synthetic crpra retain the characteristics f real data Gibbs sampling is run fr 1000 iteratins n all the fur datasets and the final wrd-tpic distributin is used t generate varying number (s) f synthetic dcuments with dcument-tpic distributin drawn frm a symmetric Dirichlet with hyper-parameter 0.01 Nte that the synthetic data is nt guaranteed t satisfy dminant tpic assumptin fr every dcument, n average abut 80% dcuments satisfy the assumptin

80 Empirical Results: L1 Recnstructin Errr And percent imprvement ver Recver-KL. Ttal average imprvement ver R-KL is 20% Crpus Dcuments Tensr R-L2 R-KL TSVD % Imprvement NIPS Pubmed 20NG NYT 40, % 60, % 80, % 100, % 150, % 200, % 40, % 60, % 80, % 100, % 150, % 200, % 40, % 60, % 80, % 100, % 200, % 40, % 60, % 80, % 100, %

81 Empirical Results: L1 Recnstructin Err Histgram f L1 errr acrss tpics fr 40k synthetic dcuments. On majrity f the tpic (> 90%) the recvery errr fr TSVD is significantly smaller than Recver-KL. NIPS NYT Number f Tpics Pubmed NG Algrithm R KL TSVD L1 Recnstructin Errr

82 Empirical Results: Perplexity & Tpic Cherence Perplexity 0 Tpic Cherence Algrithm TSVD Recver L2 Recver KL 0 20NG NIPS NYT Pubmed 20NG NIPS NYT Pubmed

83 Tp 5 wrds f sme tpics n the real NYT dataset. Catchwrds, anchr highlighted. zzz - identifier placed by NYT dataset. TSVD Recver-KL Gibbs cup minutes add tablespn il cup minutes tablespn add il cup minutes add tablespn il team seasn cach zzz_ram game game team seasn play zzz_ram team seasn game cach zzz_nfl patient dctr drug cancer study patient drug dctr percent fund patient dctr drug medical cancer zzz_jhn_mccain zzz_mccain zzz_bush zzz_gerge_bush campaign zzz_jhn_mccain zzz_gerge_bush campaign republican vter zzz_jhn_mccain zzz_gerge_bush campaign zzz_bush zzz_mccain huse rm building wall flr rm shw lk hme huse rm lk water huse hand film mvie actr character zzz_scar zzz_gd christian religius zzz_jesus church film shw mvie music bk ppe church bk jewish religius film mvie character play directr religius church jewish jew zzz_gd

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

Principal Components

Principal Components Principal Cmpnents Suppse we have N measurements n each f p variables X j, j = 1,..., p. There are several equivalent appraches t principal cmpnents: Given X = (X 1,... X p ), prduce a derived (and small)

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

Lecture 10, Principal Component Analysis

Lecture 10, Principal Component Analysis Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal

More information

NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN. Junjiro Ogawa University of North Carolina

NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN. Junjiro Ogawa University of North Carolina NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN by Junjir Ogawa University f Nrth Carlina This research was supprted by the Office f Naval Research under Cntract N. Nnr-855(06) fr research in prbability

More information

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016 Maximum A Psteriri (MAP) CS 109 Lecture 22 May 16th, 2016 Previusly in CS109 Game f Estimatrs Maximum Likelihd Nn spiler: this didn t happen Side Plt argmax argmax f lg Mther f ptimizatins? Reviving an

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

24 Multiple Eigenvectors; Latent Factor Analysis; Nearest Neighbors

24 Multiple Eigenvectors; Latent Factor Analysis; Nearest Neighbors Multiple Eigenvectrs; Latent Factr Analysis; Nearest Neighbrs 47 24 Multiple Eigenvectrs; Latent Factr Analysis; Nearest Neighbrs Clustering w/multiple Eigenvectrs [When we use the Fiedler vectr fr spectral

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

Plan o o. I(t) Divide problem into sub-problems Modify schematic and coordinate system (if needed) Write general equations

Plan o o. I(t) Divide problem into sub-problems Modify schematic and coordinate system (if needed) Write general equations STAPLE Physics 201 Name Final Exam May 14, 2013 This is a clsed bk examinatin but during the exam yu may refer t a 5 x7 nte card with wrds f wisdm yu have written n it. There is extra scratch paper available.

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes Chemistry 20 Lessn 11 Electrnegativity, Plarity and Shapes In ur previus wrk we learned why atms frm cvalent bnds and hw t draw the resulting rganizatin f atms. In this lessn we will learn (a) hw the cmbinatin

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

Figure 1a. A planar mechanism.

Figure 1a. A planar mechanism. ME 5 - Machine Design I Fall Semester 0 Name f Student Lab Sectin Number EXAM. OPEN BOOK AND CLOSED NOTES. Mnday, September rd, 0 Write n ne side nly f the paper prvided fr yur slutins. Where necessary,

More information

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic. Tpic : AC Fundamentals, Sinusidal Wavefrm, and Phasrs Sectins 5. t 5., 6. and 6. f the textbk (Rbbins-Miller) cver the materials required fr this tpic.. Wavefrms in electrical systems are current r vltage

More information

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large

More information

Chapter 8: The Binomial and Geometric Distributions

Chapter 8: The Binomial and Geometric Distributions Sectin 8.1: The Binmial Distributins Chapter 8: The Binmial and Gemetric Distributins A randm variable X is called a BINOMIAL RANDOM VARIABLE if it meets ALL the fllwing cnditins: 1) 2) 3) 4) The MOST

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

1 The limitations of Hartree Fock approximation

1 The limitations of Hartree Fock approximation Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

Equilibrium of Stress

Equilibrium of Stress Equilibrium f Stress Cnsider tw perpendicular planes passing thrugh a pint p. The stress cmpnents acting n these planes are as shwn in ig. 3.4.1a. These stresses are usuall shwn tgether acting n a small

More information

Multiple Source Multiple. using Network Coding

Multiple Source Multiple. using Network Coding Multiple Surce Multiple Destinatin Tplgy Inference using Netwrk Cding Pegah Sattari EECS, UC Irvine Jint wrk with Athina Markpulu, at UCI, Christina Fraguli, at EPFL, Lausanne Outline Netwrk Tmgraphy Gal,

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design

More information

Source Coding and Compression

Source Coding and Compression Surce Cding and Cmpressin Heik Schwarz Cntact: Dr.-Ing. Heik Schwarz heik.schwarz@hhi.fraunhfer.de Heik Schwarz Surce Cding and Cmpressin September 22, 2013 1 / 60 PartI: Surce Cding Fundamentals Heik

More information

Physical Layer: Outline

Physical Layer: Outline 18-: Intrductin t Telecmmunicatin Netwrks Lectures : Physical Layer Peter Steenkiste Spring 01 www.cs.cmu.edu/~prs/nets-ece Physical Layer: Outline Digital Representatin f Infrmatin Characterizatin f Cmmunicatin

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

Chapter 2 GAUSS LAW Recommended Problems:

Chapter 2 GAUSS LAW Recommended Problems: Chapter GAUSS LAW Recmmended Prblems: 1,4,5,6,7,9,11,13,15,18,19,1,7,9,31,35,37,39,41,43,45,47,49,51,55,57,61,6,69. LCTRIC FLUX lectric flux is a measure f the number f electric filed lines penetrating

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

More Tutorial at

More Tutorial at Answer each questin in the space prvided; use back f page if extra space is needed. Answer questins s the grader can READILY understand yur wrk; nly wrk n the exam sheet will be cnsidered. Write answers,

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

SPH3U1 Lesson 06 Kinematics

SPH3U1 Lesson 06 Kinematics PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.

More information

CHM112 Lab Graphing with Excel Grading Rubric

CHM112 Lab Graphing with Excel Grading Rubric Name CHM112 Lab Graphing with Excel Grading Rubric Criteria Pints pssible Pints earned Graphs crrectly pltted and adhere t all guidelines (including descriptive title, prperly frmatted axes, trendline

More information

Experiment #3. Graphing with Excel

Experiment #3. Graphing with Excel Experiment #3. Graphing with Excel Study the "Graphing with Excel" instructins that have been prvided. Additinal help with learning t use Excel can be fund n several web sites, including http://www.ncsu.edu/labwrite/res/gt/gt-

More information

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern. Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme

More information

Physics 2010 Motion with Constant Acceleration Experiment 1

Physics 2010 Motion with Constant Acceleration Experiment 1 . Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin

More information

BLAST / HIDDEN MARKOV MODELS

BLAST / HIDDEN MARKOV MODELS CS262 (Winter 2015) Lecture 5 (January 20) Scribe: Kat Gregry BLAST / HIDDEN MARKOV MODELS BLAST CONTINUED HEURISTIC LOCAL ALIGNMENT Use Cmmnly used t search vast bilgical databases (n the rder f terabases/tetrabases)

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

Hiding in plain sight

Hiding in plain sight Hiding in plain sight Principles f stegangraphy CS349 Cryptgraphy Department f Cmputer Science Wellesley Cllege The prisners prblem Stegangraphy 1-2 1 Secret writing Lemn juice is very nearly clear s it

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

Probability, Random Variables, and Processes. Probability

Probability, Random Variables, and Processes. Probability Prbability, Randm Variables, and Prcesses Prbability Prbability Prbability thery: branch f mathematics fr descriptin and mdelling f randm events Mdern prbability thery - the aximatic definitin f prbability

More information

Collocation Map for Overcoming Data Sparseness

Collocation Map for Overcoming Data Sparseness Cllcatin Map fr Overcming Data Sparseness Mnj Kim, Yung S. Han, and Key-Sun Chi Department f Cmputer Science Krea Advanced Institute f Science and Technlgy Taejn, 305-701, Krea mj0712~eve.kaist.ac.kr,

More information

Statistics, Numerical Models and Ensembles

Statistics, Numerical Models and Ensembles Statistics, Numerical Mdels and Ensembles Duglas Nychka, Reinhard Furrer,, Dan Cley Claudia Tebaldi, Linda Mearns, Jerry Meehl and Richard Smith (UNC). Spatial predictin and data assimilatin Precipitatin

More information

Name AP CHEM / / Chapter 1 Chemical Foundations

Name AP CHEM / / Chapter 1 Chemical Foundations Name AP CHEM / / Chapter 1 Chemical Fundatins Metric Cnversins All measurements in chemistry are made using the metric system. In using the metric system yu must be able t cnvert between ne value and anther.

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm

More information

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y ) (Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well

More information

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium Lecture 17: 11.07.05 Free Energy f Multi-phase Slutins at Equilibrium Tday: LAST TIME...2 FREE ENERGY DIAGRAMS OF MULTI-PHASE SOLUTIONS 1...3 The cmmn tangent cnstructin and the lever rule...3 Practical

More information

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are: Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f

More information

Lecture 5: Equilibrium and Oscillations

Lecture 5: Equilibrium and Oscillations Lecture 5: Equilibrium and Oscillatins Energy and Mtin Last time, we fund that fr a system with energy cnserved, v = ± E U m ( ) ( ) One result we see immediately is that there is n slutin fr velcity if

More information

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling? CS4445 ata Mining and Kwledge iscery in atabases. B Term 2014 Exam 1 Nember 24, 2014 Prf. Carlina Ruiz epartment f Cmputer Science Wrcester Plytechnic Institute NAME: Prf. Ruiz Prblem I: Prblem II: Prblem

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Chapter 8 Predicting Molecular Geometries

Chapter 8 Predicting Molecular Geometries Chapter 8 Predicting Mlecular Gemetries 8-1 Mlecular shape The Lewis diagram we learned t make in the last chapter are a way t find bnds between atms and lne pais f electrns n atms, but are nt intended

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A. SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST Mark C. Ott Statistics Research Divisin, Bureau f the Census Washingtn, D.C. 20233, U.S.A. and Kenneth H. Pllck Department f Statistics, Nrth Carlina State

More information

SOME CONSTRUCTIONS OF OPTIMAL BINARY LINEAR UNEQUAL ERROR PROTECTION CODES

SOME CONSTRUCTIONS OF OPTIMAL BINARY LINEAR UNEQUAL ERROR PROTECTION CODES Philips J. Res. 39, 293-304,1984 R 1097 SOME CONSTRUCTIONS OF OPTIMAL BINARY LINEAR UNEQUAL ERROR PROTECTION CODES by W. J. VAN OILS Philips Research Labratries, 5600 JA Eindhven, The Netherlands Abstract

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis

More information

READING STATECHART DIAGRAMS

READING STATECHART DIAGRAMS READING STATECHART DIAGRAMS Figure 4.48 A Statechart diagram with events The diagram in Figure 4.48 shws all states that the bject plane can be in during the curse f its life. Furthermre, it shws the pssible

More information

Cambridge Assessment International Education Cambridge Ordinary Level. Published

Cambridge Assessment International Education Cambridge Ordinary Level. Published Cambridge Assessment Internatinal Educatin Cambridge Ordinary Level ADDITIONAL MATHEMATICS 4037/1 Paper 1 Octber/Nvember 017 MARK SCHEME Maximum Mark: 80 Published This mark scheme is published as an aid

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

Bayesian nonparametric modeling approaches for quantile regression

Bayesian nonparametric modeling approaches for quantile regression Bayesian nnparametric mdeling appraches fr quantile regressin Athanasis Kttas Department f Applied Mathematics and Statistics University f Califrnia, Santa Cruz Department f Statistics Athens University

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Lecture 3: Principal Components Analysis (PCA)

Lecture 3: Principal Components Analysis (PCA) Lecture 3: Principal Cmpnents Analysis (PCA) Reading: Sectins 6.3.1, 10.1, 10.2, 10.4 STATS 202: Data mining and analysis Jnathan Taylr, 9/28 Slide credits: Sergi Bacallad 1 / 24 The bias variance decmpsitin

More information

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

Reinforcement Learning CMPSCI 383 Nov 29, 2011! Reinfrcement Learning" CMPSCI 383 Nv 29, 2011! 1 Tdayʼs lecture" Review f Chapter 17: Making Cmple Decisins! Sequential decisin prblems! The mtivatin and advantages f reinfrcement learning.! Passive learning!

More information

Why Don t They Get It??

Why Don t They Get It?? Why Dn t They Get It?? A 60-minute Webinar NEURO LINGUISTIC PROGRAMMING NLP is the way we stre and prcess infrmatin in ur brains, and then frm the wrds we use t cmmunicate. By learning abut NLP, yu can

More information