Versatility of Singular Value Decomposition (SVD) January 7, 2015
|
|
- Quentin Jenkins
- 6 years ago
- Views:
Transcription
1 Versatility f Singular Value Decmpsitin (SVD) January 7, 2015
2 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A.
3 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B + }{{} C. Real Data Nise
4 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ).
5 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ). k << n,d. small.
6 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ). k << n,d. small. C Cautin: C F (= 2 ij ) need nt be smaller than fr example B F. In wrds, verall nise can be larger than verall real data.
7 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ). k << n,d. small. C Cautin: C F (= 2 ij ) need nt be smaller than fr example B F. In wrds, verall nise can be larger than verall real data. Given any A, Singular Value Decmpsitin (SVD) finds B f rank k (r less) fr which A B is minimum. Space spanned by clumns f B is the best-fit subspace fr A in the sense f least sum ver all data pints f squared distances t subspace.
8 Assumptin : Data = Real Data + Nise Each Data Pint is a clumn f the n d Data Matrix A. A = }{{} B Real Data + }{{} C. Nise rank (B) k. C (= Max u =1 Cu ). k << n,d. small. C Cautin: C F (= 2 ij ) need nt be smaller than fr example B F. In wrds, verall nise can be larger than verall real data. Given any A, Singular Value Decmpsitin (SVD) finds B f rank k (r less) fr which A B is minimum. Space spanned by clumns f B is the best-fit subspace fr A in the sense f least sum ver all data pints f squared distances t subspace. A very pwerful tl. Decades f thery, algrithms. Here: Example applicatins.
9 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 k ), in d dimensins.
10 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 k ), in d dimensins.
11 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 ), in d k dimensins. Learning Prblem: Given i.i.d. samples frm F( ), find the cmpnents (µ i,σ i,w i ). Really a Clustering Prblem.
12 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 ), in d k dimensins. Learning Prblem: Given i.i.d. samples frm F( ), find the cmpnents (µ i,σ i,w i ). Really a Clustering Prblem. In 1-dimensin, we can slve the learning prblem if Means f the cmpnent densities are Ω(1) standard deviatins apart.
13 Example I- Mixture f Spherical Gaussians F(x) = w 1 N(µ 1,σ 2 1 ) +w 2N(µ 2,σ 2 2 ) + +w kn(µ k,σ 2 ), in d k dimensins. Learning Prblem: Given i.i.d. samples frm F( ), find the cmpnents (µ i,σ i,w i ). Really a Clustering Prblem. In 1-dimensin, we can slve the learning prblem if Means f the cmpnent densities are Ω(1) standard deviatins apart. But in d dimensins: Apprximate k means fails. Pair f Sample frm different clusters may be clser than a pair frm the same!
14 SVD t the Rescue Fr a mixture f k spherical Gaussians (with different variances), the best-fit k dimensinal subspace (fund by SVD) passes thrugh all the k centers. Vempala, Wang.
15 SVD t the Rescue Fr a mixture f k spherical Gaussians (with different variances), the best-fit k dimensinal subspace (fund by SVD) passes thrugh all the k centers. Vempala, Wang. Beautiful prf: Fr ne spherical Gaussian with nn-zer mean, the best fit 1-dim subspace passes thrugh the mean. And any k-dim subspace cntaining the mean is a best-fit k dimensinal space.
16 SVD t the Rescue Fr a mixture f k spherical Gaussians (with different variances), the best-fit k dimensinal subspace (fund by SVD) passes thrugh all the k centers. Vempala, Wang. Beautiful prf: Fr ne spherical Gaussian with nn-zer mean, the best fit 1-dim subspace passes thrugh the mean. And any k-dim subspace cntaining the mean is a best-fit k dimensinal space. S, nw if a k dimensinal space cntains all the k means, it is individually the best fr each cmpnent Gaussian!!
17 SVD t the Rescue Fr a mixture f k spherical Gaussians (with different variances), the best-fit k dimensinal subspace (fund by SVD) passes thrugh all the k centers. Vempala, Wang. Beautiful prf: Fr ne spherical Gaussian with nn-zer mean, the best fit 1-dim subspace passes thrugh the mean. And any k-dim subspace cntaining the mean is a best-fit k dimensinal space. S, nw if a k dimensinal space cntains all the k means, it is individually the best fr each cmpnent Gaussian!! Simple Observatin t finish : Given the k space cntaining the means, we need nly slve a k dim prblem. Can be dne in time expnential nly in k
18 Planted Clique Prblem Given G = G(n,1/2) +S S, (S unknwn, S = s), find S in ply time. Best knwn: s Ω( n) ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ±1 A = ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1
19 Planted Clique Prblem Given G = G(n,1/2) +S S, (S unknwn, S = s), find S in ply time. Best knwn: s Ω( n) ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ±1 A = ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 Planted Clique = s. Randm Matrix Thery: Randm ±1 matrix has nrm at mst 2 n. S, SVD finds S when s n. Aln, Bppanna-1985.
20 Planted Clique Prblem Given G = G(n,1/2) +S S, (S unknwn, S = s), find S in ply time. Best knwn: s Ω( n) ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ± ±1 ±1 ±1 ±1 ±1 A = ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 ±1 Planted Clique = s. Randm Matrix Thery: Randm ±1 matrix has nrm at mst 2 n. S, SVD finds S when s n. Aln, Bppanna Feldman, Grigrescu, Reyzin, Vempala, Xia (2014): Cannt be beaten by Statistical Learning Algrithms.
21 Planted Gaussians: Signal and Nise A n n matrix and S [n], S = k. A =. µ N(0,σ 2 )..
22 Planted Gaussians: Signal and Nise A n n matrix and S [n], S = k. A ij all independent r.v. s A =. µ N(0,σ 2 )..
23 Planted Gaussians: Signal and Nise A n n matrix and S [n], S = k. A ij all independent r.v. s Fr i,j S, Pr(A ij µ) 1/2. (Eg. N(µ,σ 2 )). Signal = µ. A =. µ N(0,σ 2 )..
24 Planted Gaussians: Signal and Nise A = A n n matrix and S [n], S = k. A ij all independent r.v. s Fr i,j S, Pr(A ij µ) 1/2. (Eg. N(µ,σ 2 )). Signal = µ. Fr ther i,j, A ij is N(0,σ 2 ). Nise = σ.. µ N(0,σ 2 )..
25 Planted Gaussians: Signal and Nise A n n matrix and S [n], S = k. A ij all independent r.v. s Fr i,j S, Pr(A ij µ) 1/2. (Eg. N(µ,σ 2 )). Signal = µ. Fr ther i,j, A ij is N(0,σ 2 ). Nise = σ. Given A,µ,σ, find S. [Recall Planted Clique.]. µ A =..... N(0,σ 2 )..
26 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.
27 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 )..
28 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 )..
29 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 ).. Subtract exp( µ 2 /2σ 2 )
30 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 ).. Subtract exp( µ 2 /2σ 2 ) k/4... Rand. Matrix nexp( cµ 2 /σ 2 )
31 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 ).. Subtract exp( µ 2 /2σ 2 ) k/4... Rand. Matrix S, SVD finds S prvided exp(c(µ/σ) 2 ) > nexp( cµ 2 /σ 2 ) n k.
32 Expnential Advantage in SNR by Threshlding Brave new step: Threshld entries f A at µ 0-1 matrix B.. (1/2) E(B) :..... exp( µ 2 /2σ 2 ).. Subtract exp( µ 2 /2σ 2 ) k/4... Rand. Matrix S, SVD finds S prvided exp(c(µ/σ) 2 ) > Cf: Ordinary SVD succeeds if µ σ > n k. nexp( cµ 2 /σ 2 ) n k.
33 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features.
34 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters)
35 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster.
36 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster. A ij µ if feature i is a dminant feature f the dminant tpic f data pint j.
37 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster. A ij µ if feature i is a dminant feature f the dminant tpic f data pint j. A ij σ therwise.
38 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster. A ij µ if feature i is a dminant feature f the dminant tpic f data pint j. A ij σ therwise. If variance abve µ is larger than gap between µ and σ, a 2-clustering criterin (like 2-means) may split the high weight cluster instead f separating it frm the thers.
39 Threshlding: Secnd Plus Data pints {A 1,A 2,...,A j,...} in R d, d features. Data pints are in 2 SOFT clusters: Data pint j belngs w j t cluster 1 and 1 w j t cluster 2. (Mre Generally, k clusters) Each cluster has sme sme dminant features and each data pint has a dminant cluster. A ij µ if feature i is a dminant feature f the dminant tpic f data pint j. A ij σ therwise. If variance abve µ is larger than gap between µ and σ, a 2-clustering criterin (like 2-means) may split the high weight cluster instead f separating it frm the thers. Tw Differences frm Mixtures: Sft, High Variance in dminant features.
40 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr.
41 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic).
42 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic). T generate dc j, generate a randm cnvex cmbinatin f tpic vectrs.
43 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic). T generate dc j, generate a randm cnvex cmbinatin f tpic vectrs. Generate wrds f dc. j in i.i.d. trials, each frm the multinmial with prb.s = Cnvex Cmbinatin. ***DRAW PICTURE ON BOARD WITH SPORTS, POLITICS, WEATHER***
44 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic). T generate dc j, generate a randm cnvex cmbinatin f tpic vectrs. Generate wrds f dc. j in i.i.d. trials, each frm the multinmial with prb.s = Cnvex Cmbinatin. ***DRAW PICTURE ON BOARD WITH SPORTS, POLITICS, WEATHER*** The Tpic Mdeling Prblem Given nly A, find an apprximatin t all tpic vectrs s that the l 1 errr in each tpic vectr is at mst ε. l 1 errr crucial. (l 2 misses small wrds.)
45 Tpic Mdeling: The Prblem Jint Wrk with T. Bansal and C. Bhattacharyya d features - wrds in the dictinary. A dcument is a d (clumn) vectr. k tpics. Tpic l is a d vectr. (Prbabilities f wrds in tpic). T generate dc j, generate a randm cnvex cmbinatin f tpic vectrs. Generate wrds f dc. j in i.i.d. trials, each frm the multinmial with prb.s = Cnvex Cmbinatin. ***DRAW PICTURE ON BOARD WITH SPORTS, POLITICS, WEATHER*** The Tpic Mdeling Prblem Given nly A, find an apprximatin t all tpic vectrs s that the l 1 errr in each tpic vectr is at mst ε. l 1 errr crucial. (l 2 misses small wrds.) Generally NP-hard.
46 Tpic Mdeling is Sft Clustering Tpic Vectrs Cluster Centers
47 Tpic Mdeling is Sft Clustering Tpic Vectrs Cluster Centers Each data pint (dc) belngs t a weighted cmbinatin f clusters. Generated frm a distributin (happens t be multinmial) with expectatin = weighted cmbinatin.
48 Tpic Mdeling is Sft Clustering Tpic Vectrs Cluster Centers Each data pint (dc) belngs t a weighted cmbinatin f clusters. Generated frm a distributin (happens t be multinmial) with expectatin = weighted cmbinatin. Even if we manage t slve the clustering prblem smehw, it is nt true that cluster centers are averages f dcuments. Big Distinctin frm Learning Mixtures which is hard clusetring.
49 Gemetry Tpic Mdeling = Sft Clustering Given dc s (means f s), find μ l. Helps t find nearly pure dcs (X near crner) μ 3 X μ 1 X X X μ l = l th tpic vectr X = Weighted cmbinatin f μ l s are wrds in a dc iid chices with mean X X X μ 2
50 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Our Gals
51 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. Our Gals
52 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Our Gals
53 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Anandkumar, Fster, Hsu, Kakade, Liu d Tpic Mdeling under LDA, t l 2 errr using clever tensr methds. Parameters. Our Gals
54 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Anandkumar, Fster, Hsu, Kakade, Liu d Tpic Mdeling under LDA, t l 2 errr using clever tensr methds. Parameters. Arra, Ge, Mitra Assume Anchr Wrd + Other parameters : Each tpic has ne wrd (a) ccurring nly in that tpic (b) with high frequency. First Prvable Algrithm. Our Gals
55 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Anandkumar, Fster, Hsu, Kakade, Liu d Tpic Mdeling under LDA, t l 2 errr using clever tensr methds. Parameters. Arra, Ge, Mitra Assume Anchr Wrd + Other parameters : Each tpic has ne wrd (a) ccurring nly in that tpic (b) with high frequency. First Prvable Algrithm. Our Gals Intuitive, empirically verified assumptins.
56 Prir Results and Assumptins Under Pure Tpics and Primary Wrds (1 ε f wrds are primary) Assumptins, SVD slves it. Papadimitriu, Raghavan, Tamaki, Vempala. Belief: SVD cannt d the nn-pure tpic case. LDA : Mst used mdel. Blei, Ng, Jrdan. Multiple tpics per dc. Anandkumar, Fster, Hsu, Kakade, Liu d Tpic Mdeling under LDA, t l 2 errr using clever tensr methds. Parameters. Arra, Ge, Mitra Assume Anchr Wrd + Other parameters : Each tpic has ne wrd (a) ccurring nly in that tpic (b) with high frequency. First Prvable Algrithm. Our Gals Intuitive, empirically verified assumptins. Natural, prvable Algrithm.
57 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number.
58 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number. Catchwrds: Each tpic has a set f wrds: (a) each ccurs mre frequently in the tpic than thers and (b) tgether, they have high frequency.
59 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number. Catchwrds: Each tpic has a set f wrds: (a) each ccurs mre frequently in the tpic than thers and (b) tgether, they have high frequency. Dminant Tpics Each Dcument has a dminant tpic which has weight (in that dc) f at least sme α, whereas, nn-dminant tpics have weight at mst sme β.
60 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number. Catchwrds: Each tpic has a set f wrds: (a) each ccurs mre frequently in the tpic than thers and (b) tgether, they have high frequency. Dminant Tpics Each Dcument has a dminant tpic which has weight (in that dc) f at least sme α, whereas, nn-dminant tpics have weight at mst sme β. Nearly Pure Dcuments Each tpic has a (small) fractin f dcuments which are 1 δ pure fr that tpic.
61 Our Assumptins Intuitive t Tpic Mdeling, nt numerical parameters like cnditin number. Catchwrds: Each tpic has a set f wrds: (a) each ccurs mre frequently in the tpic than thers and (b) tgether, they have high frequency. Dminant Tpics Each Dcument has a dminant tpic which has weight (in that dc) f at least sme α, whereas, nn-dminant tpics have weight at mst sme β. Nearly Pure Dcuments Each tpic has a (small) fractin f dcuments which are 1 δ pure fr that tpic. N Lcal Min.: Fr every wrd, the plt f number f dcuments versus number f ccurrences f wrd (cnditined n dminant tpic) has n lcal min. [Zipf s law Or Unimdal.]
62 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k.
63 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s.
64 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s. SVD Use SVD n threshlded matrix t get starting centers fr k means algrithm.
65 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s. SVD Use SVD n threshlded matrix t get starting centers fr k means algrithm. k means Run k means. Will shw: This identifies dminant tpic.
66 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s. SVD Use SVD n threshlded matrix t get starting centers fr k means algrithm. k means Run k means. Will shw: This identifies dminant tpic. Identify Catchwrds Find the set f high frequency wrds in each cluster. Will shw: Set f Catchwrds fr tpic.
67 The Algrthm - Threshld SVD (TSVD) s = N. f dcs. Fr this talk, prbability that each tpic is dminant is 1/k. Threshld Cmpute the threshld fr each wrd i: First Gap : Maxζ : A ij ζ fr (s/2k) j s and A ij = ζ fr εs j s. SVD Use SVD n threshlded matrix t get starting centers fr k means algrithm. k means Run k means. Will shw: This identifies dminant tpic. Identify Catchwrds Find the set f high frequency wrds in each cluster. Will shw: Set f Catchwrds fr tpic. Identify Pure Dcs Find the set f dcuments with highest ttal number f ccurrences f set f catchwrds. Shw: Nearly Pure Dcs. Their average tpic vectr.
68 The advantage f Threshlding Diagnal blue blcks are Catchwrds fr each tpic. Black: Nn-Catchwrds.
69 μ 1 μ 2 μ 3 X X X X X X Thresh+SVD+k-means Dminant Tpics
70 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics.
71 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX.
72 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX. Dne? N. Need inter-cluster separatin intra-cluster spread (variance inside cluster).
73 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX. Dne? N. Need inter-cluster separatin intra-cluster spread (variance inside cluster). Catchwrds prvide sufficient inter-cluster separatin.
74 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX. Dne? N. Need inter-cluster separatin intra-cluster spread (variance inside cluster). Catchwrds prvide sufficient inter-cluster separatin. Inside-cluster variance bunded with machinery frm Randm Matrix Thery. Beware: Only clumns are independent. Rws are nt.
75 Prperties f Threshlding Using n lcal min., shw: N threshld splits any dminant tpic in the middle. S, threshlded matrix is a blck matrix fr catchwrds. But fr nn-catchwrds, can be high n several tpics. PICTURE ON THE BOARD OF A BLOCK MATRIX. Dne? N. Need inter-cluster separatin intra-cluster spread (variance inside cluster). Catchwrds prvide sufficient inter-cluster separatin. Inside-cluster variance bunded with machinery frm Randm Matrix Thery. Beware: Only clumns are independent. Rws are nt. Appeal t a result n k means (Kumar, K.: If inter-cluster separatin inside-cluster directinal stan. dev, then SVD fllwed by k means clusters.
76 Getting Tpic Vectrs PICTURE OF SIMPLEX with clumns f M as extreme pints and cluster f dc.s with each dminant tpic. Taking average f dcs in T l n gd.
77 Emprircal Results: Datasets NIPS: 1,500 NIPS full papers NYT: Randm subset f 30,000 dcuments frm the New Yrk Times dataset Pubmed: Randm subset f 30,000 dcuments frm the Pubmed abstracts dataset 20NG: 13,389 dcuments frm 20NewsGrup dataset
78 Empirical Results: Assumptins Crpus Dcuments K Fractin f Dcuments α = 0.4 α = 0.8 α = 0.9 NIPS 1, % 10.7% 4.8% NYT 30, % 20.9% 12.7% Pubmed 30, % 20.3% 10.7% 20NG 13, % 54.4% 44.3% Table: Fractin f dcuments satisfying dminant tpic assumptin. Crpus K Mean per tpic % Tpics frequency f CW with CW NIPS % NYT % Pubmed % 20NG % Table: CatchWrds (CW) assumptin with ρ = 1.1, ε = 0.25
79 Empirical Results: Semi-synthetic Data Generate semi-synthetic crpra frm LDA mdel trained by MCMC, t ensure that the synthetic crpra retain the characteristics f real data Gibbs sampling is run fr 1000 iteratins n all the fur datasets and the final wrd-tpic distributin is used t generate varying number (s) f synthetic dcuments with dcument-tpic distributin drawn frm a symmetric Dirichlet with hyper-parameter 0.01 Nte that the synthetic data is nt guaranteed t satisfy dminant tpic assumptin fr every dcument, n average abut 80% dcuments satisfy the assumptin
80 Empirical Results: L1 Recnstructin Errr And percent imprvement ver Recver-KL. Ttal average imprvement ver R-KL is 20% Crpus Dcuments Tensr R-L2 R-KL TSVD % Imprvement NIPS Pubmed 20NG NYT 40, % 60, % 80, % 100, % 150, % 200, % 40, % 60, % 80, % 100, % 150, % 200, % 40, % 60, % 80, % 100, % 200, % 40, % 60, % 80, % 100, %
81 Empirical Results: L1 Recnstructin Err Histgram f L1 errr acrss tpics fr 40k synthetic dcuments. On majrity f the tpic (> 90%) the recvery errr fr TSVD is significantly smaller than Recver-KL. NIPS NYT Number f Tpics Pubmed NG Algrithm R KL TSVD L1 Recnstructin Errr
82 Empirical Results: Perplexity & Tpic Cherence Perplexity 0 Tpic Cherence Algrithm TSVD Recver L2 Recver KL 0 20NG NIPS NYT Pubmed 20NG NIPS NYT Pubmed
83 Tp 5 wrds f sme tpics n the real NYT dataset. Catchwrds, anchr highlighted. zzz - identifier placed by NYT dataset. TSVD Recver-KL Gibbs cup minutes add tablespn il cup minutes tablespn add il cup minutes add tablespn il team seasn cach zzz_ram game game team seasn play zzz_ram team seasn game cach zzz_nfl patient dctr drug cancer study patient drug dctr percent fund patient dctr drug medical cancer zzz_jhn_mccain zzz_mccain zzz_bush zzz_gerge_bush campaign zzz_jhn_mccain zzz_gerge_bush campaign republican vter zzz_jhn_mccain zzz_gerge_bush campaign zzz_bush zzz_mccain huse rm building wall flr rm shw lk hme huse rm lk water huse hand film mvie actr character zzz_scar zzz_gd christian religius zzz_jesus church film shw mvie music bk ppe church bk jewish religius film mvie character play directr religius church jewish jew zzz_gd
T Algorithmic methods for data mining. Slide set 6: dimensionality reduction
T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationChapter 3: Cluster Analysis
Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationPart 3 Introduction to statistical classification techniques
Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms
More informationModelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA
Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview
More informationPrincipal Components
Principal Cmpnents Suppse we have N measurements n each f p variables X j, j = 1,..., p. There are several equivalent appraches t principal cmpnents: Given X = (X 1,... X p ), prduce a derived (and small)
More informationDistributions, spatial statistics and a Bayesian perspective
Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics
More information, which yields. where z1. and z2
The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin
More informationBootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >
Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);
More informationResampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017
Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with
More informationComputational modeling techniques
Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins
More informationDifferentiation Applications 1: Related Rates
Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm
More informationTree Structured Classifier
Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients
More informationLecture 10, Principal Component Analysis
Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal
More informationNOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN. Junjiro Ogawa University of North Carolina
NOTE ON THE ANALYSIS OF A RANDOMIZED BLOCK DESIGN by Junjir Ogawa University f Nrth Carlina This research was supprted by the Office f Naval Research under Cntract N. Nnr-855(06) fr research in prbability
More informationMaximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016
Maximum A Psteriri (MAP) CS 109 Lecture 22 May 16th, 2016 Previusly in CS109 Game f Estimatrs Maximum Likelihd Nn spiler: this didn t happen Side Plt argmax argmax f lg Mther f ptimizatins? Reviving an
More informationChecking the resolved resonance region in EXFOR database
Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,
More informationSimple Linear Regression (single variable)
Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins
More information24 Multiple Eigenvectors; Latent Factor Analysis; Nearest Neighbors
Multiple Eigenvectrs; Latent Factr Analysis; Nearest Neighbrs 47 24 Multiple Eigenvectrs; Latent Factr Analysis; Nearest Neighbrs Clustering w/multiple Eigenvectrs [When we use the Fiedler vectr fr spectral
More informationThe standards are taught in the following sequence.
B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and
More informationPlan o o. I(t) Divide problem into sub-problems Modify schematic and coordinate system (if needed) Write general equations
STAPLE Physics 201 Name Final Exam May 14, 2013 This is a clsed bk examinatin but during the exam yu may refer t a 5 x7 nte card with wrds f wisdm yu have written n it. There is extra scratch paper available.
More informationAP Statistics Notes Unit Two: The Normal Distributions
AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).
More informationx 1 Outline IAML: Logistic Regression Decision Boundaries Example Data
Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares
More informationAdmissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs
Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department
More informationThe blessing of dimensionality for kernel methods
fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented
More informationChemistry 20 Lesson 11 Electronegativity, Polarity and Shapes
Chemistry 20 Lessn 11 Electrnegativity, Plarity and Shapes In ur previus wrk we learned why atms frm cvalent bnds and hw t draw the resulting rganizatin f atms. In this lessn we will learn (a) hw the cmbinatin
More informationWhat is Statistical Learning?
What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,
More informationk-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels
Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t
More informationMATHEMATICS SYLLABUS SECONDARY 5th YEAR
Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE
More informationCOMP 551 Applied Machine Learning Lecture 11: Support Vector Machines
COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse
More informationFigure 1a. A planar mechanism.
ME 5 - Machine Design I Fall Semester 0 Name f Student Lab Sectin Number EXAM. OPEN BOOK AND CLOSED NOTES. Mnday, September rd, 0 Write n ne side nly f the paper prvided fr yur slutins. Where necessary,
More informationSections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.
Tpic : AC Fundamentals, Sinusidal Wavefrm, and Phasrs Sectins 5. t 5., 6. and 6. f the textbk (Rbbins-Miller) cver the materials required fr this tpic.. Wavefrms in electrical systems are current r vltage
More informationSlide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons
Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large
More informationChapter 8: The Binomial and Geometric Distributions
Sectin 8.1: The Binmial Distributins Chapter 8: The Binmial and Gemetric Distributins A randm variable X is called a BINOMIAL RANDOM VARIABLE if it meets ALL the fllwing cnditins: 1) 2) 3) 4) The MOST
More informationDetermining the Accuracy of Modal Parameter Estimation Methods
Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system
More information5 th grade Common Core Standards
5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin
More informationHomology groups of disks with holes
Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.
More informationIAML: Support Vector Machines
1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int
More informationA Matrix Representation of Panel Data
web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins
More information1 The limitations of Hartree Fock approximation
Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants
More informationSupport-Vector Machines
Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material
More informationEquilibrium of Stress
Equilibrium f Stress Cnsider tw perpendicular planes passing thrugh a pint p. The stress cmpnents acting n these planes are as shwn in ig. 3.4.1a. These stresses are usuall shwn tgether acting n a small
More informationMultiple Source Multiple. using Network Coding
Multiple Surce Multiple Destinatin Tplgy Inference using Netwrk Cding Pegah Sattari EECS, UC Irvine Jint wrk with Athina Markpulu, at UCI, Christina Fraguli, at EPFL, Lausanne Outline Netwrk Tmgraphy Gal,
More information1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp
THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*
More informationThe general linear model and Statistical Parametric Mapping I: Introduction to the GLM
The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design
More informationSource Coding and Compression
Surce Cding and Cmpressin Heik Schwarz Cntact: Dr.-Ing. Heik Schwarz heik.schwarz@hhi.fraunhfer.de Heik Schwarz Surce Cding and Cmpressin September 22, 2013 1 / 60 PartI: Surce Cding Fundamentals Heik
More informationPhysical Layer: Outline
18-: Intrductin t Telecmmunicatin Netwrks Lectures : Physical Layer Peter Steenkiste Spring 01 www.cs.cmu.edu/~prs/nets-ece Physical Layer: Outline Digital Representatin f Infrmatin Characterizatin f Cmmunicatin
More information4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression
4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw
More informationChapter 2 GAUSS LAW Recommended Problems:
Chapter GAUSS LAW Recmmended Prblems: 1,4,5,6,7,9,11,13,15,18,19,1,7,9,31,35,37,39,41,43,45,47,49,51,55,57,61,6,69. LCTRIC FLUX lectric flux is a measure f the number f electric filed lines penetrating
More informationCOMP 551 Applied Machine Learning Lecture 4: Linear classification
COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted
More informationMore Tutorial at
Answer each questin in the space prvided; use back f page if extra space is needed. Answer questins s the grader can READILY understand yur wrk; nly wrk n the exam sheet will be cnsidered. Write answers,
More informationComparing Several Means: ANOVA. Group Means and Grand Mean
STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal
More informationPattern Recognition 2014 Support Vector Machines
Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft
More informationSPH3U1 Lesson 06 Kinematics
PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.
More informationCHM112 Lab Graphing with Excel Grading Rubric
Name CHM112 Lab Graphing with Excel Grading Rubric Criteria Pints pssible Pints earned Graphs crrectly pltted and adhere t all guidelines (including descriptive title, prperly frmatted axes, trendline
More informationExperiment #3. Graphing with Excel
Experiment #3. Graphing with Excel Study the "Graphing with Excel" instructins that have been prvided. Additinal help with learning t use Excel can be fund n several web sites, including http://www.ncsu.edu/labwrite/res/gt/gt-
More informationInterference is when two (or more) sets of waves meet and combine to produce a new pattern.
Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme
More informationPhysics 2010 Motion with Constant Acceleration Experiment 1
. Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin
More informationBLAST / HIDDEN MARKOV MODELS
CS262 (Winter 2015) Lecture 5 (January 20) Scribe: Kat Gregry BLAST / HIDDEN MARKOV MODELS BLAST CONTINUED HEURISTIC LOCAL ALIGNMENT Use Cmmnly used t search vast bilgical databases (n the rder f terabases/tetrabases)
More informationFall 2013 Physics 172 Recitation 3 Momentum and Springs
Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.
More informationHiding in plain sight
Hiding in plain sight Principles f stegangraphy CS349 Cryptgraphy Department f Cmputer Science Wellesley Cllege The prisners prblem Stegangraphy 1-2 1 Secret writing Lemn juice is very nearly clear s it
More informationThe Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition
The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge
More informationCOMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)
COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise
More informationthe results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must
M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins
More informationProbability, Random Variables, and Processes. Probability
Prbability, Randm Variables, and Prcesses Prbability Prbability Prbability thery: branch f mathematics fr descriptin and mdelling f randm events Mdern prbability thery - the aximatic definitin f prbability
More informationCollocation Map for Overcoming Data Sparseness
Cllcatin Map fr Overcming Data Sparseness Mnj Kim, Yung S. Han, and Key-Sun Chi Department f Cmputer Science Krea Advanced Institute f Science and Technlgy Taejn, 305-701, Krea mj0712~eve.kaist.ac.kr,
More informationStatistics, Numerical Models and Ensembles
Statistics, Numerical Mdels and Ensembles Duglas Nychka, Reinhard Furrer,, Dan Cley Claudia Tebaldi, Linda Mearns, Jerry Meehl and Richard Smith (UNC). Spatial predictin and data assimilatin Precipitatin
More informationName AP CHEM / / Chapter 1 Chemical Foundations
Name AP CHEM / / Chapter 1 Chemical Fundatins Metric Cnversins All measurements in chemistry are made using the metric system. In using the metric system yu must be able t cnvert between ne value and anther.
More informationKinetic Model Completeness
5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins
More informationModule 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics
Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm
More information[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )
(Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well
More informationLecture 17: Free Energy of Multi-phase Solutions at Equilibrium
Lecture 17: 11.07.05 Free Energy f Multi-phase Slutins at Equilibrium Tday: LAST TIME...2 FREE ENERGY DIAGRAMS OF MULTI-PHASE SOLUTIONS 1...3 The cmmn tangent cnstructin and the lever rule...3 Practical
More informationand the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:
Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track
More informationYou need to be able to define the following terms and answer basic questions about them:
CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f
More informationLecture 5: Equilibrium and Oscillations
Lecture 5: Equilibrium and Oscillatins Energy and Mtin Last time, we fund that fr a system with energy cnserved, v = ± E U m ( ) ( ) One result we see immediately is that there is n slutin fr velcity if
More informationNAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?
CS4445 ata Mining and Kwledge iscery in atabases. B Term 2014 Exam 1 Nember 24, 2014 Prf. Carlina Ruiz epartment f Cmputer Science Wrcester Plytechnic Institute NAME: Prf. Ruiz Prblem I: Prblem II: Prblem
More informationMidwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter
Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline
More informationMATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank
MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use
More information3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression
3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets
More informationResampling Methods. Chapter 5. Chapter 5 1 / 52
Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and
More informationChapter 8 Predicting Molecular Geometries
Chapter 8 Predicting Mlecular Gemetries 8-1 Mlecular shape The Lewis diagram we learned t make in the last chapter are a way t find bnds between atms and lne pais f electrns n atms, but are nt intended
More informationDepartment of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets
Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0
More informationinitially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur
Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract
More informationSIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.
SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST Mark C. Ott Statistics Research Divisin, Bureau f the Census Washingtn, D.C. 20233, U.S.A. and Kenneth H. Pllck Department f Statistics, Nrth Carlina State
More informationSOME CONSTRUCTIONS OF OPTIMAL BINARY LINEAR UNEQUAL ERROR PROTECTION CODES
Philips J. Res. 39, 293-304,1984 R 1097 SOME CONSTRUCTIONS OF OPTIMAL BINARY LINEAR UNEQUAL ERROR PROTECTION CODES by W. J. VAN OILS Philips Research Labratries, 5600 JA Eindhven, The Netherlands Abstract
More informationModule 4: General Formulation of Electric Circuit Theory
Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated
More informationBiplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint
Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:
More informationCS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007
CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is
More informationNUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION
NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science
More informationMATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank
MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis
More informationREADING STATECHART DIAGRAMS
READING STATECHART DIAGRAMS Figure 4.48 A Statechart diagram with events The diagram in Figure 4.48 shws all states that the bject plane can be in during the curse f its life. Furthermre, it shws the pssible
More informationCambridge Assessment International Education Cambridge Ordinary Level. Published
Cambridge Assessment Internatinal Educatin Cambridge Ordinary Level ADDITIONAL MATHEMATICS 4037/1 Paper 1 Octber/Nvember 017 MARK SCHEME Maximum Mark: 80 Published This mark scheme is published as an aid
More informationIn SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:
In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin
More informationBayesian nonparametric modeling approaches for quantile regression
Bayesian nnparametric mdeling appraches fr quantile regressin Athanasis Kttas Department f Applied Mathematics and Statistics University f Califrnia, Santa Cruz Department f Statistics Athens University
More informationSUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis
SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm
More informationLecture 3: Principal Components Analysis (PCA)
Lecture 3: Principal Cmpnents Analysis (PCA) Reading: Sectins 6.3.1, 10.1, 10.2, 10.4 STATS 202: Data mining and analysis Jnathan Taylr, 9/28 Slide credits: Sergi Bacallad 1 / 24 The bias variance decmpsitin
More informationReinforcement Learning" CMPSCI 383 Nov 29, 2011!
Reinfrcement Learning" CMPSCI 383 Nv 29, 2011! 1 Tdayʼs lecture" Review f Chapter 17: Making Cmple Decisins! Sequential decisin prblems! The mtivatin and advantages f reinfrcement learning.! Passive learning!
More informationWhy Don t They Get It??
Why Dn t They Get It?? A 60-minute Webinar NEURO LINGUISTIC PROGRAMMING NLP is the way we stre and prcess infrmatin in ur brains, and then frm the wrds we use t cmmunicate. By learning abut NLP, yu can
More information