Model-based clustering and validation techniques for gene expression data
|
|
- Donald Merritt
- 5 years ago
- Views:
Transcription
1 Moel-base clustering an valiation techniques for gene expression ata Ka Yee Yeung Department of Microbiology University of Washington, Seattle WA
2 Overview Introuction to microarrays Clustering 11 Valiating clustering results on microarray ata Moel-base clustering using microarray ata Co-expression co-regulation?? 6/8/ Ka Yee Yeung - Lipari
3 Introuction to Microarrays 6/8/ Ka Yee Yeung - Lipari
4 DNA Arrays Measure the Concentration of 1 s of Genes Simultaneously On the surface In solution After Hybriization A B copies of gene A, 1 copy of gene B A B 6/8/ Ka Yee Yeung - Lipari
5 Two-color analysis allows for comparative stuies to be one In solution #1 After Hybriization copies of gene A, 1 copy of gene B In solution # A B copies of gene A, copies of gene B 6/8/ Ka Yee Yeung - Lipari
6 Cell Population #1 Cell Population # Extract mrna Extract mrna Make cdna Label w/ Green Fluor Make cdna Label w/ Re Fluor Co-hybriize.. Scan Slie with DNA from ifferent genes
7 The Gene Chip from Affymetrix 6/8/ Ka Yee Yeung - Lipari 7
8 Affymetrix chips: oligonucleotie arrays PM (Perfect Match) vs. MM (mis-match) 6/8/ Ka Yee Yeung - Lipari 8
9 A gene expression ata set.. Snapshot of activities in the cell Each chip represents an experiment: time course tissue samples (normal/cancer) n genes p experiments X ij 6/8/ Ka Yee Yeung - Lipari 9
10 What is clustering? Group similar objects together Clustering experiments Clustering genes Gene 1 Gene Gene Gene E E E E /8/ Ka Yee Yeung - Lipari 1
11 Applications of clustering gene expression ata Guilt by association E.g. Cluster the genes Ë functionally relate genes Class iscovery E.g. Cluster the experiments Ë iscover new subtypes of tissue samples 6/8/ Ka Yee Yeung - Lipari 11
12 Clustering 11 6/8/ Ka Yee Yeung - Lipari 1
13 What is clustering? Group similar objects together Objects in the same cluster (group) are more similar to each other than objects in ifferent clusters Data exploratory tool Unsupervise metho Do not assume any knowlege of the genes or experiments 6/8/ Ka Yee Yeung - Lipari 1
14 How to efine similarity? 1 X 1 Experiments p X genes n genes genes Y Y n Raw matrix Similarity measure: A measure of pairwise similarity or issimilarity Examples: Correlation coefficient Eucliean istance Similarity matrix 6/8/ Ka Yee Yeung - Lipari 1 n
15 6/8/ Ka Yee Yeung - Lipari 1 Similarity measures Eucliean istance Correlation coefficient 1 ]) [ ] [ Â( - p j j Y j X p j X X where Y j Y X j X Y j Y X j X p j p j p j p j     ] [, ) ] [ ( ) ] [ ( ) ] [ )( ] [ (
16 Example X 1-1 Y 1 Z -1 1 W X Y Z W - - Correlation (X,Y) 1 Distance (X,Y) Correlation (X,Z) -1 Distance (X,Z).8 Correlation (X,W) 1 Distance (X,W) 1.1 6/8/ Ka Yee Yeung - Lipari 16
17 Lessons from the example Correlation irection only Eucliean istance magnitue & irection 6/8/ Ka Yee Yeung - Lipari 17
18 Clustering algorithms Inputs: Raw ata matrix or similarity matrix Number of clusters or some other parameters Hierarchical vs Partitional algorithms 6/8/ Ka Yee Yeung - Lipari 18
19 Hierarchical Clustering [Hartigan 197] Agglomerative (bottom-up) enrogram Algorithm: Initialize: each item a cluster Iterate: select two most similar clusters merge them Halt: when require number of clusters is reache 6/8/ Ka Yee Yeung - Lipari 19
20 Hierarchical: Single Link cluster similarity similarity of two most similar members - Potentially long an skinny clusters + Fast 6/8/ Ka Yee Yeung - Lipari
21 6/8/ Ka Yee Yeung - Lipari 1 Example: single link Î È Î È (1,) (1,) 1 8 min{ 9,8} }, min{ 9 min{ 1,9} }, min{ 6,} min{ }, min{, 1, (1,),, 1, (1,),, 1, (1,),
22 6/8/ Ka Yee Yeung - Lipari Example: single link Î È 7 (1,,) (1,,) 1 Î È (1,) (1,) Î È min{ 8,} }, min{ 7 min{ 9,7} }, min{, (1,), (1,,),, (1,), (1,,),
23 6/8/ Ka Yee Yeung - Lipari Example: single link Î È 7 (1,,) (1,,) 1 Î È (1,) (1,) Î È }, min{ (1,,), (1,,), 1,,),(,) (
24 Hierarchical: Complete Link cluster similarity similarity of two least similar members + tight clusters - slow 6/8/ Ka Yee Yeung - Lipari
25 6/8/ Ka Yee Yeung - Lipari Example: complete link Î È Î È (1,) (1,) 1 9 max{9,8} }, max{ 1 max{1,9} }, max{ 6 max{6,} }, max{, 1, (1,),, 1, (1,),, 1, (1,),
26 6/8/ Ka Yee Yeung - Lipari 6 Example: complete link Î È Î È (1,) (1,) 1 Î È (,) (1,) (,) (1,) 7 max{7,} }, max{ 1 max{1,9} }, max{,,,(,) (1,), (1,), (1,),(,)
27 6/8/ Ka Yee Yeung - Lipari 7 Example: complete link Î È Î È (1,) (1,) Î È (,) (1,) (,) (1,) 1 1 }, max{ ),(, (1,),(,) 1,,),(,) (
28 Hierarchical: Average Link cluster similarity average similarity of all pairs + tight clusters - slow 6/8/ Ka Yee Yeung - Lipari 8
29 6/8/ Ka Yee Yeung - Lipari 9 Example: average link Î È Î È (1,) (1,) ) ( ) ( 1. 6 ) ( 1, 1, (1,),, 1, (1,),, 1, (1,),
30 6/8/ Ka Yee Yeung - Lipari Example: average link Î È Î È (1,) (1,) 1 Î È 6 9. (,) (1,) (,) (1,) 6 ) ( 1 9 ) ( 1,,,(,),, 1, 1, (1,),(,)
31 6/8/ Ka Yee Yeung - Lipari 1 Example: average link Î È Î È (1,) (1,) Î È 6 9. (,) (1,) (,) (1,) 1 8 ) ( 6 1,,,, 1, 1, 1,,),(,) (
32 Partitional: K-Means [MacQueen 196] 1 6/8/ Ka Yee Yeung - Lipari
33 Details of k-means Iterate until converge: Assign each ata point to the closest centroi Compute new centroi Properties: Converge to local optimum In practice, converge quickly Ten to prouce spherical, equal-size clusters 6/8/ Ka Yee Yeung - Lipari
34 D-clustering Cluster both genes an experiments 6/8/ Ka Yee Yeung - Lipari
35 Summary Definition of clustering Pairwise similarity: Correlation Eucliean istance Clustering algorithms: Hierarchical (single-link, complete-link, average-link) K-means 6/8/ Ka Yee Yeung - Lipari
36 Clustering microarray ata
37 What has been one? Many clustering algorithms have been propose for gene expression ata. For example: Hierarchical clustering algorithms eg. [Eisen et al 1998] K-means eg. [Tavazoie et al. 1999] Self-organizing maps (SOM) eg. [Tamayo et al. 1999] Cluster Affinity Search Technique (CAST) [Ben- Dor, Yakhini 1999] an many others 6/8/ Ka Yee Yeung - Lipari 7
38 Common questions 1. How can I choose between all these clustering methos?. Is there a clustering algorithm that works better than the others?. How to choose the number of clusters?. How often o I get biologically meaningful clusters?. How many microarray experiments o I nee? 6/8/ Ka Yee Yeung - Lipari 8
39 Valiating clustering results [Yeung, Haynor, Ruzzo 1] FOM iea Data sets Results ISI most cite paper in Computer Science (Dec ) 6/8/ Ka Yee Yeung - Lipari 9
40 Valiation techniques [Jain an Dubes 1988] External valiation Require external knowlege Internal valiation Does not require external knowlege Gooness of fit between the ata an the clusters 6/8/ Ka Yee Yeung - Lipari
41 Comparing ifferent heuristicbase algorithms Apply a clustering algorithm to all but one experiment Use the left-out experiment to check the preictive power of the algorithm 6/8/ Ka Yee Yeung - Lipari 1
42 Figure of Merit (FOM) 1 Experiments 1 e m Cluster C 1 genes g Cluster C i n Cluster C k FOM estimates preictive power: measures uniformity of gene expression levels in the left-out conition in clusters forme Low FOM > High preictive power 6/8/ Ka Yee Yeung - Lipari
43 FOM genes Experiments 1 e m 1 g n R(g,e) Cluster C 1 k 1 FOM ( e, k)  Â( R( g, e) - mc n Cluster C i Cluster C k FOM ( k) i 1 gœc m  e 1 ajuste FOM(e, k) i FOM ( e, k) FOM(e, k)/ i ( e)) n - k n 6/8/ Ka Yee Yeung - Lipari
44 Clustering Algorithms Partitional CAST (Ben-Dor et al. 1999) k-means (Hartigan 197) Hierarchical single-link average-link Complete-link Ranom (as a control) Ranomly assign genes to clusters 6/8/ Ka Yee Yeung - Lipari
45 Gene expression ata sets Ovary ata (Michel Schummer, Institute of Systems Biology) Subset of ata : clones experiments (cancer/normal tissue samples) clones correspon to genes (classes) Yeast cell cycle ata (Cho et al 1998) 17 time points Subset of 8 genes correspon to phases of cell cycle 6/8/ Ka Yee Yeung - Lipari
46 Synthetic ata sets Mixture of normal istributions base on the ovary ata Generate a multivariate normal istributions with the sample covariance matrix an mean vector of each class in the ovary ata Ranomly resample ovary ata For each class, ranomly sample the expression levels in each experiment Near iagonal covariance matrix 6/8/ Ka Yee Yeung - Lipari 6
47 Ranomly resample synthetic Ovary ata experiments ata set Synthetic ata class genes 6/8/ Ka Yee Yeung - Lipari 7
48 Results: ovary ata FOM 1 CAST ranom average-link single-link complete-link kmeans (init avglink) number of clusters CAST, k-means an complete-link : best performanace 6/8/ Ka Yee Yeung - Lipari 8
49 Results: yeast cell cycle ata 6 FOM 1 1 CAST ranom avglink complete-link single-link 1 1 number of clusters k-means (init avglink) CAST, k-means: best performance 6/8/ Ka Yee Yeung - Lipari 9
50 Results: mixture of normal base on ovary ata FOM CAST ranom average-link single-link complete-link k-means (init avglink) number of clusters At clusters: CAST lowest FOM 6/8/ Ka Yee Yeung - Lipari
51 Results: ranomly resample ovary ata FOM CAST ranom average-link single-link complete-link k-means (init avglink) number of clusters At clusters: CAST lowest FOM 6/8/ Ka Yee Yeung - Lipari 1
52 Summary of Results CAST an k-means prouce higher quality clusters than the hierarchical algorithms Single-link has worse performance among the hierarchical algorithms 6/8/ Ka Yee Yeung - Lipari
53 Software Implementation Comman line C coe: not very userfrienly at the moment Thir party implementation: MEV from TIGR 6/8/ Ka Yee Yeung - Lipari
54 6/8/ Ka Yee Yeung - Lipari
55 Thank-you s FOM work: Davi Haynor (Raiology, UW) Larry Ruzzo (Computer Science, UW) Ovary ata: Michel Schummer (Institute of Systems Biology) 6/8/ Ka Yee Yeung - Lipari
56 Reay for a break?
57 Overview Introuction to microarrays Clustering 11 Valiating clustering results on microarray ata Moel-base clustering using microarray ata Co-expression co-regulation?? 6/8/ Ka Yee Yeung - Lipari 7
58 Common questions 1. How can I choose between all these clustering methos?. Is there a clustering algorithm that works better than the others?. How to choose the number of clusters?. How often o I get biologically meaningful clusters?. How many microarray experiments o I nee? 6/8/ Ka Yee Yeung - Lipari 8
59 Moel-base clustering [Yeung, Fraley, Murua, Raftery, Ruzzo 1] Overview of moel-base clustering Data sets Results Summary an Future Work ISI most cite paper in Computer Science (Jan ) 6/8/ Ka Yee Yeung - Lipari 9
60 Moel-base clustering Gaussian mixture moel: Assume each cluster is generate by the multivariate normal istribution Each cluster k has parameters : Mean vector: m k Covariance matrix: S k Likelihoo for the mixture moel: n G Â 1,..., qg y) t k k ( yi k ) i 1 k 1 L mix ( q f q Data transformations & normality assumption 6/8/ Ka Yee Yeung - Lipari 6
61 EM algorithm General appraoch to maximum likelihoo Iterate between E an M steps: E step: compute the probability of each observation belonging to each cluster using the current parameter estimates M-step: estimate moel parameters using the current group membership probabilities 6/8/ Ka Yee Yeung - Lipari 61
62 Parameterization of the covariance matrix: S k l k D k A k D k T (Banfiel & Raftery 199) Equal variance spherical moel (EI): ~ kmeans S k l I variance orientation shape Unequal variance spherical moel (VI): S k l k I 6/8/ Ka Yee Yeung - Lipari 6
63 Covariance matrix: S k l k D k A k D k T Unconstraine moel (VVV): S k l k D k A k D k T variance orientation shape EEE elliptical moel: S k ldad T Diagonal moel: S k l k B k, where B k is iagonal with B k 1 6/8/ Ka Yee Yeung - Lipari 6
64 Key avantage of the moelbase approach: choose the moel an the number of clusters Bayesian Information Criterion (BIC) A large BIC score inicates strong evience for the corresponing moel. 6/8/ Ka Yee Yeung - Lipari 6
65 Definition of the BIC score log p ( D M ) ª log p( D qˆ, M ) -u log( n) k k k k BIC k The integrate likelihoo p(d M k ) is har to evaluate, where D is the ata, M k is the moel. BIC is an approximation to log p(d M k ) u k : number of parameters to be estimate in moel M k 6/8/ Ka Yee Yeung - Lipari 6
66 Overall Clustering Approach For a given range of # clusters (G) For each moel Apply EM to estimate moel parameters an cluster memberships Compute BIC 6/8/ Ka Yee Yeung - Lipari 66
67 Our Approach Our Goal: To show the moel-base approach has superior performance on: Quality of clusters Number of clusters an moel chosen (BIC) To compare clusters with classes: Ajuste Ran inex (Hubert an Arabie 198) High ajuste Ran inex Ë high agreement Compare the quality of clusters with a leaing heuristic-base algorithm: CAST (Ben- Dor & Yakhini 1999) 6/8/ Ka Yee Yeung - Lipari 67
68 Ajuste Ran inex Compare clusters to classes Consier # pairs of objects Same class Different class Same cluster a b Different cluster c 6/8/ Ka Yee Yeung - Lipari 68
69 6/8/ Ka Yee Yeung - Lipari 69 Example (Ajuste Ran) c#1() c#() c#(7) c#() class#1() class#() class#() 1 class#(1) ˆ Á Ë Ê - - ˆ Á Ë Ê + ˆ Á Ë Ê + ˆ Á Ë Ê + ˆ Á Ë Ê - - ˆ Á Ë Ê + ˆ Á Ë Ê + ˆ Á Ë Ê + ˆ Á Ë Ê ˆ Á Ë Ê + ˆ Á Ë Ê + ˆ Á Ë Ê + ˆ Á Ë Ê c b a a c a b a.69 ) ( 1 ) ( Ajuste Ran.789, R E R E R c a a R Ran
70 Methoology for BIC users: Not require classes Choose the number of clusters an moel Our evaluation methoology: Ajuste Ran Nee classes Assess the agreement of clusters to the classes 6/8/ Ka Yee Yeung - Lipari 7
71 Gene expression ata sets Ovary ata (Michel Schummer, Institute of Systems Biology) Subset of ata : clones experiments (cancer/normal tissue samples) clones correspon to genes (classes) Yeast cell cycle ata (Cho et al 1998) 17 time points Subset of 8 genes correspon to phases of cell cycle 6/8/ Ka Yee Yeung - Lipari 71
72 Synthetic ata sets Mixture of normal istributions base on the ovary ata Generate a multivariate normal istributions with the sample covariance matrix an mean vector of each class in the ovary ata Ranomly resample ovary ata For each class, ranomly sample the expression levels in each experiment Near iagonal covariance matrix 6/8/ Ka Yee Yeung - Lipari 7
73 Ranomly resample synthetic Ovary ata experiments ata set Synthetic ata class genes 6/8/ Ka Yee Yeung - Lipari 7
74 Results: mixture of normal istributions base on ovary ata ( genes).8 Ajuste Ran BIC number of clusters EI VI VVV iagonal CAST EI VI VVV iagonal At clusters, VVV, iagonal, CAST : high ajuste Ran BIC selects VVV at clusters number of clusters 6/8/ Ka Yee Yeung - Lipari 7
75 Results: ranomly resample ovary ata.9.8 Ajuste Ran number of clusters EI VI VVV iagonal CAST EEE Diagonal moel achieves the max ajuste Ran an BIC score (higher than CAST) BIC EI VI iagonal EEE BIC max at clusters Confirms expecte result -1 number of clusters 6/8/ Ka Yee Yeung - Lipari 7
76 Results: square root ovary ata.8.7 Ajuste Ran.6... EI VI VVV iagonal CAST EEE Ajuste Ran: max at EEE clusters (> CAST) number of clusters BIC analysis: EEE an iagonal moels -> first local max at clusters BIC EI VI iagonal EEE Global max -> VI at 8 clusters - number of clusters 6/8/ Ka Yee Yeung - Lipari 76
77 Results: stanarize yeast cell cycle ata.. BIC Ajuste Ran number of clusters EI VI VVV iagonal CAST EEE EI VI iagonal EEE Ajuste Ran: EI slightly > CAST at clusters. BIC: selects EEE at clusters. -17 number of clusters 6/8/ Ka Yee Yeung - Lipari 77
78 Results: log yeast cell cycle ata BIC Ajuste Ran number of clusters number of clusters EI VI VVV iagonal CAST EEE EI VI iagonal EEE CAST achieves much higher ajuste Ran inices than most moel-base approaches (except EEE). BIC scores of EEE much higher than the other moels. 6/8/ Ka Yee Yeung - Lipari 78
79 log yeast cell cycle ata 6/8/ Ka Yee Yeung - Lipari 79
80 Stanarize yeast cell cycle ata 6/8/ Ka Yee Yeung - Lipari 8
81 Synthetic ata sets: Summary With the correct moel, the moel-base approach excels over CAST BIC selects the right moel at the correct number of clusters Real expression ata sets: Comparable quality of clusters to CAST Avantage: BIC gives a hint to the number of clusters 6/8/ Ka Yee Yeung - Lipari 81
82 Software Implementation Software: Mclust available in Splus (Chris Fraley an Arian Raftery) R (Ron Wehrens) Matlab (Angel Martinez an Weny Martinez) 6/8/ Ka Yee Yeung - Lipari 8
83 Future Work Custom refinements to the moelbase implementation: Design moels that incorporate specific information about the experiments, eg. Block iagonal covariance matrix Missing ata Outliers 6/8/ Ka Yee Yeung - Lipari 8
84 Moel-base work: Thank-you s Chris Fraley (Statistics, UW) Alejanro Murua (Statistics, UW) Arian Raftery (Statistics, UW) Larry Ruzzo (Computer Science, UW) Ovary ata: Michel Schummer (Institute of Systems Biology) 6/8/ Ka Yee Yeung - Lipari 8
85 Common questions 1. How can I choose between all these clustering methos?. Is there a clustering algorithm that works better than the others?. How to choose the number of clusters?. How often o I get biologically meaningful clusters?. How many microarray experiments o I nee? 6/8/ Ka Yee Yeung - Lipari 8
86 From co-expression to co-regulation: How many microarray experiments o we nee? [Yeung, Meveovic, Bumgarner: To appear in Genome Biology ] 6/8/ Ka Yee Yeung - Lipari 86
87 From co-expression to co-regulation Motivation: Genes sharing the same transcriptional moules are expecte to prouce similar expression patterns Cluster analysis is often use to ientify genes that have similar expression patterns. Questions: 1. How likely are co-expresse genes regulate by the same transcription factors?. What is the effect of the following factors on the likelihoo: a. Number of microarray experiments b. Clustering algorithm use 6/8/ Ka Yee Yeung - Lipari 87
88 Transcription factor atabases Yeast microarray ata Pre-processing Yeast genes regulate by the same TF s Ranomly sample subsets with E experiments cluster For each pair of genes in the same cluster, evaluate if they share the same TF s 6/8/ Ka Yee Yeung - Lipari 88
89 Yeast transcription factors SCPD (Saccharomyces Cerevisiae Promoter Database) [zhang et al. 1999] List ~ genes that are regulate by 9 transcription factors (TF s) YPD (Yeast Protein Database) Commercial: UW oes not have access Appenix of [Lee et al. ] List genes regulate by each TF from literature as of Nov 1 List ~8 genes that are regulate by 1 TF s 6/8/ Ka Yee Yeung - Lipari 89
90 Comparing YPD an SCPD # istinct ORF s SCP D YPD 8 Common 16 # istinct TF s 18 1 # gene-tf interactions SCPD: 1/9 6% TF s regulate only 1 gene YPD: 17/1 1% TF s regulate only 1 gene In general, the YPD list contains TF s that regulate a higher # genes 6/8/ Ka Yee Yeung - Lipari 9
91 Yeast microarray ata Rosetta s yeast compenium ata [Hughes et al. ] knockout -color experiments Stanfor: Gasch et al. Data [ an 1] cdna array ata uner a variety of environmental stress (eg. heat shock) Total concatenate time course experiments 6/8/ Ka Yee Yeung - Lipari 91
92 Evaluation For each clustering result Count the number of pairs of genes that belong to the same cluster an share a common TF (True positive, TP) # gene pairs from the same clusters an share at least 1common TF's TP rate # gene pairs from the same clusters TP rate may change as a function of # clusters, we compare the TP rate to the TP rate of ranom partitions: Ranomly partition the set of genes 1 times Distribution of TP rate ~ Normal mean m an stanar eviation s Z-score (TP rate - m) / s A high z-score TP rate is significantly higher than those of ranom partitions 6/8/ Ka Yee Yeung - Lipari 9
93 Results: Compenium ata using all experiments To compare the performance of ifferent clustering algorithms 6/8/ Ka Yee Yeung - Lipari 9
94 compenium ata & SCPD (7 E) average-link (corr) IMM average-link (ist) complete-link (corr) Mclust complete-link (ist) 1 z-score # clusters MCLUST an complete-link (corr) prouce relatively high z-scores 6/8/ Ka Yee Yeung - Lipari 9
95 IMM (Infinite Mixture Moel-base) Infinite mixture moel: Each cluster is assume to follow a multivariate normal istribution o NOT assume # clusters Use a Gibbs Sampler to estimate the pairwise probabilities (Pij) for two genes (i,j) to belong to the same cluster To form clusters: cluster Pij with a heuristic clustering algorithm (eg. complete-link) Built-in error moel Assume the repeate measurements are generate from another multivariate Gaussian istribution. 6/8/ Ka Yee Yeung - Lipari 9
96 Results: Compenium ata: effect of # experiments 6/8/ Ka Yee Yeung - Lipari 96
97 Compenium ata & SCPD: Hierarchical complete-link over a range of # clusters meian z-score experiments 1 experiments experiments experiments 1 experiments experiments 7 experiments # clusters Observation: Meian z-score increases as # experiments increases over ifferent # clusters. 6/8/ Ka Yee Yeung - Lipari 97
98 Compenium ata & SCPD: Different clustering algorithms at clusters meian z-score average-link (corr) complete-link (corr) Mclust IMM # experiments Proportions of co-regulate genes increase as # experiments increases Mclust: highest proportions of co-regulate genes 6/8/ Ka Yee Yeung - Lipari 98
99 Compenium ata & YPD (7G): Different clustering algorithms at clusters meian z-score 1 1 average-link (corr) complete-link (corr) Mclust IMM # experiments 6/8/ Ka Yee Yeung - Lipari 99
100 Summary of Results More microarray experiments More likely to fin co-regulate genes!! SCPD/YPD prouces similar results Eucliean istance tens to prouce relatively low z- score compare to correlation using the same algorithm Stanarization greatly improves the performance of moel-base methos Mclust (EI moel) prouces relatively high z-scores IMM oesn t work as well as Mclust. Why?? 6/8/ Ka Yee Yeung - Lipari 1
101 ChIP-CHIP the methoology Transcription factors are crosslinke to genomic DNA DNA is sheare Antiboies immunoprecipitate a specific transcription factor DNA is un-linke, labele an use to interrogate arrays * * 6/8/ Ka Yee Yeung - Lipari 11
102 ChIP ata:r gol stanar [Lee et al. Science ] Chromatin Immunoprecipitation (ChIP): to etect the bining of TF s of interest to integenic sequences in yeast in vivo 16 TF s from YPD (11 TF s in their raw ata) Aopte error moel from [Hughes et al. ] Raw ata (log ratios an p-values for genes/integenic regions to TF s) available p-value cutoff.1 6/8/ Ka Yee Yeung - Lipari 1
103 Comparing ChIP ata an YPD 791 gene-tf interactions from YPD have a common gene an TF in the ChIP ata p-values range # gene-tf interactions % gene-tf interactions < x < < x < < x < < x < < x < < x <. 9. x >. 1.7 p-value TP FN total # ChIP interactions # etecte by ChIP but not in YPD TP % FN % /8/ Ka Yee Yeung - Lipari 1
104 Results: compenium ata & ChIP (1 genes) meian z-score 1 1 average-link (corr) complete-link (corr) Mclust IMM # experiments Very similar results on other atasets as well 6/8/ Ka Yee Yeung - Lipari 1
105 Take home message In orer to reliably infer co-regulation from cluster analysis, we nee lots of ata. 6/8/ Ka Yee Yeung - Lipari 1
106 Limitations Very naïve assumption of co-regulation: genes sharing at last one common transcription factors Yeast ata only Does not take the information limit of microarray atasets into consieration Consier only clustering algorithms in which each gene is assigne to only one cluster Our current stuy oes not provie completely quantitative results: how many experiments are sufficient to achieve x% co-regulation? 6/8/ Ka Yee Yeung - Lipari 16
107 Thank you s Roger Bumgarner (Microbiology, UW) Bumgarner Lab, UW: Tanveer Haier, Tao Peng, Mette Peters, Kyle Serikawa, Caimiao Wei Te Young -- Biochemistry, UW IMM: Mario Meveovic -- Univ of Cincinnati Mclust: Arian Raftery + Chris Fraley (Statistics, UW) 6/8/ Ka Yee Yeung - Lipari 17
108 Summary Question 1. How can I choose between ifferent clustering methos?. Is there a clustering algorithm that works better than the others?. How to choose the number of clusters?. How often o I get biologically meaningful clusters?. How many experiments o I nee? Answer FOM: compare any clustering algorithms on any ataset Moel-base clustering algorithm: high cluster quality estimate number of clusters. More experiments more likely to fin co-regulate genes Moel-base metho J in yeast, ~ experiments 6/8/ Ka Yee Yeung - Lipari 18
109 Meet my collaborators an mentors Davi Haynor Roger Bumgarner 6/8/ Mario Meveovic Arian Raftery Ka Yee Yeung - Lipari Larry Ruzzo 19
110 Key References [Yeung, Haynor, Ruzzo 1] Valiating clustering for gene expression ata. Bioinformatics 17:9-18 [Yeung, Fraley, Murua, Raftery, Ruzzo 1] Moelbase clustering an ata transformations for gene expresison ata. Bioinformatics 17: [Yeung, Meveovic, Bumgarner ] From coexpression to co-regulation: how many microarray experiments o we nee? To appear in Genome Biology. 6/8/ Ka Yee Yeung - Lipari 11
Overview. and data transformations of gene expression data. Toy 2-d Clustering Example. K-Means. Motivation. Model-based clustering
Model-based clustering and data transformations of gene expression data Walter L. Ruzzo University of Washington UW CSE Computational Biology Group 2 Toy 2-d Clustering Example K-Means? 3 4 Hierarchical
More informationPrincipal component analysis (PCA) for clustering gene expression data
Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationIntroduction to clustering methods for gene expression data analysis
Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationIntroduction to clustering methods for gene expression data analysis
Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationLDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling
Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe
More informationCollapsed Variational Inference for HDP
Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference
More informationA Variational Approach to Semi-Supervised Clustering
A Variational Approach to Semi-Supervise Clustering Peng Li, Yiming Ying an Colin Campbell Department of Engineering Mathematics, University of Bristol, Bristol BS8 1TR, Unite Kingom Abstract We present
More informationA Modification of the Jarque-Bera Test. for Normality
Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationCollapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling
Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationLecture 6: Generalized multivariate analysis of variance
Lecture 6: Generalize multivariate analysis of variance Measuring association of the entire microbiome with other variables Distance matrices capture some aspects of the ata (e.g. microbiome composition,
More informationModelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions.
BAYESIAN STATISTICS 7, pp. 000 000 J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2003 Modelling Data over Time: Curve
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationInferring Transcriptional Regulatory Networks from High-throughput Data
Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20
More informationMulti-View Clustering via Canonical Correlation Analysis
Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: EBLUP Unit Level for Small Area Estimation Contents General section... 3 1. Summary... 3 2. General
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationunder the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.
Assignment 13 Exercise 8.4 For the hypotheses consiere in Examples 8.12 an 8.13, the sign test is base on the statistic N + = #{i : Z i > 0}. Since 2 n(n + /n 1) N(0, 1) 2 uner the null hypothesis, the
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationThe Limits of Multiplexing
Article type: Focus Article The Limits of Multiplexing Dan Shen,, D.P. Dittmer 2, an J. S. Marron 3 Keywors Multiplexing, Genomics, nanostring TM, DASL TM, Probability Abstract We were motivate by three
More informationVI. Linking and Equating: Getting from A to B Unleashing the full power of Rasch models means identifying, perhaps conceiving an important aspect,
VI. Linking an Equating: Getting from A to B Unleashing the full power of Rasch moels means ientifying, perhaps conceiving an important aspect, efining a useful construct, an calibrating a pool of relevant
More informationInverse Theory Course: LTU Kiruna. Day 1
Inverse Theory Course: LTU Kiruna. Day Hugh Pumphrey March 6, 0 Preamble These are the notes for the course Inverse Theory to be taught at LuleåTekniska Universitet, Kiruna in February 00. They are not
More informationFuzzy Clustering of Gene Expression Data
Fuzzy Clustering of Gene Data Matthias E. Futschik and Nikola K. Kasabov Department of Information Science, University of Otago P.O. Box 56, Dunedin, New Zealand email: mfutschik@infoscience.otago.ac.nz,
More informationTHE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS
THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,
More informationELECTRON DIFFRACTION
ELECTRON DIFFRACTION Electrons : wave or quanta? Measurement of wavelength an momentum of electrons. Introuction Electrons isplay both wave an particle properties. What is the relationship between the
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More informationA Hybrid Approach for Modeling High Dimensional Medical Data
A Hybri Approach for Moeling High Dimensional Meical Data Alok Sharma 1, Gofrey C. Onwubolu 1 1 University of the South Pacific, Fii sharma_al@usp.ac.f, onwubolu_g@usp.ac.f Abstract. his work presents
More informationBioinformatics. Transcriptome
Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics
More informationMODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS
MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS BY BENJAMIN AVANZI, LUKE C. CASSAR AND BERNARD WONG ABSTRACT In this paper we investigate the potential of Lévy copulas as a tool for
More informationarxiv: v1 [hep-ex] 4 Sep 2018 Simone Ragoni, for the ALICE Collaboration
Prouction of pions, kaons an protons in Xe Xe collisions at s =. ev arxiv:09.0v [hep-ex] Sep 0, for the ALICE Collaboration Università i Bologna an INFN (Bologna) E-mail: simone.ragoni@cern.ch In late
More informationLevel Construction of Decision Trees in a Partition-based Framework for Classification
Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S
More informationCONFIRMATORY FACTOR ANALYSIS
1 CONFIRMATORY FACTOR ANALYSIS The purpose of confirmatory factor analysis (CFA) is to explain the pattern of associations among a set of observe variables in terms of a smaller number of unerlying latent
More informationOn the Surprising Behavior of Distance Metrics in High Dimensional Space
On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com
More informationDiscovering molecular pathways from protein interaction and ge
Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationBinary Discrimination Methods for High Dimensional Data with a. Geometric Representation
Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationSponsors. AMIA 4915 St. Elmo Ave, Suite 401 Bethesda, MD tel fax
http://www.amia.org/meetings/f05/ CD Orer Form Post-conference Summary General Information Searchable Program Scheule At-A-Glance Tutorials Nursing Informatics Symposium Exhibitor Information Exhibitor
More informationHomework 2 EM, Mixture Models, PCA, Dualitys
Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationEstimation of District Level Poor Households in the State of. Uttar Pradesh in India by Combining NSSO Survey and
Int. Statistical Inst.: Proc. 58th Worl Statistical Congress, 2011, Dublin (Session CPS039) p.6567 Estimation of District Level Poor Househols in the State of Uttar Praesh in Inia by Combining NSSO Survey
More informationA Review of Multiple Try MCMC algorithms for Signal Processing
A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationWESD - Weighted Spectral Distance for Measuring Shape Dissimilarity
1 WESD - Weighte Spectral Distance for Measuring Shape Dissimilarity Ener Konukoglu, Ben Glocker, Antonio Criminisi an Kilian M. Pohl Abstract This article presents a new istance for measuring shape issimilarity
More informationBayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen
Bayesian Hierarchical Classification Seminar on Predicting Structured Data Jukka Kohonen 17.4.2008 Overview Intro: The task of hierarchical gene annotation Approach I: SVM/Bayes hybrid Barutcuoglu et al:
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More informationA New Minimum Description Length
A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum
More informationNew Statistical Test for Quality Control in High Dimension Data Set
International Journal of Applie Engineering Research ISSN 973-456 Volume, Number 6 (7) pp. 64-649 New Statistical Test for Quality Control in High Dimension Data Set Shamshuritawati Sharif, Suzilah Ismail
More informationIntroduction. A Dirichlet Form approach to MCMC Optimal Scaling. MCMC idea
Introuction A Dirichlet Form approach to MCMC Optimal Scaling Markov chain Monte Carlo (MCMC quotes: Metropolis et al. (1953, running coe on the Los Alamos MANIAC: a feasible approach to statistical mechanics
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationLecture 5: November 19, Minimizing the maximum intracluster distance
Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationSimilarity Measures for Categorical Data A Comparative Study. Technical Report
Similarity Measures for Categorical Data A Comparative Stuy Technical Report Department of Computer Science an Engineering University of Minnesota 4-92 EECS Builing 200 Union Street SE Minneapolis, MN
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationGene mapping, linkage analysis and computational challenges. Konstantin Strauch
Gene mapping, linkage analysis an computational challenges Konstantin Strauch Institute for Meical Biometry, Informatics, an Epiemiology (IMBIE) University of Bonn E-mail: strauch@uni-bonn.e Genetics an
More informationConcentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification
Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,
More informationBayesian Estimation of the Entropy of the Multivariate Gaussian
Bayesian Estimation of the Entropy of the Multivariate Gaussian Santosh Srivastava Fre Hutchinson Cancer Research Center Seattle, WA 989, USA Email: ssrivast@fhcrc.org Maya R. Gupta Department of Electrical
More informationBohr Model of the Hydrogen Atom
Class 2 page 1 Bohr Moel of the Hyrogen Atom The Bohr Moel of the hyrogen atom assumes that the atom consists of one electron orbiting a positively charge nucleus. Although it oes NOT o a goo job of escribing
More informationLecture 6 : Dimensionality Reduction
CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing
More informationGaussian processes with monotonicity information
Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process
More informationDeriving ARX Models for Synchronous Generators
Deriving AR Moels for Synchronous Generators Yangkun u, Stuent Member, IEEE, Zhixin Miao, Senior Member, IEEE, Lingling Fan, Senior Member, IEEE Abstract Parameter ientification of a synchronous generator
More informationSpatio-temporal fusion for reliable moving vehicle classification in wireless sensor networks
Proceeings of the 2009 IEEE International Conference on Systems, Man, an Cybernetics San Antonio, TX, USA - October 2009 Spatio-temporal for reliable moving vehicle classification in wireless sensor networks
More informationFast Resampling Weighted v-statistics
Fast Resampling Weighte v-statistics Chunxiao Zhou Mar O. Hatfiel Clinical Research Center National Institutes of Health Bethesa, MD 20892 chunxiao.zhou@nih.gov Jiseong Par Dept of Math George Mason Univ
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationnetworks in molecular biology Wolfgang Huber
networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic
More informationarxiv: v4 [cs.ds] 7 Mar 2014
Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning
More informationensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y
Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay
More informationMixture models for analysing transcriptome and ChIP-chip data
Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationFLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction
FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number
More informationSituation awareness of power system based on static voltage security region
The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran
More informationCost-based Heuristics and Node Re-Expansions Across the Phase Transition
Cost-base Heuristics an Noe Re-Expansions Across the Phase Transition Elan Cohen an J. Christopher Beck Department of Mechanical & Inustrial Engineering University of Toronto Toronto, Canaa {ecohen, jcb}@mie.utoronto.ca
More informationHybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion
Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl
More informationEstimating Causal Direction and Confounding Of Two Discrete Variables
Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]
More informationOptimal Variable-Structure Control Tracking of Spacecraft Maneuvers
Optimal Variable-Structure Control racking of Spacecraft Maneuvers John L. Crassiis 1 Srinivas R. Vaali F. Lanis Markley 3 Introuction In recent years, much effort has been evote to the close-loop esign
More informationPart I: Web Structure Mining Chapter 1: Information Retrieval and Web Search
Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim
More informationRegularized extremal bounds analysis (REBA): An approach to quantifying uncertainty in nonlinear geophysical inverse problems
GEOPHYSICAL RESEARCH LETTERS, VOL. 36, L03304, oi:10.1029/2008gl036407, 2009 Regularize extremal bouns analysis (REBA): An approach to quantifying uncertainty in nonlinear geophysical inverse problems
More informationA COMPARISON OF SMALL AREA AND CALIBRATION ESTIMATORS VIA SIMULATION
SAISICS IN RANSIION new series an SURVEY MEHODOLOGY 133 SAISICS IN RANSIION new series an SURVEY MEHODOLOGY Joint Issue: Small Area Estimation 014 Vol. 17, No. 1, pp. 133 154 A COMPARISON OF SMALL AREA
More informationHomework 2 Solutions EM, Mixture Models, PCA, Dualitys
Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture
More informationPLAL: Cluster-based Active Learning
JMLR: Workshop an Conference Proceeings vol 3 (13) 1 22 PLAL: Cluster-base Active Learning Ruth Urner rurner@cs.uwaterloo.ca School of Computer Science, University of Waterloo, Canaa, ON, N2L 3G1 Sharon
More informationTHE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE
Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek
More informationCONTROL CHARTS FOR VARIABLES
UNIT CONTOL CHATS FO VAIABLES Structure.1 Introuction Objectives. Control Chart Technique.3 Control Charts for Variables.4 Control Chart for Mean(-Chart).5 ange Chart (-Chart).6 Stanar Deviation Chart
More informationA Randomized Approximate Nearest Neighbors Algorithm - a short version
We present a ranomize algorithm for the approximate nearest neighbor problem in - imensional Eucliean space. Given N points {x } in R, the algorithm attempts to fin k nearest neighbors for each of x, where
More informationExploratory statistical analysis of multi-species time course gene expression
Exploratory statistical analysis of multi-species time course gene expression data Eng, Kevin H. University of Wisconsin, Department of Statistics 1300 University Avenue, Madison, WI 53706, USA. E-mail:
More informationA Fay Herriot Model for Estimating the Proportion of Households in Poverty in Brazilian Municipalities
Int. Statistical Inst.: Proc. 58th Worl Statistical Congress, 2011, Dublin (Session CPS016) p.4218 A Fay Herriot Moel for Estimating the Proportion of Househols in Poverty in Brazilian Municipalities Quintaes,
More informationConstruction of Equiangular Signatures for Synchronous CDMA Systems
Construction of Equiangular Signatures for Synchronous CDMA Systems Robert W Heath Jr Dept of Elect an Comp Engr 1 University Station C0803 Austin, TX 78712-1084 USA rheath@eceutexaseu Joel A Tropp Inst
More informationCS9840 Learning and Computer Vision Prof. Olga Veksler. Lecture 2. Some Concepts from Computer Vision Curse of Dimensionality PCA
CS9840 Learning an Computer Vision Prof. Olga Veksler Lecture Some Concepts from Computer Vision Curse of Dimensionality PCA Some Slies are from Cornelia, Fermüller, Mubarak Shah, Gary Braski, Sebastian
More informationDatabase-friendly Random Projections
Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional
More informationNecessary and Sufficient Conditions for Sketched Subspace Clustering
Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More information