Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset

Size: px
Start display at page:

Download "Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset"

Transcription

1 Gene Expression Context-specific infinite mixtures for clustering gene expression profiles cross diverse microrry dtset Liu, X. 1,, Sivgnesn, S. 3, Yeung, K.Y. 4, Guo, J. 1, Bumgrner, R.E. 4, MedvedovicMrio 1, 1 Deprtment of Environmentl Helth, University of Cincinnti, 33 Eden Av. ML 56, Cincinnti OH 4567, Division of Biomedicl Informtics, Cincinnti Children s Hospitl Reserch Foundtion, Cincinnti, OH 459, 3 Mthemticl Sciences Deprtment, University of Cincinnti, Cincinnti, OH 451, 4 Deprtment of Microbiology, University of Wshington, Settle, WA ABSTRACT Motivtion: Identifying groups of co-regulted genes by monitoring their expression over vrious experimentl conditions is complicted by the fct tht such co-regultion is condition-specific. Ignoring the context-specific nture of co-regultion significntly reduces the bility of clustering procedures to detect co-expressed genes due to dditionl noise introduced by non-informtive mesurements. Results: We hve developed novel Byesin hierrchicl model nd corresponding computtionl lgorithms for clustering gene expression profiles cross diverse experimentl conditions nd studies tht ccounts for context-specificity of gene expression ptterns. The model is bsed on the Byesin infinite mixtures frmework nd does not require priori specifiction of the number of clusters. We demonstrte tht explicit modeling of context-specificity results in incresed ccurcy of the cluster nlysis by exmining the specificity nd sensitivity of clusters in microrry dt. We lso demonstrte tht probbilities of co-expression derived from the posterior distribution of clusterings re vlid estimtes of sttisticl significnce of creted clusters. Avilbility: The open-source pckge gimm is vilble t Contct: Mrio.Medvedovic@uc.edu Supplementry informtion: 1 INTRODUCTION Identifying nd interpreting gene expression ptterns nd chrcterizing groups of co-expressed genes defining these ptterns through cluster nlysis hs been productive pproch to lerning from DNA microrry dt. The results of such nlyses hve served to dissect regultory mechnisms underlying co-expression, identify pthwys involved in biologicl processes nd nnotte gene function. The qulity of these results nd conclusions is directly dependent on the qulity of the clustering procedure used in the nlysis. Since the dvent of the microrry technology, virtully ll trditionl clustering pproches hve been pplied in this context nd numerous new pproches hve been developed (Yeung nd Bumgrner 004). To identify subsets of co-expressed genes, most clustering procedures depend on either visul identifiction clusters from ptterns in color-coded disply (such s hierrchicl clustering) or on the correct specifiction of the number of ptterns present in dt prior to the nlysis (k-mens nd To whom correspondence should be ddressed. Self Orgnizing Mps). The most commonly used clustering procedures re d-hoc by nture nd incpble of seprting sttisticlly significnt clusters from rtifcts of rndom fluctutions in the dt. On the other hnd, clustering pproches bsed on the sttisticl modeling of the dt often require the number of clusters to be specified in dvnce (Brsh nd Friedmn 00; McLchln et l. 00; Segl et l. 003). When the correct number of clusters is estimted from the dt, trditionl methods fil to ccount for this significnt source of vribility in ssessing the sttisticl significnce of detected ptterns (Medvedovic nd Guo 004). Assessing the function of gene product is multidimensionl endevor whereby one my scertin number of properties including structure, the low-level function of protein (i.e. kinse, protese, etc), nd higher level function describing the biologicl processes in which the protein prticiptes. Identifiction of groups of co-expressed genes cross diverse microrry dtsets is very promising strtegy for ssessing higher-level function of gene products. Such nlysis is complicted by the fct tht coregultion is often condition-specific nd my not extend cross ll conditions. The problem of context-specificity cn be prticulrly pronounced when combining gene expression profiles cross different experiments, tissue types or even different orgnisms to perform met-cluster nlysis (Segl et l. 004; Segl et l. 003; Sturt et l. 003). In these situtions, mesurements of genes expression under ll conditions re not necessrily informtive with regrds to their co-regultion. Ignoring the locl nture of co-regultion significntly reduces one s bility to detect coregulted genes due to the noise introduced by non-informtive mesurements. Previously proposed solutions to this problem in terms of the context-specific Byesin networks (Brsh nd Friedmn 00) nd more generl module networks (Segl et l. 003) rely on the specifiction or estimtion of the correct number of ptterns. In these respects, they suffer from the sme problems relted to the estimtion of the correct number of ptterns in the finite mixture bsed clustering. We developed context-specific infinite mixture model (CSIMM) to llow clusters of co-expressed genes to be further grouped loclly on subsets of experimentl conditions tht do not contribute ny informtion bout their differences. This pproch mkes use of the Byesin infinite mixture frmework (Medvedovic et l. 004; Medvedovic nd Sivgnesn 00) to circumvent the issue of identifying the correct number of globl nd locl ptterns in the dt. Infinite mixtures re one possible Oxford University Press 005 1

2 X. Liu et l. prmetriztion of semi-prmetric Byesin models with Dirichlet process priors (Nel, 000) nd the CSIMM described here cn be thought of s hierrchicl Dirichlet process. Infinite mixtures frmework fcilittes verging over models with different numbers of ptterns, nd the posterior distribution of clusterings incorportes uncertinties relted to not knowing the correct number of clusters, either globlly or loclly. Consequently, the resulting posterior probbilities of co-expression offer relible ssessment of the sttisticl significnce of the groupings. We demonstrte the bility of the procedure to integrte informtion from diverse microrry experiments through simultion study nd by ssessing the performnce of lgorithms in the context of functionl nnottion of genes bsed on their co-expression. METHODS.1 Motivtion Our gol is to identify clusters of genes exhibiting similr expression ptterns cross multiple microrry dtsets. Ech dt set or context consists of number of closely relted microrry experiments tht shre biologicl reltionship nd smple limited rnge of perturbtions to the system under study. For exmple, one dt set my consist of mesurements of gene expression t different time points fter het shock, while nother my consists of mesurements t different stges of the cell cycle. For the ske of discussion, we will refer to ech dt set s context the entire collection of dtsets s the globl dtset. Context1 Context Context3 E1 E E3 E4 E5 E6 E7 E8 E9 E10 E11 E1 E13 E14 E15 Fig. 1. Simple context-specificity of expression ptterns It is resonble to ssume tht different regultory progrms re employed by different biologicl processes nd tht specific subsets of regultory progrms re needed to respond to given type of perturbtion. Some regultory progrms will respond to ll the perturbtions vilble within the globl dtset while others will respond to none, one or limited number of perturbtions. Tht is, some genes will be co-regulted on globl scle, while others (perhps most) will be co-regulted on locl scle. We define globl clustering structure on set of gene expression profiles by sying tht two genes belong to the sme globl clusters shre common pttern of expression in ll of the exmined dtsets. On the other hnd, we define locl clustering structure s groups of genes tht shre common expression pttern within subset ( context) of the dt but which do not group together when exmined globlly. B Cluster 1 Cluster Cluster 3 Cluster 4 Figure 1 shows n exmple of the type of structure we might resonbly expect within gene expression dt nd the informtion (groupings) we would like to recover. For clrity, the exmple provided is overly simplified. It shows only three dt sets (contexts), 0 genes, 4 globl clusters nd two locl clusters within ech context. Expression level is either high (coded light gry) or low (blck). Locl clusters within ech context consist of genes tht re either high or low within this context. For exmple, globl Cluster1 is formed of genes tht re high within Context 1 nd low within Contexts nd 3. On the other hnd, Context 1 does not contribute ny informtion bout differences between globl Clusters 1 nd 4. We wish to be ble to seprte ptterns in such globl clusters even in the presence of dt from mny other noisy or non-informtive contexts. We construct Byesin hierrchicl model tht describes the probbility distribution of the dt so tht locl nd globl clusters cn be identified.. Context Specific Infinite Mixture Model (CSIMM) Suppose tht expression ws mesured for T genes cross M experimentl conditions. If x ij is the expression level of the i th gene for the j th experimentl condition, then x i =(x i1, x i,, x im ) denotes the complete expression profile for the i th gene. Ech gene expression profile cn be viewed s being generted by one out of Q different underlying expression ptterns. Expression profiles generted by the sme pttern form cluster of similr expression profiles. If c i is the clssifiction vrible indicting the pttern tht genertes the i th expression profile (c i =q mens tht the i th expression profile ws generted by the q th pttern), then clustering is defined by set of clssifiction vribles for ll gene expression profiles C=(c 1, c,, c T ). In our model, the q th pttern is represented by the men vector of the M-dimensionl Gussin distribution µ q =(µ q1,, µ qm ). Profiles clustering together (i.e. belong to the sme pttern) re ssumed to be rndom smple from the sme multivrite Gussin distribution. Tht is, c i =q implies tht x i ~ N M (µ q,σ q ), where Σ q is the vrince-covrince mtrix of the M-dimensionl multivrite Gussin distribution. Suppose further tht ech gene profile is prtitioned into R subprofiles. Without loss of generlity we cn ssume tht the first r 1 experimentl conditions form the first sub-profile, experimentl conditions r 1 +1 to r 1 +r form the second sub-profile, etc. Tht is x i =(x 1 i,,x R i ) where x j i = (xi r',...,x ) j+ 1 i r' j+ r nd r j =r 1 + +r j-1. j The two most extreme cses re when R=M nd R=1. The cse of R=1 is equivlent to the simple clustering in which the context structure is not defined. The cse when R=M represents the sitution when ech microrry hybridiztion represents seprte context. The locl structure of the co-expression ptterns is specified by the Q by R mtrix L=(L qf ), where L qf =t if globl cluster q is plced in locl cluster t within context f. Thus, within ech context, we crete group of loclly indistinguishble globl clusters. All gene expression profiles contined in globl clusters tht re indistinguishble within context form locl cluster of genes which re co-expressed within this context. The joint posterior distribution of ll prmeters in the model, including the globl nd locl clustering vribles C nd L, given dt is estimted using the Gibbs smpler (Gelfnd nd Smith 1990). The clusters of globlly nd loclly co-expressed genes re formed bsed on the mrginl posterior distributions of the clssifiction vribles C nd L. Summrizing the smple of cluster-

3 Context-specific infinite mixtures for clustering gene expression profiles cross diverse microrry dtset ings generted by the Gibbs smpler in mixture models is generlly non-trivil problem. We circumvent this problem by clculting posterior pirwise probbilities (PPPs) of co-expression for genes i nd j s the proportion of the smples in which these two genes re clustered together (Medvedovic nd Sivgnesn 00). We then use these probbilities s the similrity mesure to hierrchiclly cluster gene expression profiles by pplying the verge linkge principle. The mthemticl specifiction of the model describing the distribution of the dt nd the specifics of the Gibbs smpler re given in the Appendix. All conditionl probbility distributions needed to run the Gibbs smpler re given in the Supplementl Mterils..3 Implementtion Computtionl procedures for performing CSIMM-bsed clustering re implemented in stndlone pckge gimm. The pckge consists of C++ code, simple Jv-bsed gui nd instlltion scripts, nd it is vilble for both Linux/Unix nd Windows pltforms. The Windows version is vilble s self-instlling pckge. The softwre genertes.cdt nd.gtr files defining the hierrchicl clustering tht cn be viewed nd nlyzed using the treeview progrm (Eisen et l. 1998). The Linux C++ code is designed to exploit the OpenMP prlleliztion when pproprite compiler is instlled. For Linux, we lso developed the R pckge wrpper tht fcilittes using gimm within R. All pckges nd the source code cn be freely downloded from We discuss the computtionl complexity of the lgorithm in the Web Supplement. 3 RESULTS 3.1 Simultion Study The study ws designed to compre different clustering procedures bsed on their bility to correctly seprte simulted expression profiles into different clusters in repeted experiments. The problem is treted in the trditionl sttisticl hypothesis-testing frmework of ssessing the probbility tht procedure will correctly conclude tht two expression profiles re generted by distinct ptterns of expression (i.e. belong to two different clusters) while controlling the probbility of flsely concluding tht two profiles belong to different clusters when they re ctully generted by the sme pttern. Unlike trditionl sttisticl hypothesis testing procedure, we do not supply the lbels for profiles tht re being compred. We simulted dt representing the structure depicted in Figure 1 where the het mp ws tken to represent the vlues of the men expression profiles in the corresponding cluster. Low expression levels (blck) were set to 0 nd the high expression levels (gry) were set to 1. For exmple, in ech dtset, profile g1 ws rndomly drwn from the 15-dimensionl Gussin rndom distribution whose men vector is equl to 1 in first 5 dimensions (e1,,e5) nd 0 in other 10 dimensions (e6,,e15). Dt ws simulted for different level of noise (σ). The selected rnge of rndom noise llowed us to ssess the performnce of different pproches in esy nd progressively more difficult (i.e. noisier) situtions. 100 dtsets were generted for ech noise level. We re focusing on the bility of method to seprte profiles in Cluster1 from profiles in Cluster. This is the most difficult spect of the nlysis since Cluster1 hs only two profiles nd differs from Cluster only on 5 out of 15 experimentl conditions. Methods re tested bsed on their bility to correctly conclude tht profiles in Cluster1 re different from profiles with Cluster. In sense we re ssessing the power of clustering procedures to conclude tht profiles in Cluster1 re different from profiles in Cluster. If we knew which profiles cme from which cluster, we could perform simple test of hypothesis to decide one wy or nother. In the unsupervised sitution, we do not supply the membership but the gol is still the sme. The performnce of different clustering procedures ws ssessed by constructing Receiver Operting Chrcteristic (ROC) curves tht relte the probbility of the clustering method to correctly seprte profiles from different clusters nd the probbility of incorrectly seprting profiles from the sme clusters. True Positive Rte σ = 0.3 Simple Infinite Mixtures Euclidin Distnce Context-Specific IMM σ = 0.5 Flse Positive Rte Fig.. ROC curves for different clustering pproches Let X be the posterior probbility cut-off for seprting profiles in Cluster1 from Cluster. For fixed cut-off point X, we consider tht the clustering procedure is correctly concluding tht profile from Cluster1 does not belong to Cluster if its posterior pirwise probbility of co-clustering with ny single profile from Cluster is less thn X. Tht is, mx{p(c i =c j for ll profiles j from Cluster}<X, where p(c i =c j ) denotes the posterior pirwise probbility of co-expression for profiles i nd j. We consider tht the clustering procedure is incorrectly concluding tht profile 1 nd profile from Cluster1 do not belong in the sme cluster if p(c 1 =c )<X. The true positive rte (TPR) is the proportion of times tht correct decision is mde nd the flse positive rte (FPR) is the proportion of times tht n incorrect decision is mde. As the cut-off X is incresed from 0 to 1, both TPR nd the FPR will increse. The re under the curve relting the TPR nd FPR s X is incresed from 0 to 1 describes the efficiency of sttisticl procedure with the rndom decision-mking hving n re of 0.5 while the idel sttisticl procedure would hve n re equl to 1. ROC curves in Figure indicte tht context-specific infinite mixtures model σ = σ = 3

4 X. Liu et l. significntly outperforms simple mixture model in its bility to seprte different ptterns of expression while controlling the flse positive rtes. The difference between the simple infinite mixture nd context-specific infinite mixtures is clerly due to the better representtion of the underlying ptterns offered by the contextspecific model. Furthermore, over- nd under-fitting the dt by specifying too mny or too few context hs the expected consequences on the clustering results (Figure S1 in the Supplement). When plcing ech experiment in seprte context (over-fitting), the performnce is ctully worse thn for the simple model. Filing to specify ll contexts (two out of three) cuses reduction in the performnce of the context-specific model, but it still outperforms the simple model. Posterior pirwise probbilities re vlid mesures of sttisticl significnce: In Figure 3 we plotted observed flse positive rtes ginst corresponding sttisticl significnce levels from the CSIMM nlysis. Given significnce level α, ll gene-pirs whose PPP ws lower thn α were ssumed to belong to different clusters. As the empiricl flse positive rtes re lwys less thn α, we conclude tht PPPs bsed on CSIMM re vlid mesures of sttisticl significnce t ll noise level. This is lso true for the simple IMM, but not for the finite mixtures model (Figure S in the Supplement). Additionlly we performed similr nlysis on 100 dt sets in which ll dt points were generted from the single probbility distribution representing the situtions when there is no clustering structure in the dt (Rndom). As it cn be seen from the virtully perfect 45 degrees line, PPPs correctly protect ginst Type I errors when there re no ptterns in the dt. procedure ws interpreted s the mesure of the precision for clustering procedure. We constructed the test dtset by combining two microrry experiments ssessing two distinct yet relted biologicl processes. The first dtset is the yest sporultion dtset (Primig et l. 000) consisting of gene expression mesurements t 8 nd 7 time points throughout the sporultion process for two sporultion-competent yest strins SK1 nd W303 respectively. The second dtset is the cell-cycle (Cho et l. 1998) dtset consisting of gene expression mesurements t 17 time-points spnning two complete yest cell cycles. The two dt sets were mtched by identifying 6044 ORFs represented on both of the two versions of the Affymetrix microrrys used in these experiments. Dt ws mildly processed by setting ny mesurement below 1 to 1, log-trnsforming it nd centering ech gene s expression profile round zero for the two experiments seprtely. Genes which never reched the signl of 100 were excluded from the nlysis resulting in the totl of 5685 genes remining ORFs represented genes ssocited with t lest one KEGG pthwy. A σ = 0.3 σ = σ = 0.5 σ = Rndom B 1. Significnce Level α Fig. 3. Posterior probbilities s mesures of sttisticl significnce. The Rndom scenrio correspond to the sitution in which ll profiles were generted by the sme multivrite norml probbility distribution. 3. Yest cell-cycle nd sporultion dt Compring the performnce of different clustering procedures on the rel-world dt is complicted by the lck of gold-stndrd (i.e. the correct clustering ). We ssessed our clustering results by forming functionl groupings of genes bsed on the informtion vilble in the KEGG dtbse of biologicl pthwys (Knehis et l. 004). The strength of ssocition between such functionl clusters nd clusters of co-expressed genes formed by clustering Fig. 4. ROC curves compring the performnce of different clustering pproches on the joint sporultion nd cell cycle dtset. A) The curve relting true positive nd flse positive rtes. B) The curve relting ctul numbers of true positive nd flse positive pirs of co-clustered genes. 4

5 Context-specific infinite mixtures for clustering gene expression profiles cross diverse microrry dtset Dt from the two experiments ws clustered seprtely nd jointly using the simple IMM pproch, CSIMM nd Eucliden distnce-bsed hierrchicl clustering (EDHC). For ech hierrchicl clustering, the tree ws cut to crete 1 to 5685 clusters. For fixed number of clusters pir of genes (from the 1044 genes ssigned to t lest one pthwy) belonging to the sme cluster ws ssumed to be true positive if the two genes both belonged to t lest one specific KEGG pthwy, nd it ws considered to be flse positive if they did not shre single KEGG pthwy. True nd flse positive rtes were then obtined by dividing the number of true/flse positives with the totl number of gene pirs shring common KEGG pthwy nd totl number of gene pirs not shring KEGG pthwy respectively. When ll genes re plced in their own individul clusters (5685 clusters), both true nd flse positive rtes re equl to zero. As we reduce the number of clusters, both true nd flse positive rtes increse defining ROC curve. At the extreme when ll genes re plced in the sme cluster, both true nd flse positive rtes re equl to one. ROC curves SK1 0h 1h W303 0h 1h Cell cycle 0 160min Ribosome Cell cycle Purine metbolism Pyrmidine metoblism DNA Polymerse Fig. 5. Gene expression levels (green-red het mp) nd KEGG pthwys memberships (blue het-mp) for 54 genes which were co-clustered with t lest one other gene fter cutting CSIMM-bsed tree t the verge linkge distnce of 5. derived in such wy for ech dtset/method combintion for the sttisticlly relevnt flse-positive rtes (less thn 5) re shown in Figure 4A. The globl clustering methods, IMM nd EDHC, both performed worse for the joint dt nlysis thn using the cell-cycle dt lone. The ROC curve for the CSIMM indicted tht this method ws ble to integrte informtion from both dtsets into single more precise nlysis. The behvior of the overfitted model in which ech microrry is treted s seprte context is inline with the behvior observed in the nlysis of simulted dt. Due to the strong imblnce between the totl number of positive pirs 30,336 nd negtive pirs 513,067, reltively low FPR still results in lrge number of flse positive pirs in comprison to the number of true positive pirs. Therefore, we exmined more closely the behvior of different clustering procedures by relting the bsolute number of flse nd true positive pirs of genes (Figure 4B). The improvements in precision of the CSIMM over competing pproches when looking t this outcome for less then 10 flse positive pirs is drmtic. Clusters of co-expressed genes t the highest rtio of true to flse positives (191.6 with 5 flse nd 958 true positive pirs) long with corresponding KEGG pthwys re displyed in Figure 5. The highest rtio of true to flse positive ws chieved when cutting the tree t the verge linkge distnce of 5. Genes were included in the het mp if they were ssigned to t lest one KEGG pthwy nd were co-clustered with t lest one other gene from KEGG. The KEGG pthwys implicted by these ptterns re clerly relted to the biologic processes under investigtion (sporultion nd cell cycle). Although our gold stndrd bsed on KEGG implicted 5 flse positive pirs, closer exmintion of the genes in two clusters on the top of the het mp (RNR1, MCD1, POL1, CDC45 nd POL1) revels tht the ctivity of ll these genes is tightly regulted during the DNA repliction process. This indictes tht t this level of resolution, CSIMM cretes perfect groupings of functionlly relted genes from KEGG. Interestingly, context-specific model for the cell-cycle dt, in which gene expression profiles re split in two distinct cell-cycles performed better thn the simple model when nlyzing the cellcycle dt lone (Figure S3 in the supplement). This could be consequence of issues previously rised bout the synchroniztion of cells in different microrry experiments chrcterizing gene expression signtures of cell cycle (Cooper nd Shedden 003). We exmined broder ptterns of expression implicted t this level of significnce by clustering ll 135 genes tht were coclustered with t lest one other gene fter cutting the tree t the verge linkge distnce of 5 regrdless of their KEGG membership (Figure S4 in the supplement). In ddition to KEGG pthwy memberships, we exmined correltions of the clusters generted by CSIMM with trnscription fctors shown to bind their promoters in Chip-on-Chip experiments (Lee et l. 00). The hierrchicl tree in Figure S4 ws cut in 8 clusters. Six of these clusters hd more thn two genes nd they were tested for overrepresen ttion of genes whose promoters re substrtes of ny one single trnscription fctor using the Fisher s exct test. Eight trnscription fctors were significntly ssocited with t lest one of the cluster nd their functionl roles re closely relted to the biologicl processes exmined, s well s the KEGG pthwy ssocitions, lnding dditionl credibility to the clusters identified by CSIMM. In comprison, cutting the tree formed by the Euclidindistnce bsed hierrchicl clustering to obtin 135 genes tht were 5

6 X. Liu et l. co-clustered with t lest one other gene generted diffused ptterns without ny obvious clustering structure (Figure S5 in the supplement). Seprte nlyses of cell-cycle nd sporultion dt offered similr picture (Figures S6, S7, S8, S9 in the supplement). Fig. 6. Empiricl flse discovery rte s function of the verge linkge distnce used to cut the CSIMM-bsed hierrchicl clustering tree. Finlly, we investigted the vlidity of PPP-derived significnce levels in deciding which clusters of genes re sttisticlly significnt. This ws ssessed by exmining the proportion of flse positive co-clusterings in clusters obtined by cutting the hierrchicl tree t different levels of verge-linkge distnces derived from posterior pirwise probbilities of co-expression. If the tree ws cut t the similrity level d, the verge PPP between ech gene in cluster nd ll other genes in the sme cluster is greter thn (1-d). In the sme time the flse discovery rtes (FDR) re clculted s the proportion of implicted pirs of co-expressed genes when the tree is cut t the verge-linkge distnce d which lso shred t lest one KEGG pthwy out of ll pirs implicted. Plotting the flse discovery rtes ginst different d s (Figure 6) indictes tht d s very well pproximte the empiricl flse discovery rtes. 4 DISCUSSION The most importnt distinguishing feture of the model described here lies in its bility to circumvent the difficult problem of identifying the correct number of locl nd/or globl ptterns in the dt. Previously described context-specific models relied on different versions of penlized likelihood scores to estimte the correct number of ptterns in the dt. There re some obvious dvntges of being ble to identify the single most likely number of clusters. However we previously demonstrted tht our modelverging results in more ccurte nlysis thn the clustering procedure in which the correct number of clusters is estimted from dt. Here we further demonstrte tht posterior distribution of clusterings offers credible ssessment of sttisticl significnce of identified clusters nd devise prcticl pproch for identifying sttisticlly significnt ptterns in the dt. This lso simplifies the use of the model-bsed clustering since the whole procedure resembles simple hierrchicl pproches. The notion of context specificity introduced in our model is different from the two previously proposed context-specificity definitions. In the context-specific finite mixture model introduced by (Brsh nd Friedmn 00), ll uninformtive mesurements within context re plced into single defult cluster. CSIMM insted forms distinct groups of globl clusters within ech context. The module-network described by (Segl et l. 003) introduces notion of context-specificity in which contexts re defined differently for different clusters nd the distribution of ll mesurements within the sme cluster nd context re represented by the univrite Gussin distribution. These two methods lso fcilitte estimtion nd modeling of the most likely context structure while CSIMM t this point requires context structure to be specified in dvnce. On the other hnd, CSIMM uses globlly defined contexts which re identicl for ll clusters, nd the ptterns within different contexts re described by multivrite Gussin distributions. The distinction between univrite vs multivrite definition of locl ptterns seems to be prticulrly importnt in the situtions when distinct locl clusters describe complex ptterns such s the time series or dose-response dt. ACKNOWLEDGEMENTS The development of sttisticl models presented here hs been supported by the grnt 1R1HG00849 from NHGRI. Yeung is supported by NIH-NCI 1K5CA REFERENCES Yeung,K.Y. et l. (004) Pttern Recognition in Gene Expression Dt. Recent Devel.Nucleic Acids Res. 1, Brsh,Y. et l. (00) Context-specific byesin clustering for gene expression dt. J Comput Biol. 9[], McLchln,G.J. et l. (00) A mixture model-bsed pproch to the clustering of microrry expression dt. Bioinformtics 18[3], Segl,E. et l. (003) Module networks: identifying regultory modules nd their condition-specific regultors from gene expression dt. Nt.Genet. 34, Medvedovic,M. et l. (004) Byesin Model-Averging in Unsupervised Lering From Microrry Dt. BIOKDD 004. Segl,E. et l. (004) A module mp showing conditionl ctivity of expression modules in cncer. Nt Genet. 36[10], Sturt,J.M. et l. (003) A gene-coexpression network for globl discovery of conserved genetic modules. Science 30[5643], Medvedovic,M. et l. (00) Byesin infinite mixture model bsed clustering of gene expression profiles. Bioinformtics 18[9], Medvedovic,M. et l. (004) Byesin mixture model bsed clustering of replicted microrry dt. Bioinformtics. 0[8], Nel,R.M. (000) Mrkov Chin Smpling Methods for Dirichlet Process Mixture Models. Journl of Computtionl nd Grphicl Sttistics 9, Gelfnd,E.A. et l. (1990) Smpling-bsed pproches to clculting mrginl densities. Journl of The Americn Sttisticl Assocition 85, Eisen,M.B. et l. (1998) Cluster nlysis nd disply of genome-wide expression ptterns. Proc.Ntl.Acd.Sci.U.S.A 95[5], Knehis,M. et l. (004) The KEGG resource for deciphering the genome. Nucleic Acids Res. 3 Dtbse issue, D77-D80. Primig,M. et l. (000) The core meiotic trnscriptome in budding yests. Nt.Genet. 6[4], Cho,R.J. et l. (1998) A genome-wide trnscriptionl nlysis of the mitotic cell cycle. Mol.Cell [1], Cooper,S. et l. (003) Microrry nlysis of gene expression during the cell cycle. Cell Chromosome. [1], 1. Lee,T.I. et l. (00) Trnscriptionl regultory networks in Scchromyces cerevisie. Science 98[5594], Gelmn,A. et l. (003) Byesin Dt Anlysis. Second. Cowell,R.G. et l. (1999) Probbilistic Networks nd Expert Systems. APPENDIX: CSIMM MODEL 6

7 Context-specific infinite mixtures for clustering gene expression profiles cross diverse microrry dtset The sttisticl model describing the distribution of the dt is given in the form of Byesin hierrchicl model (Gelmn et l. 003). Dependencies between vrious model prmeters nd the dt re defined by the Directed Acyclic Network (Cowell et l. 1999) in Figure 7. Nodes in the network represent rndom vribles nd rcs define the independence structure of the joint probbility distribution function. Assuming tht the probbility distribution of ny node is independent of its non-descendnts if vlues of the prent nodes re given (Directed Mrkov Assumption), the joint probbility distribution of ll prmeters nd dt is given by the product of the locl probbility distributions of individul rndom vribles given their prents. p(x, C, L, M, M, S, α,, λ, τ, β, φ) = p(x C, M, S)p(C α)p(m L,M )p(s β, φ) p(l C,)p(M λ, τ)p(α)p()p(λ)p(τ)p(β)p(φ) M={µ 1,,µ Q } is the set of ll men vectors ssocited with Q globl ptterns, S={Σ 1,,Σ Q } is the set of corresponding vrincecovrince mtrices, M = {( µ,..., µ ),..., ( µ,..., µ )} is 11 K 1 1 1R KRR { 1,..., Σ R the set of ll locl men vectors, S = Σ } is the set of corresponding vrince-covrince mtrices nd K f is the number of locl groupings of globl clusters within context f. α,, λ, τ, β nd φ re hyperprmeters for C, L, M, nd S respectively. The probbility distribution of the expression dt vector for gene i, given its clssifiction vrible c i, globl mens M nd the vrincecovrince mtrices S is p ( xi ci = q, M, S) = f N ( xi µ q,σq ), where f N (. µ,σ) is the multivrite Gussin probbility distribution function with men µ nd vrince-covrince mtrix Σ. All vrince-covrince mtrices in the model re context-specific nd digonl. Tht is Σ q is the block digonl mtrix with contextspecific digonl mtrices σ I on the digonl. α C Fig. 7. Context-specific infinite mixtures tf The probbility distribution of the globl men vector µ q, given the locl structure L nd the locl prmeters M nd S is p( ( µ q1, µ q L,..., µ qr ) L, M,σ ) = x i1 (λ, τ) (M,S ) M x ig fn ( µ q1 µ L 1, Σ1 ) f ( µ q µ L, )... ( µ qr µ L R, Σ ) q1 N Σ f q N qr where µ qf is the subvector of the globl men µ q on the f th context. (β, φ) S R Prior distributions for the locl groupings L re defined following the infinite mixtures pproch tht voids the specifiction of the correct number of groups of locl clusters for ech context (Medvedovic et l. 004; Medvedovic nd Sivgnesn 00). The probbility of ssigning the globl cluster q to n lredy existing group of clusters t within the context f, given C nd, is given n-qft by p (Lqf = t C,), t=1,..,q, where n -qft is the number of globl clusters currently plced in locl cluster t within context f without counting globl cluster q. The probbility of ssigning globl cluster to new locl group is given by p ( Lqf Lq' f, q' q C,). The rest of the locl conditionl probbility distributions, the structure of vrince- covrince mtrices nd hyperprmeters re stted in the supplementl mteril. The joint posterior distribution of ll prmeters in the model given dt is estimted using Gibbs smpler. Gibbs smpler (Gelfnd nd Smith 1990) is generl procedure for smpling observtions from multivrite distribution. It proceeds by itertively drwing observtions from complete conditionl distributions of ll components given the current vlues of ll other components. Under mild condition, the distribution of generted multivrite observtions converges to the trget multivrite distribution. The Gibbs smpler employed here is derived from previously described lgorithms for fitting infinite mixture models. Conditionl posterior distributions for M, M, nd L re derived ssuming tht Σ f = ( σ q ) I nd by letting σ f 0 within ll contexts. This effectively forces ll globl cluster mens grouped together within context to be identicl within this context. Consequently, insted of estimting mens nd vrinces for ech of the Q globl clusters within ech context f, we estimte only K f <Q locl prmeters. The posterior distributions for the locl clssifiction vribles, conditionl on ll other prmeters in the model re: n -qft q σ p(l t,,) ( tf, tf qf = C X fn xf µ I) n q σ σ p(l L, q' q C,) ( tf, tf ft, tf qf q' f xf µ I) (µ )d( µ tf,σ tf ) fn p, nq nq where f xi q ci = q x f =. n q All other conditionl posterior distributions re similr to the simple infinite mixture models (Medvedovic et l. 004; Medvedovic nd Sivgnesn 00) nd re given in the web supplement. The Gibbs smpler is initilized smpling ll model prmeters from their respective prior distributions, nd plcing ll globl gene expression profiles into single cluster. The Gibbs smpler proceeds to smple first globl clusters, then locl groupings of globl clusters within ech context nd then the rest of the prmeters in the model. To llevite the problem of slow mixing, we pply heuristic nneling djustment described in the Web Supplement. Previously, we demonstrted tht such modifictions preserve the topology of the posterior distribution of clusterings (Medvedovic et l. 004). q 7

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses Chpter 9: Inferences bsed on Two smples: Confidence intervls nd tests of hypotheses 9.1 The trget prmeter : difference between two popultion mens : difference between two popultion proportions : rtio of

More information

Non-Linear & Logistic Regression

Non-Linear & Logistic Regression Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d Interntionl Industril Informtics nd Computer Engineering Conference (IIICEC 15) Driving Cycle Construction of City Rod for Hybrid Bus Bsed on Mrkov Process Deng Pn1,, Fengchun Sun1,b*, Hongwen He1, c,

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

A signalling model of school grades: centralized versus decentralized examinations

A signalling model of school grades: centralized versus decentralized examinations A signlling model of school grdes: centrlized versus decentrlized exmintions Mri De Pol nd Vincenzo Scopp Diprtimento di Economi e Sttistic, Università dell Clbri m.depol@unicl.it; v.scopp@unicl.it 1 The

More information

Lecture INF4350 October 12008

Lecture INF4350 October 12008 Biosttistics ti ti Lecture INF4350 October 12008 Anj Bråthen Kristoffersen Biomedicl Reserch Group Deprtment of informtics, UiO Gol Presenttion of dt descriptive tbles nd grphs Sensitivity, specificity,

More information

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

Credibility Hypothesis Testing of Fuzzy Triangular Distributions 666663 Journl of Uncertin Systems Vol.9, No., pp.6-74, 5 Online t: www.jus.org.uk Credibility Hypothesis Testing of Fuzzy Tringulr Distributions S. Smpth, B. Rmy Received April 3; Revised 4 April 4 Abstrct

More information

Comparison Procedures

Comparison Procedures Comprison Procedures Single Fctor, Between-Subects Cse /8/ Comprison Procedures, One-Fctor ANOVA, Between Subects Two Comprison Strtegies post hoc (fter-the-fct) pproch You re interested in discovering

More information

Math 426: Probability Final Exam Practice

Math 426: Probability Final Exam Practice Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by

More information

Design and Analysis of Single-Factor Experiments: The Analysis of Variance

Design and Analysis of Single-Factor Experiments: The Analysis of Variance 13 CHAPTER OUTLINE Design nd Anlysis of Single-Fctor Experiments: The Anlysis of Vrince 13-1 DESIGNING ENGINEERING EXPERIMENTS 13-2 THE COMPLETELY RANDOMIZED SINGLE-FACTOR EXPERIMENT 13-2.1 An Exmple 13-2.2

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Testing categorized bivariate normality with two-stage. polychoric correlation estimates

Testing categorized bivariate normality with two-stage. polychoric correlation estimates Testing ctegorized bivrite normlity with two-stge polychoric correltion estimtes Albert Mydeu-Olivres Dept. of Psychology University of Brcelon Address correspondence to: Albert Mydeu-Olivres. Fculty of

More information

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading Dt Assimiltion Aln O Neill Dt Assimiltion Reserch Centre University of Reding Contents Motivtion Univrite sclr dt ssimiltion Multivrite vector dt ssimiltion Optiml Interpoltion BLUE 3d-Vritionl Method

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

Robust Predictions in Games with Incomplete Information

Robust Predictions in Games with Incomplete Information Robust Predictions in Gmes with Incomplete Informtion Dirk Bergemnn nd Stephen Morris April 2011 Introduction in gmes of incomplete informtion, privte informtion represents informtion bout: pyo environment

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Classification Part 4. Model Evaluation

Classification Part 4. Model Evaluation Clssifiction Prt 4 Dr. Snjy Rnk Professor Computer nd Informtion Science nd Engineering University of Florid, Ginesville Model Evlution Metrics for Performnce Evlution How to evlute the performnce of model

More information

13: Diffusion in 2 Energy Groups

13: Diffusion in 2 Energy Groups 3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups

More information

Designing Information Devices and Systems I Spring 2018 Homework 7

Designing Information Devices and Systems I Spring 2018 Homework 7 EECS 16A Designing Informtion Devices nd Systems I Spring 2018 omework 7 This homework is due Mrch 12, 2018, t 23:59. Self-grdes re due Mrch 15, 2018, t 23:59. Sumission Formt Your homework sumission should

More information

Markscheme May 2016 Mathematics Standard level Paper 1

Markscheme May 2016 Mathematics Standard level Paper 1 M6/5/MATME/SP/ENG/TZ/XX/M Mrkscheme My 06 Mthemtics Stndrd level Pper 7 pges M6/5/MATME/SP/ENG/TZ/XX/M This mrkscheme is the property of the Interntionl Bcclurete nd must not be reproduced or distributed

More information

Probability Distributions for Gradient Directions in Uncertain 3D Scalar Fields

Probability Distributions for Gradient Directions in Uncertain 3D Scalar Fields Technicl Report 7.8. Technische Universität München Probbility Distributions for Grdient Directions in Uncertin 3D Sclr Fields Tobis Pfffelmoser, Mihel Mihi, nd Rüdiger Westermnn Computer Grphics nd Visuliztion

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

For the percentage of full time students at RCC the symbols would be:

For the percentage of full time students at RCC the symbols would be: Mth 17/171 Chpter 7- ypothesis Testing with One Smple This chpter is s simple s the previous one, except it is more interesting In this chpter we will test clims concerning the sme prmeters tht we worked

More information

Estimation of Binomial Distribution in the Light of Future Data

Estimation of Binomial Distribution in the Light of Future Data British Journl of Mthemtics & Computer Science 102: 1-7, 2015, Article no.bjmcs.19191 ISSN: 2231-0851 SCIENCEDOMAIN interntionl www.sciencedomin.org Estimtion of Binomil Distribution in the Light of Future

More information

Probabilistic Investigation of Sensitivities of Advanced Test- Analysis Model Correlation Methods

Probabilistic Investigation of Sensitivities of Advanced Test- Analysis Model Correlation Methods Probbilistic Investigtion of Sensitivities of Advnced Test- Anlysis Model Correltion Methods Liz Bergmn, Mtthew S. Allen, nd Dniel C. Kmmer Dept. of Engineering Physics University of Wisconsin-Mdison Rndll

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction Lesson : Logrithmic Functions s Inverses Prerequisite Skills This lesson requires the use of the following skills: determining the dependent nd independent vribles in n exponentil function bsed on dt from

More information

Lecture Note 9: Orthogonal Reduction

Lecture Note 9: Orthogonal Reduction MATH : Computtionl Methods of Liner Algebr 1 The Row Echelon Form Lecture Note 9: Orthogonl Reduction Our trget is to solve the norml eution: Xinyi Zeng Deprtment of Mthemticl Sciences, UTEP A t Ax = A

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

A Brief Review on Akkar, Sandikkaya and Bommer (ASB13) GMPE

A Brief Review on Akkar, Sandikkaya and Bommer (ASB13) GMPE Southwestern U.S. Ground Motion Chrcteriztion Senior Seismic Hzrd Anlysis Committee Level 3 Workshop #2 October 22-24, 2013 A Brief Review on Akkr, Sndikky nd Bommer (ASB13 GMPE Sinn Akkr Deprtment of

More information

2008 Mathematical Methods (CAS) GA 3: Examination 2

2008 Mathematical Methods (CAS) GA 3: Examination 2 Mthemticl Methods (CAS) GA : Exmintion GENERAL COMMENTS There were 406 students who st the Mthemticl Methods (CAS) exmintion in. Mrks rnged from to 79 out of possible score of 80. Student responses showed

More information

Natural examples of rings are the ring of integers, a ring of polynomials in one variable, the ring

Natural examples of rings are the ring of integers, a ring of polynomials in one variable, the ring More generlly, we define ring to be non-empty set R hving two binry opertions (we ll think of these s ddition nd multipliction) which is n Abelin group under + (we ll denote the dditive identity by 0),

More information

Lecture 21: Order statistics

Lecture 21: Order statistics Lecture : Order sttistics Suppose we hve N mesurements of sclr, x i =, N Tke ll mesurements nd sort them into scending order x x x 3 x N Define the mesured running integrl S N (x) = 0 for x < x = i/n for

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

A027 Uncertainties in Local Anisotropy Estimation from Multi-offset VSP Data

A027 Uncertainties in Local Anisotropy Estimation from Multi-offset VSP Data A07 Uncertinties in Locl Anisotropy Estimtion from Multi-offset VSP Dt M. Asghrzdeh* (Curtin University), A. Bon (Curtin University), R. Pevzner (Curtin University), M. Urosevic (Curtin University) & B.

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

1 Probability Density Functions

1 Probability Density Functions Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our

More information

Multivariate problems and matrix algebra

Multivariate problems and matrix algebra University of Ferrr Stefno Bonnini Multivrite problems nd mtrix lgebr Multivrite problems Multivrite sttisticl nlysis dels with dt contining observtions on two or more chrcteristics (vribles) ech mesured

More information

LECTURE 14. Dr. Teresa D. Golden University of North Texas Department of Chemistry

LECTURE 14. Dr. Teresa D. Golden University of North Texas Department of Chemistry LECTURE 14 Dr. Teres D. Golden University of North Texs Deprtment of Chemistry Quntittive Methods A. Quntittive Phse Anlysis Qulittive D phses by comprison with stndrd ptterns. Estimte of proportions of

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4

More information

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System SPIE Aerosense 001 Conference on Signl Processing, Sensor Fusion, nd Trget Recognition X, April 16-0, Orlndo FL. (Minor errors in published version corrected.) A Signl-Level Fusion Model for Imge-Bsed

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Predict Global Earth Temperature using Linier Regression

Predict Global Earth Temperature using Linier Regression Predict Globl Erth Temperture using Linier Regression Edwin Swndi Sijbt (23516012) Progrm Studi Mgister Informtik Sekolh Teknik Elektro dn Informtik ITB Jl. Gnesh 10 Bndung 40132, Indonesi 23516012@std.stei.itb.c.id

More information

CHM Physical Chemistry I Chapter 1 - Supplementary Material

CHM Physical Chemistry I Chapter 1 - Supplementary Material CHM 3410 - Physicl Chemistry I Chpter 1 - Supplementry Mteril For review of some bsic concepts in mth, see Atkins "Mthemticl Bckground 1 (pp 59-6), nd "Mthemticl Bckground " (pp 109-111). 1. Derivtion

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

3.4 Numerical integration

3.4 Numerical integration 3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Decision Science Letters

Decision Science Letters Decision Science Letters 8 (09) 37 3 Contents lists vilble t GrowingScience Decision Science Letters homepge: www.growingscience.com/dsl The negtive binomil-weighted Lindley distribution Sunthree Denthet

More information

Session 13

Session 13 780.20 Session 3 (lst revised: Februry 25, 202) 3 3. 780.20 Session 3. Follow-ups to Session 2 Histogrms of Uniform Rndom Number Distributions. Here is typicl figure you might get when histogrmming uniform

More information

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that Arc Length of Curves in Three Dimensionl Spce If the vector function r(t) f(t) i + g(t) j + h(t) k trces out the curve C s t vries, we cn mesure distnces long C using formul nerly identicl to one tht we

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS F. Tkeo 1 nd M. Sk 1 Hchinohe Ntionl College of Technology, Hchinohe, Jpn; Tohoku University, Sendi, Jpn Abstrct:

More information

Numerical Integration

Numerical Integration Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil

More information

Chapter 2 Fundamental Concepts

Chapter 2 Fundamental Concepts Chpter 2 Fundmentl Concepts This chpter describes the fundmentl concepts in the theory of time series models In prticulr we introduce the concepts of stochstic process, men nd covrince function, sttionry

More information

2.2 Background Correction / Signal Adjustment Methods

2.2 Background Correction / Signal Adjustment Methods 7 It is importnt to note tht this definition is somewht roder thn is often used in the wider community. Mny times only methods deling with the first prolem hve een referred to s ckground correction methods.

More information

USA Mathematical Talent Search Round 1 Solutions Year 21 Academic Year

USA Mathematical Talent Search Round 1 Solutions Year 21 Academic Year 1/1/21. Fill in the circles in the picture t right with the digits 1-8, one digit in ech circle with no digit repeted, so tht no two circles tht re connected by line segment contin consecutive digits.

More information

More precisely, given the collection fx g, with Eucliden distnces between pirs (; b) of ptterns: = p (x? x b ) ; one hs to nd mp, ' : R n distnce-erro

More precisely, given the collection fx g, with Eucliden distnces between pirs (; b) of ptterns: = p (x? x b ) ; one hs to nd mp, ' : R n distnce-erro Improved Multidimensionl Scling Anlysis Using Neurl Networks with Distnce-Error Bckpropgtion Llus Grrido (), Sergio Gomez () nd Jume Roc () () Deprtment d'estructur i Constituents de l Mteri/IFAE Universitt

More information

Multiscale Fourier Descriptor for Shape Classification

Multiscale Fourier Descriptor for Shape Classification Multiscle Fourier Descriptor for Shpe Clssifiction Iivri Kunttu, een epistö, Juhni Ruhm 2, nd Ari Vis Tmpere University of Technology Institute of Signl Processing P. O. Box 553, FI-330 Tmpere, Finlnd

More information

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model: 1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

A New Statistic Feature of the Short-Time Amplitude Spectrum Values for Human s Unvoiced Pronunciation

A New Statistic Feature of the Short-Time Amplitude Spectrum Values for Human s Unvoiced Pronunciation Xiodong Zhung A ew Sttistic Feture of the Short-Time Amplitude Spectrum Vlues for Humn s Unvoiced Pronuncition IAODOG ZHUAG 1 1. Qingdo University, Electronics & Informtion College, Qingdo, 6671 CHIA Abstrct:

More information

Joint distribution. Joint distribution. Marginal distributions. Joint distribution

Joint distribution. Joint distribution. Marginal distributions. Joint distribution Joint distribution To specify the joint distribution of n rndom vribles X 1,...,X n tht tke vlues in the smple spces E 1,...,E n we need probbility mesure, P, on E 1... E n = {(x 1,...,x n ) x i E i, i

More information

A New Grey-rough Set Model Based on Interval-Valued Grey Sets

A New Grey-rough Set Model Based on Interval-Valued Grey Sets Proceedings of the 009 IEEE Interntionl Conference on Systems Mn nd Cybernetics Sn ntonio TX US - October 009 New Grey-rough Set Model sed on Intervl-Vlued Grey Sets Wu Shunxing Deprtment of utomtion Ximen

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Section 11.5 Estimation of difference of two proportions

Section 11.5 Estimation of difference of two proportions ection.5 Estimtion of difference of two proportions As seen in estimtion of difference of two mens for nonnorml popultion bsed on lrge smple sizes, one cn use CLT in the pproximtion of the distribution

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees

A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees A generl frmework for estimting similrity of dtsets nd decision trees: exploring semntic similrity of decision trees Irene Ntoutsi Alexndros Klousis Ynnis Theodoridis Abstrct Decision trees re mong the

More information

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018 Physics 201 Lb 3: Mesurement of Erth s locl grvittionl field I Dt Acquisition nd Preliminry Anlysis Dr. Timothy C. Blck Summer I, 2018 Theoreticl Discussion Grvity is one of the four known fundmentl forces.

More information

Scientific notation is a way of expressing really big numbers or really small numbers.

Scientific notation is a way of expressing really big numbers or really small numbers. Scientific Nottion (Stndrd form) Scientific nottion is wy of expressing relly big numbers or relly smll numbers. It is most often used in scientific clcultions where the nlysis must be very precise. Scientific

More information

Estimation on Monotone Partial Functional Linear Regression

Estimation on Monotone Partial Functional Linear Regression A^VÇÚO 1 33 ò 1 4 Ï 217 c 8 Chinese Journl of Applied Probbility nd Sttistics Aug., 217, Vol. 33, No. 4, pp. 433-44 doi: 1.3969/j.issn.11-4268.217.4.8 Estimtion on Monotone Prtil Functionl Liner Regression

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

A new algorithm for generating Pythagorean triples 1

A new algorithm for generating Pythagorean triples 1 A new lgorithm for generting Pythgoren triples 1 RH Dye 2 nd RWD Nicklls 3 The Mthemticl Gzette (1998; 82 (Mrch, No. 493, pp. 86 91 http://www.nicklls.org/dick/ppers/mths/pythgtriples1998.pdf 1 Introduction

More information

Generalized Fano and non-fano networks

Generalized Fano and non-fano networks Generlized Fno nd non-fno networks Nildri Ds nd Brijesh Kumr Ri Deprtment of Electronics nd Electricl Engineering Indin Institute of Technology Guwhti, Guwhti, Assm, Indi Emil: {d.nildri, bkri}@iitg.ernet.in

More information

Section 14.3 Arc Length and Curvature

Section 14.3 Arc Length and Curvature Section 4.3 Arc Length nd Curvture Clculus on Curves in Spce In this section, we ly the foundtions for describing the movement of n object in spce.. Vector Function Bsics In Clc, formul for rc length in

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information