Performance of Different Algorithms on Clustering Molecular Dynamics Trajectories

Size: px
Start display at page:

Download "Performance of Different Algorithms on Clustering Molecular Dynamics Trajectories"

Transcription

1 Performance of Dfferent Algorthms on Clusterng Molecular Dynamcs Trajectores Chenchen Song Abstract Dfferent types of clusterng algorthms are appled to clusterng molecular dynamcs trajectores to get nsght about possble conformatons for molecules.. The algorthms covered nclude, (multvarate Gaussan), snglelnage, centrod-lnage, average-lnage, complete-lnage. Root-Mean-Square-Devaton( RMSD) s used as metrc. Performances of algorthms are analyzed and compared based on Daves-Bouldn ndex(dbi) and Pseudo-F statc (psf). 1. Introducton Molecular dynamcs smulaton methods produce trajectores of molecule confguraton snapshots as a functon of tme. The tme scale of chemcal process s ns, whle the tme scale of molecule nternal freedom s fs, thus a suffcent smulaton trajectory wll need to contan at least O(N 6 ) confguratons. For such large amounts of data, machne learnng algorthm becomes helpful to extract useful nformaton from the datasets. One type of nformaton we hope to get s the conformaton substates of a molecule, whch falls nto the clusterng problem. Usng clusterng algorthm to help analyze molecular dynamcs trajectores s actually not a new dea, and can date bac to Snce then, a number of papers about applyng dfferent types of clusterng algorthms on MD have been publshed. Thus, n ths project, I wll focus on comparng the performance of dfferent clusterng algorthm. 2. Method Detals of the methods have been carefully dscussed n the mlestone report. Here we only gve a bref revew. (1) Smlarty Metrc Instead of normal Eucldean norm, RMSD wll be used as metrc, whch frst tres to algn two molecule as much as possble before calculatng Eucldean dstance. By usng RMSD, we can elmnate the effect from translatonal and rotatonal moton of molecule. (2) Algorthm and multvarate Gaussan(wll be called for short) have been ntroduced n class. The lnage methods are dfferent n how the dstance between clusters s defned. Sngle(edge)-lnage uses the shortest nter-cluster pont-to-pont dstance. Centrod-lnage uses the dstance between cluster centrods. Average-lnage: uses average dstances between ndvdual ponts of the two clusters. Completelnage uses maxmal pont-to-pont dstance. Due to ther dfferent defnton of crtcal dstance, latter we wll see that they sometmes can have very dfferent behavor. Sngle-lnage, complete-lnage, and average-lnage only need to calculate a metrc matrx of sze N 2 at the very begnnng. Other methods need to update the postons of centrods and relatve dstances from ponts to centrods durng each teraton. Because our metrc s not a smple Eucldean dstance, the latter methods wll be more tme-consumng. (3) Performance Metrc DBI s defned as 1 d dj DB D D (1.1) max j j, j 1 d, j where d j s dstance between centrods and d s the average dstance between ponts n some cluster wth the centrod of that cluster. psf s defned as: SS N B psf SS 1 w 2 2 B, w 1 1 xc SS n m m SS x m (1.2) where m s centrod and m s the overall mean of data. Usually, lower DBI and hgher psf reflects compact and well-separated DBI. But one should be careful when usng these ndces. For example, DBI s affected by cluster count, we should only compare DBI values when the number of clusters s smlar. 3. Results and Analyss Molecular dynamcs trajectores are generated by Terachem pacage. RMSD calculaton and molecular algnment s performed usng VMD pacage. If the methods only requres N 2 metrc matrx as dscussed n the prevous sectons, then the clusterng s performed by MATLAB statstc box. Otherwse, the clusterng s performed by Cluster3.0, where we have added our metrc nto the clusterng lbrary. DBI and psf are calculated by MATLAB. 3.1 Clusterng ponts n 2D-plane from unform dstrbuton Frst, dfferent clusterng methods are appled to a hundred ponts n 2D plane whch are sampled from unform dstrbuton. The ponts don t have any nternal

2 Orgnal Ln-complete Ln-sngle Ln-complete Ln-sngle Ln-average Ln-centrod Ln-average Ln-centrod -means -means structures, thus the clusterng results wll only reflect the propertes of dfferent algorthm. Fgure 1. Clusterng results for ponts n 2D-plane drawn from unform dstrbuton From Fg.3, the followng propertes can be observed. (1) Most of the methods tend to naturally and equally partton the ponts nto four blocs. Ths s especally true for -means. (2) Sngle-lnage (or edge-lnage) almost classfes all the ponts nto a same cluster. Ths mght because the crtcal dstance of sngle-lnage s defned as the nearest ponts between two clusters, thus sngle-lnage may be very senstve to cases where ponts are close to each other and no clear border exsts. (3) method s the only one that produces clusters wth very dfferent shapes. Ths mght be method does clusterng based on probablstc assumptons whle all other methods are based on geometrc structures. Fgure 2. Clusterng results for ponts n 2D-plane drawn from three equally szed overlapped Gaussan dstrbuton. Compared to the orgnal fgure, the followng propertes can be notced. (1) The success of -means may agan due to the nternal property of -means to produce clusters wth same sze and same shape. (2) can get pretty good result f the underlyng probablty dstrbuton s very close to Gaussan. (3) Centrod lnage doesn t wor well, perhaps because the centrods are ll-defned. (4) Sngle-lnage seems to fal for ths crcumstance agan where ponts have no clear borders Clusterng Artfcal MD data: Four equally szed clusters 3.2 Clusterng ponts on 2D-plane from overlapped Gaussan dstrbuton. In the second step, clusterng methods are appled to ponts sampled from three ndependent Gaussan dstrbutons. The mean and covarance of 2D Gaussan s tuned so that the three dstrbutons are overlapped. The reason to test on overlapped ponts s that n MD smulatons, the trajectores are generated consequently, thus adjacent confguratons are usually very smlar. Fgure 3. Illustratons of typcal cyclo-hexane conformatons. Cyclo-hexane s used as a test model. Four clusters are char, twsted boat, boat, and half-char. To generate each cluster, tae char as an example, we start from a

3 L-sngle L-centrod L-Complete L-sngle L-average L-centrod L-Complete L-average char confguraton, control the tmestep to 0.01fs, control the temperature to very low and only proceed 100 steps. Repeat ths procedure several tmes, we can guarantee to generate a cluster wth typcal char confguratons. Because the fve dfferent clusters are generated ndependently and artfcally, there s actually a very clear border between dfferent clusters. The fgure shows the expected dealzed behavor ncludng mnma n the DBI, maxma n psf when clusterng number s equal to the optmal value of 4. DBI psf Ths set s much dffcult to cluster as t has both very small clusters wth small varance and relatve large clusters wth large varance. DBI psf Fgure 4.DBI and psf for clusterng artfcal MD trajectores of cyclo-hexane. Optmal clusters are four equally szed clusters. X-axs range from 3 to 8. By checng the assgnments, t s found that for ths test wth equal clusterng sze and clear border, most of the algorthm perform very well except. Ths mght be because Gaussan dstrbuton s not a good assumpton for how confguratons dstrbute wthn each cluster around the centrod. 3.4 Clusterng artfcal MD data: Fve dfferentally szed clusters. In real MD smulatons, the szes of clusters can be very dfferent, because the lower the energy s, the hgher probablty t wll appear durng smulaton. To mmc ths property, n ths test, we buld an artfcal MD data by combnng: 2 planar structures, 15 half-char structures, 30 boat structures, 50 twst-boat structures, and 100 chars. The order s also consstent wth the energy order. Fgure 5.DBI and psf for clusterng artfcal MD trajectores of cyclo-hexane. Optmal clusters are fve dstnct szed clusters. X-axs range from 3 to 8. Table 1. Sze of clusters for dfferent algorthm wth dfferent number of clusters. Method #cluster Cluster sze Kmeans L-sngle (2, 28) L-average (12, 18) L-centrod (12, 18) L-complete (5, 25) It can be notced that when confguratons are not unformly separated, the metrcs are less consstent and not necessary gets optmal value at optmal cluster numbers.

4 Ln-Centrods Ln-Complete From the table and fgure the followng propertes could be observed (1) For ths tests where clear border exsts but cluster szes are very dfferent, all the lnage methods are able to recover the correct assgnment at optmal cluster number 5. When cluster number ncreases, dfferent methods then splt clusters n dfferent way due to ther dfferent dstance defnton. (2) K-mean exhbts a strong tendency to cluster ponts nto equal sze, thus doesn t gve good performance for ths tests. (3) also gves pretty bad result, possbly due to same reason as prevous tests. Fgure 7. Centrods for each cluster. To explan ths behavor, the energy profle for a contnuous changng cyclohexane s checed. 3.5 Clusterng real MD data. Unle the artfcal trajectores n whch confguratons are clearly separated, confguratons from adjacent steps n real MD smulatons are very smlar and wll mae clusterng to be more dffcult. Cyclohexane s agan used as the test examples. Multple trajectores are generated startng from half-char structure. After clusterng each trajectores ndvdually, two dstnct typcal types of behavor s shown below. Fgure 8. Cyclohexane energy profle. It can be seen from the fgure that, f startng from halfchar, t can ether goes to the left, the char regon( f the plat part flps downwards) or goes to the rght, the boat regon(f the plat fart flps upwards). Once t steps to the left, t wll be separated from the other part by a hgh energy barrer. Smlar for steppng nto the rght regon. All methods gve the almost the same clusterng when cluster number s three. If we set cluster number to two, methods wll have dstnct results. Fgure 6. Two dstnct types of behavor and clusterng results when clusterng real MD trajectores of cyclo-hexane startng from half-char conformaton. (0) Char (2) Upward twst-boat (3) boat (4) Downward twst-boat

5 RMSD Ln-sngle Ln-Average Fgure 8. Dfferent clusterng results from dfferent algorthm when number of cluster s 2. It can be seen from the fgure that : (1) and lnage-complete method merges upward and downward twst-boat. These two methods emphasze more on the dfference between boat and twst-boat. (2) Ln-centrods and ln-average merge boat and upward twst boat. and wegh more on the dfference between reversed confguratons. (3) Ln-sngle doesn t preform very well for ths tests, perhaps because t cannot handle crcumstances where clear border s absent. 3.6 Clusterng proten trajectores Fnally, clusterng method s appled to proten trajectores. Complete-lnage method s used, because t performs well n the prevous tests and only requres a N 2 metrc matrx, wthout necessty to compute updated centrods durng each teraton. Optmal clusterng number s pced out by psf ndces. Consderng the avalable computng ablty, we apply a pullng force to the proten so that the structure wll change more rapdly. Only the coordnates of bacbones (carbon, ntrogen, oxygen) are passed nto the clusterng program. Results are shown below. Tme Fgure 9. Clusterng results for proten. The three clusters clearly show the transformaton from folded, half-unfolded to completely unfolded under the pullng force we apply to the system. Fgure 10. Centrods for three clusters. Summary In ths project, we use several tests to compare and analyze the performance of dfferent clusterng methods under varous condtons. The followng propertes can be summarzed from our observatons: (1) tends to produce blocy clusters of smlar sze. Thus when cluster szes are smlar, gves very good performance. But t usually fals for clusters wth dstnct szes. (2) Sngle-lnage s very senstve to closely spaced ponts. As a result t may be fragle to the presence or absence of sngle pont. (3) Multvarate Gaussan doesn t wor well for most of the tests, perhaps because the assumpton of Gaussan dstrbuton s not good n our problem (4) Centrod-, average-, and complete-average gves qute consstency good results through the results. The latter two doesn t requre updatng centrods durng each teraton, thus may be canddates for clusterng molecular dynamcs trajectores. References 1. Karpen, M. E.; Tobas, D. J.; Broos, C. L..Bochemstry, 1993, 32, Kabsch, W..Acta Crystallogr. 1976, 32: Ramanathan,A; Yoo, J.O.; Langmead, C.J.; J. Chem. Theory Comput.2011, 7, Shao, E, et.al. J. Chem. Theory Comput.2007,3,

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Cluster Validation Determining Number of Clusters. Umut ORHAN, PhD.

Cluster Validation Determining Number of Clusters. Umut ORHAN, PhD. Cluster Analyss Cluster Valdaton Determnng Number of Clusters 1 Cluster Valdaton The procedure of evaluatng the results of a clusterng algorthm s known under the term cluster valdty. How do we evaluate

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Chapter 3 Describing Data Using Numerical Measures

Chapter 3 Describing Data Using Numerical Measures Chapter 3 Student Lecture Notes 3-1 Chapter 3 Descrbng Data Usng Numercal Measures Fall 2006 Fundamentals of Busness Statstcs 1 Chapter Goals To establsh the usefulness of summary measures of data. The

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics ) Ismor Fscher, 8//008 Stat 54 / -8.3 Summary Statstcs Measures of Center and Spread Dstrbuton of dscrete contnuous POPULATION Random Varable, numercal True center =??? True spread =???? parameters ( populaton

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

The optimal delay of the second test is therefore approximately 210 hours earlier than =2. THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Spatial Statistics and Analysis Methods (for GEOG 104 class).

Spatial Statistics and Analysis Methods (for GEOG 104 class). Spatal Statstcs and Analyss Methods (for GEOG 104 class). Provded by Dr. An L, San Dego State Unversty. 1 Ponts Types of spatal data Pont pattern analyss (PPA; such as nearest neghbor dstance, quadrat

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

Physics 5153 Classical Mechanics. Principle of Virtual Work-1 P. Guterrez 1 Introducton Physcs 5153 Classcal Mechancs Prncple of Vrtual Work The frst varatonal prncple we encounter n mechancs s the prncple of vrtual work. It establshes the equlbrum condton of a mechancal

More information

I + HH H N 0 M T H = UΣV H = [U 1 U 2 ] 0 0 E S. X if X 0 0 if X < 0 (X) + = = M T 1 + N 0. r p + 1

I + HH H N 0 M T H = UΣV H = [U 1 U 2 ] 0 0 E S. X if X 0 0 if X < 0 (X) + = = M T 1 + N 0. r p + 1 Homework 4 Problem Capacty wth CSI only at Recever: C = log det I + E )) s HH H N M T R SS = I) SVD of the Channel Matrx: H = UΣV H = [U 1 U ] [ Σr ] [V 1 V ] H Capacty wth CSI at both transmtter and

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

AS-Level Maths: Statistics 1 for Edexcel

AS-Level Maths: Statistics 1 for Edexcel 1 of 6 AS-Level Maths: Statstcs 1 for Edecel S1. Calculatng means and standard devatons Ths con ndcates the slde contans actvtes created n Flash. These actvtes are not edtable. For more detaled nstructons,

More information

Lecture 10: May 6, 2013

Lecture 10: May 6, 2013 TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,

More information

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014 Measures of Dsperson Defenton Range Interquartle Range Varance and Standard Devaton Defnton Measures of dsperson are descrptve statstcs that descrbe how smlar a set of scores are to each other The more

More information

Supplement to Clustering with Statistical Error Control

Supplement to Clustering with Statistical Error Control Supplement to Clusterng wth Statstcal Error Control Mchael Vogt Unversty of Bonn Matthas Schmd Unversty of Bonn In ths supplement, we provde the proofs that are omtted n the paper. In partcular, we derve

More information

Lecture Space-Bounded Derandomization

Lecture Space-Bounded Derandomization Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval

More information

STATISTICAL MECHANICS

STATISTICAL MECHANICS STATISTICAL MECHANICS Thermal Energy Recall that KE can always be separated nto 2 terms: KE system = 1 2 M 2 total v CM KE nternal Rgd-body rotaton and elastc / sound waves Use smplfyng assumptons KE of

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department

More information

One-sided finite-difference approximations suitable for use with Richardson extrapolation

One-sided finite-difference approximations suitable for use with Richardson extrapolation Journal of Computatonal Physcs 219 (2006) 13 20 Short note One-sded fnte-dfference approxmatons sutable for use wth Rchardson extrapolaton Kumar Rahul, S.N. Bhattacharyya * Department of Mechancal Engneerng,

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Split alignment. Martin C. Frith April 13, 2012

Split alignment. Martin C. Frith April 13, 2012 Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some

More information

Lecture Nov

Lecture Nov Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances

More information

Mathematical Preparations

Mathematical Preparations 1 Introducton Mathematcal Preparatons The theory of relatvty was developed to explan experments whch studed the propagaton of electromagnetc radaton n movng coordnate systems. Wthn expermental error the

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Clustering gene expression data & the EM algorithm

Clustering gene expression data & the EM algorithm CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

Problem Points Score Total 100

Problem Points Score Total 100 Physcs 450 Solutons of Sample Exam I Problem Ponts Score 1 8 15 3 17 4 0 5 0 Total 100 All wor must be shown n order to receve full credt. Wor must be legble and comprehensble wth answers clearly ndcated.

More information

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1] DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm

More information

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Lecture 6 More on Complete Randomized Block Design (RBD)

Lecture 6 More on Complete Randomized Block Design (RBD) Lecture 6 More on Complete Randomzed Block Desgn (RBD) Multple test Multple test The multple comparsons or multple testng problem occurs when one consders a set of statstcal nferences smultaneously. For

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Introduction to Algorithms

Introduction to Algorithms Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

ANOVA. The Observations y ij

ANOVA. The Observations y ij ANOVA Stands for ANalyss Of VArance But t s a test of dfferences n means The dea: The Observatons y j Treatment group = 1 = 2 = k y 11 y 21 y k,1 y 12 y 22 y k,2 y 1, n1 y 2, n2 y k, nk means: m 1 m 2

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

Hashing. Alexandra Stefan

Hashing. Alexandra Stefan Hashng Alexandra Stefan 1 Hash tables Tables Drect access table (or key-ndex table): key => ndex Hash table: key => hash value => ndex Man components Hash functon Collson resoluton Dfferent keys mapped

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Analysis of Discrete Time Queues (Section 4.6)

Analysis of Discrete Time Queues (Section 4.6) Analyss of Dscrete Tme Queues (Secton 4.6) Copyrght 2002, Sanjay K. Bose Tme axs dvded nto slots slot slot boundares Arrvals can only occur at slot boundares Servce to a job can only start at a slot boundary

More information

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting Onlne Appendx to: Axomatzaton and measurement of Quas-hyperbolc Dscountng José Lus Montel Olea Tomasz Strzaleck 1 Sample Selecton As dscussed before our ntal sample conssts of two groups of subjects. Group

More information

Lecture 4: September 12

Lecture 4: September 12 36-755: Advanced Statstcal Theory Fall 016 Lecture 4: September 1 Lecturer: Alessandro Rnaldo Scrbe: Xao Hu Ta Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer: These notes have not been

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Week 9 Chapter 10 Section 1-5

Week 9 Chapter 10 Section 1-5 Week 9 Chapter 10 Secton 1-5 Rotaton Rgd Object A rgd object s one that s nondeformable The relatve locatons of all partcles makng up the object reman constant All real objects are deformable to some extent,

More information

Physics 181. Particle Systems

Physics 181. Particle Systems Physcs 181 Partcle Systems Overvew In these notes we dscuss the varables approprate to the descrpton of systems of partcles, ther defntons, ther relatons, and ther conservatons laws. We consder a system

More information

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

p 1 c 2 + p 2 c 2 + p 3 c p m c 2 Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

Temperature. Chapter Heat Engine

Temperature. Chapter Heat Engine Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the

More information

} Often, when learning, we deal with uncertainty:

} Often, when learning, we deal with uncertainty: Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally

More information

Lecture Note 3. Eshelby s Inclusion II

Lecture Note 3. Eshelby s Inclusion II ME340B Elastcty of Mcroscopc Structures Stanford Unversty Wnter 004 Lecture Note 3. Eshelby s Incluson II Chrs Wenberger and We Ca c All rghts reserved January 6, 004 Contents 1 Incluson energy n an nfnte

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Image Processing for Bubble Detection in Microfluidics

Image Processing for Bubble Detection in Microfluidics Image Processng for Bubble Detecton n Mcrofludcs Introducton Chen Fang Mechancal Engneerng Department Stanford Unverst Startng from recentl ears, mcrofludcs devces have been wdel used to buld the bomedcal

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

An (almost) unbiased estimator for the S-Gini index

An (almost) unbiased estimator for the S-Gini index An (almost unbased estmator for the S-Gn ndex Thomas Demuynck February 25, 2009 Abstract Ths note provdes an unbased estmator for the absolute S-Gn and an almost unbased estmator for the relatve S-Gn for

More information

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS Shervn Haamn A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS INTRODUCTION Increasng computatons n applcatons has led to faster processng. o Use more cores n a chp

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Lecture 17 : Stochastic Processes II

Lecture 17 : Stochastic Processes II : Stochastc Processes II 1 Contnuous-tme stochastc process So far we have studed dscrete-tme stochastc processes. We studed the concept of Makov chans and martngales, tme seres analyss, and regresson analyss

More information

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information