Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R
|
|
- Claribel Booker
- 6 years ago
- Views:
Transcription
1 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R Mirko Birbaumer birbaumer@imsb.biol.ethz.ch
2 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Introduction Pattern Recognition in Biological Systems What happens if we silence a gene in a cell and add infectious viruses?
3 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Introduction Pattern Recognition in Biological Systems What happens if we silence a gene in a cell and add infectious viruses? RNA interference enables silencing of single genes (Nobel Prize in Physiology and Medicine in 2006) library for genes in human cells : exploited to study the function of genes clinical and pharmaceutical purpose: which proteins are essential for virus entry? Are there drugs that target these proteins?
4 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Introduction Pattern Recognition in Biological Systems What happens if we silence a gene in a cell and add infectious viruses? RNA interference enables silencing of single genes (Nobel Prize in Physiology and Medicine in 2006) library for genes in human cells : exploited to study the function of genes clinical and pharmaceutical purpose: which proteins are essential for virus entry? Are there drugs that target these proteins? Required Technology/Techniques : Fully automated microscopy, Image Analysis and Statistical Data Analysis Bioconductor: R project for the analysis and comprehension of genomic data
5 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Introduction Pattern Recognition in Biological Systems What happens if we silence a gene in a cell and add infectious viruses? RNA interference enables silencing of single genes (Nobel Prize in Physiology and Medicine in 2006) library for genes in human cells : exploited to study the function of genes clinical and pharmaceutical purpose: which proteins are essential for virus entry? Are there drugs that target these proteins? Required Technology/Techniques : Fully automated microscopy, Image Analysis and Statistical Data Analysis Bioconductor: R project for the analysis and comprehension of genomic data Answer: For some genes we observe strikingly different patterns!
6 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Introduction RNA interference experiment : in each well there are thousands of cells and the expression of a particular gene is knocked-down. How a virus enters a cell and how it is transported within the cell.
7 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 High-Throughput Screening Vesicle Patterns Patterns of Transferrin in Hep2Beta cells upon silencing of a gene
8 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 High-Throughput Screening Vesicle Patterns Patterns of Transferrin in Hep2Beta cells upon silencing of a gene
9 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 High-Throughput Screening Overview of Image Analysis Pipeline
10 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 High-Throughput Screening Cell Classification GUI for cell classification based on nuclei features and SVM (R package e1071); the GUI was written in python (Tkinter) in combination with rpy used as an interface to R
11 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 High-Throughput Screening Cell Classification 9 cell types are distinguished and classified according to 52 intensity and texture features (CellProfiler) via the GUI
12 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 High-Throughput Screening Cell Classification 9 cell types are distinguished and classified according to 52 intensity and texture features (CellProfiler) via the GUI Based on SVM all detected cells are classified and in-focus and interphase cells are selected
13 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 High-Throughput Screening Cell Classification 9 cell types are distinguished and classified according to 52 intensity and texture features (CellProfiler) via the GUI Based on SVM all detected cells are classified and in-focus and interphase cells are selected 5 Classes : Interphase (1),Prophase/Prometaphase/Apoptotic (2), Metaphase (3), AnaphaseI/AnaphaseII/Telophase(4) and Artefact (5) ; Total accuracy of 10-fold CV : 96.63; Confusion Matrix based on 4000 training data points (nuclei) and tested on 2800 nuclei: Class 1 Class 2 Class 3 Class 4 Class 5 predicted Class predicted Class predicted Class predicted Class predicted Class
14 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Transferrin Phenotypes How can we classify these patterns in an unsupervised manner?
15 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Transferrin Phenotypes How can we classify these patterns in an unsupervised manner?
16 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Feature selection Vesicle Features: Relative distance to nucleus, radius, ellipticity, flux, area of vesicles, number of vesicles per cell area Spatial Point Patterns: Number of vesicles within 18/25 pixels around each vesicle Radius [pixels] within which 40 and 60 percent of all vesicles are contained around each vesicle Clustering Tendency (Ripley s k function):
17 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Outline of Statistical Analysis All detected vesicles (11 features) are written into one file Principal Component analysis of this data file : 3 pc are chosen Parameterized Gaussian Mixture Modeling (package MCLUST) on data files containing all vesicles of a single cell with 3 principal components: G f (x i ) = τ k Φ k (x i µ k, Σ k ) (1) k=1 where x i corresponds to the feature vector of a vesicle, G is the number of components, τ k the probability that an observation belongs to the kth component ( G k=1 τ k = 1) and Φ k a normal distribution with mean µ k and covariance matrix Σ k.
18 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Outline of Statistical Analysis All detected vesicles (11 features) are written into one file Principal Component analysis of this data file : 3 pc are chosen Parameterized Gaussian Mixture Modeling (package MCLUST) on data files containing all vesicles of a single cell with 3 principal components: G f (x i ) = τ k Φ k (x i µ k, Σ k ) (1) k=1 where x i corresponds to the feature vector of a vesicle, G is the number of components, τ k the probability that an observation belongs to the kth component ( G k=1 τ k = 1) and Φ k a normal distribution with mean µ k and covariance matrix Σ k. Based on BIC number of components is determined Symmetrized Kulback-Leibler is used as a distance (dissimilarity) measure between two vesicle distributions Representation of distance matrix via Hierarchical Clustering In collaboration with Prof. P. Buehlmann and Dr. M. Kalisch from the SFS ETH Zurich
19 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Vesicle Classification Based on the Bayesian Information Criterion (BIC) 5 vesicle classes are assumed; vesicles are classified according to these 5 groups.
20 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Principal Component Analysis Coordinate Projection of vesicles and projeceted covariance matrices of vesicle groups.
21 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Principal Component Analysis Biplot : original features of all vesicles are projected onto the first 2 principal components.
22 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Vesicle Groups All vesicles are marked according to the vesicle group they are belonging to.
23 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Vesicle Groups All vesicles are marked according to the vesicle group they are belonging to.
24 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Clustering Tree Hierarchical Clustering Tree: Cells with similar patterns (and biological perturbation) group together.
25 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Evaluation of Kinome Screens Aim: Find in an unsupervised manner well distinguishable patterns and corresponding proteins and characterize these patterns : functional modules of proteins Ultimate reduction of Data sets : Distance Matrix
26 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Case Study Evaluation of Kinome Screens Aim: Find in an unsupervised manner well distinguishable patterns and corresponding proteins and characterize these patterns : functional modules of proteins Ultimate reduction of Data sets : Distance Matrix Phylogenic tree of moste distant wells within a Kinome Screen
27 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Distributed Computing Large Dataset Handling Principal Component Analysis of a data matrix of 3 GB size
28 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Distributed Computing Large Dataset Handling Principal Component Analysis of a data matrix of 3 GB size Solution: Calculate the covariance matrix with a Fortran program; eigenvectors and principal components are calculated again in R Question : how good is the flexibility of R functions for large datasets?
29 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Distributed Computing Large Dataset Handling Principal Component Analysis of a data matrix of 3 GB size Solution: Calculate the covariance matrix with a Fortran program; eigenvectors and principal components are calculated again in R Question : how good is the flexibility of R functions for large datasets? 2500 data files need to be processed by the mclust package; processing time per file : minutes (minimum 3 weeks in total)
30 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Distributed Computing Large Dataset Handling Principal Component Analysis of a data matrix of 3 GB size Solution: Calculate the covariance matrix with a Fortran program; eigenvectors and principal components are calculated again in R Question : how good is the flexibility of R functions for large datasets? 2500 data files need to be processed by the mclust package; processing time per file : minutes (minimum 3 weeks in total) Solution: Condor : distribute jobs among 60 and 70 computers : less than 2 days of processing time! Condor is an open-source computing software framework for distributed parallelization of computationally intensive tasks (runs under Linux, Mac and Windows) It can be used to manage workload on a dedicated cluster of computers or on non-dedicated desktop machines
31 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Distributed Computing How to submit a condor job A job, described in the description file Genname descr.txt, is submitted for execution under Condor by : condor submit Genname descr.txt Organisation of the description file Genname descr.txt: getenv = True when to transfer output = ON EXIT OR EVICT notification = never universe = vanilla Executable = condor.py log = condorlogs/genname.log output = condorlogs/genname.out error = condorlogs/genname.error GetEnv = True transfer input files = Genname.mat,mcl.R arguments = Genname.mat queue
32 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Distributed Computing How to submit a condor job A job, described in the description file Genname descr.txt, is submitted for execution under Condor by : condor submit Genname descr.txt Organisation of the description file Genname descr.txt: getenv = True when to transfer output = ON EXIT OR EVICT notification = never universe = vanilla Executable = condor.py log = condorlogs/genname.log output = condorlogs/genname.out error = condorlogs/genname.error GetEnv = True transfer input files = Genname.mat,mcl.R arguments = Genname.mat queue Contents of the executable file condor.py: condorfile=sys.argv[1] command= R CMD BATCH --no-save --no-restore --args filemat= " + condorfile + " + + mcl.r os.system(command)
33 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Outlook Pattern Matching Michael Rissi s ideas about vectorised pattern matching algorithm using NVidia s GPU programming tool CUDA GPU: Vector processor with fast 64k shared memory access (shareable between threads). CUDA: NVidia s programming tools for their 8xxx Series Graphics Cards. Standard C with some extensions for low level programming as well as an API for high level programming (e.g. math libraries FFT). Announced: double precision. General idea: Produce a database of vesicle patterns within a cell. Compare cells from a dataset with these patterns. Trivially vectorizable problem. Huge acceleration possible on GPUs (estimate: ) Contact : rissim@particle.phys.ethz.ch
34 Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July / 23 Outlook Pattern Matching Thanks for your attention!
Integrating Globus into a Science Gateway for Cryo-EM
Integrating Globus into a Science Gateway for Cryo-EM Michael Cianfrocco Life Sciences Institute Department of Biological Chemistry University of Michigan Globus World 2018 Impact & growth of Executive
More informationIntroducing a Bioinformatics Similarity Search Solution
Introducing a Bioinformatics Similarity Search Solution 1 Page About the APU 3 The APU as a Driver of Similarity Search 3 Similarity Search in Bioinformatics 3 POC: GSI Joins Forces with the Weizmann Institute
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationComputational Biology Course Descriptions 12-14
Computational Biology Course Descriptions 12-14 Course Number and Title INTRODUCTORY COURSES BIO 311C: Introductory Biology I BIO 311D: Introductory Biology II BIO 325: Genetics CH 301: Principles of Chemistry
More informationCross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic
Cross Discipline Analysis made possible with Data Pipelining J.R. Tozer SciTegic System Genesis Pipelining tool created to automate data processing in cheminformatics Modular system built with generic
More informationMURCIA: Fast parallel solvent accessible surface area calculation on GPUs and application to drug discovery and molecular visualization
MURCIA: Fast parallel solvent accessible surface area calculation on GPUs and application to drug discovery and molecular visualization Eduardo J. Cepas Quiñonero Horacio Pérez-Sánchez Wolfgang Wenzel
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationGalaxy in Plant Pathology: Not everything is NGS data
Galaxy in Plant Pathology: Not everything is NGS data Peter Cock & Leighton Pritchard Galaxy Community Conference Lunteren, The Netherlands 25 May 2011 JHI Plant Pathology We work on a range of organisms
More informationMachine Learning for Gravitational Wave signals classification in LIGO and Virgo
Machine Learning for Gravitational Wave signals classification in LIGO and Virgo Elena Cuoco European Gravitational Observatory www.elenacuoco.com @elenacuoco 2 About me About me Working as Data Analyst
More informationDeep Learning. Convolutional Neural Networks Applications
Deep Learning Using a Convolutional Neural Network Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich Supercomputing
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationMixture models for analysing transcriptome and ChIP-chip data
Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationUnderstanding Supernovae with Condor
Understanding Supernovae with Condor Bang! Scott Teige SN 1006 Type I Supernova May 1, 1006 Accretion of matter onto a companion star. Cassiopeia A, A type II supernova November 11, 1572 Collapse of a
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationFACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION
SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL
More informationA Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationK-means-based Feature Learning for Protein Sequence Classification
K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract
More informationUncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization
Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,
More informationPRACTICAL ANALYTICS 7/19/2012. Tamás Budavári / The Johns Hopkins University
PRACTICAL ANALYTICS / The Johns Hopkins University Statistics Of numbers Of vectors Of functions Of trees Statistics Description, modeling, inference, machine learning Bayesian / Frequentist / Pragmatist?
More informationAnomaly Detection for the CERN Large Hadron Collider injection magnets
Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing
More informationBiology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:
Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week: Course general information About the course Course objectives Comparative methods: An overview R as language: uses and
More informationBioimage Informatics for Systems Pharmacology
Bioimage Informatics for Systems Pharmacology Authors : Fuhai Li Zheng Yin Guangxu Jin Hong Zhao Stephen T. C. Wong Presented by : Iffat chowdhury Motivation Image is worth for phenotypic changes identification
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationFree Open Source Software for Geoinformatics (FOSS4G) A Practical Example System for Automated Geoscientific Analyses (SAGA)
Free Open Source Software for Geoinformatics (FOSS4G) A Practical Example System for Automated Geoscientific Analyses (SAGA) Zlatko Horvat, MSc DGU Područni ured za katastar Čakovec My Motivation Give
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationSTRUCTURAL BIOINFORMATICS I. Fall 2015
STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C;
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationCRYPTOGRAPHIC COMPUTING
CRYPTOGRAPHIC COMPUTING ON GPU Chen Mou Cheng Dept. Electrical Engineering g National Taiwan University January 16, 2009 COLLABORATORS Daniel Bernstein, UIC, USA Tien Ren Chen, Army Tanja Lange, TU Eindhoven,
More informationESPRIT Feature. Innovation with Integrity. Particle detection and chemical classification EDS
ESPRIT Feature Particle detection and chemical classification Innovation with Integrity EDS Fast and Comprehensive Feature Analysis Based on the speed and accuracy of the QUANTAX EDS system with its powerful
More informationSome notes on efficient computing and setting up high performance computing environments
Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient
More informationEigenface-based facial recognition
Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The
More informationDivCalc: A Utility for Diversity Analysis and Compound Sampling
Molecules 2002, 7, 657-661 molecules ISSN 1420-3049 http://www.mdpi.org DivCalc: A Utility for Diversity Analysis and Compound Sampling Rajeev Gangal* SciNova Informatics, 161 Madhumanjiri Apartments,
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationKarsten Vennemann, Seattle. QGIS Workshop CUGOS Spring Fling 2015
Karsten Vennemann, Seattle 2015 a very capable and flexible Desktop GIS QGIS QGIS Karsten Workshop Vennemann, Seattle slide 2 of 13 QGIS - Desktop GIS originally a GIS viewing environment QGIS for the
More informationScientific Data Mining: Why is it Difficult? Sapphire: using data mining techniques to address the data overload problem
Scientific Data Mining: Why is it Difficult? Chandrika Kamath June 25, 2008 MMDS 2008: Workshop on Algorithms for Modern Massive Data Sets LLNL-PRES-404920: This work performed under the auspices of the
More informationHigh-performance processing and development with Madagascar. July 24, 2010 Madagascar development team
High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar
More informationMachine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis
Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Sham M Kakade c 2019 University of Washington cse446-staff@cs.washington.edu 0 / 10 Announcements Please do Q1
More informationSPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA
SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA D. Pokrajac Center for Information Science and Technology Temple University Philadelphia, Pennsylvania A. Lazarevic Computer
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationThe Schrödinger KNIME extensions
The Schrödinger KNIME extensions Computational Chemistry and Cheminformatics in a workflow environment Jean-Christophe Mozziconacci Volker Eyrich KNIME UGM, Zurich, February 2014 The Schrödinger extensions
More informationTDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts.
TDDI4 Concurrent Programming, Operating Systems, and Real-time Operating Systems CPU Scheduling Overview: CPU Scheduling CPU bursts and I/O bursts Scheduling Criteria Scheduling Algorithms Multiprocessor
More informationESARAD Status. Status Overall. Current version s features Next version s features Next development work. October 2005.
October 2005 ESARAD Status Bruno CASTELLI Status Overall ESARAD 5.6 released in February 2005 Patch 5.6.1, released in June 2005 Next release, version 5.8, end of 2005 Current version s features Next version
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationMitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks
Mitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks IDSIA, Lugano, Switzerland dan.ciresan@gmail.com Dan C. Cireşan and Alessandro Giusti DNN for Visual Pattern Recognition
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationAssignment No A-05 Aim. Pre-requisite. Objective. Problem Statement. Hardware / Software Used
Assignment No A-05 Aim Implement Naive Bayes to predict the work type for a person. Pre-requisite 1. Probability. 2. Scikit-Learn Python Library. 3. Programming language basics. Objective 1. To Learn basic
More informationGPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques
More informationECE521 W17 Tutorial 1. Renjie Liao & Min Bai
ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationOECD QSAR Toolbox v.3.4
OECD QSAR Toolbox v.3.4 Predicting developmental and reproductive toxicity of Diuron (CAS 330-54-1) based on DART categorization tool and DART SAR model Outlook Background Objectives The exercise Workflow
More informationCPU SCHEDULING RONG ZHENG
CPU SCHEDULING RONG ZHENG OVERVIEW Why scheduling? Non-preemptive vs Preemptive policies FCFS, SJF, Round robin, multilevel queues with feedback, guaranteed scheduling 2 SHORT-TERM, MID-TERM, LONG- TERM
More informationACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS
ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS Bojan Musizza, Dejan Petelin, Juš Kocijan, Jožef Stefan Institute Jamova 39, Ljubljana, Slovenia University of Nova Gorica Vipavska 3, Nova Gorica, Slovenia
More informationNew Prediction Methods for Tree Ensembles with Applications in Record Linkage
New Prediction Methods for Tree Ensembles with Applications in Record Linkage Samuel L. Ventura Rebecca Nugent Department of Statistics Carnegie Mellon University June 11, 2015 45th Symposium on the Interface
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationhsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference
CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science
More informationThe Development of a Quality Control and Analysis Application for the ThermoFluor High Throughput Screening Assay
The Development of a Quality Control and Analysis Application for the ThermoFluor High Throughput Screening Assay Robert B. Nachbar 1 Delphine Collin 2 Jonathan Robinson 1 Thomas J. Mildorf 3 Eugen Buehler
More informationMultidimensional data analysis in biomedicine and epidemiology
in biomedicine and epidemiology Katja Ickstadt and Leo N. Geppert Faculty of Statistics, TU Dortmund, Germany Stakeholder Workshop 12 13 December 2017, PTB Berlin Supported by Deutsche Forschungsgemeinschaft
More informationImproving Satellite Data Utilization Through Deep Learning
September 26th, 2018 Improving Satellite Data Utilization Through Deep Learning Jebb Stewart *, Christina Bonfanti **, David M. Hall ***, Isidora Jankov *, Lidia Trailovic **, Stevan Maksimovic *, Mark
More informationBIOLIGHT STUDIO IN ROUTINE UV/VIS SPECTROSCOPY
BIOLIGHT STUDIO IN ROUTINE UV/VIS SPECTROSCOPY UV/Vis Spectroscopy is a technique that is widely used to characterize, identify and quantify chemical compounds in all fields of analytical chemistry. The
More informationScikit-learn. scikit. Machine learning for the small and the many Gaël Varoquaux. machine learning in Python
Scikit-learn Machine learning for the small and the many Gaël Varoquaux scikit machine learning in Python In this meeting, I represent low performance computing Scikit-learn Machine learning for the small
More informationSubcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network
Subcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network Marko Tscherepanow and Franz Kummert Applied Computer Science, Faculty of Technology, Bielefeld
More informationOECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance
OECD QSAR Toolbox v.4.0 Tutorial on how to predict Skin sensitization potential taking into account alert performance Outlook Background Objectives Specific Aims Read across and analogue approach The exercise
More informationArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde
ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise is the new name for ArcGIS for Server ArcGIS Enterprise Software Components ArcGIS Server Portal
More informationClassification Techniques with Applications in Remote Sensing
Classification Techniques with Applications in Remote Sensing Hunter Glanz California Polytechnic State University San Luis Obispo November 1, 2017 Glanz Land Cover Classification November 1, 2017 1 /
More informationAutomated Analysis of the Mitotic Phases of Human Cells in 3D Fluorescence Microscopy Image Sequences
Automated Analysis of the Mitotic Phases of Human Cells in 3D Fluorescence Microscopy Image Sequences Nathalie Harder 1, Felipe Mora-Bermúdez 2, William J. Godinez 1, Jan Ellenberg 2, Roland Eils 1, and
More informationCSD. Unlock value from crystal structure information in the CSD
CSD CSD-System Unlock value from crystal structure information in the CSD The Cambridge Structural Database (CSD) is the world s most comprehensive and up-todate knowledge base of crystal structure data,
More informationSmartDairy Catalog HerdMetrix Herd Management Software
SmartDairy Catalog HerdMetrix Herd Management Quality Milk Through Technology Sort Gate Hoof Care Feeding Station ISO RFID SmartControl Meter TouchPoint System Management ViewPoint Catalog March 2011 Quality
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationAnalysis of Software Artifacts
Analysis of Software Artifacts System Performance I Shu-Ngai Yeung (with edits by Jeannette Wing) Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 2001 by Carnegie Mellon University
More informationFundamentals of Computational Science
Fundamentals of Computational Science Dr. Hyrum D. Carroll August 23, 2016 Introductions Each student: Name Undergraduate school & major Masters & major Previous research (if any) Why Computational Science
More informationCSC 411 Lecture 12: Principal Component Analysis
CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised
More informationWord-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator
Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical & Electronic
More informationKarhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering
Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition
More informationOECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance
OECD QSAR Toolbox v.4.1 Tutorial on how to predict Skin sensitization potential taking into account alert performance Outlook Background Objectives Specific Aims Read across and analogue approach The exercise
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationNuclear Data Uncertainty Analysis in Criticality Safety. Oliver Buss, Axel Hoefer, Jens-Christian Neuber AREVA NP GmbH, PEPA-G (Offenbach, Germany)
NUDUNA Nuclear Data Uncertainty Analysis in Criticality Safety Oliver Buss, Axel Hoefer, Jens-Christian Neuber AREVA NP GmbH, PEPA-G (Offenbach, Germany) Workshop on Nuclear Data and Uncertainty Quantification
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationBMD645. Integration of Omics
BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study
More informationAn IDL Based Image Deconvolution Software Package
An IDL Based Image Deconvolution Software Package F. Városi and W. B. Landsman Hughes STX Co., Code 685, NASA/GSFC, Greenbelt, MD 20771 Abstract. Using the Interactive Data Language (IDL), we have implemented
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationDATA ANALYTICS IN NANOMATERIALS DISCOVERY
DATA ANALYTICS IN NANOMATERIALS DISCOVERY Michael Fernandez OCE-Postdoctoral Fellow September 2016 www.data61.csiro.au Materials Discovery Process Materials Genome Project Integrating computational methods
More informationHybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures
More informationCPU Scheduling. CPU Scheduler
CPU Scheduling These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of each
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationPrincipal Component Analysis, A Powerful Scoring Technique
Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new
More informationTowards Automatic Nanomanipulation at the Atomic Scale
Towards Automatic Nanomanipulation at the Atomic Scale Bernd Schütz Department of Computer Science University of Hamburg, Germany Department of Computer Science Outline Introduction System Overview Workpackages
More information