Graph Theoretic Latent Class Discovery
|
|
- Felicity Ellis
- 6 years ago
- Views:
Transcription
1 Graph Theoretic Latent Class Discovery Jeff Solka NSWCDD/GMU GMU BINF Colloquium 2/24/04 p.1/28
2 Agenda What is latent class discovery? What are some approaches to the latent class discovery process? The class cover catch digraph classifier. Latent class discovery results on a gene expression data set. Wrap-up and conclusions. GMU BINF Colloquium 2/24/04 p.2/28
3 Acknowledgments John Grefenstette Office of Naval Research through their ILIR Program for funding this effort GMU BINF Colloquium 2/24/04 p.3/28
4 What is Latent Class Discovery? A latent class is a class of observations that reside undiscovered within a known class of observations. Develop a general methodology for the discernment of latent class structure during discriminant analysis. Moderately large hyperdimensional data sets. During training or testing. Explore applications of developed methodologies to the analysis of data sets in the areas of hyperdimensional image analysis, artificial olfactory systems, computer security data, gene expression data, and text data mining. GMU BINF Colloquium 2/24/04 p.4/28
5 Flow Chart MULTIDIMENSIONAL SCALING HYPERDIMENSIONAL DATA GRAPH THEORETIC DISCRIMINANT ANALYSIS LATENT CLASSES I N S I G H T S METRIC SPACE ADAPTATION NONLINEAR DIMENSIONALITY REDUCTION GMU BINF Colloquium 2/24/04 p.5/28
6 Dominating Set two class data and covering discs Dominating set GMU BINF Colloquium 2/24/04 p.6/28
7 A Brief Movie GMU BINF Colloquium 2/24/04 p.7/28
8 CCCD-Based Latent Class Discovery GMU BINF Colloquium 2/24/04 p.8/28
9 Quadratic Classifier-Based Latent Class Discovery GMU BINF Colloquium 2/24/04 p.9/28
10 ALL/AML Leukemia Gene Expression Analysis 72 Patients 7129 genes Apply CCCD to ALL Observations = AML = ALL B cell = ALL T cell Cluster CCCD Solution Based on Radii Ascertain Significance of Latent Class Structure Examine Clusters for Latent Class Structure GMU BINF Colloquium 2/24/04 p.10/28
11 5 4 / 3 $$ ' 4 / 3 $$ ' Resubstitution Error Rate Estimate is an empirical risk (resubstitution error rate estimate) For each calculated as ( ), $ " #"! (*),+ ' &% 5 ( ) ( ) + ' % $ " #"! 7 6 GMU BINF Colloquium 2/24/04 p.11/28
12 5 Classification Dimension We proceed by defining the scale dimension to be the cluster map dimension that minimizes a dimensionality-penalized empirical risk; 021 / 021 / for some penalty coefficient. GMU BINF Colloquium 2/24/04 p.12/28
13 ALL/AML Classification Dimension Plot GMU BINF Colloquium 2/24/04 p.13/28
14 Gene Latent Class Discovery GMU BINF Colloquium 2/24/04 p.14/28
15 ALL/AML MDS Plot GMU BINF Colloquium 2/24/04 p.15/28
16 How Robust is the Methodology? One other success story using artificial nose data. What if we had used another dominating set in our analysis? Is the discovered latent class structure independent of the dominating set used? GMU BINF Colloquium 2/24/04 p.16/28
17 An Exhaustive Enumeration of All Possible Dominating Sets for the Gene Data node solutions 16 of the nodes remain fixed across the solutions 14 greedy solutions GMU BINF Colloquium 2/24/04 p.17/28
18 Classification Space Curves for the 180 Solutions GMU BINF Colloquium 2/24/04 p.18/28
19 Classification Dimension for the 180 Solutions (red o Greedy Solutions, Green * Previous Solution) GMU BINF Colloquium 2/24/04 p.19/28
20 Classification Dimension for the 180 Solutions GMU BINF Colloquium 2/24/04 p.20/28
21 Number of Dominating Sets for Each Vertex Number of Dominating sets for each vertex # Dominating Sets T Cell B Cell In degree Vertex GMU BINF Colloquium 2/24/04 p.21/28
22 Digraph Analysis! " # $ % & ' ( ) * +, -. / : ; = 8 :A B : C B D : E ; FG H D I@ J H K C H F@ 9 9 L ; J 8 C D J H M= N B I O F J D9 J D P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n 78 9: ; o 8 :A B : C B D : E ; FG H D I@ J H K C H F@ 9 9 L ; J 8 C D J H = p B I O F J D9 J D J H F J C I: q B ; D : q J r ; I O F 9 ; B s F q 9 I ; 8 J H O P = GMU BINF Colloquium 2/24/04 p.22/28
23 Latent Class Discovery Figures of Merit How can we be assured that all of the greedy dominating set solutions discover the same latent classes? Previous greedy solution had 3 clusters that are pure B and 1 cluster that contained 8/9 of the T observations Percentage of B points that are in pure B clusters and the highest percentage of T points in any one cluster GMU BINF Colloquium 2/24/04 p.23/28
24 Purity (Latent Class Discovery) for the Golub Gene Data, Red Triangles are the Greedy Solutions tpercent bpercent GMU BINF Colloquium 2/24/04 p.24/28
25 Remaining Questions Demonstrated similar latent class discovery among all of the greedy dominating set solutions Many of the 7129 variates (genes) are superfluous to the discriminant analysis problem Work is ongoing to examine the discovered latent classes based on subsets of the genes Various figures of merit have been used to choose the subsets of the genes GMU BINF Colloquium 2/24/04 p.25/28
26 Conclusions Developed a new concept for latent class discovery during discriminant analysis Illustrated one graph theoretic methodology for the discovery of the latent classes Illustrated this methodology with a gene expression data set. Presented some preliminary results examining the robustness of the discovery process to the cccd process GMU BINF Colloquium 2/24/04 p.26/28
27 Readings C. E. Priebe, J. L. Solka, D. J. Marchette, and B. T. Clark,2003, Class Cover Catch Digraphs for Latent Class Discovery in Gene Expression Monitoring by DNA Microarrays, Computational Statistics and Data Analysis on Statistical, Vol. 43, pp J. L. Solka, C. E. Priebe, and B. T. Clark,2002, A Visualization Framework for the Analysis of Hyperdimensional Data, International Journal of Image and Graphics Special Issue on Graphical Methods in Data Mining, pp Marchette, D.J., Priebe, C.E., 2002, Characterizing the scale dimension of a high-dimensional classification problem, Pattern Recognition,Vol. 36, pp GMU BINF Colloquium 2/24/04 p.27/28
28 Questions? GMU BINF Colloquium 2/24/04 p.28/28
Investigating the structure of high dimensional pattern recognition problems
Investigating the structure of high dimensional pattern recognition problems Carey E. Priebe Department of Mathematical Sciences Whiting School of Engineering Johns Hopkins University altimore,
More informationComparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees
Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland
More informationKnowledge Discovery with Iterative Denoising
Knowledge Discovery with Iterative Denoising kegiles@vcu.edu www.people.vcu.edu/~kegiles Assistant Professor Department of Statistics and Operations Research Virginia Commonwealth University Associate
More informationMicroarray Data Analysis: Discovery
Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover
More informationAn Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets
An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets George Lee 1, Carlos Rodriguez 2, and Anant Madabhushi 1 1 Rutgers, The State University
More informationGene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm
Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,
More informationGLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data
GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationLinear Programming-based Data Mining Techniques And Credit Card Business Intelligence
Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence Yong Shi the Charles W. and Margre H. Durham Distinguished Professor of Information Technology University of Nebraska,
More informationHeuristics for The Whitehead Minimization Problem
Heuristics for The Whitehead Minimization Problem R.M. Haralick, A.D. Miasnikov and A.G. Myasnikov November 11, 2004 Abstract In this paper we discuss several heuristic strategies which allow one to solve
More informationKERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA
Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationStatistical aspects of prediction models with high-dimensional data
Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationStatistics Applied to Bioinformatics. Tests of homogeneity
Statistics Applied to Bioinformatics Tests of homogeneity Two-tailed test of homogeneity Two-tailed test H 0 :m = m Principle of the test Estimate the difference between m and m Compare this estimation
More informationGENOMIC SIGNAL PROCESSING. Lecture 2. Classification of disease subtype based on microarray data
GENOMIC SIGNAL PROCESSING Lecture 2 Classification of disease subtype based on microarray data 1. Analysis of microarray data (see last 15 slides of Lecture 1) 2. Classification methods for microarray
More informationData Exploration vis Local Two-Sample Testing
Data Exploration vis Local Two-Sample Testing 0 20 40 60 80 100 40 20 0 20 40 Freeman, Kim, and Lee (2017) Astrostatistics at Carnegie Mellon CMU Astrostatistics Network Graph 2017 (not including collaborations
More informationTenMarks Curriculum Alignment Guide: GO Math! Grade 8
GO Math! Unit 1: Real, Exponents, and Scientific Module 1: Real Rational and Irrational Identifying Rational and Irrational Classifying and Representing Rational and Irrational Converting Fractions to
More informationHow GIS can be used for improvement of literacy and CE programmes
How GIS can be used for improvement of literacy and CE programmes Training Workshop for Myanmar Literacy Resource Center (MLRC) ( Yangon, Myanmar, 11 20 October 2000 ) Presented by U THEIN HTUT GEOCOMP
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationGeneralization Error on Pruning Decision Trees
Generalization Error on Pruning Decision Trees Ryan R. Rosario Computer Science 269 Fall 2010 A decision tree is a predictive model that can be used for either classification or regression [3]. Decision
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationMachine Learning on temporal data
Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches
More informationVARIABLE SELECTION IN VERY-HIGH DIMENSIONAL REGRESSION AND CLASSIFICATION
VARIABLE SELECTION IN VERY-HIGH DIMENSIONAL REGRESSION AND CLASSIFICATION PETER HALL HUGH MILLER UNIVERSITY OF MELBOURNE & UC DAVIS 1 LINEAR MODELS A variety of linear model-based methods have been proposed
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationMolecular Biology: from sequence analysis to signal processing. University of Sao Paulo. Junior Barrera
Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo Layout Introduction Knowledge evolution in Genetics Data acquisition Data Analysis A system for genetic
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationSimultaneous variable selection and class fusion for high-dimensional linear discriminant analysis
Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant
More informationChapter 9: Relations Relations
Chapter 9: Relations 9.1 - Relations Definition 1 (Relation). Let A and B be sets. A binary relation from A to B is a subset R A B, i.e., R is a set of ordered pairs where the first element from each pair
More informationChapter 2 Class Notes Sample & Population Descriptions Classifying variables
Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationGradient Boosting, Continued
Gradient Boosting, Continued David Rosenberg New York University December 26, 2016 David Rosenberg (New York University) DS-GA 1003 December 26, 2016 1 / 16 Review: Gradient Boosting Review: Gradient Boosting
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationEVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko
94 International Journal "Information Theories & Applications" Vol13 [Raudys, 001] Raudys S, Statistical and neural classifiers, Springer, 001 [Mirenkova, 00] S V Mirenkova (edel ko) A method for prediction
More informationResearch Statement on Statistics Jun Zhang
Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation
More informationPattern Recognition Approaches to Solving Combinatorial Problems in Free Groups
Contemporary Mathematics Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups Robert M. Haralick, Alex D. Miasnikov, and Alexei G. Myasnikov Abstract. We review some basic methodologies
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationFeature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size
Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski
More informationFeature Selection for SVMs
Feature Selection for SVMs J. Weston, S. Mukherjee, O. Chapelle, M. Pontil T. Poggio, V. Vapnik, Barnhill BioInformatics.com, Savannah, Georgia, USA. CBCL MIT, Cambridge, Massachusetts, USA. AT&T Research
More informationA Posteriori Corrections to Classification Methods.
A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk
More informationwhere X is the feasible region, i.e., the set of the feasible solutions.
3.5 Branch and Bound Consider a generic Discrete Optimization problem (P) z = max{c(x) : x X }, where X is the feasible region, i.e., the set of the feasible solutions. Branch and Bound is a general semi-enumerative
More informationReference Material /Formulas for Pre-Calculus CP/ H Summer Packet
Reference Material /Formulas for Pre-Calculus CP/ H Summer Packet Week # 1 Order of Operations Step 1 Evaluate expressions inside grouping symbols. Order of Step 2 Evaluate all powers. Operations Step
More informationModule Based Neural Networks for Modeling Gene Regulatory Networks
Module Based Neural Networks for Modeling Gene Regulatory Networks Paresh Chandra Barman, Std 1 ID: 20044523 Term Project: BiS732 Bio-Network Department of BioSystems, Korea Advanced Institute of Science
More informationLearning Classification Trees. Sargur Srihari
Learning Classification Trees Sargur srihari@cedar.buffalo.edu 1 Topics in CART CART as an adaptive basis function model Classification and Regression Tree Basics Growing a Tree 2 A Classification Tree
More informationPhotometric Redshifts with DAME
Photometric Redshifts with DAME O. Laurino, R. D Abrusco M. Brescia, G. Longo & DAME Working Group VO-Day... in Tour Napoli, February 09-0, 200 The general astrophysical problem Due to new instruments
More informationSupport Vector Machines Explained
December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationPredictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 3 (2017) pp. 461-469 Research India Publications http://www.ripublication.com Predictive Analytics on Accident Data Using
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationIndex of Balanced Accuracy: A Performance Measure for Skewed Class Distributions
Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions V. García 1,2, R.A. Mollineda 2, and J.S. Sánchez 2 1 Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca Av.
More informationNon-Negative Factorization for Clustering of Microarray Data
INT J COMPUT COMMUN, ISSN 1841-9836 9(1):16-23, February, 2014. Non-Negative Factorization for Clustering of Microarray Data L. Morgos Lucian Morgos Dept. of Electronics and Telecommunications Faculty
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationMathematics, Genomics, and Cancer
School of Informatics IUB April 6, 2009 Outline Introduction Class Comparison Class Discovery Class Prediction Example Biological states and state modulation Software Tools Research directions Math & Biology
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationVisualize Biological Database for Protein in Homosapiens Using Classification Searching Models
International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 2 (2017), pp. 213-224 Research India Publications http://www.ripublication.com Visualize Biological Database
More informationBayesian decision making
Bayesian decision making Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánů 1580/3, Czech Republic http://people.ciirc.cvut.cz/hlavac,
More informationEfficient Information Planning in Graphical Models
Efficient Information Planning in Graphical Models computational complexity considerations John Fisher & Giorgos Papachristoudis, MIT VITALITE Annual Review 2013 September 9, 2013 J. Fisher (VITALITE Annual
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationALGEBRA 1B GOALS. 1. The student should be able to use mathematical properties to simplify algebraic expressions.
GOALS 1. The student should be able to use mathematical properties to simplify algebraic expressions. 2. The student should be able to add, subtract, multiply, divide, and compare real numbers. 3. The
More informationComputational Systems Biology
Computational Systems Biology Vasant Honavar Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Center for Computational Intelligence, Learning, & Discovery
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationCS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014
CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof.
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationGraph Wavelets to Analyze Genomic Data with Biological Networks
Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced
More informationDecision T ree Tree Algorithm Week 4 1
Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Read pp. 105 117 of the text book. Do Examples 3.1, 3.2, 3.3 and Exercise 3.4 (a). Prepare for the results of the homework assignment. Due date
More informationData Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td
Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak
More informationAlexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA
Analyzing Behavioral Similarity Measures in Linguistic and Non-linguistic Conceptualization of Spatial Information and the Question of Individual Differences Alexander Klippel and Chris Weaver GeoVISTA
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationALGORITHMS FOR DISCOVERY OF MULTIPLE MARKOV BOUNDARIES: APPLICATION TO THE MOLECULAR SIGNATURE MULTIPLICITY PROBLEM. Alexander Romanovich Statnikov
ALGORITHMS FOR DISCOVERY OF MULTIPLE MARKOV BOUNDARIES: APPLICATION TO THE MOLECULAR SIGNATURE MULTIPLICITY PROBLEM By Alexander Romanovich Statnikov Dissertation Submitted to the Faculty of the Graduate
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationGrade 8. Concepts and Procedures. The Number System. Expressions and Equations
Grade 8 Concepts and Procedures The Number System Target A: Know that there are numbers that are not rational and approximate them by rational numbers. identify pi as not rational, classify numbers as
More informationClustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015
Clustering of Pathogenic Genes in Human Co-regulatory Network Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015 Topics Background Genetic Background Regulatory Networks
More informationMarkowitz Minimum Variance Portfolio Optimization. using New Machine Learning Methods. Oluwatoyin Abimbola Awoye. Thesis
Markowitz Minimum Variance Portfolio Optimization using New Machine Learning Methods by Oluwatoyin Abimbola Awoye Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of
More informationarxiv: v1 [stat.ml] 17 Sep 2012
Generalized Canonical Correlation Analysis for Disparate Data Fusion Ming Sun a, Carey E. Priebe b,, Minh Tang c arxiv:1209.3761v1 [stat.ml] 17 Sep 2012 a Department of Electrical and Computer Engineering,
More informationSparse Linear Discriminant Analysis With High Dimensional Data
Sparse Linear Discriminant Analysis With High Dimensional Data Jun Shao University of Wisconsin Joint work with Yazhen Wang, Xinwei Deng, Sijian Wang Jun Shao (UW-Madison) Sparse Linear Discriminant Analysis
More informationAn Introduction to Reversible Jump MCMC for Bayesian Networks, with Application
An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.
More informationExploring Spatial Relationships for Knowledge Discovery in Spatial Data
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Exploring Spatial Relationships for Knowledge Discovery in Spatial Norazwin Buang
More informationData Mining and Machine Learning (Machine Learning: Symbolische Ansätze)
Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze) Learning Individual Rules and Subgroup Discovery Introduction Batch Learning Terminology Coverage Spaces Descriptive vs. Predictive
More informationNext Generation Science Standards Crosscutting Concepts for MS
Next Generation Science Standards Crosscutting Concepts for MS 1. Patterns 2. Cause and Effect 3. Scale, Proportion and Quantity 4. Systems and System Models 5. Energy and Matter 6. Structure and Function
More informationDecision Tree Learning and Inductive Inference
Decision Tree Learning and Inductive Inference 1 Widely used method for inductive inference Inductive Inference Hypothesis: Any hypothesis found to approximate the target function well over a sufficiently
More informationMissouri Educator Gateway Assessments
Missouri Educator Gateway Assessments June 2014 Content Domain Range of Competencies Approximate Percentage of Test Score I. Number and Operations 0001 0002 19% II. Algebra and Functions 0003 0006 36%
More informationMachine Learning for Biomedical Engineering. Enrico Grisan
Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and
More informationSupport Vector Machine via Nonlinear Rescaling Method
Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationGeometric Algorithms in GIS
Geometric Algorithms in GIS GIS Software Dr. M. Gavrilova GIS System What is a GIS system? A system containing spatially referenced data that can be analyzed and converted to new information for a specific
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More informationText mining and natural language analysis. Jefrey Lijffijt
Text mining and natural language analysis Jefrey Lijffijt PART I: Introduction to Text Mining Why text mining The amount of text published on paper, on the web, and even within companies is inconceivably
More informationURBAN LAND COVER AND LAND USE CLASSIFICATION USING HIGH SPATIAL RESOLUTION IMAGES AND SPATIAL METRICS
URBAN LAND COVER AND LAND USE CLASSIFICATION USING HIGH SPATIAL RESOLUTION IMAGES AND SPATIAL METRICS Ivan Lizarazo Universidad Distrital, Department of Cadastral Engineering, Bogota, Colombia; ilizarazo@udistrital.edu.co
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationModels, Data, Learning Problems
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,
More informationMaster of Science in Statistics A Proposal
1 Master of Science in Statistics A Proposal Rationale of the Program In order to cope up with the emerging complexity on the solutions of realistic problems involving several phenomena of nature it is
More information