Marginal Balance of Spread Designs
|
|
- Liliana Edwards
- 5 years ago
- Views:
Transcription
1 Marginal Balance of Spread Designs For High Dimensional Binary Data Joe Verducci, Ohio State Mike Fligner, Ohio State Paul Blower, Leadscope
2 Motivation Database: M x N array of 0-1 bits M = number of compounds N = number of structural features Objective: Select a subset of m compounds for further testing, so that we can learn which combinations of structural features are associated with biological response.
3 Complication Testing for biological activity typically involves in vitro (or in silico) bioassays. Efficient designs utilize compounds containing about ½ of all features. For in vivo testing (e.g. gene expression in mice), compounds should contain close to a target proportion (p 0 < ½) of features because large compounds tend to be broken down.
4 Notation x = original binary string z = expanded binary string V = xx T v = V[lower.tri(V)] z T = [x T,v T ] = T m T x x X 1 = T m T z z Z 1
5 Modulated Response Model for In Vivo Activity Y = ) z α ( x β + ε α ( x) 1 exp ( j = 2 2 π h x 2h Np 0 ) 2 ε ~ N(0,1)
6 Information Matrix I ( β ) m = i= 1 α ( x i ) 2 z i z T i = Z T A 2 X Z
7 General Optimality Size Magnitude of eigenvalues maximize tr[ι(β)] Balance Equality of eigenvalues Both (for designs of full rank) Minimize tr[ι(β) -1 ] Maximize det[ι(β)]
8 Total Information tr [ I(β )] m i= 1 p exp i ( p + 1) ( p p ) i i 2( h / 0 N) is maximized for each p i = p 1 > p 0
9 Information from Each Compound Target Proportion p 0 =.12 information per compound h=n/10 h=n/ proportion of features
10 Marginal Balance Two Way Margins of Features p 1 = optimal proportion of compounds per feature M(1-p 1 ) 2 Mp 1 (1-p 1 ) 1 Mp 1 (1-p 1 ) Mp 1 2
11 Insufficiency of Marginal Balance Criterion Need additional criterion to spread 1s among different compounds Avoid near duplication of compounds
12 Example: 8 x 20 Design P 1 = Spread Design contains 76 different pairs of features Full 20 x 20 Cyclic Design contains 105 different pairs of features
13 Spread Design Select a subset S of fixed size m so as to maximize the minimum distance between points in S. Higgs Algorithm: -- Choose points sequentially: At each step, maximize minimum distance to already selected points. -- Leads to near optimal solution Choice of distance greatly effects resulting design.
14 XOR (Hamming Distance) XOR (Hamming): Only accounts for bits that don t match A: B: d 2 XOR = k A k ( XOR ) B k Larger structures have more bits that don t match each other Diversity Result: Tends to favor larger structures with a lot of features
15 Tanimoto Coefficient a = # bits on in A b = # bits on in B c = # bits on in both A and B d = # bits off in both A and B Tanimoto Coefficient T 1 = a + c b Measures similarity using on bits c Tanimoto Coefficient Complement T 0 = d a+ b 2c + d Measures similarity using off bits
16 MT = Modified Tanimoto Measure similarity based on the both the presence (on bits) and absence (off bits) of features α T + ( 1 α ) T 1 2 p 3 a + b where α =, and p =. 2 n When there are fewer on bits: T 1 is weighted more heavily. When there are fewer off bits: T 0 is weighted more heavily. As a variation, p may be fixed by external considerations. The result is called the P-Modified Tanimoto distance. 0
17 Implementing Spread Designs Maximin vs Average Distance Higgs Algorithm Stochastic Searches Near Optimal Solutions
18 Medicinal Drug Database 186 Leadscope Features Prevalence Range: Median: Mean: Drugs now in market Range: 5-70 distinct features per compound Median: 24 (12.8%) features per compound Mean: 26.4 (14.2%) features per compound
19 Procedure Use Higgs algorithm Apply with 4 different metrics Use each of 1089 compounds as initial seed Pick best (maximin distance) 150 designs for each metric Evaluate balance criterion for all designs Summarize
20 Average Number of Distinct Features of Sampled Compounds (Population Median 24 features/cmpd) Distance Hamming Tanimoto Mod.Tan. P-Mod.Tan Sample Size P =
21 Balances of Best Spread Design (of size 20) for Each Distance balance criterion tanimoto modified tanimoto p-modified tanimoto hamming P1
22 Balances with p 1 =.14 for Size 20 Uniform Balances for Best 150 Samples of Size ham tan mod tan random
23 Balances with p 1 =.20 for Size 20 Uniform Balances for Best 150 Samples of Size ham tan mod tan random
24 Balance Results for Medicinals Hamming distance gives worst balance whenever p 1 <.25 Random selection tends to be very erratic Both Tanimoto and Modified Tanimoto produce good balance for p 1 near the database median. Modified Tanimoto produces better balance than does Tanimoto for p 1 larger the database median.
25 Conclusion Using Modified Tanimoto in a spread design tends to produce the best balance over a wide operating range of compound sizes.
26 Recap Modulated Response Model Includes all two-way interaction terms Peak response for compounds of size p 0 Response falls off outside a window of width h Information Total Information maximized at p 1 >p 0 Balance defined in terms of 2-way margins Spread Designs Higgs Algorithm Metrics Hamming -- picks large compounds Tanimoto -- picks small compounds Modified Tanimoto picks medium compounds Medicinal Database Example
Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing
Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning and Simulated Annealing Student: Ke Zhang MBMA Committee: Dr. Charles E. Smith (Chair) Dr. Jacqueline M. Hughes-Oliver
More informationDevelopment of a Structure Generator to Explore Target Areas on Chemical Space
Development of a Structure Generator to Explore Target Areas on Chemical Space Kimito Funatsu Department of Chemical System Engineering, This materials will be published on Molecular Informatics Drug Development
More informationApproximating MAX-E3LIN is NP-Hard
Approximating MAX-E3LIN is NP-Hard Evan Chen May 4, 2016 This lecture focuses on the MAX-E3LIN problem. We prove that approximating it is NP-hard by a reduction from LABEL-COVER. 1 Introducing MAX-E3LIN
More informationError Detection and Correction: Hamming Code; Reed-Muller Code
Error Detection and Correction: Hamming Code; Reed-Muller Code Greg Plaxton Theory in Programming Practice, Spring 2005 Department of Computer Science University of Texas at Austin Hamming Code: Motivation
More informationChapter 8: Introduction to Evolutionary Computation
Computational Intelligence: Second Edition Contents Some Theories about Evolution Evolution is an optimization process: the aim is to improve the ability of an organism to survive in dynamically changing
More informationCSE 190, Great ideas in algorithms: Pairwise independent hash functions
CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required
More informationInformation Hiding and Covert Communication
Information Hiding and Covert Communication Andrew Ker adk @ comlab.ox.ac.uk Royal Society University Research Fellow Oxford University Computing Laboratory Foundations of Security Analysis and Design
More informationAssignment 3: Chapter 2 & 3 (2.6, 3.8)
Neha Aggarwal Comp 578 Data Mining Fall 8 9-12-8 Assignment 3: Chapter 2 & 3 (2.6, 3.8) 2.6 Q.18 This exercise compares and contrasts some similarity and distance measures. (a) For binary data, the L1
More informationCombinatorial algorithms
Combinatorial algorithms computing subset rank and unrank, Gray codes, k-element subset rank and unrank, computing permutation rank and unrank Jiří Vyskočil, Radek Mařík 2012 Combinatorial Generation definition:
More informationComputational Intelligence Winter Term 2018/19
Computational Intelligence Winter Term 2018/19 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund Three tasks: 1. Choice of an appropriate problem
More informationLecture 12. Block Diagram
Lecture 12 Goals Be able to encode using a linear block code Be able to decode a linear block code received over a binary symmetric channel or an additive white Gaussian channel XII-1 Block Diagram Data
More informationComputational Completeness
Computational Completeness 1 Definitions and examples Let Σ = {f 1, f 2,..., f i,...} be a (finite or infinite) set of Boolean functions. Any of the functions f i Σ can be a function of arbitrary number
More informationACM 116: Lecture 1. Agenda. Philosophy of the Course. Definition of probabilities. Equally likely outcomes. Elements of combinatorics
1 ACM 116: Lecture 1 Agenda Philosophy of the Course Definition of probabilities Equally likely outcomes Elements of combinatorics Conditional probabilities 2 Philosophy of the Course Probability is the
More informationIn Silico Investigation of Off-Target Effects
PHARMA & LIFE SCIENCES WHITEPAPER In Silico Investigation of Off-Target Effects STREAMLINING IN SILICO PROFILING In silico techniques require exhaustive data and sophisticated, well-structured informatics
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationComputational chemical biology to address non-traditional drug targets. John Karanicolas
Computational chemical biology to address non-traditional drug targets John Karanicolas Our computational toolbox Structure-based approaches Ligand-based approaches Detailed MD simulations 2D fingerprints
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationMulti-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University
Multi-class SVMs Lecture 17: Aykut Erdem April 2016 Hacettepe University Administrative We will have a make-up lecture on Saturday April 23, 2016. Project progress reports are due April 21, 2016 2 days
More informationSequential Logic Optimization. Optimization in Context. Algorithmic Approach to State Minimization. Finite State Machine Optimization
Sequential Logic Optimization! State Minimization " Algorithms for State Minimization! State, Input, and Output Encodings " Minimize the Next State and Output logic Optimization in Context! Understand
More informationthe long tau-path for detecting monotone association in an unspecified subpopulation
the long tau-path for detecting monotone association in an unspecified subpopulation Joe Verducci Current Challenges in Statistical Learning Workshop Banff International Research Station Tuesday, December
More informationLecture 8 HASHING!!!!!
Lecture 8 HASHING!!!!! Announcements HW3 due Friday! HW4 posted Friday! Q: Where can I see examples of proofs? Lecture Notes CLRS HW Solutions Office hours: lines are long L Solutions: We will be (more)
More informationFlip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance. Presenter: Brian Wongchaowart March 17, 2010
Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance Sangyeun Cho Hyunjin Lee Presenter: Brian Wongchaowart March 17, 2010 Motivation Suppose that you
More informationLecture 7 September 24
EECS 11: Coding for Digital Communication and Beyond Fall 013 Lecture 7 September 4 Lecturer: Anant Sahai Scribe: Ankush Gupta 7.1 Overview This lecture introduces affine and linear codes. Orthogonal signalling
More informationDivCalc: A Utility for Diversity Analysis and Compound Sampling
Molecules 2002, 7, 657-661 molecules ISSN 1420-3049 http://www.mdpi.org DivCalc: A Utility for Diversity Analysis and Compound Sampling Rajeev Gangal* SciNova Informatics, 161 Madhumanjiri Apartments,
More informationInfo-Greedy Sequential Adaptive Compressed Sensing
Info-Greedy Sequential Adaptive Compressed Sensing Yao Xie Joint work with Gabor Braun and Sebastian Pokutta Georgia Institute of Technology Presented at Allerton Conference 2014 Information sensing for
More informationPrincipal component analysis (PCA) for clustering gene expression data
Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical
More informationOptimum Soft Decision Decoding of Linear Block Codes
Optimum Soft Decision Decoding of Linear Block Codes {m i } Channel encoder C=(C n-1,,c 0 ) BPSK S(t) (n,k,d) linear modulator block code Optimal receiver AWGN Assume that [n,k,d] linear block code C is
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More informationSELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms
SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms Revision submitted to Technometrics Abhyuday Mandal Ph.D. Candidate School of Industrial and Systems Engineering
More informationSimilarity methods for ligandbased virtual screening
Similarity methods for ligandbased virtual screening Peter Willett, University of Sheffield Computers in Scientific Discovery 5, 22 nd July 2010 Overview Molecular similarity and its use in virtual screening
More informationUsing Self-Organizing maps to accelerate similarity search
YOU LOGO Using Self-Organizing maps to accelerate similarity search Fanny Bonachera, Gilles Marcou, Natalia Kireeva, Alexandre Varnek, Dragos Horvath Laboratoire d Infochimie, UM 7177. 1, rue Blaise Pascal,
More informationAssignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.
Assignment 1 Math 5341 Linear Algebra Review Give complete answers to each of the following questions Show all of your work Note: You might struggle with some of these questions, either because it has
More informationCHAPTER 2 AN ALGORITHM FOR OPTIMIZATION OF QUANTUM COST. 2.1 Introduction
CHAPTER 2 AN ALGORITHM FOR OPTIMIZATION OF QUANTUM COST Quantum cost is already introduced in Subsection 1.3.3. It is an important measure of quality of reversible and quantum circuits. This cost metric
More informationClustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.
1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches:
More informationAlgorithms, CSE, OSU. Introduction, complexity of algorithms, asymptotic growth of functions. Instructor: Anastasios Sidiropoulos
6331 - Algorithms, CSE, OSU Introduction, complexity of algorithms, asymptotic growth of functions Instructor: Anastasios Sidiropoulos Why algorithms? Algorithms are at the core of Computer Science Why
More informationThe fingerprint Package
The fingerprint Package October 7, 2007 Version 2.6 Date 2007-10-05 Title Functions to operate on binary fingerprint data Author Rajarshi Guha Maintainer Rajarshi Guha
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationPrincipal Components Analysis. Sargur Srihari University at Buffalo
Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2
More informationLearning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1
Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and
More informationA ZERO ENTROPY T SUCH THAT THE [T,ID] ENDOMORPHISM IS NONSTANDARD
A ZERO ENTROPY T SUCH THAT THE [T,ID] ENDOMORPHISM IS NONSTANDARD CHRISTOPHER HOFFMAN Abstract. We present an example of an ergodic transformation T, a variant of a zero entropy non loosely Bernoulli map
More informationTutorial on Approximate Bayesian Computation
Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016
More informationMind the gap Solving optimization problems with a quantum computer
Mind the gap Solving optimization problems with a quantum computer A.P. Young http://physics.ucsc.edu/~peter Work supported by Talk at the London Centre for Nanotechnology, October 17, 2012 Collaborators:
More informationGenetic Algorithms and Genetic Programming Lecture 17
Genetic Algorithms and Genetic Programming Lecture 17 Gillian Hayes 28th November 2006 Selection Revisited 1 Selection and Selection Pressure The Killer Instinct Memetic Algorithms Selection and Schemas
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationWe have seen that for a function the partial derivatives whenever they exist, play an important role. This motivates the following definition.
\ Module 12 : Total differential, Tangent planes and normals Lecture 34 : Gradient of a scaler field [Section 34.1] Objectives In this section you will learn the following : The notions gradient vector
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationHaploid & diploid recombination and their evolutionary impact
Haploid & diploid recombination and their evolutionary impact W. Garrett Mitchener College of Charleston Mathematics Department MitchenerG@cofc.edu http://mitchenerg.people.cofc.edu Introduction The basis
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationKolmogorov-Loveland Randomness and Stochasticity
Kolmogorov-Loveland Randomness and Stochasticity Wolfgang Merkle 1 Joseph Miller 2 André Nies 3 Jan Reimann 1 Frank Stephan 4 1 Institut für Informatik, Universität Heidelberg 2 Department of Mathematics,
More informationSome Nonregular Designs From the Nordstrom and Robinson Code and Their Statistical Properties
Some Nonregular Designs From the Nordstrom and Robinson Code and Their Statistical Properties HONGQUAN XU Department of Statistics, University of California, Los Angeles, CA 90095-1554, U.S.A. (hqxu@stat.ucla.edu)
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at
More information4 Limit and Continuity of Functions
Module 2 : Limits and Continuity of Functions Lecture 4 : Limit at a point Objectives In this section you will learn the following The sequential concept of limit of a function The definition of the limit
More informationEvolutionary Computation
Evolutionary Computation - Computational procedures patterned after biological evolution. - Search procedure that probabilistically applies search operators to set of points in the search space. - Lamarck
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationINFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson
INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS Michael A. Lexa and Don H. Johnson Rice University Department of Electrical and Computer Engineering Houston, TX 775-892 amlexa@rice.edu,
More informationCausality & Concurrency. Time-Stamping Systems. Plausibility. Example TSS: Lamport Clocks. Example TSS: Vector Clocks
Plausible Clocks with Bounded Inaccuracy Causality & Concurrency a b exists a path from a to b Brad Moore, Paul Sivilotti Computer Science & Engineering The Ohio State University paolo@cse.ohio-state.edu
More informationEuler s method for solving a differential equation (approximately)
Euler s method for solving a differential equation (approximately) Department of Mathematics, UW - Madison March 4, 2013 A chemical reaction A B A B A A A chemical reactor contains two kinds of molecules,
More informationIn this section you will learn the following : 40.1Double integrals
Module 14 : Double Integrals, Applilcations to Areas and Volumes Change of variables Lecture 40 : Double integrals over rectangular domains [Section 40.1] Objectives In this section you will learn the
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More information4/19/11. NP and NP completeness. Decision Problems. Definition of P. Certifiers and Certificates: COMPOSITES
Decision Problems NP and NP completeness Identify a decision problem with a set of binary strings X Instance: string s. Algorithm A solves problem X: As) = yes iff s X. Polynomial time. Algorithm A runs
More informationExpressions that always have the same value. The Identity Property of Addition states that For any value a; a + 0 = a so = 3
Name Key Words/Topic 2.1 Identity and Zero Properties Topic 2 Guided Notes Equivalent Expressions Identity Property of Addition Identity Property of Multiplication Zero Property of Multiplication The sum
More informationLecture Notes on Secret Sharing
COMS W4261: Introduction to Cryptography. Instructor: Prof. Tal Malkin Lecture Notes on Secret Sharing Abstract These are lecture notes from the first two lectures in Fall 2016, focusing on technical material
More informationError Detection and Correction: Small Applications of Exclusive-Or
Error Detection and Correction: Small Applications of Exclusive-Or Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin Exclusive-Or (XOR,
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationMind the gap Solving optimization problems with a quantum computer
Mind the gap Solving optimization problems with a quantum computer A.P. Young http://physics.ucsc.edu/~peter Work supported by Talk at Saarbrücken University, November 5, 2012 Collaborators: I. Hen, E.
More informationRegular and synchronizing transformation monoids
Regular and synchronizing transformation monoids Peter J. Cameron NBSAN, York 23 November 2011 Mathematics may be defined as the subject in which we never know what we are talking about, nor whether what
More informationBSc MATHEMATICAL SCIENCE
Overview College of Science Modules Electives May 2018 (2) BSc MATHEMATICAL SCIENCE BSc Mathematical Science Degree 2018 1 College of Science, NUI Galway Fullscreen Next page Overview [60 Credits] [60
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationComp487/587 - Boolean Formulas
Comp487/587 - Boolean Formulas 1 Logic and SAT 1.1 What is a Boolean Formula Logic is a way through which we can analyze and reason about simple or complicated events. In particular, we are interested
More informationSimilarity Search. Uwe Koch
Similarity Search Uwe Koch Similarity Search The similar property principle: strurally similar molecules tend to have similar properties. However, structure property discontinuities occur frequently. Relevance
More informationIntroduction to Chemoinformatics and Drug Discovery
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013 The Chemical Space There are atoms and space. Everything else is opinion. Democritus (ca.
More informationIntroduction to Randomized Algorithms: Quick Sort and Quick Selection
Chapter 14 Introduction to Randomized Algorithms: Quick Sort and Quick Selection CS 473: Fundamental Algorithms, Spring 2011 March 10, 2011 14.1 Introduction to Randomized Algorithms 14.2 Introduction
More informationSecond Order Reed-Muller Decoding Algorithm in Quantum Computing
Second Order Reed-Muller Decoding Algorithm in Quantum Computing Minji Kim June 2, 2011 1 Contents 1 Introduction 2 1.1 How to identify the order...................... 2 2 First order Reed Muller Code
More informationCoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain;
CoDa-dendrogram: A new exploratory tool J.J. Egozcue 1, and V. Pawlowsky-Glahn 2 1 Dept. Matemàtica Aplicada III, Universitat Politècnica de Catalunya, Barcelona, Spain; juan.jose.egozcue@upc.edu 2 Dept.
More information1 Cryptographic hash functions
CSCI 5440: Cryptography Lecture 6 The Chinese University of Hong Kong 24 October 2012 1 Cryptographic hash functions Last time we saw a construction of message authentication codes (MACs) for fixed-length
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationLearning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University
Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic
More informationarxiv: v2 [cs.ds] 3 Oct 2017
Orthogonal Vectors Indexing Isaac Goldstein 1, Moshe Lewenstein 1, and Ely Porat 1 1 Bar-Ilan University, Ramat Gan, Israel {goldshi,moshe,porately}@cs.biu.ac.il arxiv:1710.00586v2 [cs.ds] 3 Oct 2017 Abstract
More informationModelling with cellular automata
Modelling with cellular automata Shan He School for Computational Science University of Birmingham Module 06-23836: Computational Modelling with MATLAB Outline Outline of Topics Concepts about cellular
More informationhas its own advantages and drawbacks, depending on the questions facing the drug discovery.
2013 First International Conference on Artificial Intelligence, Modelling & Simulation Comparison of Similarity Coefficients for Chemical Database Retrieval Mukhsin Syuib School of Information Technology
More informationCMU Social choice: Advanced manipulation. Teachers: Avrim Blum Ariel Procaccia (this time)
CMU 15-896 Social choice: Advanced manipulation Teachers: Avrim Blum Ariel Procaccia (this time) Recap A Complexity-theoretic barrier to manipulation Polynomial-time greedy alg can successfully decide
More informationFunctional Group Fingerprints CNS Chemistry Wilmington, USA
Functional Group Fingerprints CS Chemistry Wilmington, USA James R. Arnold Charles L. Lerman William F. Michne James R. Damewood American Chemical Society ational Meeting August, 2004 Philadelphia, PA
More informationNotion of Distance. Metric Distance Binary Vector Distances Tangent Distance
Notion of Distance Metric Distance Binary Vector Distances Tangent Distance Distance Measures Many pattern recognition/data mining techniques are based on similarity measures between objects e.g., nearest-neighbor
More informationRecap: Prefix Sums. Given A: set of n integers Find B: prefix sums 1 / 86
Recap: Prefix Sums Given : set of n integers Find B: prefix sums : 3 1 1 7 2 5 9 2 4 3 3 B: 3 4 5 12 14 19 28 30 34 37 40 1 / 86 Recap: Parallel Prefix Sums Recursive algorithm Recursively computes sums
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationQuantum Oracle Classification
Quantum Oracle Classification The Case of Group Structure Mark Zhandry Princeton University Query Complexity x O(x) O:Xà Y Info about O Examples: Pre- image of given output Collision Complete description
More informationLakehead University ECON 4117/5111 Mathematical Economics Fall 2003
Test 1 September 26, 2003 1. Construct a truth table to prove each of the following tautologies (p, q, r are statements and c is a contradiction): (a) [p (q r)] [(p q) r] (b) (p q) [(p q) c] 2. Answer
More informationLecture 11. Linear Soft Margin Support Vector Machines
CS142: Machine Learning Spring 2017 Lecture 11 Instructor: Pedro Felzenszwalb Scribes: Dan Xiang, Tyler Dae Devlin Linear Soft Margin Support Vector Machines We continue our discussion of linear soft margin
More informationan efficient procedure for the decision problem. We illustrate this phenomenon for the Satisfiability problem.
1 More on NP In this set of lecture notes, we examine the class NP in more detail. We give a characterization of NP which justifies the guess and verify paradigm, and study the complexity of solving search
More informationProtein Complex Identification by Supervised Graph Clustering
Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie
More information2 Completing the Hardness of approximation of Set Cover
CSE 533: The PCP Theorem and Hardness of Approximation (Autumn 2005) Lecture 15: Set Cover hardness and testing Long Codes Nov. 21, 2005 Lecturer: Venkat Guruswami Scribe: Atri Rudra 1 Recap We will first
More informationCS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6
CS 395T Computational Learning Theory Lecture 3: September 0, 2007 Lecturer: Adam Klivans Scribe: Mike Halcrow 3. Decision List Recap In the last class, we determined that, when learning a t-decision list,
More information1 Difference between grad and undergrad algorithms
princeton univ. F 4 cos 52: Advanced Algorithm Design Lecture : Course Intro and Hashing Lecturer: Sanjeev Arora Scribe:Sanjeev Algorithms are integral to computer science and every computer scientist
More informationChemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013
Chemical Space Space, Diversity, and Synthesis Jeremy Henle, 4/23/2013 Computational Modeling Chemical Space As a diversity construct Outline Quantifying Diversity Diversity Oriented Synthesis Wolf and
More informationXOR - XNOR Gates. The graphic symbol and truth table of XOR gate is shown in the figure.
XOR - XNOR Gates Lesson Objectives: In addition to AND, OR, NOT, NAND and NOR gates, exclusive-or (XOR) and exclusive-nor (XNOR) gates are also used in the design of digital circuits. These have special
More informationGecco 2007 Tutorial / Grammatical Evolution
Gecco 2007 Grammatical Evolution Tutorial Conor Ryan Biocomputing and Developmental Systems Group Department of Computer Science and Information Systems University of Limerick Copyright is held by the
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More information