Efficient discovery of the top-k optimal dependency rules with Fisher s exact test of significance

Size: px
Start display at page:

Download "Efficient discovery of the top-k optimal dependency rules with Fisher s exact test of significance"

Transcription

1 Efficient discovery of the top-k optimal dependency rules with Fisher s exact test of significance Wilhelmiina Hämäläinen epartment of omputer Science University of Helsinki Finland whamalai@cs.helsinki.fi IM p./23

2 Problem Given a set of binary attributes R = {,..., k }. Search for the most significant positive and negative dependency rules X, where X R, R \ X! notations: = and = negative dependency between X and, if P(X) < P(X)P() P(X ) > P(X)P( ) pos. dependency between X and enough to search for positive dependencies for any consequence = a, R, a {, } IM p.2/23

3 Requirements for X = a sufficiently significant by Fisher s exact test: p F (X = a) = i ( m(x) m(x=a)+i ( ) ( n m(=a) m( X) m( X a)+i non-redundant (X contains no extra attributes which do not improve the dependency): Y X such that p F (X = a) p F (Y = a) No other restrictions (like minimum frequency thresholds)! ) ) IM p.3/23

4 Example ata R = {,,,} n = set freq est non-redundant rules rule p F e.g. redundant IM p.4/23

5 lgorithm: the main idea Generate an enumeration tree and keep record on possible consequences. onsequence i = a i is impossible in set X, if for all sets Q rule X \ { i }Q i = a i is insignificant or redundant. Possible consequences in node : pos neg IM p.5/23

6 lgorithm: the stopping criterion node (and its subtree) can be pruned out, when no possible consequences left. The algorithm stops, when no more nodes can be created. Note: fr= is not a sufficient condition for pruning out a node! However, no children are created for it. IM p.6/23

7 lgorithm: possible consequences Given node (set) X, its parents are Y i = X \ { i } for all i X.. Initialization: combine parents consequence vectors (bit-and). 2. Updating: estimate lower bounds for p F (X \ { i }Q i = a i ) and decide if i = a i impossible in node X. L, when only m( i = a i ) known; L2, when m( i = a i ) and m(x) known, i / X; L3, when m( i = a i ) and m(x) known, i X. IM p.7/23

8 lgorithm: possible consequences 3. Minimality condition: If P( i = a i Y i ) =., then i, i, and all j = a j, j / X, are impossible. 4. Lapis philosophorum principle: If i = a i is impossible in X, it is impossible in the parent node Y i = X \ { i } and all its children. efficient pruning! (hess, ccidents and Pumsb could not be handled without LP) IM p.8/23

9 Simulation (level ) Search for the best rules, when p max =.2 8. L: and are impossible consequences Otherwise, possible consequences are determined by L2 IM p.9/23

10 Simulation (level 2) ombine parents bitvectors IM p./23

11 Simulation (level 2) Rule minimal and become impossible consequences (more specific rules would be redundant) IM p./23

12 Simulation (level 2) LP: set as impossible in node IM p.2/23

13 Simulation (level 2) Rules and minimal, and impossible cons. Node is removed IM p.3/23

14 Simulation (level 2) Node removed LP: set as impossible in node and and as impossible in node IM p.4/23

15 Simulation (level 2) Node : no rules found IM p.5/23

16 Simulation (level 2) L3: impossible cons. Rule (not min.) IM p.6/23

17 Simulation (level 2) LP: impossible cons. also in node IM p.7/23

18 Simulation (level 2) and : no rules found IM p.8/23

19 Simulation (level 3) Rule minimal impossible consequence Node is removed LP IM p.9/23

20 Simulation (level 3) Rules and minimal Node is removed LP IM p.2/23

21 Results Implementation kingfisher the algorithm itself suits for all common goodness measures for statistical dependencies currently only p F and χ 2, but easy to add new measures very efficient without any extra restrictions (G memory suffices for Pumsb, ccidents, Retail, hess, etc.) p F is more efficient and produces more reliable results than χ 2! IM p.2/23

22 Thank you! IM p.22/23

23 Possible questions Why the rules are called dependency rules and not association rules? re there any previous solutions to the problem? Why the consequence vectors contain all attributes in all nodes? Wouldn t it be enough to store only those attributes, which can be added under a node? IM p.23/23

Efficient search for statistically significant dependency rules in binary data

Efficient search for statistically significant dependency rules in binary data Department of Computer Science Series of Publications A Report A-2010-2 Efficient search for statistically significant dependency rules in binary data Wilhelmiina Hämäläinen To be presented, with the permission

More information

IE418 Integer Programming

IE418 Integer Programming IE418: Integer Programming Department of Industrial and Systems Engineering Lehigh University 2nd February 2005 Boring Stuff Extra Linux Class: 8AM 11AM, Wednesday February 9. Room??? Accounts and Passwords

More information

where X is the feasible region, i.e., the set of the feasible solutions.

where X is the feasible region, i.e., the set of the feasible solutions. 3.5 Branch and Bound Consider a generic Discrete Optimization problem (P) z = max{c(x) : x X }, where X is the feasible region, i.e., the set of the feasible solutions. Branch and Bound is a general semi-enumerative

More information

Pulse characterization with Wavelet transforms combined with classification using binary arrays

Pulse characterization with Wavelet transforms combined with classification using binary arrays Pulse characterization with Wavelet transforms combined with classification using binary arrays Overview Wavelet Transformation Creating binary arrays out of and how to deal with them An estimator for

More information

Efficient search of association rules with Fisher s exact test

Efficient search of association rules with Fisher s exact test fficient search of association rules with Fisher s exact test W. Hämäläinen Abstract limination of spurious and redundant rules is an important problem in the association rule discovery. Statistical measure

More information

Integer Programming. Wolfram Wiesemann. December 6, 2007

Integer Programming. Wolfram Wiesemann. December 6, 2007 Integer Programming Wolfram Wiesemann December 6, 2007 Contents of this Lecture Revision: Mixed Integer Programming Problems Branch & Bound Algorithms: The Big Picture Solving MIP s: Complete Enumeration

More information

SF2930 Regression Analysis

SF2930 Regression Analysis SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2

More information

STATISTICALLY SOUND PATTERN DISCOVERY

STATISTICALLY SOUND PATTERN DISCOVERY Tutorial KDD 14 New York STATISTICALLY SOUND PATTERN DISCOVERY Wilhelmiina Hämäläinen University of Eastern Finland whamalai@cs.uef.fi Geoff Webb Monash University Australia geoff.webb@monash.edu http://www.cs.joensuu.fi/pages/whamalai/kdd14/

More information

Amortized analysis. Amortized analysis

Amortized analysis. Amortized analysis In amortized analysis the goal is to bound the worst case time of a sequence of operations on a data-structure. If n operations take T (n) time (worst case), the amortized cost of an operation is T (n)/n.

More information

Guaranteeing the Accuracy of Association Rules by Statistical Significance

Guaranteeing the Accuracy of Association Rules by Statistical Significance Guaranteeing the Accuracy of Association Rules by Statistical Significance W. Hämäläinen Department of Computer Science, University of Helsinki, Finland Abstract. Association rules are a popular knowledge

More information

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009 Branch-and-Bound Algorithm November 3, 2009 Outline Lecture 23 Modeling aspect: Either-Or requirement Special ILPs: Totally unimodular matrices Branch-and-Bound Algorithm Underlying idea Terminology Formal

More information

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur Module 5 EMBEDDED WAVELET CODING Lesson 13 Zerotree Approach. Instructional Objectives At the end of this lesson, the students should be able to: 1. Explain the principle of embedded coding. 2. Show the

More information

Decision Procedures An Algorithmic Point of View

Decision Procedures An Algorithmic Point of View An Algorithmic Point of View ILP References: Integer Programming / Laurence Wolsey Deciding ILPs with Branch & Bound Intro. To mathematical programming / Hillier, Lieberman Daniel Kroening and Ofer Strichman

More information

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,

More information

The CPLEX Library: Mixed Integer Programming

The CPLEX Library: Mixed Integer Programming The CPLEX Library: Mixed Programming Ed Rothberg, ILOG, Inc. 1 The Diet Problem Revisited Nutritional values Bob considered the following foods: Food Serving Size Energy (kcal) Protein (g) Calcium (mg)

More information

Streaming Algorithms for Optimal Generation of Random Bits

Streaming Algorithms for Optimal Generation of Random Bits Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou Electrical Engineering Department California Institute of echnology Pasadena, CA 925 Email: hzhou@caltech.edu Jehoshua Bruck Electrical

More information

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan

More information

Extending the Associative Rule Chaining Architecture for Multiple Arity Rules

Extending the Associative Rule Chaining Architecture for Multiple Arity Rules Extending the Associative Rule Chaining Architecture for Multiple Arity Rules Nathan Burles, James Austin, and Simon O Keefe Advanced Computer Architectures Group Department of Computer Science University

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Decision Trees. CS 341 Lectures 8/9 Dan Sheldon

Decision Trees. CS 341 Lectures 8/9 Dan Sheldon Decision rees CS 341 Lectures 8/9 Dan Sheldon Review: Linear Methods Y! So far, we ve looked at linear methods! Linear regression! Fit a line/plane/hyperplane X 2 X 1! Logistic regression! Decision boundary

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

Algorithm Analysis Divide and Conquer. Chung-Ang University, Jaesung Lee

Algorithm Analysis Divide and Conquer. Chung-Ang University, Jaesung Lee Algorithm Analysis Divide and Conquer Chung-Ang University, Jaesung Lee Introduction 2 Divide and Conquer Paradigm 3 Advantages of Divide and Conquer Solving Difficult Problems Algorithm Efficiency Parallelism

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

HW #4. (mostly by) Salim Sarımurat. 1) Insert 6 2) Insert 8 3) Insert 30. 4) Insert S a.

HW #4. (mostly by) Salim Sarımurat. 1) Insert 6 2) Insert 8 3) Insert 30. 4) Insert S a. HW #4 (mostly by) Salim Sarımurat 04.12.2009 S. 1. 1. a. 1) Insert 6 2) Insert 8 3) Insert 30 4) Insert 40 2 5) Insert 50 6) Insert 61 7) Insert 70 1. b. 1) Insert 12 2) Insert 29 3) Insert 30 4) Insert

More information

Multivalued Decision Diagrams. Postoptimality Analysis Using. J. N. Hooker. Tarik Hadzic. Cork Constraint Computation Centre

Multivalued Decision Diagrams. Postoptimality Analysis Using. J. N. Hooker. Tarik Hadzic. Cork Constraint Computation Centre Postoptimality Analysis Using Multivalued Decision Diagrams Tarik Hadzic Cork Constraint Computation Centre J. N. Hooker Carnegie Mellon University London School of Economics June 2008 Postoptimality Analysis

More information

State of the art Image Compression Techniques

State of the art Image Compression Techniques Chapter 4 State of the art Image Compression Techniques In this thesis we focus mainly on the adaption of state of the art wavelet based image compression techniques to programmable hardware. Thus, an

More information

Problem: Data base too big to fit memory Disk reads are slow. Example: 1,000,000 records on disk Binary search might take 20 disk reads

Problem: Data base too big to fit memory Disk reads are slow. Example: 1,000,000 records on disk Binary search might take 20 disk reads B Trees Problem: Data base too big to fit memory Disk reads are slow Example: 1,000,000 records on disk Binary search might take 20 disk reads Disk reads are done in blocks Example: One block read can

More information

AM 121: Intro to Optimization! Models and Methods! Fall 2018!

AM 121: Intro to Optimization! Models and Methods! Fall 2018! AM 121: Intro to Optimization Models and Methods Fall 2018 Lecture 13: Branch and Bound (I) Yiling Chen SEAS Example: max 5x 1 + 8x 2 s.t. x 1 + x 2 6 5x 1 + 9x 2 45 x 1, x 2 0, integer 1 x 2 6 5 x 1 +x

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna. onserved RN Structures Ivo L. Hofacker Institut for Theoretical hemistry, University Vienna http://www.tbi.univie.ac.at/~ivo/ Bled, January 2002 Energy Directed Folding Predict structures from sequence

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6 CS 395T Computational Learning Theory Lecture 3: September 0, 2007 Lecturer: Adam Klivans Scribe: Mike Halcrow 3. Decision List Recap In the last class, we determined that, when learning a t-decision list,

More information

C4.5 - pruning decision trees

C4.5 - pruning decision trees C4.5 - pruning decision trees Quiz 1 Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can have? A: No. Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can

More information

Efficient discovery of statistically significant association rules

Efficient discovery of statistically significant association rules fficient discovery of statistically significant association rules Wilhelmiina Hämäläinen Department of Computer Science University of Helsinki Finland whamalai@cs.helsinki.fi Matti Nykänen Department of

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

Digital search trees JASS

Digital search trees JASS Digital search trees Analysis of different digital trees with Rice s integrals. JASS Nicolai v. Hoyningen-Huene 28.3.2004 28.3.2004 JASS 04 - Digital search trees 1 content Tree Digital search tree: Definition

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min. Huffman coding Optimal codes - I A code is optimal if it has the shortest codeword length L L m = i= pl i i This can be seen as an optimization problem min i= li subject to D m m i= lp Gabriele Monfardini

More information

Advanced Analysis of Algorithms - Midterm (Solutions)

Advanced Analysis of Algorithms - Midterm (Solutions) Advanced Analysis of Algorithms - Midterm (Solutions) K. Subramani LCSEE, West Virginia University, Morgantown, WV {ksmani@csee.wvu.edu} 1 Problems 1. Solve the following recurrence using substitution:

More information

Ordered Dictionary & Binary Search Tree

Ordered Dictionary & Binary Search Tree Ordered Dictionary & Binary Search Tree Finite map from keys to values, assuming keys are comparable (). insert(k, v) lookup(k) aka find: the associated value if any delete(k) (Ordered set: No values;

More information

Binary Decision Diagrams. Graphs. Boolean Functions

Binary Decision Diagrams. Graphs. Boolean Functions Binary Decision Diagrams Graphs Binary Decision Diagrams (BDDs) are a class of graphs that can be used as data structure for compactly representing boolean functions. BDDs were introduced by R. Bryant

More information

Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups

Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups Contemporary Mathematics Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups Robert M. Haralick, Alex D. Miasnikov, and Alexei G. Myasnikov Abstract. We review some basic methodologies

More information

Disjoint Sets : ADT. Disjoint-Set : Find-Set

Disjoint Sets : ADT. Disjoint-Set : Find-Set Disjoint Sets : ADT Disjoint-Set Forests Make-Set( x ) Union( s, s ) Find-Set( x ) : π( ) : π( ) : ν( log * n ) amortized cost 6 9 {,, 9,, 6} set # {} set # 7 5 {5, 7, } set # Disjoint-Set : Find-Set Disjoint

More information

Machine Learning 3. week

Machine Learning 3. week Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which

More information

Personal Project: Shift-Reduce Dependency Parsing

Personal Project: Shift-Reduce Dependency Parsing Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce

More information

CS 240 Data Structures and Data Management. Module 4: Dictionaries

CS 240 Data Structures and Data Management. Module 4: Dictionaries CS 24 Data Structures and Data Management Module 4: Dictionaries A. Biniaz A. Jamshidpey É. Schost Based on lecture notes by many previous cs24 instructors David R. Cheriton School of Computer Science,

More information

Streaming Algorithms for Optimal Generation of Random Bits

Streaming Algorithms for Optimal Generation of Random Bits Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou, and Jehoshua Bruck, Fellow, IEEE arxiv:09.0730v [cs.i] 4 Sep 0 Abstract Generating random bits from a source of biased coins (the

More information

Using an innovative coding algorithm for data encryption

Using an innovative coding algorithm for data encryption Using an innovative coding algorithm for data encryption Xiaoyu Ruan and Rajendra S. Katti Abstract This paper discusses the problem of using data compression for encryption. We first propose an algorithm

More information

Further analysis. Objective Based on the understanding of the LV model from last report, further study the behaviours of the LV equations parameters.

Further analysis. Objective Based on the understanding of the LV model from last report, further study the behaviours of the LV equations parameters. Further analysis Objective Based on the understanding of the LV model from last report, further study the behaviours of the LV equations parameters. Changes have been made Method of finding the mean is

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Advanced Implementations of Tables: Balanced Search Trees and Hashing Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the

More information

Introduction to Mathematical Programming IE406. Lecture 21. Dr. Ted Ralphs

Introduction to Mathematical Programming IE406. Lecture 21. Dr. Ted Ralphs Introduction to Mathematical Programming IE406 Lecture 21 Dr. Ted Ralphs IE406 Lecture 21 1 Reading for This Lecture Bertsimas Sections 10.2, 10.3, 11.1, 11.2 IE406 Lecture 21 2 Branch and Bound Branch

More information

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Guimei Liu 1,2 Jinyan Li 1 Limsoon Wong 2 Wynne Hsu 2 1 Institute for Infocomm Research, Singapore 2 School

More information

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 CS17 Integrated Introduction to Computer Science Klein Contents Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 1 Tree definitions 1 2 Analysis of mergesort using a binary tree 1 3 Analysis of

More information

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Trees Entropy, Information Gain, Gain Ratio Changelog: 14 Oct, 30 Oct Decision Trees Entropy, Information Gain, Gain Ratio Lecture 3: Part 2 Outline Entropy Information gain Gain ratio Marina Santini Acknowledgements Slides borrowed and adapted

More information

5. Density evolution. Density evolution 5-1

5. Density evolution. Density evolution 5-1 5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;

More information

Assignment 5: Solutions

Assignment 5: Solutions Comp 21: Algorithms and Data Structures Assignment : Solutions 1. Heaps. (a) First we remove the minimum key 1 (which we know is located at the root of the heap). We then replace it by the key in the position

More information

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute

More information

Exploiting Statistically Significant Dependent Rules for Associative Classification

Exploiting Statistically Significant Dependent Rules for Associative Classification Exploiting Statistically Significant Depent Rules for Associative Classification Jundong Li Computer Science and Engineering Arizona State University, USA jundongl@asuedu Osmar R Zaiane Department of Computing

More information

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing

More information

Lecture 2 September 4, 2014

Lecture 2 September 4, 2014 CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the

More information

Fibonacci Heaps These lecture slides are adapted from CLRS, Chapter 20.

Fibonacci Heaps These lecture slides are adapted from CLRS, Chapter 20. Fibonacci Heaps These lecture slides are adapted from CLRS, Chapter 20. Princeton University COS 4 Theory of Algorithms Spring 2002 Kevin Wayne Priority Queues Operation mae-heap insert find- delete- union

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Parameter Estimation, Sampling Distributions & Hypothesis Testing Parameter Estimation, Sampling Distributions & Hypothesis Testing Parameter Estimation & Hypothesis Testing In doing research, we are usually interested in some feature of a population distribution (which

More information

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler + Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:

More information

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi Predictive Modeling: Classification Topic 6 Mun Yi Agenda Models and Induction Entropy and Information Gain Tree-Based Classifier Probability Estimation 2 Introduction Key concept of BI: Predictive modeling

More information

Meronymy-based Aggregation of Activities in Business Process Models

Meronymy-based Aggregation of Activities in Business Process Models Meronymy-based Aggregation of Activities in Business Process Models Sergey Smirnov 1, Remco Dijkman 2, Jan Mendling 3, and Mathias Weske 1 1 Hasso Plattner Institute, Germany 2 Eindhoven University of

More information

CS Data Structures and Algorithm Analysis

CS Data Structures and Algorithm Analysis CS 483 - Data Structures and Algorithm Analysis Lecture VII: Chapter 6, part 2 R. Paul Wiegand George Mason University, Department of Computer Science March 22, 2006 Outline 1 Balanced Trees 2 Heaps &

More information

Inge Li Gørtz. Thank you to Kevin Wayne for inspiration to slides

Inge Li Gørtz. Thank you to Kevin Wayne for inspiration to slides 02110 Inge Li Gørtz Thank you to Kevin Wayne for inspiration to slides Welcome Inge Li Gørtz. Reverse teaching and discussion of exercises: 3 teaching assistants 8.00-9.15 Group work 9.15-9.45 Discussions

More information

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing Lecture 7 Agenda for the lecture M-ary hypothesis testing and the MAP rule Union bound for reducing M-ary to binary hypothesis testing Introduction of the channel coding problem 7.1 M-ary hypothesis testing

More information

Variants of Turing Machine (intro)

Variants of Turing Machine (intro) CHAPTER 3 The Church-Turing Thesis Contents Turing Machines definitions, examples, Turing-recognizable and Turing-decidable languages Variants of Turing Machine Multi-tape Turing machines, non-deterministic

More information

Advanced Data Structures

Advanced Data Structures Simon Gog gog@kit.edu - Simon Gog: KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Predecessor data structures We want to support

More information

Scout, NegaScout and Proof-Number Search

Scout, NegaScout and Proof-Number Search Scout, NegaScout and Proof-Number Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction It looks like alpha-beta pruning is the best we can do for a generic searching

More information

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence Artificial Intelligence (AI) Artificial Intelligence AI is an attempt to reproduce intelligent reasoning using machines * * H. M. Cartwright, Applications of Artificial Intelligence in Chemistry, 1993,

More information

Notes on Logarithmic Lower Bounds in the Cell Probe Model

Notes on Logarithmic Lower Bounds in the Cell Probe Model Notes on Logarithmic Lower Bounds in the Cell Probe Model Kevin Zatloukal November 10, 2010 1 Overview Paper is by Mihai Pâtraşcu and Erik Demaine. Both were at MIT at the time. (Mihai is now at AT&T Labs.)

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

Integer Programming for Bayesian Network Structure Learning

Integer Programming for Bayesian Network Structure Learning Integer Programming for Bayesian Network Structure Learning James Cussens Helsinki, 2013-04-09 James Cussens IP for BNs Helsinki, 2013-04-09 1 / 20 Linear programming The Belgian diet problem Fat Sugar

More information

Efficient Spatial Data Structure for Multiversion Management of Engineering Drawings

Efficient Spatial Data Structure for Multiversion Management of Engineering Drawings Efficient Structure for Multiversion Management of Engineering Drawings Yasuaki Nakamura Department of Computer and Media Technologies, Hiroshima City University Hiroshima, 731-3194, Japan and Hiroyuki

More information

Extended breadth-first search algorithm in practice

Extended breadth-first search algorithm in practice Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 59 66 doi: 10.14794/ICAI.9.2014.1.59 Extended breadth-first search algorithm

More information

If the above problem is hard to solve, we might try to break it into smaller problems which are easier to solve.

If the above problem is hard to solve, we might try to break it into smaller problems which are easier to solve. 8 Branch and Bound 8.1 Divide and Conquer uppose that we would like to solve the problem { z = max c T x : x. If the above problem is hard to solve, we might try to break it into smaller problems which

More information

Machine Learning 2010

Machine Learning 2010 Machine Learning 2010 Decision Trees Email: mrichter@ucalgary.ca -- 1 - Part 1 General -- 2 - Representation with Decision Trees (1) Examples are attribute-value vectors Representation of concepts by labeled

More information

Advanced Data Structures

Advanced Data Structures Simon Gog gog@kit.edu - Simon Gog: KIT The Research University in the Helmholtz Association www.kit.edu Predecessor data structures We want to support the following operations on a set of integers from

More information

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine)

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Mihai Pǎtraşcu, MIT, web.mit.edu/ mip/www/ Index terms: partial-sums problem, prefix sums, dynamic lower bounds Synonyms: dynamic trees 1

More information

CSE 202: Design and Analysis of Algorithms Lecture 3

CSE 202: Design and Analysis of Algorithms Lecture 3 CSE 202: Design and Analysis of Algorithms Lecture 3 Instructor: Kamalika Chaudhuri Announcement Homework 1 out Due on Tue Jan 24 in class No late homeworks will be accepted Greedy Algorithms Direct argument

More information

CSC 411 Lecture 3: Decision Trees

CSC 411 Lecture 3: Decision Trees CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning

More information

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:

More information

CS 584 Data Mining. Association Rule Mining 2

CS 584 Data Mining. Association Rule Mining 2 CS 584 Data Mining Association Rule Mining 2 Recall from last time: Frequent Itemset Generation Strategies Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M

More information

with Binary Decision Diagrams Integer Programming J. N. Hooker Tarik Hadzic IT University of Copenhagen Carnegie Mellon University ICS 2007, January

with Binary Decision Diagrams Integer Programming J. N. Hooker Tarik Hadzic IT University of Copenhagen Carnegie Mellon University ICS 2007, January Integer Programming with Binary Decision Diagrams Tarik Hadzic IT University of Copenhagen J. N. Hooker Carnegie Mellon University ICS 2007, January Integer Programming with BDDs Goal: Use binary decision

More information

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007. Spring 2007 / Page 1 Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007. Don t panic. Be sure to write your name and student ID number on every page of the exam. The only materials

More information

Computation Theory Finite Automata

Computation Theory Finite Automata Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program

More information

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October Contents 1 Cartesian Tree. Definition. 1 2 Cartesian Tree. Construction 1 3 Cartesian Tree. Operations. 2 3.1 Split............................................

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

5 Spatial Access Methods

5 Spatial Access Methods 5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and

More information

Lecture 7 Decision Tree Classifier

Lecture 7 Decision Tree Classifier Machine Learning Dr.Ammar Mohammed Lecture 7 Decision Tree Classifier Decision Tree A decision tree is a simple classifier in the form of a hierarchical tree structure, which performs supervised classification

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Shannon-Fano-Elias coding

Shannon-Fano-Elias coding Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The

More information

Models for representing sequential circuits

Models for representing sequential circuits Sequential Circuits Models for representing sequential circuits Finite-state machines (Moore and Mealy) Representation of memory (states) Changes in state (transitions) Design procedure State diagrams

More information

PROBABILISTIC REASONING SYSTEMS

PROBABILISTIC REASONING SYSTEMS PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain

More information

Learning Partial Lexicographic Preference Trees over Combinatorial Domains

Learning Partial Lexicographic Preference Trees over Combinatorial Domains Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Learning Partial Lexicographic Preference Trees over Combinatorial Domains Xudong Liu and Miroslaw Truszczynski Department of

More information