Mining State Dependencies Between Multiple Sensor Data Sources
|
|
- Imogene Ramsey
- 5 years ago
- Views:
Transcription
1 Mining State Dependencies Between Multiple Sensor Data Sources C. Robardet Co-Authored with Marc Plantevit and Vasile-Marian Scuturici April / 27
2 Mining Sensor data A timely challenge? Why is it such a keen interest? Sensor technology monitors many events in real time. It produces multiple heterogeneous data streams that generates very large volumes of information. 2 / 27
3 Mining Sensor data A timely challenge? Why is it such a keen interest? Sensor technology monitors many events in real time. It produces multiple heterogeneous data streams that generates very large volumes of information. Data streams challenges: Handling infinite sequences of events that occur at steady a pace Providing actionable insights to end-users The mining step has to be faster than the data acquisition process. 2 / 27
4 Mining Sensor data A timely challenge? Why is it such a keen interest? Sensor technology monitors many events in real time. It produces multiple heterogeneous data streams that generates very large volumes of information. Data streams challenges: Handling infinite sequences of events that occur at steady a pace Providing actionable insights to end-users The mining step has to be faster than the data acquisition process. New challenges! Identifying temporal dependencies between data sources Events last for a period of time What is the time lag between dependent events? 2 / 27
5 Our Proposal Identify temporal dependencies between data streams: Data stream events change the internal state of the sensor. Each state has a duration, represented as a set of disjoint time intervals. Temporal relations between two interval sets infers dependencies between the corresponding sensors. 3 / 27
6 Our Proposal Identify temporal dependencies between data streams: Data stream events change the internal state of the sensor. Each state has a duration, represented as a set of disjoint time intervals. Temporal relations between two interval sets infers dependencies between the corresponding sensors. Main idea: discover temporal dependencies and their associated time-delay intervals robust to the temporal variability of events characterizes the time intervals during which the events are related. without user-determined threshold 3 / 27
7 Examples Organizing the deicing operations on the basis of freezing alert prediction Sensor anomaly detection 4 / 27
8 Overview of our Approach TEDDY: TEmporal Dependency DiscoverY Data Streams A 1 A 2 A i A k Failure détection Structural Health Monitoring smart home cameras indexing object tracking Smart Environment Discovery of valid and significant temporal state dependencies B 1 B 2 B l C 1 C 2 C i C n State Streams 5 / 27
9 Step 1: Converting a data stream into a state stream Batch of events: To process the infinite set of events, we use batches of events defined by a time interval [t begin, t end ). timestamp event 1 open 2 close 4 open 5 close 8 open 9 close Active time period The main property of a state is the period of time when it is active. The importance of a state A is evaluated by the sum of the lengths of its active time intervals: len(a) 6 / 27
10 Step 2: Measuring state dependency Active time period intersection The dependency of two states A and B is evaluated on the basis of the intersection of their active time intervals: len(a B). 7 / 27
11 Step 2: Measuring time shift state dependency A B B can be transformed to maximize its intersection with A: B can be shifted in the past B can be slightly extended Extending the intervals makes the temporal dependency measure more robust to the inherent variability of the data. Shifting of t time units (B [t,t] ) Slight extension of t time units (B [0,t] ) 8 / 27
12 Step 2: Measuring the confidence of a time shift state dependency Time shift intervals We explore all possible time shift intervals [α, β] included in [t min, t max ]: B [α,β] = {[b j + α, b j+1 + β[ j = 1 #B} with α β 0, and len(b) len(b [α,β] ) Confidence of state dependency A [α,β] B The proportion of time where the two states are active over the active time period of A: conf(a [α,β] B) = len(a B[α,β] ) len(a) 9 / 27
13 Step 3: Statistical assessment of the confidence measure Pearson s chi-squared test of independence Are the occurrences of A and B [α,β] statistically independent? B [α,β] B [α,β] A P(A B [α,β] ) P(A B [α,β] ) A P(A B [α,β] ) P(A B [α,β] ) (O) ij is the contingency table that crosses the observed outcomes of A and B [α,β] (E) ij gives the expected outcomes under the null hypothesis, where A and B [α,β] occur uniformly over T = [t begin, t end ): X 2 = 2 i=1 j=1 2 (O ij E ij ) 2 E ij χ conf(a [α,β] B) minconf(len(b [α,β] )) 10 / 27
14 Step 4: Significant time shift intervals selection Limiting the redundancy between time shift intervals of a state dependency A huge number of time shift intervals may exist that result in valid temporal dependencies Many of them are redundant, depicting the same phenomenon several times Property If [α 1, β 1 ] [α 2, β 2 ], then conf(a [α1,β1] B) conf(a [α2,β2] B). 11 / 27
15 Step 4: Significant time shift intervals selection Most interesting time shift intervals 1 have a high confidence value 2 are as specific as possible with respect to the inclusion relation Dominance relationship on the time shift intervals if [α 1, β 1 ] [α 2, β 2 ] and 1 conf(a [α1,β1] B) conf(a [α2,β2] B) then A [α1,β1] B A [α2,β2] B < 1 len(b[α1,β1] ) len(b [α2,β2] ) 12 / 27
16 Step 4: Significant time shift intervals selection Dominance relationship on the time shift intervals if [α 1, β 1 ] [α 2, β 2 ] and 1 conf(a [α1,β1] B) conf(a [α2,β2] B) then A [α1,β1] B A [α2,β2] B < 1 len(b[α1,β1] ) len(b [α2,β2] ) Therefore len(b [α2,β2]\[α1,β1] A) is almost / 27
17 Step 4: Significant time shift intervals selection Significant temporal dependency: The most specific temporal dependency that dominates all its supersets and each of its supersets dominates its supersets as well. [-4,0] [-4,-1] [-3,0] [-4,-2] [-3,-1] [-2,0] [-4,-3] [-3,-2] [-2,-1] [-1,0] [-4,-4] [-3,-3] [-2,-2] [-1,-1] [0,0] 13 / 27
18 TEDDY: TEmporal Dependency DiscoverY Enumeration principle of the time shift intervals A [α,β] B Level-wise enumeration to compute the confidence value of each interval once at the most. Use the monotonic property of the confidence measure and a lower bound on minconf(len(b [α,β] )) to early prune the search space. Exploits an upper bound on the confidence measure, whose complexity is O(1), to avoid unnecessary computation of the confidence. Exploits the transitivity of the dominance relationship in the identification of significant temporal dependencies. 14 / 27
19 TEDDY: Candidate time shift intervals generation Level-wise enumeration [-4,0] [-4,-1] [-3,0] [-4,-2] [-3,-1] [-2,0] [-4,-3] [-3,-2] [-2,-1] [-1,0] [-4,-4] [-1,-1] [0,0] 15 / 27
20 TEDDY: Pruning-based on confidence measure minconf (x) = λx + with x = len(b [α,β] ) and λ = len(a). χ T λ(t λ)x(t x) λt 1.5 MinConfidence(x) Lower bound x 1 T Lower bound on minconf of A [α,β] B ( ) ( ) minconf len(b [α,β] ) min 1, minconf(len(b [0,0] )) 16 / 27
21 TEDDY: Avoiding unnecessary computation of the confidence Upper bound on confidence conf(a [α 1,β 1 ] B) conf(a [α 2,β 2 ] ( α1 α2 + β1 β2 ) #B B) len(a) where #B represents the number of intervals of B. Complexity in O(1) instead of O(#B). 17 / 27
22 TEDDY: Pruning-based on dominance relationships Dominance transitivity [α 1, β 1 ] [α, β] [α 2, β 2 ], if A [α1,β1] B A [α,β] B and A [α,β] B A [α2,β2] B then A [α1,β1] B A [α2,β2] B. Pruning If, x, y such that t min x α and β y t max A [x,y] B A [x 1,y] B and A [x,y] B A [x,y+1] B then A [α,β] B also dominates all its ancestors. 18 / 27
23 TEDDY: Identification of valid and significant dependencies Final step As we used a lower bound on the confidence threshold, in the final step we need to check if conf(a [α,β] B) minconf(len(b [α,β] )) If not, A [α 1,β] B and A [α,β+1] B are recursively considered. 19 / 27
24 Example M011.ON [2,4] M014.ON if the sensor M011 produces the event ON, then the sensor M014 will produce a similar event in the interval 2 to 4 seconds 20 / 27
25 Experiments: Evaluation of the pruning efficiency Ratio (without pruning to TEDDY time) SYNT02 SYNT04 SYNT08 SYNT16 f(x)= TMAX Ratio WP to TEDDY (running time) SYNT02 SYNT04 SYNT08 SYNT16 f(x)= Batch size (in seconds) 21 / 27
26 Experiments: impact of each pruning technique on search space exploration % of search space % of search space % of search space Batch size (in minutes) Batch size (in minutes) Batch size (in minutes) % of search space % of search space % of search space TMAX TMAX TMAX Average number of events per minute and per stream 22 / 27
27 Results from Milan dataset Temporal dependencies that occur the 26th October (left) and the 1st January (right). Master Bedroom Office Master Bedroom Office Master Bathroom Guest Bedroom Living Room Master Bathroom Guest Bedroom Living Room Kitchen Dining Room Entrance Kitchen Dining Room Entrance 23 / 27
28 Results from Foxstream 24 / 27
29 Results from Foxstream 25 / 27
30 Anomaly detection 1 f(x)= Jaccard Index Tuesday Spider Wednesday T 26 / 27
31 Conclusion and Future Work Temporal dependencies mining We investigate a new direction in data stream mining: mining relations between data sources that produce multiple heterogeneous data. Future Directions Analyzing the dynamics of dependency graphs through time. Studying the potential use of temporal dependencies in a database perspective: potential integration into continuous query engines as a basis of a semantic indexation of data sources. 27 / 27
Data Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationDiscrete Events Modelling of a Person Behaviour at Home
www.usn.no FMH606 Master's Thesis 2017 Industrial IT and Automa on Discrete Events Modelling of a Person Behaviour at Home Badreddine Cherradi Faculty of Technology, Natural Sciences and Mari me Sciences
More informationCS 584 Data Mining. Association Rule Mining 2
CS 584 Data Mining Association Rule Mining 2 Recall from last time: Frequent Itemset Generation Strategies Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M
More informationConstraint-based Subspace Clustering
Constraint-based Subspace Clustering Elisa Fromont 1, Adriana Prado 2 and Céline Robardet 1 1 Université de Lyon, France 2 Universiteit Antwerpen, Belgium Thursday, April 30 Traditional Clustering Partitions
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 6
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationHandling a Concept Hierarchy
Food Electronics Handling a Concept Hierarchy Bread Milk Computers Home Wheat White Skim 2% Desktop Laptop Accessory TV DVD Foremost Kemps Printer Scanner Data Mining: Association Rules 5 Why should we
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 04 Association Analysis Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationCOMP 5331: Knowledge Discovery and Data Mining
COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar And Jiawei Han, Micheline Kamber, and Jian Pei 1 10
More informationAssociation Analysis Part 2. FP Growth (Pei et al 2000)
Association Analysis art 2 Sanjay Ranka rofessor Computer and Information Science and Engineering University of Florida F Growth ei et al 2 Use a compressed representation of the database using an F-tree
More informationData Analytics Beyond OLAP. Prof. Yanlei Diao
Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan
More informationD B M G Data Base and Data Mining Group of Politecnico di Torino
Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket
More informationD B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.
Definitions Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Itemset is a set including one or more items Example: {Beer, Diapers} k-itemset is an itemset that contains k
More informationD B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example
Association rules Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationAssociation Rules. Fundamentals
Politecnico di Torino Politecnico di Torino 1 Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket counter Association rule
More informationDetecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.
Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on
More informationIMP 2 September &October: Solve It
IMP 2 September &October: Solve It IMP 2 November & December: Is There Really a Difference? Interpreting data: Constructing and drawing inferences from charts, tables, and graphs, including frequency bar
More informationDATA MINING - 1DL360
DATA MINING - 1DL36 Fall 212" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht12 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala
More informationPatrol: Revealing Zero-day Attack Paths through Network-wide System Object Dependencies
Patrol: Revealing Zero-day Attack Paths through Network-wide System Object Dependencies Jun Dai, Xiaoyan Sun, and Peng Liu College of Information Sciences and Technology Pennsylvania State University,
More informationAssociation Analysis: Basic Concepts. and Algorithms. Lecture Notes for Chapter 6. Introduction to Data Mining
Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association
More informationGuaranteeing the Accuracy of Association Rules by Statistical Significance
Guaranteeing the Accuracy of Association Rules by Statistical Significance W. Hämäläinen Department of Computer Science, University of Helsinki, Finland Abstract. Association rules are a popular knowledge
More informationSequential Pattern Mining
Sequential Pattern Mining Lecture Notes for Chapter 7 Introduction to Data Mining Tan, Steinbach, Kumar From itemsets to sequences Frequent itemsets and association rules focus on transactions and the
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/17/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.
More informationChapter 4: Frequent Itemsets and Association Rules
Chapter 4: Frequent Itemsets and Association Rules Jilles Vreeken Revision 1, November 9 th Notation clarified, Chi-square: clarified Revision 2, November 10 th details added of derivability example Revision
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationDATA MINING - 1DL360
DATA MINING - DL360 Fall 200 An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht0 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala
More informationSurprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University
Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Outline Astroinformatics Example Application:
More informationStatistical inference for Markov deterioration models of bridge conditions in the Netherlands
Statistical inference for Markov deterioration models of bridge conditions in the Netherlands M.J.Kallen & J.M. van Noortwijk HKV Consultants, Lelystad, and Delft University of Technology, Delft, Netherlands
More information40 WAYS TO INCREASE YOUR HOME VALUE
40 WAYS TO INCREASE YOUR HOME VALUE WWW.ARCBAZAR.COM I N T R O D U C T I O N I f y o u a r e p l a n n i n g t o p u t y o u r h o u s e o r a p a r t m e n t o n t h e m a r k e t, y o u w a n t t o m
More informationEncyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen
Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi
More informationLecture Notes for Chapter 6. Introduction to Data Mining
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
More informationLecture Notes for Chapter 6. Introduction to Data Mining. (modified by Predrag Radivojac, 2017)
Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 27) Association Rule Mining Given a set of transactions, find rules that will predict the
More informationMathematical statistics
October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More informationDynamic Data-Driven Adaptive Sampling and Monitoring of Big Spatial-Temporal Data Streams for Real-Time Solar Flare Detection
Dynamic Data-Driven Adaptive Sampling and Monitoring of Big Spatial-Temporal Data Streams for Real-Time Solar Flare Detection Dr. Kaibo Liu Department of Industrial and Systems Engineering University of
More informationFARMER: Finding Interesting Rule Groups in Microarray Datasets
FARMER: Finding Interesting Rule Groups in Microarray Datasets Gao Cong, Anthony K. H. Tung, Xin Xu, Feng Pan Dept. of Computer Science Natl. University of Singapore {conggao,atung,xuxin,panfeng}@comp.nus.edu.sg
More informationA stochastic model-based approach to online event prediction and response scheduling
A stochastic model-based approach to online event prediction and response scheduling M. Biagi, L. Carnevali, M. Paolieri, F. Patara, E. Vicario Department of Information Engineering, University of Florence,
More informationChapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining
Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of
More informationAssociation Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci
Association Rules Information Retrieval and Data Mining Prof. Matteo Matteucci Learning Unsupervised Rules!?! 2 Market-Basket Transactions 3 Bread Peanuts Milk Fruit Jam Bread Jam Soda Chips Milk Fruit
More informationAssociation Rule. Lecturer: Dr. Bo Yuan. LOGO
Association Rule Lecturer: Dr. Bo Yuan LOGO E-mail: yuanb@sz.tsinghua.edu.cn Overview Frequent Itemsets Association Rules Sequential Patterns 2 A Real Example 3 Market-Based Problems Finding associations
More informationStatistical Testing of Randomness
Statistical Testing of Randomness (Yesterday, Today, and Possibly Tomorrow) Jan Krhovják BUSLab & LaBAK Faculty of Informatics, Masaryk University, Brno LaBAK & KD Lab Seminar, Cikhákj, Spring 2007 1/26
More informationAssociation Rule Mining on Web
Association Rule Mining on Web What Is Association Rule Mining? Association rule mining: Finding interesting relationships among items (or objects, events) in a given data set. Example: Basket data analysis
More informationApproximate counting: count-min data structure. Problem definition
Approximate counting: count-min data structure G. Cormode and S. Muthukrishhan: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55 (2005) 58-75. Problem
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationBinary response data
Binary response data A Bernoulli trial is a random variable that has two points in its sample space. The two points may be denoted success/failure, heads/tails, yes/no, 0/1, etc. The probability distribution
More informationAnalysis of Call Center Data
University of Pennsylvania ScholarlyCommons Wharton Research Scholars Wharton School 4-1-2004 Analysis of Call Center Data Yu Chu Cheng University of Pennsylvania Follow this and additional works at: http://repository.upenn.edu/wharton_research_scholars
More informationIntroduction to Linear Programming (LP) Mathematical Programming (MP) Concept (1)
Introduction to Linear Programming (LP) Mathematical Programming Concept LP Concept Standard Form Assumptions Consequences of Assumptions Solution Approach Solution Methods Typical Formulations Massachusetts
More informationCOMP 5331: Knowledge Discovery and Data Mining
COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei And slides provide by Raymond
More informationLecture 21: October 19
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use
More informationFrequent Pattern Mining: Exercises
Frequent Pattern Mining: Exercises Christian Borgelt School of Computer Science tto-von-guericke-university of Magdeburg Universitätsplatz 2, 39106 Magdeburg, Germany christian@borgelt.net http://www.borgelt.net/
More informationA Branch and Bound Algorithm for the Project Duration Problem Subject to Temporal and Cumulative Resource Constraints
A Branch and Bound Algorithm for the Project Duration Problem Subject to Temporal and Cumulative Resource Constraints Christoph Schwindt Institut für Wirtschaftstheorie und Operations Research University
More informationAssociation Analysis. Part 2
Association Analysis Part 2 1 Limitations of the Support/Confidence framework 1 Redundancy: many of the returned patterns may refer to the same piece of information 2 Difficult control of output size:
More informationMoment-based Availability Prediction for Bike-Sharing Systems
Moment-based Availability Prediction for Bike-Sharing Systems Jane Hillston Joint work with Cheng Feng and Daniël Reijsbergen LFCS, School of Informatics, University of Edinburgh http://www.quanticol.eu
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationDATA MINING LECTURE 4. Frequent Itemsets, Association Rules Evaluation Alternative Algorithms
DATA MINING LECTURE 4 Frequent Itemsets, Association Rules Evaluation Alternative Algorithms RECAP Mining Frequent Itemsets Itemset A collection of one or more items Example: {Milk, Bread, Diaper} k-itemset
More informationMining Molecular Fragments: Finding Relevant Substructures of Molecules
Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli
More informationFrequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:
Frequent Itemsets and Association Rule Mining Vinay Setty vinay.j.setty@uis.no Slides credit: http://www.mmds.org/ Association Rule Discovery Supermarket shelf management Market-basket model: Goal: Identify
More informationASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG
ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING, LIG M2 SIF DMV course 207/208 Market basket analysis Analyse supermarket s transaction data Transaction = «market basket» of a customer Find which items are
More informationDelay and Accessibility in Random Temporal Networks
Delay and Accessibility in Random Temporal Networks 2nd Symposium on Spatial Networks Shahriar Etemadi Tajbakhsh September 13, 2017 Outline Z Accessibility in Deterministic Static and Temporal Networks
More informationBehavioral Data Mining. Lecture 2
Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events
More informationTemporal Data Mining
Temporal Data Mining Christian Moewes cmoewes@ovgu.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing and Language Engineering Zittau Fuzzy Colloquium
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationAberrant Behavior Detection in Time Series for Monitoring Business-Critical Metrics (DRAFT)
Aberrant Behavior Detection in Time Series for Monitoring Business-Critical Metrics (DRAFT) Evan Miller IMVU, Inc. emiller@imvu.com Oct. 28, 2007 1 Abstract Detecting failures swiftly is a key process
More informationSearching Dimension Incomplete Databases
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO., JANUARY 3 Searching Dimension Incomplete Databases Wei Cheng, Xiaoming Jin, Jian-Tao Sun, Xuemin Lin, Xiang Zhang, and Wei Wang Abstract
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..
.. Cal Poly CSC 4: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Examples Course Enrollments Itemset. I = { CSC3, CSC3, CSC40, CSC40, CSC4, CSC44, CSC4, CSC44,
More informationCS 484 Data Mining. Association Rule Mining 2
CS 484 Data Mining Association Rule Mining 2 Review: Reducing Number of Candidates Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due
More informationSensor Tasking and Control
Sensor Tasking and Control Sensing Networking Leonidas Guibas Stanford University Computation CS428 Sensor systems are about sensing, after all... System State Continuous and Discrete Variables The quantities
More informationIn order to compare the proteins of the phylogenomic matrix, we needed a similarity
Similarity Matrix Generation In order to compare the proteins of the phylogenomic matrix, we needed a similarity measure. Hamming distances between phylogenetic profiles require the use of thresholds for
More informationWhat is (certain) Spatio-Temporal Data?
What is (certain) Spatio-Temporal Data? A spatio-temporal database stores triples (oid, time, loc) In the best case, this allows to look up the location of an object at any time 2 What is (certain) Spatio-Temporal
More informationChapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.
Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:
More informationUnSAID: Uncertainty and Structure in the Access to Intensional Data
UnSAID: Uncertainty and Structure in the Access to Intensional Data Pierre Senellart 3 July 214, Univ. Rennes 1 Uncertain data is everywhere Numerous sources of uncertain data: Measurement errors Data
More informationLogic, Optimization and Data Analytics
Logic, Optimization and Data Analytics John Hooker Carnegie Mellon University United Technologies Research Center, Cork, Ireland August 2015 Thesis Logic and optimization have an underlying unity. Ideas
More informationTopic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.
Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick
More informationECML PKDD Discovery Challenges 2017
ECML PKDD Discovery Challenges 2017 Roberto Corizzo 1 and Dino Ienco 2 1 Department of Computer Science, University of Bari Aldo Moro, Bari, Italy roberto.corizzo@uniba.it 2 Irstea, UMR TETIS, Univ. Montpellier,
More informationAlpha-Investing. Sequential Control of Expected False Discoveries
Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint
More informationReliability Engineering I
Happiness is taking the reliability final exam. Reliability Engineering I ENM/MSC 565 Review for the Final Exam Vital Statistics What R&M concepts covered in the course When Monday April 29 from 4:30 6:00
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationPositive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise
Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Guimei Liu 1,2 Jinyan Li 1 Limsoon Wong 2 Wynne Hsu 2 1 Institute for Infocomm Research, Singapore 2 School
More informationExploring Spatial Relationships for Knowledge Discovery in Spatial Data
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Exploring Spatial Relationships for Knowledge Discovery in Spatial Norazwin Buang
More informationMACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance
MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance Jingbo Shang, Jian Peng, Jiawei Han University of Illinois, Urbana-Champaign May 6, 2016 Presented by Jingbo Shang 2 Outline
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More information4.5.1 The use of 2 log Λ when θ is scalar
4.5. ASYMPTOTIC FORM OF THE G.L.R.T. 97 4.5.1 The use of 2 log Λ when θ is scalar Suppose we wish to test the hypothesis NH : θ = θ where θ is a given value against the alternative AH : θ θ on the basis
More informationOak Ridge Urban Dynamics Institute
Oak Ridge Urban Dynamics Institute Presented to ORNL NEED Workshop Budhendra Bhaduri, Director Corporate Research Fellow July 30, 2014 Oak Ridge, TN Our societal challenges and solutions are often local
More informationCovariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20
Patrick Breheny April 18 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20 Introduction In our final lecture on inferential approaches for penalized regression, we will discuss two rather
More informationTwo-Sample Inference for Proportions and Inference for Linear Regression
Two-Sample Inference for Proportions and Inference for Linear Regression Kwonsang Lee University of Pennsylvania kwonlee@wharton.upenn.edu April 24, 2015 Kwonsang Lee STAT111 April 24, 2015 1 / 13 Announcement:
More informationMining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany
Mining Rank Data Sascha Henzgen and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {sascha.henzgen,eyke}@upb.de Abstract. This paper addresses the problem of mining rank
More informationSpatial Data Science. Soumya K Ghosh
Workshop on Data Science and Machine Learning (DSML 17) ISI Kolkata, March 28-31, 2017 Spatial Data Science Soumya K Ghosh Professor Department of Computer Science and Engineering Indian Institute of Technology,
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationOverall Plan of Simulation and Modeling I. Chapters
Overall Plan of Simulation and Modeling I Chapters Introduction to Simulation Discrete Simulation Analytical Modeling Modeling Paradigms Input Modeling Random Number Generation Output Analysis Continuous
More informationCS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014
CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof.
More informationWAM-Miner: In the Search of Web Access Motifs from Historical Web Log Data
WAM-Miner: In the Search of Web Access Motifs from Historical Web Log Data Qiankun Zhao a, Sourav S Bhowmick a and Le Gruenwald b a School of Computer Engineering, Division of Information Systems, Nanyang
More informationQUANTITATIVE TECHNIQUES
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION (For B Com. IV Semester & BBA III Semester) COMPLEMENTARY COURSE QUANTITATIVE TECHNIQUES QUESTION BANK 1. The techniques which provide the decision maker
More informationSlides 8: Statistical Models in Simulation
Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An
More informationIndependence Solutions STAT-UB.0103 Statistics for Business Control and Regression Models
Independence Solutions STAT-UB.003 Statistics for Business Control and Regression Models The Birthday Problem. A class has 70 students. What is the probability that at least two students have the same
More informationCPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016
CPSC 340: Machine Learning and Data Mining Linear Least Squares Fall 2016 Assignment 2 is due Friday: Admin You should already be started! 1 late day to hand it in on Wednesday, 2 for Friday, 3 for next
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationLatent Geographic Feature Extraction from Social Media
Latent Geographic Feature Extraction from Social Media Christian Sengstock* Michael Gertz Database Systems Research Group Heidelberg University, Germany November 8, 2012 Social Media is a huge and increasing
More informationAnnouncements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.
Announcements Announcements Unit 3: Foundations for inference Lecture 3:, significance levels, sample size, and power Statistics 101 Mine Çetinkaya-Rundel October 1, 2013 Project proposal due 5pm on Friday,
More informationMining bi-sets in numerical data
Mining bi-sets in numerical data Jérémy Besson, Céline Robardet, Luc De Raedt and Jean-François Boulicaut Institut National des Sciences Appliquées de Lyon - France Albert-Ludwigs-Universitat Freiburg
More information