Online Appendix for Discovery of Periodic Patterns in Sequence Data: A Variance Based Approach

Size: px
Start display at page:

Download "Online Appendix for Discovery of Periodic Patterns in Sequence Data: A Variance Based Approach"

Transcription

1 Online Appendix for Discovery of Periodic Patterns in Sequence Data: A Variance Based Approach Yinghui (Catherine) Yang Graduate School of Management, University of California, Davis AOB IV, One Shields Ave., Davis, CA 95616, USA, yiyang@ucdavis.edu Balaji Padmanabhan ISDS Department, College of Business, University of South Florida 40 East Fowler Ave., CIS 1040, Tampa, FL , USA, bpadmana@coba.usf.edu Hongyan Liu, Xiaoyu Wang Department of Management Science and Engineering, School of Economics and Management, Tsinghua University, Beijing, China, , {liuhy,wangxy3}@sem.tsinghua.edu.cn More Related Work for Section 1 Most previous work on mining sequence data fell into two categories: discovering sequential patterns (Agrawal and Srikant 1995, Ayres et al. 00, Garofalakis et al. 1999, Srikant and Agrawal 1996) and mining periodic patterns (Han et al. 1998, 1999; Ozden et al. 1998; Yang et al. 003, 004). Full cyclic patterns were first studied in Ozden et al. (1998). The input data to Ozden et al. (1998) is a set of transactions, each of which consists a set of items. In addition, each transaction is tagged with an execution time. The goal is to find association rules that repeat themselves throughout the input data. Han et al. (1998, 1999) presented algorithms for efficiently mining partial periodic patterns. In practice, not every portion in the time series may contribute to the periodicity. For example, a company s stock may often gain a couple of points at the beginning of each trading session but it may not have much regularity at later time. This type of periodicity is often referred to as partial periodicity (we will discuss this in greater detail in the next section). Han et al. focused on frequent periodic patterns. Yang et al. (004) addresses the mining of surprising periodic patterns and also allows partial periodicity. As pointed out in Ma and Hellerstein (001) and Han et al. (1999), the fast Fourier transform (FFT) (Brigham 1988) can also be used to identify periodicity. There are two problems though. First, the FFT does not cope well with random off-segments in periodic patterns. Further, the computational efficiency of FFT is O ( T logt ), where T is the number of time units. In most applications, T is large even though events are sparse. 1

2 Most of the research studying frequent or periodic sequential patterns used support as the measure of interestingness and addressed the discovery of frequent patterns. Yang et al. (004) instead used information gain metric to mine surprising periodic patterns. Some work treats these as one long sequence (Yang et al. 003), and most work within the bioinformatics field belongs to this category. Others consider these as a set of transactions, each of which consists of a set of items (Ozden et al. 1998, Han et al. 1998, 1999). While related to the broader topic of periodicity, Elfeky et al. (004), Funda et al. (004), Vlachos et al. (005) and Yeh and Lin (009) do not specifically study partial periodicity and thus are less related to our paper (for example, Elfeky et al. (004) develops an algorithm that mines periodic patterns with unknown or obscure periods; Funda et al. (004) presents algorithms that use less resource to discover periodicities in data streams.) References Agrawal, R. and R. Srikant Mining Sequential Patterns, Proc. 11th Int l Conf. Data Eng. Ayres, J., J. Gehrke, T. Yiu, and J. Flannick. 00. Sequential Pattern Mining Using a Bitmap Representation, Proc. Eighth Int l Conf. Knowledge Discovery and Data Mining. Brigham, E Fast Fourier Transform and Its Applications, Prentice Hall. Elfeky, M.G., W. G. Aref, and A. K.Elmagarmid Using Convolution to Mine Obscure Periodic Patterns in One Pass, Proc. 9 th Int l Conf. Extending Database Technology (EDBT). Funda, E., S. Muthukrishnan, and S. C. Sahinalp Sublinear methods for detecting periodic trends in data streams. Proc. of Latin American Symposium on Theoretical Informatics. Garofalakis, M., R. Rastogi, and K. Shim SPIRIT: Sequential Pattern Mining with Regular Expression Constraints, Proc. 5 th Int l Conf. Very Large Data Bases. Ozden, B., Ramaswamy, S., and Silberschatz, A Cyclic association rules. Procs. ICDE 98, pp Srikant, R. and R. Agrawal Mining Sequential Patterns: Generalizations and Performance Improvements, Proc. Fifth Int l Conf. Extending Data Base Technology. Vlachos, M., P.S. Yu, V. Castelli On Periodicity Detection and Structural Periodic Similarity Proc. of SIAM Conf. Data Mining. Yeh, J.S., and S.C. Lin A new data structure for asynchronous periodic pattern mining, Proc. 3rd Int l Conf. Ubiquitous Information Management and Communication.

3 Formal Presentation of the Algorithms for Section 3 Inputs: 1. Dataset D with specific time stamps 1 to T associated with each transaction. Pattern discovery algorithm, R, that discovers patterns that can be evaluated to hold or not at each time stamp 3. Threshold or_ratio c (e.g. 5%) 4. Minimum length b Output: 1. A ranked list of periodic patterns L = {}, output Generate a set of patterns P = {P 1, P,, P M } by applying R to D. for each e P do { Let Q be the inter-arrival time sequence of e in D. Compute F as the number of time stamps when pattern e holds in D V 0 = T /F //variance of the exponential distribution V = variance of inter-arrival times of e in D if (V/V 0 < c) and Length(Q)> b, then L.append([e,or_ratio]) } Print sorted list of patterns in L according to the or_ratio (i.e. V/V 0 score) for each pattern. Figure A1. Basic Method - Identifying Type 1 Patterns 3

4 Inputs: 1. Dataset D. Pattern discovery algorithm, R 3. Threshold or_ratio c 4. Minimum length b 5. Equal mean threshold q 6. On-segment ratio r Output: Periodic patterns with their type. Define S, E as stacks Generate a set of patterns P = { P 1, P,, P M } by applying R to D. L={}, output for each e P do { Set S, E to be empty stacks node = sequence of inter-arrival times of e S.push(node) While not_empty(s) { node = S.pop() if or_ratio(node)<=c children=null E.push(node) ElseIf node is longer than b choose split point k such that p L *or_ratio(left)+p R *or_ratio(right) is minimized split(node, children, k) S.push(children(right)) S.push(children(left)) } // end while Get the max mean value m_max and minimum mean value m_min from all the subsequences in E. Leng = the sum of the length of all subsequences in E. LengO = the length of the original inter-arrival sequence. If m_max<=m_min*(1+q) If Leng = LengO Output e as periodic with equal periods Else if Leng/LengO >= r Output e as partially periodic with equal periods Else If Leng=LengO Output e as periodic with unequal periods Elseif Leng/LengO >= r Output e as partially periodic with unequal periods } // end for Figure A. A Unified Approach: The Division Method 4

5 Inputs: 1. Dataset D. Pattern discovery algorithm, R 3. Threshold or_ratio c 4. Minimum length b 5. Equal mean threshold q 6. On-segment ratio r Output: Periodic patterns with their type. Generate a set of patterns P = {P 1, P,, P M } by applying R to D. E={}, Subsequences for each e P do { Q=inter-arrival time sequence of e, and Q={m 1, m,, m N }, N is the size of Q For i from 1 to N-b+1, For j from N to i+b-1, Q ={m i, m i+1,, m j } If length of Q is smaller than b or Q is the subsequence of any subsequence in E: Break If or_ratio(q )<=c: E.append(Q ) Break Get the maximum mean value m_max and minimum mean value m_min from all the subsequences in E. Leng = sum of length of all subsequences in E LengO = length of the original sequence If m_max<=m_min*(1+q) If Leng = LengO Output e as periodic with equal periods Else if Leng/LengO >= r Output e as partially periodic with equal periods Else If Leng=LengO Output e as periodic with unequal periods Elseif Leng/LengO >= r Output e as partially periodic with unequal periods } // end for Figure A3. The Complete Method A note on the complexity of the different methods. Given an inter-arrival time sequence S with N inter-arrival times, Ma and Hellerstein (001) needs a list of counters to record frequencies of each potential period. For each inter-arrival time in S, they first look for the right counter for that inter-arrival time (as a potential period), and then either increase the counter or create new counter. After all inter-arrival times in S have been read, they check all the counters for all the 5

6 possible periods, calculate the total frequency for each possible period subject to tolerance and compare that with the corresponding threshold. Therefore, the complexity of Ma and Hellerstein s method is O(N). The Basic method needs to read all the inter-arrival times in S and calculates the variance ratio. Therefore, the complexity of the Basic method is also O(N). Division method will have logn levels of divisions, and for each division O(N) to find the optimal division. Therefore, the complexity of Division method is O(NlogN). Complete method checks at most N(N+1)/ subsequences of S. Thus the complexity of Complete method is O(N ). Proof of the Range Result for Section 3.4 Proof: By at least as periodic we mean or _ ratio ( Q) or _ ratio ( Q). Hence, solving this will reach the range as we show below. N N Let A = x 1 i B = x 1 i V N ( x ) ( ) 1 i x N xi x xi + Nx Define r = = = V N( x ) B 0 i B B N( xi + ) N N NA = = 1 B B Hence, or _ ratio ( Q) or _ ratio ( Q) is equivalent to: ( N + 1)( A + u ) NA 1 1 ( B + u) B where u is the next point (x N+1 ) in the sequence. Solving for u in the quadratic inequality will provide the bounds. The graph in Figure A4 graphically shows how different values of the next point, u, affect the inequality. Since inter-arrival times are positive the range u > 0 (the right quadrant) is useful to focus on. Within this, there is a range around A/B where the new ratio is less than or equal to the old ratio in the sequence. This point A/B can be determined by calculating the derivative of f(u). The derivative is positive when u is greater than A/B and the function is therefore increasing in this range (else it is decreasing). The second derivative can also be used to determine the inflection point further right in the figure. 6

7 ( N + 1)( A + u ) ( B + u) N+1 (N+1)A A+B (N+1)A B NA B 0 A/B - B u Figure A4. The range result graph Summary Statistics for Section 4.1 Figure A5a-A5c plot the histograms of the percentage of periodic patterns among all patterns considered for each user when the variance threshold c takes three different values 100%, 30% and 5%. For example in Figure A5a (histogram on far left), the first bar shows that there are approximately 7 users for whom 3% or less of all their patterns represent periodic patterns. The second bar shows the number of users with 3% ~ 5% of their patterns being periodic. As expected, setting the variance threshold tighter will result in a fewer percentage of user patterns flagged periodic. The histogram at the far right for instance shows that, under a very tight threshold (5%), eighty users seem to have % or less of their patterns periodic. Figures A5a-A5c. Histogram of the % of periodic patterns (c =100%, 30%, 5%) Figures A6a-A6c plot the histograms of the period length among all periodic patterns. The first bar in Figure A6c represents the number of periodic patterns with periods between 0 and 1. The 7

8 averages of the period under these values are 1.69, 5.39 and 1.0. This suggests that when the search is restricted to strictly periodic patterns (variance close to zero) the patterns identified tend to be those which hold in every session, such as for instance a user s unique homepage or any other user pattern which holds every session. As the threshold is loosened it is possible to identify patterns that hold across larger periods (as some of our examples in the next section will show). Figure A6a-A6c. Histograms of period length (c = 100%, 30%, 5%) Predictive Accuracy for Section Figure A7. Predictive Accuracy Varying the Length of the Sequences 8

9 Synthetic Data Generator and Parameter Tables for Section 4. Inputs: The total time, T Maximum mean value of any segment, M Periodic type, TY (1-periodic, -partial, 3-unequal, 4-partial unequal) Threshold or_ratio c Minimum length b Equal mean threshold q 1. Randomly set the period value for an on-segment m<m, and randomly generate the first inter-arrival time.. While sum(inter-arrival times)< T: 3. If TY=1, add a new inter-arrival time to satisfy c. (similar to Theorem 1, we can calculate a range for the new inter-arrival time to satisfy c). 4. If TY=, randomly decide whether to switch to an off-segment or to continue on the current on-segment; if TY=3, randomly decide whether to continue on the current on-segment or to start a new onsegment; If TY=4, then randomly decide whether to continue on the current on-segment, switch to an off-segment or start a new onsegment. 5. If continue on the current on-segment, add a new inter-arrival time to satisfy c. 6. If switch to an off-segment, generate an off-segment and go to step If change to a new on-segment, go to step End-while. 9. Check the sequence generated to see if it satisfies c, b, and q. If not, abandon this sequence and go back to step 1 to generate the desired number of sequences. Figure A8. Data Generator 9

10 Table A1. Notations Notation Description E A set of patterns e A pattern e t i The i th occurrence time of pattern e N The number of times a pattern occurred in a sequence e τ i The i th inter-arrival time of pattern e T Total time over which events arrive p Period λ Mean of the exponential distribution V The variance of the exponential distribution, which equals to T N 0 V 1 The observed variance of the inter-arrival times of a sequence. or_ratio V 1 / V 0 D Dataset R Pattern discovery technique M Maximum mean value of any segment c Threshold or_ratio b Minimum length of an on-segment q Equal mean threshold r On-segment ratio Table A. Parameters used in the Experiments Parameters Description Value T Total time over which events arrive 500 M Maximum mean value of any Varies across simulated data sets as segment c shown in Table in the main paper. Threshold or_ratio b Minimum length of an on-segment 10 q Equal mean threshold 0.1 r On-segment ratio Not used in the synthetic data (i.e. set to r =0). 10

Using Convolution to Mine Obscure Periodic Patterns in One Pass

Using Convolution to Mine Obscure Periodic Patterns in One Pass Using Convolution to Mine Obscure Periodic Patterns in One Pass Mohamed G. Elfeky, Walid G. Aref, and Ahmed K. Elmagarmid Department of Computer Sciences, Purdue University {mgelfeky, aref, ake}@cs.purdue.edu

More information

An Approach to Classification Based on Fuzzy Association Rules

An Approach to Classification Based on Fuzzy Association Rules An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based

More information

Mining Temporal Patterns for Interval-Based and Point-Based Events

Mining Temporal Patterns for Interval-Based and Point-Based Events International Journal of Computational Engineering Research Vol, 03 Issue, 4 Mining Temporal Patterns for Interval-Based and Point-Based Events 1, S.Kalaivani, 2, M.Gomathi, 3, R.Sethukkarasi 1,2,3, Department

More information

Mining Strong Positive and Negative Sequential Patterns

Mining Strong Positive and Negative Sequential Patterns Mining Strong Positive and Negative Sequential Patter NANCY P. LIN, HUNG-JEN CHEN, WEI-HUA HAO, HAO-EN CHUEH, CHUNG-I CHANG Department of Computer Science and Information Engineering Tamang University,

More information

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company DISCOVERING FUZZY UNEXPECTED SEQUENCES WITH CONCEPT HIERARCHIES LGI2P, Dong (Haoyuan) Li

More information

Periodic Pattern Mining for Spatio-Temporal Trajectories: A Survey

Periodic Pattern Mining for Spatio-Temporal Trajectories: A Survey 2015 International Conference on Intelligent Systems and Knowledge Engineering Periodic Pattern Mining for Spatio-Temporal Trajectories: A Survey Dongzhi Zhang, Kyungmi Lee, Ickjai Lee College of Business,

More information

Discovering Lag Intervals for Temporal Dependencies

Discovering Lag Intervals for Temporal Dependencies Discovering Lag Intervals for Temporal Dependencies ABSTRACT Liang Tang Tao Li School of Computer Science Florida International University 11200 S.W. 8th Street Miami, Florida, 33199 U.S.A {ltang002,taoli}@cs.fiu.edu

More information

Mining Partially Periodic Event Patterns With. Unknown Periods. Sheng Ma and Joseph L. Hellerstein. IBM T.J. Watson Research Center

Mining Partially Periodic Event Patterns With. Unknown Periods. Sheng Ma and Joseph L. Hellerstein. IBM T.J. Watson Research Center Mining Partially Periodic Event Patterns With Unknown Periods Sheng Ma and Joseph L. Hellerstein IBM T.J. Watson Research Center Hawthorne, NY 10532 Abstract Periodic behavior is common in real-world applications.

More information

Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context.

Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context. Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context. Murphy Choy Cally Claire Ong Michelle Cheong Abstract The rapid explosion in retail data calls for more effective

More information

Mining Partially Periodic Event Patterns With Unknown Periods*

Mining Partially Periodic Event Patterns With Unknown Periods* Mining Partially Periodic Event Patterns With Unknown Periods* Sheng Ma and Joseph L. Hellerstein IBM T.J. Watson Research Center Hawthorne, NY 10532 { shengma, jlh} @us.ibm.com Abstract Periodic behavior

More information

Preserving Privacy in Data Mining using Data Distortion Approach

Preserving Privacy in Data Mining using Data Distortion Approach Preserving Privacy in Data Mining using Data Distortion Approach Mrs. Prachi Karandikar #, Prof. Sachin Deshpande * # M.E. Comp,VIT, Wadala, University of Mumbai * VIT Wadala,University of Mumbai 1. prachiv21@yahoo.co.in

More information

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Mining Molecular Fragments: Finding Relevant Substructures of Molecules Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli

More information

Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract)

Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract) Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract) Toon Calders, Nele Dexters and Bart Goethals University of Antwerp, Belgium firstname.lastname@ua.ac.be Abstract. In this paper,

More information

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms T. Vijayakumar 1, V.Nivedhitha 2, K.Deeba 3 and M. Sathya Bama 4 1 Assistant professor / Dept of IT, Dr.N.G.P College of Engineering

More information

COMP 5331: Knowledge Discovery and Data Mining

COMP 5331: Knowledge Discovery and Data Mining COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar And Jiawei Han, Micheline Kamber, and Jian Pei 1 10

More information

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Juliano Brito da Justa Neves 1 Marina Teresa Pires Vieira {juliano,marina}@dc.ufscar.br Computer Science Department

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Claudio Lucchese Ca Foscari University of Venice clucches@dsi.unive.it Raffaele Perego ISTI-CNR of Pisa perego@isti.cnr.it Salvatore

More information

Using Conservative Estimation for Conditional Probability instead of Ignoring Infrequent Case

Using Conservative Estimation for Conditional Probability instead of Ignoring Infrequent Case Using Conservative Estimation for Conditional Probability instead of Ignoring Infrequent Case Masato Kikuchi, Eiko Yamamoto, Mitsuo Yoshida, Masayuki Okabe, Kyoji Umemura Department of Computer Science

More information

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Association Rule. Lecturer: Dr. Bo Yuan. LOGO Association Rule Lecturer: Dr. Bo Yuan LOGO E-mail: yuanb@sz.tsinghua.edu.cn Overview Frequent Itemsets Association Rules Sequential Patterns 2 A Real Example 3 Market-Based Problems Finding associations

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket

More information

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Mahsa Orang Nematollaah Shiri 27th International Conference on Scientific and Statistical Database

More information

Machine Learning: Pattern Mining

Machine Learning: Pattern Mining Machine Learning: Pattern Mining Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Pattern Mining Overview Itemsets Task Naive Algorithm Apriori Algorithm

More information

InfoMiner: Mining Surprising Periodic Patterns

InfoMiner: Mining Surprising Periodic Patterns InfoMiner: Mining Surprising Periodic Patterns Jiong Yang IBM Watson Research Center jiyang@us.ibm.com Wei Wang IBM Watson Research Center ww1@us.ibm.com Philip S. Yu IBM Watson Research Center psyu@us.ibm.com

More information

Association Rules. Fundamentals

Association Rules. Fundamentals Politecnico di Torino Politecnico di Torino 1 Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket counter Association rule

More information

Guaranteeing the Accuracy of Association Rules by Statistical Significance

Guaranteeing the Accuracy of Association Rules by Statistical Significance Guaranteeing the Accuracy of Association Rules by Statistical Significance W. Hämäläinen Department of Computer Science, University of Helsinki, Finland Abstract. Association rules are a popular knowledge

More information

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions. Definitions Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Itemset is a set including one or more items Example: {Beer, Diapers} k-itemset is an itemset that contains k

More information

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example Association rules Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

A Streaming Algorithm for 2-Center with Outliers in High Dimensions

A Streaming Algorithm for 2-Center with Outliers in High Dimensions CCCG 2015, Kingston, Ontario, August 10 12, 2015 A Streaming Algorithm for 2-Center with Outliers in High Dimensions Behnam Hatami Hamid Zarrabi-Zadeh Abstract We study the 2-center problem with outliers

More information

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Mahsa Orang Nematollaah Shiri 27th International Conference on Scientific and Statistical Database

More information

Statistical Privacy For Privacy Preserving Information Sharing

Statistical Privacy For Privacy Preserving Information Sharing Statistical Privacy For Privacy Preserving Information Sharing Johannes Gehrke Cornell University http://www.cs.cornell.edu/johannes Joint work with: Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh

More information

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof.

More information

Processing Count Queries over Event Streams at Multiple Time Granularities

Processing Count Queries over Event Streams at Multiple Time Granularities Processing Count Queries over Event Streams at Multiple Time Granularities Aykut Ünal, Yücel Saygın, Özgür Ulusoy Department of Computer Engineering, Bilkent University, Ankara, Turkey. Faculty of Engineering

More information

PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING

PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING C. F. Qiao a, J. Chen b, R. L. Zhao b, Y. H. Chen a,*, J. Li a a College of Resources Science and Technology, Beijing Normal University,

More information

Ranking Sequential Patterns with Respect to Significance

Ranking Sequential Patterns with Respect to Significance Ranking Sequential Patterns with Respect to Significance Robert Gwadera, Fabio Crestani Universita della Svizzera Italiana Lugano, Switzerland Abstract. We present a reliable universal method for ranking

More information

Mining Positive and Negative Fuzzy Association Rules

Mining Positive and Negative Fuzzy Association Rules Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing

More information

6-1. Canonical Correlation Analysis

6-1. Canonical Correlation Analysis 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another

More information

Mining periodic patterns from nested event logs

Mining periodic patterns from nested event logs University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2014 Mining periodic patterns from nested event

More information

CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS. Due to elements of uncertainty many problems in this world appear to be

CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS. Due to elements of uncertainty many problems in this world appear to be 11 CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS Due to elements of uncertainty many problems in this world appear to be complex. The uncertainty may be either in parameters defining the problem

More information

Interesting Patterns. Jilles Vreeken. 15 May 2015

Interesting Patterns. Jilles Vreeken. 15 May 2015 Interesting Patterns Jilles Vreeken 15 May 2015 Questions of the Day What is interestingness? what is a pattern? and how can we mine interesting patterns? What is a pattern? Data Pattern y = x - 1 What

More information

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation 12.3.2008 Lauri Lahti Association rules Techniques for data mining and knowledge discovery in databases

More information

On Information Maximization and Blind Signal Deconvolution

On Information Maximization and Blind Signal Deconvolution On Information Maximization and Blind Signal Deconvolution A Röbel Technical University of Berlin, Institute of Communication Sciences email: roebel@kgwtu-berlinde Abstract: In the following paper we investigate

More information

STANDARDS OF LEARNING CONTENT REVIEW NOTES. ALGEBRA I Part I. 1 st Nine Weeks,

STANDARDS OF LEARNING CONTENT REVIEW NOTES. ALGEBRA I Part I. 1 st Nine Weeks, STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I Part I 1 st Nine Weeks, 2016-2017 OVERVIEW Algebra I Content Review Notes are designed by the High School Mathematics Steering Committee as a resource

More information

A Logical Formulation of the Granular Data Model

A Logical Formulation of the Granular Data Model 2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University

More information

The Ties that Bind Characterizing Classes by Attributes and Social Ties

The Ties that Bind Characterizing Classes by Attributes and Social Ties The Ties that Bind WWW April, 2017, Bryan Perozzi*, Leman Akoglu Stony Brook University *Now at Google. Introduction Outline Our problem: Characterizing Community Differences Proposed Method Experimental

More information

Social Studies 201 September 22, 2003 Histograms and Density

Social Studies 201 September 22, 2003 Histograms and Density 1 Social Studies 201 September 22, 2003 Histograms and Density 1. Introduction From a frequency or percentage distribution table, a statistical analyst can develop a graphical presentation of the distribution.

More information

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit: Frequent Itemsets and Association Rule Mining Vinay Setty vinay.j.setty@uis.no Slides credit: http://www.mmds.org/ Association Rule Discovery Supermarket shelf management Market-basket model: Goal: Identify

More information

Reductions for Frequency-Based Data Mining Problems

Reductions for Frequency-Based Data Mining Problems Reductions for Frequency-Based Data Mining Problems Stefan Neumann University of Vienna Vienna, Austria Email: stefan.neumann@univie.ac.at Pauli Miettinen Max Planck Institute for Informatics Saarland

More information

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation Vu Malbasa and Slobodan Vucetic Abstract Resource-constrained data mining introduces many constraints when learning from

More information

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Anomaly Detection via Over-sampling Principal Component Analysis

Anomaly Detection via Over-sampling Principal Component Analysis Anomaly Detection via Over-sampling Principal Component Analysis Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee Abstract Outlier detection is an important issue in data mining and has been studied in different

More information

Computing Correlation Anomaly Scores using Stochastic Nearest Neighbors

Computing Correlation Anomaly Scores using Stochastic Nearest Neighbors Computing Correlation Anomaly Scores using Stochastic Nearest Neighbors Tsuyoshi Idé IBM Research, Tokyo Research Laboratory Yamato, Kanagawa, Japan goodidea@jp.ibm.com Spiros Papadimitriou Michail Vlachos

More information

A METHOD OF FINDING IMAGE SIMILAR PATCHES BASED ON GRADIENT-COVARIANCE SIMILARITY

A METHOD OF FINDING IMAGE SIMILAR PATCHES BASED ON GRADIENT-COVARIANCE SIMILARITY IJAMML 3:1 (015) 69-78 September 015 ISSN: 394-58 Available at http://scientificadvances.co.in DOI: http://dx.doi.org/10.1864/ijamml_710011547 A METHOD OF FINDING IMAGE SIMILAR PATCHES BASED ON GRADIENT-COVARIANCE

More information

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Exploring Spatial Relationships for Knowledge Discovery in Spatial Norazwin Buang

More information

Maintaining Frequent Itemsets over High-Speed Data Streams

Maintaining Frequent Itemsets over High-Speed Data Streams Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred Ng Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon,

More information

Outlier Detection Using Rough Set Theory

Outlier Detection Using Rough Set Theory Outlier Detection Using Rough Set Theory Feng Jiang 1,2, Yuefei Sui 1, and Cungen Cao 1 1 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences,

More information

Heuristics for The Whitehead Minimization Problem

Heuristics for The Whitehead Minimization Problem Heuristics for The Whitehead Minimization Problem R.M. Haralick, A.D. Miasnikov and A.G. Myasnikov November 11, 2004 Abstract In this paper we discuss several heuristic strategies which allow one to solve

More information

Mining Correlated High-Utility Itemsets using the Bond Measure

Mining Correlated High-Utility Itemsets using the Bond Measure Mining Correlated High-Utility Itemsets using the Bond Measure Philippe Fournier-Viger 1, Jerry Chun-Wei Lin 2, Tai Dinh 3, Hoai Bac Le 4 1 School of Natural Sciences and Humanities, Harbin Institute of

More information

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan Estimating Dominance Norms of Multiple Data Streams Graham Cormode graham@dimacs.rutgers.edu Joint work with S. Muthukrishnan Data Stream Phenomenon Data is being produced faster than our ability to process

More information

ANÁLISE DOS DADOS. Daniela Barreiro Claro

ANÁLISE DOS DADOS. Daniela Barreiro Claro ANÁLISE DOS DADOS Daniela Barreiro Claro Outline Data types Graphical Analysis Proimity measures Prof. Daniela Barreiro Claro Types of Data Sets Record Ordered Relational records Video data: sequence of

More information

130 Important Questions for XI

130 Important Questions for XI 130 Important Questions for XI E T V A 1 130 Important Questions for XI PREFACE Have you ever seen a plane taking off from a runway and going up and up, and crossing the clouds but just think again that

More information

Statistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data

Statistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data Statistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data 999 Prentice-Hall, Inc. Chap. 9 - Chapter Topics Comparing Two Independent Samples: Z Test for the Difference

More information

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,

More information

Searching Dimension Incomplete Databases

Searching Dimension Incomplete Databases IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO., JANUARY 3 Searching Dimension Incomplete Databases Wei Cheng, Xiaoming Jin, Jian-Tao Sun, Xuemin Lin, Xiang Zhang, and Wei Wang Abstract

More information

Prime Analysis in Binary

Prime Analysis in Binary Prime Analysis in Binary Brandynne Cho Saint Mary s College of California September 17th, 2012 The best number is 73. [...] 73 is the 21st prime number. Its mirror, 37, is the 12th, and its mirror, 21,

More information

Part 1: Hashing and Its Many Applications

Part 1: Hashing and Its Many Applications 1 Part 1: Hashing and Its Many Applications Sid C-K Chau Chi-Kin.Chau@cl.cam.ac.u http://www.cl.cam.ac.u/~cc25/teaching Why Randomized Algorithms? 2 Randomized Algorithms are algorithms that mae random

More information

Rare Event Discovery And Event Change Point In Biological Data Stream

Rare Event Discovery And Event Change Point In Biological Data Stream Rare Event Discovery And Event Change Point In Biological Data Stream T. Jagadeeswari 1 M.Tech(CSE) MISTE, B. Mahalakshmi 2 M.Tech(CSE)MISTE, N. Anusha 3 M.Tech(CSE) Department of Computer Science and

More information

Algorithms for Characterization and Trend Detection in Spatial Databases

Algorithms for Characterization and Trend Detection in Spatial Databases Published in Proceedings of 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) Algorithms for Characterization and Trend Detection in Spatial Databases Martin Ester, Alexander

More information

User-Driven Ranking for Measuring the Interestingness of Knowledge Patterns

User-Driven Ranking for Measuring the Interestingness of Knowledge Patterns User-Driven Ranking for Measuring the Interestingness of Knowledge s M. Baumgarten Faculty of Informatics, University of Ulster, Newtownabbey, BT37 QB, UK A.G. Büchner Faculty of Informatics, University

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/17/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.

More information

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber. CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication

CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication March, 2006 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform

More information

Session-Based Queueing Systems

Session-Based Queueing Systems Session-Based Queueing Systems Modelling, Simulation, and Approximation Jeroen Horters Supervisor VU: Sandjai Bhulai Executive Summary Companies often offer services that require multiple steps on the

More information

Multi-scale anomaly detection algorithm based on infrequent pattern of time series

Multi-scale anomaly detection algorithm based on infrequent pattern of time series Journal of Computational and Applied Mathematics 214 (2008) 227 237 www.elsevier.com/locate/cam Multi-scale anomaly detection algorithm based on infrequent pattern of time series Xiao-yun Chen, Yan-yan

More information

Analysis of Variance and Co-variance. By Manza Ramesh

Analysis of Variance and Co-variance. By Manza Ramesh Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method

More information

Hierarchies of sustainability in a catchment

Hierarchies of sustainability in a catchment Sustainable Development and Planning IV, Vol. 2 635 Hierarchies of sustainability in a catchment N. Dunstan School of Science and Technology, University of New England, Australia Abstract This paper investigates

More information

Approximate counting: count-min data structure. Problem definition

Approximate counting: count-min data structure. Problem definition Approximate counting: count-min data structure G. Cormode and S. Muthukrishhan: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55 (2005) 58-75. Problem

More information

Statistics 3 WEDNESDAY 21 MAY 2008

Statistics 3 WEDNESDAY 21 MAY 2008 ADVANCED GCE 4768/01 MATHEMATICS (MEI) Statistics 3 WEDNESDAY 1 MAY 008 Additional materials: Answer Booklet (8 pages) Graph paper MEI Examination Formulae and Tables (MF) Afternoon Time: 1 hour 30 minutes

More information

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany Mining Rank Data Sascha Henzgen and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {sascha.henzgen,eyke}@upb.de Abstract. This paper addresses the problem of mining rank

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 05 Sequential Pattern Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

SUFFIX TREE. SYNONYMS Compact suffix trie

SUFFIX TREE. SYNONYMS Compact suffix trie SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix

More information

On Multi-Class Cost-Sensitive Learning

On Multi-Class Cost-Sensitive Learning On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract

More information

Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams

Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams Khantil Patel, Orland Hoeber, and Howard J. Hamilton Department of Computer Science University of Regina, Canada patel26k@uregina.ca,

More information

Scalable Hierarchical Recommendations Using Spatial Autocorrelation

Scalable Hierarchical Recommendations Using Spatial Autocorrelation Scalable Hierarchical Recommendations Using Spatial Autocorrelation Ayushi Dalmia, Joydeep Das, Prosenjit Gupta, Subhashis Majumder, Debarshi Dutta Ayushi Dalmia, JoydeepScalable Das, Prosenjit Hierarchical

More information

Constructing comprehensive summaries of large event sequences

Constructing comprehensive summaries of large event sequences Constructing comprehensive summaries of large event sequences JERRY KIERNAN IBM Silicon Valley Lab and EVIMARIA TERZI IBM Almaden Research Center Event sequences capture system and user activity over time.

More information

Comprehensive Evaluation of Social Benefits of Mineral Resources Development in Ordos Basin

Comprehensive Evaluation of Social Benefits of Mineral Resources Development in Ordos Basin Studies in Sociology of Science Vol. 4, No. 1, 2013, pp. 25-29 DOI:10.3968/j.sss.1923018420130401.2909 ISSN 1923-0176 [Print] ISSN 1923-0184 [Online] www.cscanada.net www.cscanada.org Comprehensive Evaluation

More information

CPT+: A Compact Model for Accurate Sequence Prediction

CPT+: A Compact Model for Accurate Sequence Prediction CPT+: A Compact Model for Accurate Sequence Prediction Ted Gueniche 1, Philippe Fournier-Viger 1, Rajeev Raman 2, Vincent S. Tseng 3 1 University of Moncton, Canada 2 University of Leicester, UK 3 National

More information

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Anomaly Detection for the CERN Large Hadron Collider injection magnets Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing

More information

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional -Series Data Xiaolei Li, Jiawei Han University of Illinois at Urbana-Champaign VLDB 2007 1 Series Data Many applications produce time series

More information

CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication

CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform polynomial multiplication

More information

Models, Data, Learning Problems

Models, Data, Learning Problems Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,

More information

Compression in the Space of Permutations

Compression in the Space of Permutations Compression in the Space of Permutations Da Wang Arya Mazumdar Gregory Wornell EECS Dept. ECE Dept. Massachusetts Inst. of Technology Cambridge, MA 02139, USA {dawang,gww}@mit.edu University of Minnesota

More information