Detection of Highly Correlated Live Data Streams

Similar documents
CPU Scheduling Exercises

CPU SCHEDULING RONG ZHENG

Scheduling I. Today Introduction to scheduling Classical algorithms. Next Time Advanced topics on scheduling

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling

Proportional Share Resource Allocation Outline. Proportional Share Resource Allocation Concept

Chapter 6: CPU Scheduling

TDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts.

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

CPU Scheduling. CPU Scheduler

TDDB68 Concurrent programming and operating systems. Lecture: CPU Scheduling II

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

CSE 380 Computer Operating Systems

Module 5: CPU Scheduling

CPU scheduling. CPU Scheduling

ICS 233 Computer Architecture & Assembly Language

Environment (Parallelizing Query Optimization)

Large-Scale Behavioral Targeting

Scheduling. Uwe R. Zimmer & Alistair Rendell The Australian National University

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song

Efficient TDM-based Arbitration for Mixed-Criticality Systems on Multi-Cores

Utility Maximizing Routing to Data Centers

Analysis of Software Artifacts

Anomaly Detection for the CERN Large Hadron Collider injection magnets

2/5/07 CSE 30341: Operating Systems Principles

Probabilistic real-time scheduling. Liliana CUCU-GROSJEAN. TRIO team, INRIA Nancy-Grand Est

Process Scheduling. Process Scheduling. CPU and I/O Bursts. CPU - I/O Burst Cycle. Variations in Bursts. Histogram of CPU Burst Times

Online Scheduling Switch for Maintaining Data Freshness in Flexible Real-Time Systems

AS computer hardware technology advances, both

Andrew Morton University of Waterloo Canada

Shedding the Shackles of Time-Division Multiplexing

Real-Time Systems. Lecture #14. Risat Pathan. Department of Computer Science and Engineering Chalmers University of Technology

Dynamic Time Quantum based Round Robin CPU Scheduling Algorithm

Window-aware Load Shedding for Aggregation Queries over Data Streams

Moment-based Availability Prediction for Bike-Sharing Systems

Lecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013

LSN 15 Processor Scheduling

Large-Sample Likelihood Ratio Tests

A Note on Symmetry Reduction for Circular Traveling Tournament Problems

IL-Miner: Instance-Level Discovery of Complex Event Patterns

Improved Deadline Monotonic Scheduling With Dynamic and Intelligent Time Slice for Real-time Systems

CS246 Final Exam. March 16, :30AM - 11:30AM

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

Qualitative vs Quantitative metrics

CPU Scheduling. Heechul Yun

Real-time operating systems course. 6 Definitions Non real-time scheduling algorithms Real-time scheduling algorithm

Symbolic Control of Incrementally Stable Systems

EDF Feasibility and Hardware Accelerators

Mining State Dependencies Between Multiple Sensor Data Sources

Failure Tolerance of Multicore Real-Time Systems scheduled by a Pfair Algorithm

Real-Time Scheduling and Resource Management

CHAPTER 5 - PROCESS SCHEDULING

Scheduling Algorithms for Multiprogramming in a Hard Realtime Environment

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Segment-Fixed Priority Scheduling for Self-Suspending Real-Time Tasks

Forecasting demand in the National Electricity Market. October 2017

Data Mining. CS57300 Purdue University. March 22, 2018

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems

Distributed Optimization. Song Chong EE, KAIST

Clock-driven scheduling

Stochastic Optimization for Undergraduate Computer Science Students

Liquidation in Limit Order Books. LOBs with Controlled Intensity

Half Life Variable Quantum Time Round Robin (HLVQTRR)

Comp 204: Computer Systems and Their Implementation. Lecture 11: Scheduling cont d

Embedded Systems Development

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Modeling and Tuning Parallel Performance in Dense Linear Algebra

2WB05 Simulation Lecture 7: Output analysis

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069

Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources

Simulation of Process Scheduling Algorithms

CS 550 Operating Systems Spring CPU scheduling I

Networked Embedded Systems WS 2016/17

The Separation Problem for Binary Decision Diagrams

Last class: Today: Threads. CPU Scheduling

Embedded Systems 15. REVIEW: Aperiodic scheduling. C i J i 0 a i s i f i d i

Scheduling of Frame-based Embedded Systems with Rechargeable Batteries

A Framework for Automated Competitive Analysis of On-line Scheduling of Firm-Deadline Tasks

Clustering non-stationary data streams and its applications

StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory

Component-Based Software Design

Maintaining Frequent Itemsets over High-Speed Data Streams

Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques

On a Design Space for Aggregating over Sliding Windows on a Stream

Optimizing Energy Consumption under Flow and Stretch Constraints

PART 4 INTEGER PROGRAMMING

These are special traffic patterns that create more stress on a switch

An Online Algorithm for Maximizing Submodular Functions

Cell throughput analysis of the Proportional Fair scheduler in the single cell environment

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Data Mining und Maschinelles Lernen

Real-Time Scheduling. Real Time Operating Systems and Middleware. Luca Abeni

Schedulability of Periodic and Sporadic Task Sets on Uniprocessor Systems

Revamped Round Robin Scheduling Algorithm

REGRESSION TREE CREDIBILITY MODEL

Transcription:

BIRTE 17 Detection of Highly Correlated Live Data Streams R. Alseghayer, Daniel Petrov, P.K. Chrysanthis, M. Sharaf, A. Labrinidis University of Pittsburgh The University of Queensland

Motivation U, m, <CPU, Mem, Net1, Net1, <CPU, Net2, Mem, Net2, Net3> Net1, <CPU, Mem, Net3> Net2, Net1, <CPU, Mem, Net3> Net2, Net1, <CPU, Mem, Net3> Net2, Net1, Mem, <CPU, Net3> Net2, Net1, Mem, <CPU, Net3> Net2, Net1, Mem, <CPU, Net3> Net2, Net1, <CPU, Mem, Net3> Net2, Net1, Mem, <CPU, Net3> Net2, Net1, Mem, Net3> Net2, Net1, NeN 2

System Model tuple t = (timestamp, value) Interval Deadline Micro Sliding Set of data window batch streams 3

Problem Definition For each micro-batch B of a set of data streams DS with an arrival interval I and deadline d, detect the pairs from DS, each of which has at least A correlated sliding windows with Pearson Correlation Coefficient (PCC) threshold τ, by the deadline d Challenges: deadline d = interval I number of pairs ~ DS 2 data produced at high velocity, in real time 4

Goal and Approach Goal maximize the number of identified number of pairs within a deadline #identified correlated pairs DCS_Precision = Total # correlated pairs Approach scheduling principles early termination pruning caching 5

Outline Background Detection of Correlated Streams (DCS) mode Experimental Evaluation Conclusions 6

Pearson Correlation Coefficient (PCC) 2 pass corr x, y = m (xi μ x )(y i μ y ) i=1 σ x σ y 1 pass (5 sufficient statistics) cov corr x, y = varx vary Where: sumx sumy cov = sumprodxy, varx = sumxx (sumx)2 m m vary = sumyy (sumy)2 m 7,

Basic Algorithms Caching 1 data pass with incremental computation of PCC ibraid * round-robin scheduler fair PriCe * priority scheduler informed priority function Pr = corr ( M )/C totalexp * D. Petrov et al, Interactive Exploration of Correlated Time Series, ExploreDB 17 8

Outline Background Detection of Correlated Streams (DCS) mode Experimental Evaluation Conclusions 9

DCS Mode Early Termination Pruning A A correlatedwindows > (I slidingwindowposition) 10

Start Phase (S) Start Phase 11

Cold Start Phase S 12

Warm Start Phase 13

Warm High Phase Scheduler: Promising Non Promising 14

Warm Low Phase Scheduler: Promising Non Promising 15

Outline Background Detection of Correlated Streams (DCS) mode Experimental Evaluation Conclusions 16

Evaluation Metrics Execution cost # of operations to produce a result Precision optimization criterion DCS_Precision = #identified correlated pairs Total # correlated pairs 17

Dataset Yahoo Financial Historical Data 53 companies on the NYSE for the last 28 y. each has 6 data streams (318 in total) each of length 7100 tuples data granularity is a day Values for each company opening CPUprice, closing Memory price, highest Net1price, lowest Net2 price, amount Net3 of shares traded, and the Net4 adjusted close for that day 18

Parameters Parameter Value(s) PCC threshold (τ) [0.75, 0.90] Target # of Correlated Windows (A) [112, 225, 450] Interval (I) 900 tuples (180 seconds) Deadline (d) [25%, 50%, 75%]* Interval Window length (w) 8 # data streams 72 # of micro batches 4 19

Exp1 (A=450, d=, τ = 0.75) 35% decrease 1812 1822 1822 1787 429 580 236 234 Baseline 20

Exp2 (A=112, d=25%, τ = 0.9) 5x 21

Exp2 (A=450, d=25%, τ = 0.9) 22

Exp3 (A=112, d=50%, τ = 0.9) 30% 23

Exp3 (A=112, d=50%, τ = 0.75) 24

Conclusions We proposed DCS mode of operation, which combines scheduling, early termination, pruning and caching avoids unnecessary computations and produces at least twice as many results at reduced cost Future work: investigate new methods (exploitation vs diversity), sensitivity analysis, experiment with more datasets 25

5 Sufficient Statistics w sumx = x i, sumxx = sumy = i=1 w i=1 y i, sumyy = w sumprodxy = x i y i i=1 w i=1 w i=1 x i 2, y i 2, 26