STATISTICAL PERFORMANCE

Similar documents
Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems

Energy-efficient scheduling

Scheduling Lecture 1: Scheduling on One Machine

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland

Saving Energy in Sparse and Dense Linear Algebra Computations

Resilient and energy-aware algorithms

WDCloud: An End to End System for Large- Scale Watershed Delineation on Cloud

CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When

Energy-aware scheduling for GreenIT in large-scale distributed systems

Rainfall data analysis and storm prediction system

Scheduling Lecture 1: Scheduling on One Machine

CycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators

Reclaiming the energy of a schedule: models and algorithms

How to deal with uncertainties and dynamicity?

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland

Tradeoff between Reliability and Power Management

One Optimized I/O Configuration per HPC Application

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems

Andrew Morton University of Waterloo Canada

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Generalized Network Flow Techniques for Dynamic Voltage Scaling in Hard Real-Time Systems

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing

Combinatorial Auction-Based Allocation of Virtual Machine Instances in Clouds

Knowledge Discovery and Data Mining 1 (VO) ( )

Optimal Voltage Allocation Techniques for Dynamically Variable Voltage Processors

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA

The Power of Isolation

The Concurrent Consideration of Uncertainty in WCETs and Processor Speeds in Mixed Criticality Systems

Distributed Architectures

Requests Prediction in Cloud with a Cyclic Window Learning Algorithm

An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018

Single Machine Models

ENERGY EFFICIENT TASK SCHEDULING OF SEND- RECEIVE TASK GRAPHS ON DISTRIBUTED MULTI- CORE PROCESSORS WITH SOFTWARE CONTROLLED DYNAMIC VOLTAGE SCALING

Energy-aware checkpointing of divisible tasks with soft or hard deadlines

Algorithm Design. Scheduling Algorithms. Part 2. Parallel machines. Open-shop Scheduling. Job-shop Scheduling.

CS425: Algorithms for Web Scale Data

Project 2: Hadoop PageRank Cloud Computing Spring 2017

Energy-aware Scheduling on Multiprocessor Platforms with Devices

Uncertainty Quantification in Performance Evaluation of Manufacturing Processes

Thermal Scheduling SImulator for Chip Multiprocessors

Lecture 2: Metrics to Evaluate Systems

Multi-core Real-Time Scheduling for Generalized Parallel Task Models

Yale University Department of Computer Science

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Saving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors

On Minimizing Total Energy Consumption in the Scheduling of Virtual Machine Reservations

Computation Offloading Strategy Optimization with Multiple Heterogeneous Servers in Mobile Edge Computing

Lecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013

An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters

Behavioral Simulations in MapReduce

Parallel Performance Theory

Spatial Analytics Workshop

Online Work Maximization under a Peak Temperature Constraint

CPU scheduling. CPU Scheduling

Visualizing Big Data on Maps: Emerging Tools and Techniques. Ilir Bejleri, Sanjay Ranka

Tuan V. Dinh, Lachlan Andrew and Philip Branch

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

Minimization of Energy Loss using Integrated Evolutionary Approaches

Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach. Hwisung Jung, Massoud Pedram

Optimal DPM and DVFS for Frame-Based Real-Time Systems

Integrated Electricity Demand and Price Forecasting

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Large-Scale Behavioral Targeting

Module 5: CPU Scheduling

An Energy-Efficient Semi-Partitioned Approach for Hard Real-Time Systems with Voltage and Frequency Islands

Online Scheduling Switch for Maintaining Data Freshness in Flexible Real-Time Systems

Scheduling on Unrelated Parallel Machines. Approximation Algorithms, V. V. Vazirani Book Chapter 17

FIT2009( 第 8 回情報科学技術フォーラム ) 12.8% A Selection Algorithm of Tasks for Applying DVS toward a Power-aware Task Scheduling

Linear Programming. Scheduling problems

EDF Feasibility and Hardware Accelerators

Chapter 6: CPU Scheduling

A Dynamic Real-time Scheduling Algorithm for Reduced Energy Consumption

V.4 MapReduce. 1. System Architecture 2. Programming Model 3. Hadoop. Based on MRS Chapter 4 and RU Chapter 2 IR&DM 13/ 14 !74

Graph Analysis Using Map/Reduce

Real-time Systems: Scheduling Periodic Tasks

Multi-core Real-Time Scheduling for Generalized Parallel Task Models

I N T R O D U C T I O N : G R O W I N G I T C O M P L E X I T Y

Segment-Fixed Priority Scheduling for Self-Suspending Real-Time Tasks

On Two Class-Constrained Versions of the Multiple Knapsack Problem

Network Flow Techniques for Dynamic Voltage Scaling in Hard Real-Time Systems 1

TEMPORAL WORKLOAD ANALYSIS AND ITS APPLICATION TO POWER-AWARE SCHEDULING

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

CPU Scheduling. CPU Scheduler

Bi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines

Single Machine Problems Polynomial Cases

RCPSP Single Machine Problems

Revenue Maximization in a Cloud Federation

Lecture 2: Scheduling on Parallel Machines

ANALYTICAL MODELING AND SIMULATION OF THE ENERGY CONSUMPTION OF INDEPENDENT TASKS

Algorithms Design & Analysis. Approximation Algorithm

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden

CPU SCHEDULING RONG ZHENG

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

Quantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA

IN recent years, processor performance has increased at the

Failure Tolerance of Multicore Real-Time Systems scheduled by a Pfair Algorithm

Wind power and management of the electric system. EWEA Wind Power Forecasting 2015 Leuven, BELGIUM - 02/10/2015

Transcription:

STATISTICAL PERFORMANCE PROVISIONING AND ENERGY EFFICIENCY IN DISTRIBUTED COMPUTING SYSTEMS Nikzad Babaii Rizvandi 1 Supervisors: Prof.Albert Y.Zomaya Prof. Aruna Seneviratne

OUTLINE Introduction Background Our research direction Part 1 - Energy efficiency in Hardware using DVFS Background Multiple frequencies selection algorithm Observations and mathematical proofs Part 2 - Statistical performance provisioning for jobs on MapReduce framework Background Statistical pattern matching for finding similarity between MapReduce applications grouping/ classification Statistical learning for performance analysis and provisioning of MapReduce jobs modeling 2

INTRODUCTION World data centres used* 1% of the total world electricity consumption in 2005 equivalent to about seventeen 1000MW power plants growth of around 76% in five years period (2000-2005, 2005-2010) Improper usage of 44 million servers in the world ** Produce 11.8 million tons of carbon dioxide Equivalent of 2.1 million cars * Koomey, J.G., Worldwide electricity used in data centres, Environ.Res. Lett., 2008 ** Alliance to Save Energy (ASE) and 1E, "Server Energy and Efficiency Report," 2009 3

INTRODUCTION U.S. data centres used 61 billion kwh of electricity in 2006** 1.5% of all U.S. electricity consumption double compare to 2000 Estimation of 12% growth per year Improper usage of server in U.S. produces 3.17 million tons of carbon dioxide Equal to 580,678 cars * Koomey, J.G., Worldwide electricity used in data centres, Environ.Res. Lett., 2008 ** Alliance to Save Energy (ASE) and 1E, "Server Energy and Efficiency Report," 2009 4

ENERGY CONSUMPTION IN DISTRIBUTED SYSTEMS 5

OUR RESEARCH 6

PART 1 - ENERGY EFFICIENCY IN DISTRIBUTED COMPUTING SYSTEMS USING DVFS 7

BACKGROUND Dynamic Voltage Frequency Scaling In CMOS circuits: E = kfv2 t f v Example: f f = f 2 t = 2t E = k(f ) 2 t = k E f 3 t f 3 8 2t = E 4 8

BACKGROUND Dynamic Voltage Frequency Scaling Each task has a maximum time restriction. In hypothetical world processors has continuous frequency. For k th task, Optimum Continuous Frequency (f optimum (k) ) is a frequency that uses the maximum time restriction of the task. 9

BACKGROUND Dynamic Voltage Frequency Scaling In reality processors has a discrete set of frequencies f 1 > > f N. (k) The best frequency is slightly over f optimum. 10

BACKGROUND Energy-Aware Task Scheduling Using DVFS Research question: What is the suitable frequency selection to schedule tasks on a set of processors (1) to meet tasks time restrictions, (2) to consume less energy Energy-aware scheduling scheme: Optimize a new cost function including energy Cost function = f(makespan, energy) e.g., Cost function = α Makespan + β Energy 11

BACKGROUND Energy-Aware Task Scheduling Using DVFS Slack reclamation On top of scheduled tasks Any slack on processors is used to reduce the speed of running task 12

OUR WORK Multiple Frequency Selection (MFS-DVFS) for Slack Reclamation Current scheme: use one frequency for a task (RDVFS algorithm) Our idea: use linear combination of all processor s frequencies for each task. 13

OUR WORK - Optimization Problem in MFS-DVFS In literature: P d f, v = kfv 2 For k th task in scheduling N N Minimize E (k) = t i (k) Pd f i, v i + P Idle T (k) t i (k) subject to: N i=1 i=1 1. t k i f i = K k i=1 N 2. t i k T k i=1 3. t i k 0, for i = 1,2,, N 14

MFS-DVFS Algorithm Input: the scheduled tasks on a set of P processors 1. For k th task (A (k) ) scheduled on processor P j 2. Solve optimization problem by linear programming 3. end for 4. return (the voltages and frequencies of optimal execution of the task) 15

OBSERVATION Mathematical Proofs* In literature: P d f, v = kfv 2 Generalization: P d f, v If f i, v i < f j, v j then P d f i, v i < P d f j, v j Observation #1 In hypothetical world, the cont. frequency that uses maximum time restriction of k th task gives the (k) optimum energy saving (f optimum ) Observation #2 For k th task, always up to two voltage-frequencies are involved in optimal energy consumption (f i < f j ) Proof: theorem 1, 2 * N.B.Rizvandi, et al., Some observations on optimal frequency selection in DVFSbased energy consumption minimization, J. Parallel Distrib. Comput. 71(8): 1154-1164 (2011) 16

OBSERVATION Mathematical Proofs* Observation #3 (k) f i < f optimum < f j Proof: lemma 1, 2 Observation #4 The associated time for these frequencies (t i, t j ) is Proof: Corollary 1 t i (k) = K (k) T (k) f i f j f i t j (k) = T (k) f j K (k) f j f i * N.B.Rizvandi, et al., Some observations on optimal frequency selection in DVFSbased energy consumption minimization, J. Parallel Distrib. Comput. 71(8): 1154-1164 (2011) 17

OBSERVATION Mathematical Proofs* Observation #5 The consumed energy of processor for this task associated with these two voltage-frequencies is E (k) = T k f j K k f j f i P d f i, v i + K(k) T (k) f i f j f i P d f j, v j Proof: Corollary 2 Observation #6 Using less time to execute the task results in more energy consumption Proof: theorem 3 * N.B.Rizvandi, et al., Some observations on optimal frequency selection in DVFSbased energy consumption minimization, J. Parallel Distrib. Comput. 71(8): 1154-1164 (2011) 18

OBSERVATION Mathematical Proofs* In a simplified version, f v then P d f, v = λf 3 Observation #7 These two frequencies are neighbors. i.e., two (k) immediate frequencies around f optimum Proof: theorem 4, 5 * N.B.Rizvandi, et al., Some observations on optimal frequency selection in DVFSbased energy consumption minimization, J. Parallel Distrib. Comput. 71(8): 1154-1164 (2011) 19

MFS-DVFS New Algorithm Input: the scheduled tasks on a set of P processors 1. For k th task (A (k) ) scheduled on processor P j (k) 2. Calculate f optimum 3. Select the neighbour frequencies in the processor s frequency set before and after (k) (k) (k) f optimum. These frequencies are f RD and frd 1. 4. Calculate associated times and energy consumption. (k) (k), frd 1 5. Select f RD energy for this task 6. end for associated to the lowest return (individual frequencies pair for execution of each 20 task)

EXPERIMENTAL RESULTS - Simulator Simulation settings Scheduler List scheduler List scheduler with Longest Processing Time First (LPT) List scheduler with Shortest Processing Time First (SPT) Processor power model * Transmeta Crusoe Intel Xscale Two synthetic processors 21

EXPERIMENTAL RESULTS - Simulator Task graphs (DAG) Random LU-decomposition Gauss-Jordan Assumption: switching time between frequencies can be ignored compare to task execution time Experimental parameters Parameter Value # of tasks 100, 200, 300, 400, 500 # of processors 2, 4, 8, 16, 32 22

EXPERIMENTAL RESULTS Results (1) 23

EXPERIMENTAL RESULTS Results (2) 24

EXPERIMENTAL RESULTS Results (3) Experiment Random tasks Gauss- Jordan LUdecomposition RDVFS 13.00% 0.1% 24.8% MFS-DVFS 14.40% 0.11% 27.0% Optimum Continuous Frequency 14.84% 0.14% 27.81% 25

PART 2 PERFORMANCE ANALYSIS AND PROVISIONING IN MAPREDUCE CLUSTERS 26

BACKGROUND MapReduce A parametric distributed processing framework for processing large data sets. Introduced by Google in 2004. Typically used for distributed computing on clusters of computers. Widely used in Google, Yahoo, Facebook and LinkedIn. Hadoop is famous open source version of MapReduce developed by Apache 27

MOTIVATION Many users have job completion time goals There is strong correlation between completion time and values of configuration parameters No support from current service providers, e.g. Amazon Elastic MapReduce Sole responsibility of user to set values for these parameters. Our research work Calculate similarity between MapReduce applications. It is too likely similar applications show similar performance for the same values of configuration parameters. Estimate a function to model the dependency between completion time and configuration parameters by using Machine Learning techniques on historical data. 28

CLASSIFICATION Classification Two MapReduce applications belong to the same group if they have similar CPU utilization pattern for several identical jobs. An identical job in two applications means they run with the same values for configuration parameters the same size of small input data. hypothesis Similar Applications share the same optimal values for the configuration parameters. obtain the optimal values of configuration parameters for one application and use for others in the same group. 29

CLASSIFICATION - Uncertainty Uncertainty in CPU utilization pattern Variation in values of each point in CPU utilization pattern for identical jobs of an application 30

CLASSIFICATION Similarity (1) High similarity is equal to low distance Current scheme Average values of each point in patterns Apply Dynamic Time Warping (DTW) on average CPU patterns patterns become the same length Calculate Euclidean distance between two average patterns (i.e., φ avr. and χ avr. ) N i=1 (φ avr. i χ avr. [i]) 2 r predefined distance threshold r 31

CLASSIFICATION Similarity (2) Our idea* Comes from computational finance background. Each point in the pattern has a Gaussian distribution. After DTW between two patterns φ u, χ u, use uncertain Euclidean distance as P DST φ u, χ u = (φ u i χ u i ) 2 N i=1 r τ φ u i = N(mean(φ u i ), var(φ u i )) φ c i e φ χ u j = N(mean(χ u [j]) + var(χ u [j])) χ c [j] 0 < τ 1 is predefined probability threshold e χ j j * N.B.Rizvandi, et al., A Study on Using Uncertain Time Series Matching Algorithms in Map-Reduce Applications, Concurrency and Computation: Practice and Experience, 2012 32

CLASSIFICATION Similarity (3) Calculate r boundry as r boundry = r 2 boundry,norm N i=1 N i=1 Var D 2 i E(D 2 [i]) r boundry,norm = 2 erf 1 (2τ 1) If choose r r boundry, this guarantees that P DST φ u, χ u r boundry τ So, r boundry is the minimum distance between two patterns with probability τ P DST φ u, χ u = r boundry = τ * N.B.Rizvandi, et al., A Study on Using Uncertain Time Series Matching Algorithms in Map-Reduce Applications, Concurrency and Computation: Practice and Experience, 2012 33

CLASSIFICATION technique * N.B.Rizvandi, et al., A Study on Using Uncertain Time Series Matching Algorithms in Map-Reduce Applications, Concurrency and Computation: Practice and Experience, 2012 34

EXPERIMENTAL RESULTS Setting Hadoop cluster settings Five servers, dual-core Xen Cloud Platform (XCP) Xen-API to measure performance statistics 10 Virtual machines Application settings Four legacy applications: WordCount, TeraSort, Exim Mainlog parsing, Distributed Grep Input data size 5GB, 10GB, 15GB, 20GB # of map/reduce tasks 4, 8, 12, 16, 20, 24, 28, 32 Total number of runs in our experiments 8 8 5 4 10 4 = 51200 35

EXPERIMENTAL RESULTS Results (1) r boundry between the applications for τ = 0.95 for 5G of input data on 10 virtual nodes. 36

EXPERIMENTAL RESULTS Results (2) WordCount vs Exim Mainlog parsing Terasort vs. Exim Mainlog parsing 250000 300000 200000 250000 150000 200000 150000 100000 100000 50000 50000 0 5 10 15 20 Set-1 Set-2 Set-3 Set-4 0 5 10 15 20 Set-1 Set-2 Set-3 Set-4 Set-5 Set-6 Set-7 Set-8 Set-5 Set-6 Set-7 Set-8 The size of input data vs. r boundry for τ = 0.95 for 5G, 10G, 15G and 20G of input data on 10 virtual nodes. 37

HOW TO USE THIS TECHNIQUE For frequent running jobs (e.g., Indexing, Sorting, Searching), run jobs with different values of conf. parameters For a new job, try to find most similar application in DB (using this algorithm). Then use its optimal values of parameters for running the new job. 38

MOTIVATION Many users have job completion time goals There is strong correlation between completion time and values of configuration parameters No support from current service providers, e.g. Amazon Elastic MapReduce Sole responsibility of user to set values for these parameters. Our research work Calculate similarity between MapReduce applications. It is too likely similar applications show similar performance for the same values of configuration parameters. Estimate a function to model the dependency between completion time and configuration parameters by using Machine Learning techniques on historical data. 39

PERFORMANCE MODELLING AND PROVISIONING Our idea (1) Profiling historical data Run applications for different values of configuration parameters (i.e., # of map tasks, # of reduce tasks, input data size) Profile completion time Model selection (completion time vs. parameters) polynomial regression has been widely used in literature. It estimates a global function which may not be accurate enough at many points. Suffers from over-fitting and under-fitting Needs user observation of data to find the suitable polynomial fit function 40 Cholesterol level (y-axis) vs. Age (x-axis)

PERFORMANCE MODELLING AND PROVISIONING Our idea (2) We use a combination of K-NN and regression to improve accuracy K neighbours of a point are chosen. A linear regression function is used to fit on data. Generally higher accuracy than regression. Can be chosen automatically. 41 Cholesterol level (y-axis) vs. Age (x-axis)

PERFORMANCE MODELLING AND PROVISIONING Our idea (3) Prediction accuracy Mean Absolute Percentage Error (MAPE) t (i) (i) exc. t exc. M i=1 t exc. (i) MAPE = M R 2 prediction accuracy M R 2 i=1(t (i) exc. t (i) exc. ) 2 = 1 M (t (i) M i=1 exc. M t (r) exc. r=1 ) 2 42

EXPERIMENTAL RESULTS Setting Hadoop cluster settings Five servers, dual-core SysStat to measure performance statistics Hadoop 0.20.0 Application settings Three applications: WordCount, TeraSort, Exim Mainlog Input data size 3GB, 6GB, 8GB, 10GB # of map/reduce tasks 80 number randomly chosen between 4 to 100 56 experiments for model estimation, rest for model testing 43

Time (in Sec.) EXPERIMENTAL RESULTS Results (1) 1850 1800 Completion time in TeraSort Actual data Predicted values 1750 1700 1650 1600 1550 44 1500 0 5 10 15 20 25 30 Experiment ID

EXPERIMENTAL RESULTS Results (3) MAPE R 2 prediction accuracy WordCount 1.51% 0.83 Exim MainLog parsing 3.1% 0.76 TeraSort 2.33% 0.79 45

HOW TO USE THESE TECHNIQUES For frequent running jobs (e.g., Indexing, Sorting, Searching), run jobs with different values of conf. parameters Fit a model (based on second technique) and calculate the optimal values of parameters which minimizes completion time. Put this application + optimal values into DB For a new job, try to find most similar application in DB (using first technique). Then use its optimal values of parameters for running the new job. Reducing completion time indirectly reduces energy consumption. 46

References 1) Koomey, J.G., Worldwide electricity used in data centres, Environ.Res. Lett., 2008 2) Alliance to Save Energy (ASE) and 1E, "Server Energy and Efficiency Report," 2009 3) Nikzad Babaii Rizvandi, et.al, "Multiple Frequency Selection in DVFS-Enabled Processors to Minimize Energy Consumption," in Energy Efficient Distributed Computing, 2011, John Wiley Publisher 4) Nikzad Babaii Rizvandi, et.al, Some observations on Optimal Frequency Selection in DVFS based Energy Consumption Minimization in Clusters," Journal of Parallel and Distributed Computing (JPDC), 2011. 5) Nikzad Babaii Rizvandi, et.al, "Linear Combinations of DVFS-enabled Processor Frequencies to Modify the Energy-Aware Scheduling Algorithms," CCGrid 2010 6) Nikzad Babaii Rizvandi, et.al, A Study on Using Uncertain Time Series Matching Algorithms in Map-Reduce Applications, Concurrency and Computation: Practice and Experience, 2012 7) Nikzad Babaii Rizvandi, et.al, Network Load Analysis and Provisioning for MapReduce Applications, PDCAT-2012 8) Data-Intensive Workload Consolidation on Hadoop Distributed File System, Grid 12 47 9) Nikzad Babaii Rizvandi, et.al, "On Modelling and Prediction of Total CPU Utilization for MapReduce Applications, ICA3PP 2012