Scalability Evaluation of Big Data Processing Services in Clouds

Similar documents
Chapter 3: Cluster Analysis

ENSC Discrete Time Systems. Project Outline. Semester

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

Performance Bounds for Detect and Avoid Signal Sensing

Churn Prediction using Dynamic RFM-Augmented node2vec

NGSS High School Physics Domain Model

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

Web-based GIS Systems for Radionuclides Monitoring. Dr. Todd Pierce Locus Technologies

Eric Klein and Ning Sa

SPECIMEN. Candidate Surname. Candidate Number

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Design and Simulation of Dc-Dc Voltage Converters Using Matlab/Simulink

EEO 401 Digital Signal Processing Prof. Mark Fowler

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lifting a Lion: Using Proportions

Part 3 Introduction to statistical classification techniques

Optimization Programming Problems For Control And Management Of Bacterial Disease With Two Stage Growth/Spread Among Plants

Assessment Primer: Writing Instructional Objectives

Autonomic Power Management Schemes for Internet Servers and Data Centers. L. Mastroleon, N. Bambos, C. Kozyrakis, D. Economou

A Quick Overview of the. Framework for K 12 Science Education

Multiple Source Multiple. using Network Coding

Numerical Simulation of the Flow Field in a Friction-Type Turbine (Tesla Turbine)

LCA14-206: Scheduler tooling and benchmarking. Tue-4-Mar, 11:15am, Zoran Markovic, Vincent Guittot

Lab 1 The Scientific Method

A Scalable Recurrent Neural Network Framework for Model-free

Five Whys How To Do It Better

APEX DYNAMICS, INC. Stainless. No. 10, Keyuan 3rd Rd., Situn District, Taichung City 40763, Taiwan (R.O.C.) APEX AE/AER Series - 1.

Fabrication Thermal Test. Methodology for a Safe Cask Thermal Performance

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

AD / ADR / ADS Series

Land Information New Zealand Topographic Strategy DRAFT (for discussion)

APEX DYNAMICS, INC. Stainless

CESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory

Interdisciplinary Physics Example Cognate Plans

Drought damaged area

Do big losses in judgmental adjustments affect experts behaviour? Fotios Petropoulos, Robert Fildes and Paul Goodwin

Linearization of the Output of a Wheatstone Bridge for Single Active Sensor. Madhu Mohan N., Geetha T., Sankaran P. and Jagadeesh Kumar V.

The steps of the engineering design process are to:

Effect of Conductivity Between Fasteners and Aluminum Skin On Eddy Current Specimens. Abstract

Chapter 3 Digital Transmission Fundamentals

Aircraft Performance - Drag

Engineering Approach to Modelling Metal THz Structures

QTC Pisa up-date. STM & HPK Sensors received from : December 03-February. 04 Qualification Summary and preliminary acceptance Company data comparison

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Apply Discovery Teaching Model to Instruct Engineering Drawing Course: Sketch a Regular Pentagon

Getting Involved O. Responsibilities of a Member. People Are Depending On You. Participation Is Important. Think It Through

Resampling Methods. Chapter 5. Chapter 5 1 / 52

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Data Mining Techniques

Hypothesis Tests for One Population Mean

Early detection of mining truck failure by modelling its operation with neural networks classification algorithms

Comparison of hybrid ensemble-4dvar with EnKF and 4DVar for regional-scale data assimilation

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture

Japanese HPCI Open Call

Tree Structured Classifier

A Frequency-Based Find Algorithm in Mobile Wireless Computing Systems

Writing Guidelines. (Updated: November 25, 2009) Forwards

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

CPM plans: the short, the medium and the long

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

Bios 6648: Design & conduct of clinical research

Please Stop Laughing at Me and Pay it Forward Final Writing Assignment

City of Angels School Independent Study Los Angeles Unified School District

COMP 551 Applied Machine Learning Lecture 4: Linear classification

Professional Development. Implementing the NGSS: High School Physics

How T o Start A n Objective Evaluation O f Your Training Program

Lab #3: Pendulum Period and Proportionalities

Document for ENES5 meeting

Deep Captioning with Multimodal Recurrent Neural Networks (m-rnn)

CONSTRUCTING STATECHART DIAGRAMS

Concurrent Error Detection for Reliable SHA-3 Design

Group Analysis: Hands-On

Chapter 6 Fingerprints

Space Shuttle Ascent Mass vs. Time

First Survey. Carried out by IPR feedback

8.1. Review of experiments

Computational modeling techniques

Biocomputers. [edit]scientific Background

What is Statistical Learning?

Use a lens holder fabricated from SiC. SiC has a larger CTE than C-C, i.e. it is better matched to the SFL6.

Determining the Accuracy of Modal Parameter Estimation Methods

Appendix A: Mathematics Unit

IAML: Support Vector Machines

NWC SAF ENTERING A NEW PHASE

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

Name: Period: Date: BONDING NOTES ADVANCED CHEMISTRY

8 th Grade Math: Pre-Algebra

Tooth Surface Design for Variable Transmission Ratio Bevel Gearing

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network

Introduction to Regression

Math Foundations 20 Work Plan

3.6 Condition number and RGA

Collocation Map for Overcoming Data Sparseness

Green economic transformation in Europe: territorial performance, potentials and implications

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

CMU Noncooperative games 3: Price of anarchy. Teacher: Ariel Procaccia

Evaluating enterprise support: state of the art and future challenges. Dirk Czarnitzki KU Leuven, Belgium, and ZEW Mannheim, Germany

Transcription:

Bench 2018@Seattle Scalability Evaluatin f Big Data Prcessing Services in Cluds Wei Huang 1,2, Cngfeng Jiang 1,2, Zujie Ren 1,2, Huayu Si 1,2, Jian Wan 3 1 Key Labratry f Cmplex Systems Mdeling and Simulatin, Ministry f Educatin, Hangzhu 310018,China 2 Schl f Cmputer Science and Technlgy Hangzhu Dianzi University, Hangzhu 310018, China 3 Department f Sftware Engineering, Zhejiang University f Science and Technlgy, Hangzhu, China 2018/12/29 1

Outline Intrductin Related Wrk Experiment and Analysis Implicatins

Intrductin Typical examples f clud-based big data prcessing services include Amazn EMR, Micrsft Azure HDInsight, and AliClud E-MapReduce. Amng varius clud-based data prcessing services, hw t scale the system is still challenging. Hw t evaluate the scalability f a big data prcessing system? Given a grup f wrklad, shuld user scale-up r scale-ut their deplyed cluster? i.e., hw t select the cluster cnfiguratin r rent a pre-cnfigured big data prcessing platfrm fr better perfrmance?

Related Wrk Big data benchmark: CludSuite BigDataBench HiBench Sme research effrts have been dne fr evaluating big data system Cmparisn f scalability f different service prviders is still missing.

Our Wrk We prpsed evaluatin mdel fr the scalability f big data prcessing system in cluds We evaluated the perfrmance f Hadp and Spark n AliClud and BaiduClud s big data prcessing platfrm in tw dimensins f scaleut and scale-up cnfiguratins

Evaluatin mdel Speedup measurement: S " represents the speed-up rati: S $ = M ' /M " (i.e., 1 nde ver multiple ndes) Scalability can be divided int three categries: 1. Linear acceleratin 2. Sub-linear acceleratin 3. Super linear acceleratin

Evaluatin mdel Acceleratin classificatin

Evaluatin mdel Fit the speed-up rati curve: S = f(p) Measure the scalability f the system by: Q = f p dp

Experiment and Analysis Platfrms: AliClud E-MapReduce Baidu Clud MRS Wrklads: Terasrt, WrdCunt System cnfiguratin fr the hst

Experiment and Analysis Scale-ut n AliClud(terasrt) AliClud Terasrt executin time AliClud Terasrt speed-up rati

Experiment and Analysis Scale-ut n AliClud (wrdcunt) WrdCunt executin time WrdCunt speed-up rati

Experiment and Analysis Scale-ut n Baidu MRS Terasrt executin time Terasrt speed-up rati

Experiment and Analysis Summary f Scale-ut cmparisn: 1. In the cmparisn f the speed-up rati n AliClud, (less than 8 ndes), scalability f Spark is better than Hadp, then Spark s scalability is wrse than Hadp(larger than 8 ndes). 2. When Hadp and Spark scale ut t 16 ndes, the scale-ut perfrmance is gd, and Hadp verall perfrmance(executin time) is better than the Spark in AliClud.

Experiment and Analysis Scale-up experiment(nly n AliClud)

Experiment and Analysis Executin time fr scale-up cnfig

Experiment and Analysis Cmparisn between scale-ut and scale-up

Implicatin #1 The scalability f Hadp and Spark are gd enugh n AliClud and Baidu Clud Hadp s scalability is slightly better than Spark n AliClud. Spark s speed is faster than Hadp n AliClud under WrdCunt wrklad The scalability f Hadp n Baidu Clud, is better than that n AliClud.

Implicatin #2 Fr Hadp, scale-up is better than scale-ut under the metric f prcessing perfrmance(executin time).hwever, it s nt true fr Spark. This means that scale-up the Spark cluster may nt achieve expected perfrmance imprvement. Here a dirty little secret is that scale-ut is nt mre expensive than scale-up. The results presented here can be suggestins fr Clud services prvider t design mre scalable big data prcessing services avid lss f custmers.

Cnclusins Different big data prcessing systems have different scalability Users shuld chse scale-ut r scale-up wisely Clud services prvider can d mre t prvide mre scalable big data prcessing services

Thanks!