Bench 2018@Seattle Scalability Evaluatin f Big Data Prcessing Services in Cluds Wei Huang 1,2, Cngfeng Jiang 1,2, Zujie Ren 1,2, Huayu Si 1,2, Jian Wan 3 1 Key Labratry f Cmplex Systems Mdeling and Simulatin, Ministry f Educatin, Hangzhu 310018,China 2 Schl f Cmputer Science and Technlgy Hangzhu Dianzi University, Hangzhu 310018, China 3 Department f Sftware Engineering, Zhejiang University f Science and Technlgy, Hangzhu, China 2018/12/29 1
Outline Intrductin Related Wrk Experiment and Analysis Implicatins
Intrductin Typical examples f clud-based big data prcessing services include Amazn EMR, Micrsft Azure HDInsight, and AliClud E-MapReduce. Amng varius clud-based data prcessing services, hw t scale the system is still challenging. Hw t evaluate the scalability f a big data prcessing system? Given a grup f wrklad, shuld user scale-up r scale-ut their deplyed cluster? i.e., hw t select the cluster cnfiguratin r rent a pre-cnfigured big data prcessing platfrm fr better perfrmance?
Related Wrk Big data benchmark: CludSuite BigDataBench HiBench Sme research effrts have been dne fr evaluating big data system Cmparisn f scalability f different service prviders is still missing.
Our Wrk We prpsed evaluatin mdel fr the scalability f big data prcessing system in cluds We evaluated the perfrmance f Hadp and Spark n AliClud and BaiduClud s big data prcessing platfrm in tw dimensins f scaleut and scale-up cnfiguratins
Evaluatin mdel Speedup measurement: S " represents the speed-up rati: S $ = M ' /M " (i.e., 1 nde ver multiple ndes) Scalability can be divided int three categries: 1. Linear acceleratin 2. Sub-linear acceleratin 3. Super linear acceleratin
Evaluatin mdel Acceleratin classificatin
Evaluatin mdel Fit the speed-up rati curve: S = f(p) Measure the scalability f the system by: Q = f p dp
Experiment and Analysis Platfrms: AliClud E-MapReduce Baidu Clud MRS Wrklads: Terasrt, WrdCunt System cnfiguratin fr the hst
Experiment and Analysis Scale-ut n AliClud(terasrt) AliClud Terasrt executin time AliClud Terasrt speed-up rati
Experiment and Analysis Scale-ut n AliClud (wrdcunt) WrdCunt executin time WrdCunt speed-up rati
Experiment and Analysis Scale-ut n Baidu MRS Terasrt executin time Terasrt speed-up rati
Experiment and Analysis Summary f Scale-ut cmparisn: 1. In the cmparisn f the speed-up rati n AliClud, (less than 8 ndes), scalability f Spark is better than Hadp, then Spark s scalability is wrse than Hadp(larger than 8 ndes). 2. When Hadp and Spark scale ut t 16 ndes, the scale-ut perfrmance is gd, and Hadp verall perfrmance(executin time) is better than the Spark in AliClud.
Experiment and Analysis Scale-up experiment(nly n AliClud)
Experiment and Analysis Executin time fr scale-up cnfig
Experiment and Analysis Cmparisn between scale-ut and scale-up
Implicatin #1 The scalability f Hadp and Spark are gd enugh n AliClud and Baidu Clud Hadp s scalability is slightly better than Spark n AliClud. Spark s speed is faster than Hadp n AliClud under WrdCunt wrklad The scalability f Hadp n Baidu Clud, is better than that n AliClud.
Implicatin #2 Fr Hadp, scale-up is better than scale-ut under the metric f prcessing perfrmance(executin time).hwever, it s nt true fr Spark. This means that scale-up the Spark cluster may nt achieve expected perfrmance imprvement. Here a dirty little secret is that scale-ut is nt mre expensive than scale-up. The results presented here can be suggestins fr Clud services prvider t design mre scalable big data prcessing services avid lss f custmers.
Cnclusins Different big data prcessing systems have different scalability Users shuld chse scale-ut r scale-up wisely Clud services prvider can d mre t prvide mre scalable big data prcessing services
Thanks!