Progressive & Algorithms & Systems

Similar documents
Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics

StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory

CS425: Algorithms for Web Scale Data

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

Large-Scale Behavioral Targeting

Knowledge Discovery and Data Mining 1 (VO) ( )

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics

6.830 Lecture 11. Recap 10/15/2018

Random Sampling on Big Data: Techniques and Applications Ke Yi

Behavioral Simulations in MapReduce

A Θ(n) Approximation Algorithm for 2-Dimensional Vector Packing

Environment (Parallelizing Query Optimization)

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis

Parallelization of the QC-lib Quantum Computer Simulator Library

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Quiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions

CS 347. Parallel and Distributed Data Processing. Spring Notes 11: MapReduce

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE

Using R for Iterative and Incremental Processing

Processing Aggregate Queries over Continuous Data Streams

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

Highly-scalable branch and bound for maximum monomial agreement

Computation of Large Sparse Aggregated Areas for Analytic Database Queries

Multi-Approximate-Keyword Routing Query

The treatment of uncertainty in uniform workload distribution problems

Classification of Hand-Written Digits Using Scattering Convolutional Network

A Parallel SGD method with Strong Convergence

Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Review: From problem to parallel algorithm

K-Nearest Neighbor Temporal Aggregate Queries

Binding Performance and Power of Dense Linear Algebra Operations

Single layer NN. Neuron Model

Nearest Neighbor Search with Keywords in Spatial Databases

Matrix Factorization and Factorization Machines for Recommender Systems

Ridge Regression: Regulating overfitting when using many features. Training, true, & test error vs. model complexity. CSE 446: Machine Learning

Large-scale Collaborative Ranking in Near-Linear Time

Parallel Cube Tester Analysis of the CubeHash One-Way Hash Function

Logistic Regression. Stochastic Gradient Descent

ArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan

Parallelization of the QC-lib Quantum Computer Simulator Library

CS6220: DATA MINING TECHNIQUES

Advanced Data Management Technologies

Rainfall data analysis and storm prediction system

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

CS6220: DATA MINING TECHNIQUES

Ad Placement Strategies

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Performance Analysis of Lattice QCD Application with APGAS Programming Model

SPLITTING AND MERGING OF PACKET TRAFFIC: MEASUREMENT AND MODELLING

ECS289: Scalable Machine Learning

Data Canopy. Accelerating Exploratory Statistical Analysis. Abdul Wasay Xinding Wei Niv Dayan Stratos Idreos

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Effective computation of biased quantiles over data streams

Parallel implementation of primal-dual interior-point methods for semidefinite programs

Faster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich)

Clustering algorithms distributed over a Cloud Computing Platform.

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

Lecture 4: Perceptrons and Multilayer Perceptrons

Large-scale Stochastic Optimization

Window-aware Load Shedding for Aggregation Queries over Data Streams

Open-source finite element solver for domain decomposition problems

CS145: INTRODUCTION TO DATA MINING

Schwarz-type methods and their application in geomechanics

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

Composite Quantization for Approximate Nearest Neighbor Search

In-Database Factorised Learning fdbresearch.github.io

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.

Modeling Randomized Data Streams in Caching, Data Processing, and Crawling Applications

Distributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem

THE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS. Benjamin Recht UC Berkeley

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation

ECS289: Scalable Machine Learning

Computing Reaction Rates Using Macro-states

Integrated Cartographic and Programmatic Access to Spatial Data using Object-Oriented Databases in an online Web GIS

Variance Reduction and Ensemble Methods

Q-Learning in Continuous State Action Spaces

WDCloud: An End to End System for Large- Scale Watershed Delineation on Cloud

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6

High Performance Computing

Parallelization of the Molecular Orbital Program MOS-F

ECEN 689 Special Topics in Data Science for Communications Networks

Minwise hashing for large-scale regression and classification with sparse data

Analytical Modeling of Parallel Systems

5 Spatial Access Methods

Efficient Data Reduction and Summarization

ECS289: Scalable Machine Learning

Transcription:

University of California Merced Lawrence Berkeley National Laboratory

Progressive Computation for Data Exploration

Progressive Computation Online Aggregation (OLA) in DB Query Result Estimate Result ε Confidence bounds Time How to derive confidence bounds that shrink progressively? How to generate a meaningful estimate and confidence bounds as early as possible? How to minimize the estimation overhead?

Outline 1 PF-OLA: Parallel Framework for Online Aggregation 2 OLA-GD: Online Aggregation for Gradient Descent Optimization 3 OLA-RAW: Online Aggregation over Raw Data

GLADE GLA 1 GLA 2 GLA k GLA Chunk 1 Begin Chunk Accumulate End Chunk GLA i Local Merge Local Term Chunk r Begin Chunk Accumulate End Chunk GLA j GLA 1 Node 1 Remote Merge GLA Term Result GLA 1 GLA 2 GLA l GLA GLA n Chunk 1 Begin Chunk Accumulate End Chunk GLA p Local Merge Local Term Chunk s Begin Chunk Accumulate End Chunk GLA q Node n

GLADE Architecture Coordinator Comm Manager Query Manager Code Generator GLA Manager Catalog Node 1 Node n Comm Manager Query Manager GLA Manager Storage Manager... Comm Manager Query Manager GLA Manager Storage Manager DataPath Exec. Engine Code Loader DataPath Exec. Engine Code Loader

PF-OLA API Method Init () Accumulate (Item d) Merge (UDA input 1,UDA input 2, UDA output) local and remote Terminate () local and final BeginChunk() EndChunk () Serialize () Deserialize () EstimatorTerminate () EstimatorMerge (UDA input 1,UDA input 2, UDA output) Estimate (estimator, lower, upper, confidence) Usage Basic interface Chunk processing Transfer UDA across processes Progressive computation OLA estimation

Partial Aggregation Execution Engine Processing WorkUnit 1 WorkUnit 2 Waypoint WorkUnit 1 Thread Pool Thread Thread Thread... Accumulate GLA GLA List WorkUnit 2 Merge GLA GLA WorkUnit 1 Done GLA... Have All Merged GLA WorkUnit 2 Done GLA } GLA Put one copy back to GLA list Partial Result chunks

Parallel Sampling Centralized random shuffling Permute data randomly at loading Scan produces larger samples Stratified sampling Permute data randomly in each partition Direct extension of random shuffling to partitioned data Global data randomization at loading Split data randomly at each node Permute all received data randomly Standard hash-based data partitioning

Sample Aggregation EST PartAgg EST PartAgg LocEst LocEst LocEst LocEst AGG AGG AGG AGG AGG AGG PartAgg AGG PartAgg AGG PartAgg AGG PartAgg AGG Chunk Chunk Chunk Chunk Chunk Chunk Chunk Chunk Centralized Distributed tree

Generic Sampling Estimator AGG =SELECT SUM(f (d)) FROM D WHERE P(d) S is simple random sample without replacement from D Estimator X = D S s S,P(s) f (s) Var [X ] = Est Var[X ] = D S D ( D 1) S D ( D S ) S 2 ( S 1) E [X ] = AGG d D,P(d) S s S,P(s) f 2 (d) f 2 (s) d D,P(d) s S,P(s) 2 f (d) 2 f (s)

Parallel Sampling Estimators Data are partitioned across N nodes: D = D 1 D 2 D N Take samples S i, 1 i N independently at each node Single Estimator Guarantee S = S 1 S 2 S N is a sample from D Synchronized estimator S i D i = k (const), 1 i N Asynchronous estimator Global data randomization Multiple Estimators Stratified sampling Build an estimator X i for each partition D i, 1 i N: X i = D i S i s S i,p(s) f (s) X = N i=1 X i is unbiased [ N ] Var i=1 X i = N i=1 Var [X i]

Time to Convergence SELECT n name, SUM(l extendprice*(1-l discount)*(1+l tax)) FROM lineitem, supplier, nation WHERE l shipdate = 1993-02-26 AND l quantity = 1 AND l discount between [0.02,0.03] AND l suppkey = s suppkey AND s nationkey = n nationkey GROUP BY n name Relative Error (%) 40 30 20 10 0 Single Estimator, High Selectivity 1 node 2 nodes 4 nodes 8 nodes 100 200 300 400 Time (seconds) TPC-H scale 8,000 (8TB) Single node: 16 cores @ 2GHz; 16GB RAM; 4 disks @ 110MB/s throughput/disk Cluster: 8 X worker + coordinator (9 nodes); Gigabit Ethernet; same rack

Estimation Overhead Query Execution Time (seconds) No estimation OLA Aggregate 222 222 Group small 344 345 Group large 404 407 Join 409 411

References Chengjie Qin and. Sampling Estimators for Parallel Online Aggregation. BNCOD 2013, pp. 204 217. Chengjie Qin and. Parallel Online Aggregation in Action. SSDBM 2013, pp. 383 386. [Demo] Chengjie Qin and. PF-OLA: A High-Performance Framework for Parallel Online Aggregation. Distributed and Parallel Databases (DAPD), August 2013.

Outline 1 PF-OLA: Parallel Framework for Online Aggregation 2 OLA-GD: Online Aggregation for Gradient Descent Optimization 3 OLA-RAW: Online Aggregation over Raw Data

Gradient Descent Optimization

Gradient Descent as GLADE GLA w 11 w 12 w 1i w 1j Chunk 1 Chunk r η 11 η 1r w 1i =w 1i - β k f η1 (w 1i ) w 1i =w 1j - β k f η1 (w 1j ) End Chunk End Chunk w 1i w 1j w 1i + w 1j Local Term w 1 Node 1 w n1 w n2 w ni w nj w 1 + w n w Term w w n Chunk 1 Chunk s η n1 η ns w ni =w ni - β k f ηn (w ni ) w nj =w nj - β k f ηn (w nj ) End Chunk End Chunk w ni w nj w ni + w nj Local Term Node n

Approximate Gradient Descent Main idea: apply online aggregation (OLA) sampling to speed-up the execution of a speculative iteration Speculative Gradient Descent Approximate Gradient Descent w Λ(w ) w Λ(w ) Select s steps α 1, α 2,..., α s Select s steps α 1, α 2,..., α s w 1 w s w 1 w s Compute gradient & loss Update model Compute gradient & loss Estimate gradient & loss no estimators convergence yes Update model Estimate gradient & loss no w Λ(w ) w Λ(w ) no convergence yes no model convergence yes

Approximate BGD

Approximate BGD

Approximate BGD

Train SVM Model Loss per example 1 0.8 0.6 0.4 0.2 BGD OLA # step sizes 2 2 + OLA 4 4 + OLA 8 8 + OLA 0 0 400 800 1200 1600 Time (seconds) 50M examples, 200 dimensions, 136GB

Sample Percentage 40 Sample percentage (%) 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 iteration # 50M examples, 200 dimensions, 136GB

References Chengjie Qin and. Speculative Approximations for Terascale Analytics. CoRR abs/1501.00255, December 2014. Chengjie Qin and. Speculative Approximations for Terascale Distributed Gradient Descent Optimization. DanaC@SIGMOD 2015., Chengjie Qin, and Martin Torres. Scalable Analytics Model Calibration with Online Aggregation. IEEE Data Engineering Bulletin, Vol. 38, No. 3, pp. 30 44, September 2015.

Outline 1 PF-OLA: Parallel Framework for Online Aggregation 2 OLA-GD: Online Aggregation for Gradient Descent Optimization 3 OLA-RAW: Online Aggregation over Raw Data

Motivation & Approach Time Q4 Q3 Q2 Q1 load Q4 Q3 Q2 Q1 shuffle Q4 Q3 Q2 Q1 Q4 Q3 Q2 Q1 DBMS OLA RAW OLA-RAW Error ratio 1 0.8 0.6 0.4 0.2 DBMS OLA RAW OLA-RAW 0 0 1000 2000 3000 Elapsed time [sec]

OLA-RAW Architecture Bounds Sample Estimate Shuffle Read Raw file Text chunks Code-Generated Shuffle Extract Code-Generated Shuffle Extract Code-Generated Shuffle Extract Code-Generated Shuffle Extract Binary chunk sample Sample Sample OLA Sample Estimator In-memory Sample Synopsis

Parallel Bi-Level Sampling A B 4 0.8 242 0.427 36 0.6 5724 1.236 4521 90.837 8425 0.526 3439 10.332 7294 9.236 2 1 4 2.5 6 2 5 1.2 14 3.75 562 21.23 3216 42.836 72 0.1 T 1 2 3 4 14 3.75 562 21.23 3216 42.836 72 0.1 4 14 3.75 562 21.23 3216 42.836 72 0.1 4 3 3216 42.836 4 72 0.1 2 562 21.23 1 14 3.75 4 Exact computation 2 1 4521 90.837 4 2.5 8425 0.526 6 2 3439 10.332 5 1.2 7294 9.236 3 2 Chunk-level sampling 4 0.8 242 0.427 36 0.6 5724 1.236 1 4 5724 1.236 2 242 0.427 3 36 0.6 1 4 0.8 1 Bi-level sampling 2 1 4 2.5 6 2 5 1.2 3 2 4 2.5 1 2 1 4 5 1.2 3 6 2 3 4 0.8 242 0.427 36 0.6 5724 1.236 1 4521 90.837 8425 0.526 3439 10.332 7294 9.236 2 4 7294 9.236 2 8425 0.526 1 4521 90.837 3 3439 10.332 2 Inspection paradox: result order random chunk order

Experimental Results Selectivity = 100% Error ratio Error ratio 1 0.8 0.6 0.4 0.2 C-1 C-4 C-16 BI-1 BI-4 BI-16 EXT-1 EXT-4 EXT-16 0.4 0.3 0.2 0.1 0 0 50 100 150 200 250 0 0 1000 2000 3000 Elapsed time [sec] 1 0.8 0.6 0.4 0.2 C-1 C-4 C-16 BI-1 BI-4 BI-16 EXT-1 EXT-4 EXT-16 1 0.8 0.6 0.4 0.2 0 200 220 240 260 0 0 1000 2000 3000 Elasped time [sec] Ratio of sampled chunks 1 0.8 0.6 0.4 0.2 Selectivity = 50% Ratio of sampled chunks C BI 0 1 4 16 # threads 1 0.8 0.6 0.4 0.2 C BI 0 1 4 16 # threads Ratio of sampled tuples Ratio of sampled tuples C BI 1 0.8 0.6 0.4 0.2 0 1 4 16 # threads C BI 1 0.8 0.6 0.4 0.2 0 1 4 16 # threads FlorinSelectivity Rusu = Progressive 10% & Algorithms & Systems

References Yu Cheng, Weijie Zhao, and. OLA-RAW: Scalable Exploration over Raw Data. CoRR abs/1702.00358, February 2017. Yu Cheng, Weijie Zhao, and. Bi-Level Online Aggregation on Raw Data. SSDBM 2017.

Thank you! Questions?