Clustering algorithms distributed over a Cloud Computing Platform.
|
|
- Clement Ray
- 5 years ago
- Views:
Transcription
1 Clustering algorithms distributed over a Cloud Computing Platform. SEPTEMBER 28 TH 2012 Ph. D. thesis supervised by Pr. Fabrice Rossi. Matthieu Durut (Telecom/Lokad) 1 / 55
2 Outline. 1 Introduction to Cloud Computing 2 Context 3 Distributed Batch K-Means 4 Distributed Vector Quantization algorithms Matthieu Durut (Telecom/Lokad) 2 / 55
3 Outline Introduction to Cloud Computing 1 Introduction to Cloud Computing 2 Context 3 Distributed Batch K-Means 4 Distributed Vector Quantization algorithms Matthieu Durut (Telecom/Lokad) 3 / 55
4 Introduction to Cloud Computing What is Cloud Computing? Some Features 1 Abstraction of commodity hardware that can be rent on-demand on a hourly basis. 2 Quasi-infinite hardware scale-up. 3 Virtualization, that makes web-applications maintenance easier. Grid vs Cloud Ownership. Intensive use of Virtual Machines (VM). Elasticity. Hardware administration and maintenance. Matthieu Durut (Telecom/Lokad) 4 / 55
5 Introduction to Cloud Computing Everything a as Service 1 Software as a Service (SaaS) : Gmail, Salesforce, Lokad API, etc. 2 Platform as a Service (PaaS) : Azure, Amazon S3, etc, 3 Infrastructure as a Service (IaaS) : Amazon EC2, etc. Stack of Azure Storage Level : BlobStorage, TableStorage, QueueStorage, SQLAzure. Execution Level : Dryad. Domain Specific Language Level : DryadLinq, Scope. Matthieu Durut (Telecom/Lokad) 5 / 55
6 Introduction to Cloud Computing Figure : Illustration of the Google, Hadoop and Microsoft technology stacks for cloud applications building. Matthieu Durut (Telecom/Lokad) 6 / 55
7 MapReduce Introduction to Cloud Computing Matthieu Durut (Telecom/Lokad) 7 / 55
8 Introduction to Cloud Computing The Windows Azure Storage (WAS) BlobStorage Key-value pair (blobname/blob) storage. No more ACID. But atomicity, strong persistency and strong consistency per blob. Optimistic Read-Modify-Write primitive (RMW). QueueStorage Set of scalable queues. Asynchronous Message Delivery mechanism. Approximately FIFO. Messages returned at least once => Idempotency. Matthieu Durut (Telecom/Lokad) 8 / 55
9 Introduction to Cloud Computing Elements of Azure applications architecture No communication framework such as MPI. WAS used as a shared memory abstraction. No affinity between storage and processing units. Task agnosticity of workers (at least in the beginning). Idempotence. Matthieu Durut (Telecom/Lokad) 9 / 55
10 Outline Context 1 Introduction to Cloud Computing 2 Context 3 Distributed Batch K-Means 4 Distributed Vector Quantization algorithms Matthieu Durut (Telecom/Lokad) 10 / 55
11 Context Why clustering? One of the Lokad s abilities is to deal with large scale data. Need to group client data (clustering) to extract information from complex objects (e.g. time series seasonality). Problem Set-up Data set is composed of N points {z t } N t=1 in Rκ. Clustering POV: find a simplified representation with κ vectors of R d. These vectors will be called prototypes/centroids and gathered in a quantization scheme w = (w 1,..., w κ ) ( R d) κ. Matthieu Durut (Telecom/Lokad) 11 / 55
12 Context Objective Clustering challenge can be expressed as a minimization of the empirical distortion C N, where Initial challenge C N (w) = N min z t w l 2, w ( R d) κ. l=1,...,κ t=1 Exact minimization is computationnaly intractable. Some approximative algorithms Batch K-Means Vector Quantization (Online K-Means) Neural Gas Kohonen Maps Matthieu Durut (Telecom/Lokad) 12 / 55
13 Context Architecture Context Why distributed? A suitable way to allow more computing resources. Faster serial computers: increasingly expensive + physical limits. Cloud computing: adopted by Lokad (MS Azure). Early 2012, all apps on Cloud and scale-up 300VMs. Consequences: communication delays and the lack of efficient shared memory asynchronous schemes. Matthieu Durut (Telecom/Lokad) 13 / 55
14 Outline Distributed Batch K-Means 1 Introduction to Cloud Computing 2 Context 3 Distributed Batch K-Means 4 Distributed Vector Quantization algorithms Matthieu Durut (Telecom/Lokad) 14 / 55
15 Distributed Batch K-Means Sequencial Batch K-Means Algorithm 1 Sequential Batch K-Means Select κ initial prototypes (w k ) κ k=1 repeat for t = 1 to N do for k = 1 to κ do compute z t w k 2 2 end for find the closest centroid w k (t) from z t ; end for for k = 1 to κ do 1 w k = #{t,k (t)=k} z t {t,k (t)=k} end for until the stopping criterion is met Matthieu Durut (Telecom/Lokad) 15 / 55
16 Distributed Batch K-Means Characteristics Relatively fast : Batch Walltime seq = (3Nκd + Nκ + Nd + κd)it flop, where I refers to the number of iterations and T flop refers to the time for a floating point operation to be evaluated. Determinist. Easy to set-up. Results stationary from a certain iteration. Suited for parallelization? Obvious data-level Parallelism. Same result than sequential. Excellent speed-up efficiency already achieved. Matthieu Durut (Telecom/Lokad) 16 / 55
17 Distributed Batch K-Means Distribution Scheme Data-level parallelism suggests iterated Map-Reduce distribution. Data set {z t } N t=1 is homogeneously split into M chunks (one per processing unit): S i, i {1..M}. The processing unit i computes the distance z i t w k 2 2 for zi t Si and k {1..κ} (Map phase). Then the new prototypes version is recomputed by one or several machines (Reduce phase). Matthieu Durut (Telecom/Lokad) 17 / 55
18 Distributed Batch K-Means Batch K-Means distributed over a DMM architecture Matthieu Durut (Telecom/Lokad) 18 / 55
19 Distributed Batch K-Means Wall Time Batch WallTime DMM = T comp M + T comm M, where T comp M refers to the wall time of the assignment phase and TM comm refers to the wall time of the recalculation phase (mostly spent in communications). Assignment phase T comp M = 3INκdT flop. M Matthieu Durut (Telecom/Lokad) 19 / 55
20 Distributed Batch K-Means Recalculation phase - DMM architecture with MPI T comm M = log 2 (M) IκdS B, where S refers to the size of a double in memory (8 bytes in the following) and B refers to the communication bandwidth per machine. Wall time - DMM architecture with MPI Batch WallTime DMM 3INκdT flop = M + log 2 (M) IκdS B. Matthieu Durut (Telecom/Lokad) 20 / 55
21 Distributed Batch K-Means Speed-up - DMM architecture with MPI SpeedUp DMM (M, N) = 3NT flop 3NT flop M + S B log 2(M). Optimal number of processing units MDMM = 3NT flop B. S Matthieu Durut (Telecom/Lokad) 21 / 55
22 Distributed Batch K-Means Batch K-Means distributed over Azure Matthieu Durut (Telecom/Lokad) 22 / 55
23 Distributed Batch K-Means Mapper 1 worker push blob into pings the storage untill it finds the given blob, then downloads it blobstorage Map result (prototypes) Partial reduce result (prototypes) Final reduce result (prototypes) Mapper 2 Partial Reducer Mapper 3 Final Reducer Mapper 4 Mapper 5 Partial Reducer Mapper 6 Figure : Distribution scheme of our cloud Batch K-Means. Matthieu Durut (Telecom/Lokad) 23 / 55
24 Distributed Batch K-Means Communication Modeling T comm M = I MκdS(2T read Blob + T write Blob ), where TBlob read (resp. T write Blob ) refers to the time needed by a given processing unit to download (resp. upload) a blob from (resp. to) the storage per memory unit. Speed-up - Cloud architecture SpeedUp(M, N) = 3NT flop M 3NT flop + MS(2T read Blob + T write Blob ). Optimal number of workers M (N) = 2/3 6NT flop S(2T read Blob + T write Blob ). Matthieu Durut (Telecom/Lokad) 24 / 55
25 Time to execute the reduce phase per byte (in 10-7 sec/byte) Distributed Batch K-Means Reduce phase duration per byte in function of the number of communicating units Number of communicating units Figure : Time to execute the Reduce phase per unit of memory (2TBlob read + T Blob write) in 10 7 sec/byte in function of the number of communicating units. Matthieu Durut (Telecom/Lokad) 25 / 55
26 Distributed Batch K-Means 70 Speedup in func3on of the number of mappers Speedup Theore0cal speedup N = N = N = N = N = Number of mappers P Figure : Charts of speedup performance curves with different data set size. Matthieu Durut (Telecom/Lokad) 26 / 55
27 Distributed Batch K-Means N Meff M Wall Time Sequential Effective Theoretical theoretic time Speedup Speedup (= M 3 ) Exp Exp Exp Exp Table : Comparison between the effective optimal number of processing units M eff and the theoretical optimal number of processing units M for different data set size. Matthieu Durut (Telecom/Lokad) 27 / 55
28 Distributed Batch K-Means Speedup in func3on of the number of mappers Speedup Observed speedup Theore4cal speedup Number of mappers P Figure : Charts of speedup performance curves with different number of processing units. For each value of M, the value of N is set accordingly so that the processing units are heavy loaded with data and computations. Matthieu Durut (Telecom/Lokad) 28 / 55
29 Distributed Batch K-Means Figure : Distribution of the processing time (in second) for multiple runs of the same computation task for multiple VM. Matthieu Durut (Telecom/Lokad) 29 / 55
30 Outline Distributed Vector Quantization algorithms 1 Introduction to Cloud Computing 2 Context 3 Distributed Batch K-Means 4 Distributed Vector Quantization algorithms Matthieu Durut (Telecom/Lokad) 30 / 55
31 Distributed Vector Quantization algorithms Asynchronous clustering: motivation Joint work with Benoit Patra Every actions should be accounted once No calculation should be discarded. No calculation should be used more than once. All the writes should result into prototypes update everywhere. All the reads should be used locally. On War from Clausewitz Saturate bandwidth, memory, CPU, etc. = Asynchronism = Online or at least mini-batch (no more batch) Matthieu Durut (Telecom/Lokad) 31 / 55
32 Distributed Vector Quantization algorithms Sequential VQ algorithm Consists in incremental updates of the ( R d) κ -valued prototypes {w(t)} t=0. Initiated from a random initial w(0) ( R d) κ. Given a series of positive steps (ε t ) t>0, it produces a series of w(t) by updating w at each step with a descent term. H(z, w) = ( ) (w l z) 1 {l=argmini=1,...,κ z w i 2 }. 1 l κ w(t + 1) = w(t) ε t+1 H ( z {t+1 mod n}, w(t) ), t 0. Matthieu Durut (Telecom/Lokad) 32 / 55
33 Distributed Vector Quantization algorithms Algorithm 2 Sequential VQ algorithm Select κ initial prototypes (w k ) κ k=1 Set t=0 repeat for k = 1 to κ do compute z {t+1 mod n} w k 2 2 end for Deduce H(z {t+1 mod n}, w) Set w(t + 1) = w(t) ε t+1 H ( z {t+1 mod n}, w(t) ) increment t until the stopping criterion is met Matthieu Durut (Telecom/Lokad) 33 / 55
34 Distributed Vector Quantization algorithms Our context We assume that a satisfactory VQ implementation has been found but too slow. We will not be concerned with optimization of the several parameters (initialization, sequence of steps etc.) We have access to a finite dataset: { z i n t}, i {1,..., M} t=0 distributed over M processing units. When does a distributed VQ implementation perform better than the corresponding sequential one? Matthieu Durut (Telecom/Lokad) 34 / 55
35 Distributed Vector Quantization algorithms Definition of Speed-up for VQ algorithms A reference prototypes version is made available in the shared-memory (BlobStorage), referred to as the prototypes shared version: w srd. Performance is measured with the corresponding empirical distortion: for all w ( R d) κ, L N (w) = 1 nm M n min l=1,...,κ i=1 t=1 z i 2 t w l After any t wall time seconds, the empirical distortion of the prototypes shared version should be lower than for the prototypes version produced by sequential algorithm. Matthieu Durut (Telecom/Lokad) 35 / 55
36 Distributed Vector Quantization algorithms Previous work VQ as stochastic gradient descent method Shared-Memory : interlaying the prototypes version updates No Shared-Memory but Loss Convexity : averaging the prototypes versions In our case No efficient shared-memory No convexity of the loss function Organization of our work Simulated distributed architecture on a single machine. Then cloud implementation Matthieu Durut (Telecom/Lokad) 36 / 55
37 Distributed Vector Quantization algorithms First distributed scheme All the versions are set equal at time t = 0, w 1 (0) =... = w M (0). For all i {1,..., M} and all t 0, we have the following iterations: ( ) wtemp i = w i (t) ε t+1 H z i {t+1 mod n}, w i (t) w { i (t + 1) = wtemp i if t mod τ 0 or t = 0, w srd = 1 M M j=1 w j temp w i (t + 1) = w srd if t mod τ = 0 and t τ. Matthieu Durut (Telecom/Lokad) 37 / 55
38 Distributed Vector Quantization algorithms A first basic parallelization scheme Global time reference Averaging phase Figure : A simple (and synchronous) scheme: whenever τ points are processed an averaging phase occurs. Matthieu Durut (Telecom/Lokad) 38 / 55
39 Distributed Vector Quantization algorithms A first basic parallelization scheme 6.5E+06 Empirical distortion 6.0E E E E+06 M=1 M=2 M=10 4.0E E t (iterations) Figure : Charts of performance with different number of computing entities: M = 1, 2, 10 and τ = 10. Matthieu Durut (Telecom/Lokad) 39 / 55
40 Distributed Vector Quantization algorithms A comparison between the previous parallel scheme and the sequential VQ For t mod τ = 0 and t > 0. Then, for all i {1,..., M}, w i (t + 1) = w i (t τ + 1) t t =t τ+1 ε 1 ( )) M t +1( M j=1 H z j t +1, w j (t ) (parallel) w(t + 1) = w(t τ + 1) t t =t τ+1 ε t +1H ( z {t +1 mod n}, w(t ) ) (sequential) Terms in blue are estimators of the gradient. Matthieu Durut (Telecom/Lokad) 40 / 55
41 Distributed Vector Quantization algorithms Two SGD algorithms with the same sequence of steps then, they have similar convergence speed. Sequence of steps learning rate trade-off exploration/convergence. Introducing displacement/descent terms For all j {1,..., M} and t 2 t 1 0 set j t 1 t 2 = t 2 t =t 1 +1 ε t +1H ( ) z j {t +1 mod n}, w j (t ). corresponds to the displacement of the prototypes computed by j during (t 1, t 2 ), Matthieu Durut (Telecom/Lokad) 41 / 55
42 Distributed Vector Quantization algorithms Second distributed scheme ( ) wtemp i = w i (t) ε t+1 H z i {t+1 mod n}, w i (t) w { i (t + 1) = wtemp i if t mod τ 0 or t = 0, w srd = w srd M j=1 j t τ t w i (t + 1) = w srd if t mod τ = 0 and t τ. Matthieu Durut (Telecom/Lokad) 42 / 55
43 Distributed Vector Quantization algorithms Displacement terms Figure : Illustration of the parallelization scheme of VQ procedures described by equations (43). Matthieu Durut (Telecom/Lokad) 43 / 55
44 Distributed Vector Quantization algorithms Empirical distortion 6.0E E E E+06 M=1 M=2 M=10 4.0E E t (iterations) Figure : Charts of performance curves for a reviewed scheme M = 1, 2, 10 and τ = 10. Matthieu Durut (Telecom/Lokad) 44 / 55
45 Distributed Vector Quantization algorithms Delayed distributed scheme ( ) wtemp i = w i (t) ε t+1 H z i {t+1 mod n}, w i (t) { w i (t + 1) = wtemp i if t mod τ 0 or t = 0, w srd = w srd M j=1 j t 2τ t τ if t mod τ = 0 and t 2τ, w i (t + 1) = w srd i t τ t if t mod τ = 0 and t τ. Matthieu Durut (Telecom/Lokad) 45 / 55
46 Distributed Vector Quantization algorithms delay Figure : Illustration of the parallelization scheme described by equations (46). The reducing phase is only drawn for processor 1 where t = 2τ and processor 4 where t = 4τ. Matthieu Durut (Telecom/Lokad) 46 / 55
47 Distributed Vector Quantization algorithms Empirical distortion 6.0E E E E+06 M=1 M=2 M=10 4.0E E t (iterations) Figure : Charts of performance curves for iterations (46) with different numbers of computing entities, M = 1, 2, 10 and τ = 10. Matthieu Durut (Telecom/Lokad) 47 / 55
48 Distributed Vector Quantization algorithms Simulated parallelization schemes first conclusions Motto: summing displacement term rather than averaging versions. Experimental results Satisfactory speed-ups are recovered for the later simulated parallel schemes. Delays (determinist + random) are also studied: reasonable [random] delays do not have sever impact on the convergence. Good perspectives for a true implementation on a could computing platform. Matthieu Durut (Telecom/Lokad) 48 / 55
49 Distributed Vector Quantization algorithms The CloudDALVQ project Scientific project for testing new large scale clustering/quantization algorithms distributed on a Cloud Platform (MS Azure). Open source written in C#.NET released under new BSD Licence. Matthieu Durut (Telecom/Lokad) 49 / 55
50 Distributed Vector Quantization algorithms Mapper 1 worker push blob into pings the storage untill it finds the given blob, then downloads it blobstorage Map result (displacement term) Partial reduce result (displacement term) Final reduce result (prototypes) Mapper 2 Partial Reducer Mapper 3 Final Reducer Mapper 4 Mapper 5 Partial Reducer Mapper 6 Figure : Distribution scheme of our cloud VQ implementation. Matthieu Durut (Telecom/Lokad) 50 / 55
51 Distributed Vector Quantization algorithms BlobStorage shared version pull thread read buffer process thread process action 1 ProcessService process action 2 local version process action 3 displacement term data write buffer push thread displacement term BlobStorage Matthieu Durut (Telecom/Lokad) 51 / 55
52 Distributed Vector Quantization algorithms Empirical distortion 5.9E E E+04 M M M M M 4.4E seconds Figure : Normalized quantization curves with M = 1, 2, 4, 8, 16. Troubles appear with M = 16 because the ReduceService is overloaded. Matthieu Durut (Telecom/Lokad) 52 / 55
53 Distributed Vector Quantization algorithms 6.8E+04 Empirical distortion 6.3E E E E+04 M M M M 4.3E E seconds Figure : Normalized quantization curves with M = 8, 16, 32, 64 with an extra layer for the so called reducing task. Matthieu Durut (Telecom/Lokad) 53 / 55
54 Distributed Vector Quantization algorithms Empirical distortion 9.0E E E+04 CloudDALVQ CloudBatchKM 6.0E E seconds Figure : This chart reports on the competition between our cloud DAVQ algorithm and the cloud Batch K-Means. The graph shows the empirical distortion of the algorithms over the time. Matthieu Durut (Telecom/Lokad) 54 / 55
Convergence of a distributed asynchronous learning vector quantization algorithm.
Convergence of a distributed asynchronous learning vector quantization algorithm. ENS ULM, NOVEMBER 2010 Benoît Patra (UPMC-Paris VI/Lokad) 1 / 59 Outline. 1 Introduction. 2 Vector quantization, convergence
More informationParallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence
Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn Dec 10, 2016 Wu-Jun Li (http://cs.nju.edu.cn/lwj)
More informationScalable Asynchronous Gradient Descent Optimization for Out-of-Core Models
Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine
More informationStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory
StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V. N. (vishy) Vishwanathan Purdue University and Microsoft vishy@purdue.edu October 9, 2012 S.V. N. Vishwanathan (Purdue,
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationFaster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich)
Faster Machine Learning via Low-Precision Communication & Computation Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) 2 How many bits do you need to represent a single number in machine
More informationTutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba
Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationRainfall data analysis and storm prediction system
Rainfall data analysis and storm prediction system SHABARIRAM, M. E. Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15778/ This document is the author deposited
More informationDistributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria
Distributed Machine Learning: A Brief Overview Dan Alistarh IST Austria Background The Machine Learning Cambrian Explosion Key Factors: 1. Large s: Millions of labelled images, thousands of hours of speech
More informationIntroduction to Portal for ArcGIS. Hao LEE November 12, 2015
Introduction to Portal for ArcGIS Hao LEE November 12, 2015 Agenda Web GIS pattern Product overview Installation and deployment Security and groups Configuration options Portal for ArcGIS + ArcGIS for
More informationDistributed Architectures
Distributed Architectures Software Architecture VO/KU (707023/707024) Roman Kern KTI, TU Graz 2015-01-21 Roman Kern (KTI, TU Graz) Distributed Architectures 2015-01-21 1 / 64 Outline 1 Introduction 2 Independent
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationParallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco
Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and
More informationQuantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA
Quantum Artificial Intelligence and Machine : The Path to Enterprise Deployments Randall Correll randall.correll@qcware.com +1 (703) 867-2395 Palo Alto, CA 1 Bundled software and services Professional
More informationA Service Architecture for Processing Big Earth Data in the Cloud with Geospatial Analytics and Machine Learning
A Service Architecture for Processing Big Earth Data in the Cloud with Geospatial Analytics and Machine Learning WOLFGANG GLATZ & THOMAS BAHR 1 Abstract: The Geospatial Services Framework (GSF) brings
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges
More informationParallel Matrix Factorization for Recommender Systems
Under consideration for publication in Knowledge and Information Systems Parallel Matrix Factorization for Recommender Systems Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, and Inderjit S. Dhillon Department of
More informationOne Optimized I/O Configuration per HPC Application
One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University
More informationIntroduction to Portal for ArcGIS
Introduction to Portal for ArcGIS Derek Law Product Management March 10 th, 2015 Esri Developer Summit 2015 Agenda Web GIS pattern Product overview Installation and deployment Security and groups Configuration
More informationFast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and
More informationBuilding a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI
Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering
More informationPortal for ArcGIS: An Introduction
Portal for ArcGIS: An Introduction Derek Law Esri Product Management Esri UC 2014 Technical Workshop Agenda Web GIS pattern Product overview Installation and deployment Security and groups Configuration
More informationTHE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS. Benjamin Recht UC Berkeley
THE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS Benjamin Recht UC Berkeley MY QUIXOTIC QUEST FOR SUPERLINEAR ALGORITHMS Benjamin Recht UC Berkeley Collaborators Slides extracted
More informationMapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second semester
More informationStable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems
Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationAn Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal
An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve
More informationWDCloud: An End to End System for Large- Scale Watershed Delineation on Cloud
WDCloud: An End to End System for Large- Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + Anthony Castronova, * Jonathan Goodall, and * Marty Humphrey * University of Virginia + Utah
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationStochastic Gradient Descent. CS 584: Big Data Analytics
Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration
More informationParallel Performance Theory - 1
Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationELF products in the ArcGIS platform
ELF products in the ArcGIS platform Presentation to: Author: Date: NMO Summit 2016, Dublin, Ireland Clemens Portele 18 May 2016 The Building Blocks 18 May, 2016 More ELF users through affiliated platforms
More informationContinuous Machine Learning
Continuous Machine Learning Kostiantyn Bokhan, PhD Project Lead at Samsung R&D Ukraine Kharkiv, October 2016 Agenda ML dev. workflows ML dev. issues ML dev. solutions Continuous machine learning (CML)
More informationParallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic
More informationHarvard Center for Geographic Analysis Geospatial on the MOC
2017 Massachusetts Open Cloud Workshop Boston University Harvard Center for Geographic Analysis Geospatial on the MOC Ben Lewis Harvard Center for Geographic Analysis Aaron Williams MapD Small Team Supporting
More informationPerformance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization
Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization Tabitha Samuel, Master s Candidate Dr. Michael W. Berry, Major Professor Abstract: Increasingly
More informationChapter 7. Sequential Circuits Registers, Counters, RAM
Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationI N T R O D U C T I O N : G R O W I N G I T C O M P L E X I T Y
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R I n v a r i a n t A n a l y z e r : A n A u t o m a t e d A p p r o a c h t o
More informationStochastic Gradient Descent. Ryan Tibshirani Convex Optimization
Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so
More informationCPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017
CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code
More informationArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan
ArcGIS GeoAnalytics Server: An Introduction Sarah Ambrose and Ravi Narayanan Overview Introduction Demos Analysis Concepts using GeoAnalytics Server GeoAnalytics Data Sources GeoAnalytics Server Administration
More informationEP2200 Course Project 2017 Project II - Mobile Computation Offloading
EP2200 Course Project 2017 Project II - Mobile Computation Offloading 1 Introduction Queuing theory provides us a very useful mathematic tool that can be used to analytically evaluate the performance of
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationCommunication-efficient and Differentially-private Distributed SGD
1/36 Communication-efficient and Differentially-private Distributed SGD Ananda Theertha Suresh with Naman Agarwal, Felix X. Yu Sanjiv Kumar, H. Brendan McMahan Google Research November 16, 2018 2/36 Outline
More informationFast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee
Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde
ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise is the new name for ArcGIS for Server ArcGIS Enterprise Software Components ArcGIS Server Portal
More informationNonlinear Optimization Methods for Machine Learning
Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks
More informationIntroduction to Optimization
Introduction to Optimization Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Machine learning is important and interesting The general concept: Fitting models to data So far Machine
More informationProgressive & Algorithms & Systems
University of California Merced Lawrence Berkeley National Laboratory Progressive Computation for Data Exploration Progressive Computation Online Aggregation (OLA) in DB Query Result Estimate Result ε
More informationLeveraging Web GIS: An Introduction to the ArcGIS portal
Leveraging Web GIS: An Introduction to the ArcGIS portal Derek Law Product Management DLaw@esri.com Agenda Web GIS pattern Product overview Installation and deployment Configuration options Security options
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationA Reconfigurable Quantum Computer
A Reconfigurable Quantum Computer David Moehring CEO, IonQ, Inc. College Park, MD Quantum Computing for Business 4-6 December 2017, Mountain View, CA IonQ Highlights Full Stack Quantum Computing Company
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationEnergy-efficient Mapping of Big Data Workflows under Deadline Constraints
Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationPresentation in Convex Optimization
Dec 22, 2014 Introduction Sample size selection in optimization methods for machine learning Introduction Sample size selection in optimization methods for machine learning Main results: presents a methodology
More informationOptimization for neural networks
0 - : Optimization for neural networks Prof. J.C. Kao, UCLA Optimization for neural networks We previously introduced the principle of gradient descent. Now we will discuss specific modifications we make
More informationVisualizing Big Data on Maps: Emerging Tools and Techniques. Ilir Bejleri, Sanjay Ranka
Visualizing Big Data on Maps: Emerging Tools and Techniques Ilir Bejleri, Sanjay Ranka Topics Web GIS Visualization Big Data GIS Performance Maps in Data Visualization Platforms Next: Web GIS Visualization
More informationElectrical and Computer Engineering Department University of Waterloo Canada
Predicting a Biological Response of Molecules from Their Chemical Properties Using Diverse and Optimized Ensembles of Stochastic Gradient Boosting Machine By Tarek Abdunabi and Otman Basir Electrical and
More informationPI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1
PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1 AUGUST 7, 2007 APRIL 14, 2010 APRIL 24, 2012 Copyr i g h t 2012 O S Is o f t, L L C. 2 PI Data Archive Security PI Asset
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More informationParallel Performance Theory
AMS 250: An Introduction to High Performance Computing Parallel Performance Theory Shawfeng Dong shaw@ucsc.edu (831) 502-7743 Applied Mathematics & Statistics University of California, Santa Cruz Outline
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra
More informationTelecommunication Services Engineering (TSE) Lab. Chapter IX Presence Applications and Services.
Chapter IX Presence Applications and Services http://users.encs.concordia.ca/~glitho/ Outline 1. Basics 2. Interoperability 3. Presence service in clouds Basics 1 - IETF abstract model 2 - An example of
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationLogic Design II (17.342) Spring Lecture Outline
Logic Design II (17.342) Spring 2012 Lecture Outline Class # 10 April 12, 2012 Dohn Bowden 1 Today s Lecture First half of the class Circuits for Arithmetic Operations Chapter 18 Should finish at least
More informationImportance Sampling for Minibatches
Importance Sampling for Minibatches Dominik Csiba School of Mathematics University of Edinburgh 07.09.2016, Birmingham Dominik Csiba (University of Edinburgh) Importance Sampling for Minibatches 07.09.2016,
More informationNICTA Short Course. Network Analysis. Vijay Sivaraman. Day 1 Queueing Systems and Markov Chains. Network Analysis, 2008s2 1-1
NICTA Short Course Network Analysis Vijay Sivaraman Day 1 Queueing Systems and Markov Chains Network Analysis, 2008s2 1-1 Outline Why a short course on mathematical analysis? Limited current course offering
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson
ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Sam Williamson ArcGIS Enterprise is the new name for ArcGIS for Server What is ArcGIS Enterprise ArcGIS Enterprise is powerful
More informationArcGIS is Advancing. Both Contributing and Integrating many new Innovations. IoT. Smart Mapping. Smart Devices Advanced Analytics
ArcGIS is Advancing IoT Smart Devices Advanced Analytics Smart Mapping Real-Time Faster Computing Web Services Crowdsourcing Sensor Networks Both Contributing and Integrating many new Innovations ArcGIS
More informationScikit-learn. scikit. Machine learning for the small and the many Gaël Varoquaux. machine learning in Python
Scikit-learn Machine learning for the small and the many Gaël Varoquaux scikit machine learning in Python In this meeting, I represent low performance computing Scikit-learn Machine learning for the small
More informationSpatial Analytics Workshop
Spatial Analytics Workshop Pete Skomoroch, LinkedIn (@peteskomoroch) Kevin Weil, Twitter (@kevinweil) Sean Gorman, FortiusOne (@seangorman) #spatialanalytics Introduction The Rise of Spatial Analytics
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 12: Real-Time Data Analytics (2/2) March 31, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationQR Decomposition in a Multicore Environment
QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationCSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent
CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent April 27, 2018 1 / 32 Outline 1) Moment and Nesterov s accelerated gradient descent 2) AdaGrad and RMSProp 4) Adam 5) Stochastic
More informationOverview: Synchronous Computations
Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationLecture 1: Supervised Learning
Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)
More informationMaster thesis. Multi-class Fork-Join queues & The stochastic knapsack problem
Master thesis Multi-class Fork-Join queues & The stochastic knapsack problem Sihan Ding August 26th, 2011 Supervisor UL: Dr. Floske Spieksma Supervisors CWI: Drs. Chrétien Verhoef Prof.dr. Rob van der
More informationIntroduction to Machine Learning (67577)
Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks
More informationStochastic Analogues to Deterministic Optimizers
Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured
More informationMachine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Machine Learning CS 4900/5900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning is Optimization Parametric ML involves minimizing an objective function
More informationDeep Learning & Neural Networks Lecture 4
Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly
More informationOptimization for Machine Learning
Optimization for Machine Learning Elman Mansimov 1 September 24, 2015 1 Modified based on Shenlong Wang s and Jake Snell s tutorials, with additional contents borrowed from Kevin Swersky and Jasper Snoek
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationMotivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning
Convex Optimization Lecture 15 - Gradient Descent in Machine Learning Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 21 Today s Lecture 1 Motivation 2 Subgradient Method 3 Stochastic
More informationOur Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering
Our Problem Global Predicate Detection and Event Ordering To compute predicates over the state of a distributed application Model Clock Synchronization Message passing No failures Two possible timing assumptions:
More informationPortal for ArcGIS: An Introduction. Catherine Hynes and Derek Law
Portal for ArcGIS: An Introduction Catherine Hynes and Derek Law Agenda Web GIS pattern Product overview Installation and deployment Configuration options Security options and groups Portal for ArcGIS
More informationWeb GIS & ArcGIS Pro. Zena Pelletier Nick Popovich
Web GIS & ArcGIS Pro Zena Pelletier Nick Popovich Web GIS Transformation of the ArcGIS Platform Desktop Apps GIS Web Maps Web Scenes Layers Evolution of the modern GIS Desktop GIS (standalone GIS) GIS
More information