Progressive & Algorithms & Systems
|
|
- Kimberly Holt
- 5 years ago
- Views:
Transcription
1 University of California Merced Lawrence Berkeley National Laboratory
2 Progressive Computation for Data Exploration
3 Progressive Computation Online Aggregation (OLA) in DB Query Result Estimate Result ε Confidence bounds Time How to derive confidence bounds that shrink progressively? How to generate a meaningful estimate and confidence bounds as early as possible? How to minimize the estimation overhead?
4 Outline 1 PF-OLA: Parallel Framework for Online Aggregation 2 OLA-GD: Online Aggregation for Gradient Descent Optimization 3 OLA-RAW: Online Aggregation over Raw Data
5 GLADE GLA 1 GLA 2 GLA k GLA Chunk 1 Begin Chunk Accumulate End Chunk GLA i Local Merge Local Term Chunk r Begin Chunk Accumulate End Chunk GLA j GLA 1 Node 1 Remote Merge GLA Term Result GLA 1 GLA 2 GLA l GLA GLA n Chunk 1 Begin Chunk Accumulate End Chunk GLA p Local Merge Local Term Chunk s Begin Chunk Accumulate End Chunk GLA q Node n
6 GLADE Architecture Coordinator Comm Manager Query Manager Code Generator GLA Manager Catalog Node 1 Node n Comm Manager Query Manager GLA Manager Storage Manager... Comm Manager Query Manager GLA Manager Storage Manager DataPath Exec. Engine Code Loader DataPath Exec. Engine Code Loader
7 PF-OLA API Method Init () Accumulate (Item d) Merge (UDA input 1,UDA input 2, UDA output) local and remote Terminate () local and final BeginChunk() EndChunk () Serialize () Deserialize () EstimatorTerminate () EstimatorMerge (UDA input 1,UDA input 2, UDA output) Estimate (estimator, lower, upper, confidence) Usage Basic interface Chunk processing Transfer UDA across processes Progressive computation OLA estimation
8 Partial Aggregation Execution Engine Processing WorkUnit 1 WorkUnit 2 Waypoint WorkUnit 1 Thread Pool Thread Thread Thread... Accumulate GLA GLA List WorkUnit 2 Merge GLA GLA WorkUnit 1 Done GLA... Have All Merged GLA WorkUnit 2 Done GLA } GLA Put one copy back to GLA list Partial Result chunks
9 Parallel Sampling Centralized random shuffling Permute data randomly at loading Scan produces larger samples Stratified sampling Permute data randomly in each partition Direct extension of random shuffling to partitioned data Global data randomization at loading Split data randomly at each node Permute all received data randomly Standard hash-based data partitioning
10 Sample Aggregation EST PartAgg EST PartAgg LocEst LocEst LocEst LocEst AGG AGG AGG AGG AGG AGG PartAgg AGG PartAgg AGG PartAgg AGG PartAgg AGG Chunk Chunk Chunk Chunk Chunk Chunk Chunk Chunk Centralized Distributed tree
11 Generic Sampling Estimator AGG =SELECT SUM(f (d)) FROM D WHERE P(d) S is simple random sample without replacement from D Estimator X = D S s S,P(s) f (s) Var [X ] = Est Var[X ] = D S D ( D 1) S D ( D S ) S 2 ( S 1) E [X ] = AGG d D,P(d) S s S,P(s) f 2 (d) f 2 (s) d D,P(d) s S,P(s) 2 f (d) 2 f (s)
12 Parallel Sampling Estimators Data are partitioned across N nodes: D = D 1 D 2 D N Take samples S i, 1 i N independently at each node Single Estimator Guarantee S = S 1 S 2 S N is a sample from D Synchronized estimator S i D i = k (const), 1 i N Asynchronous estimator Global data randomization Multiple Estimators Stratified sampling Build an estimator X i for each partition D i, 1 i N: X i = D i S i s S i,p(s) f (s) X = N i=1 X i is unbiased [ N ] Var i=1 X i = N i=1 Var [X i]
13 Time to Convergence SELECT n name, SUM(l extendprice*(1-l discount)*(1+l tax)) FROM lineitem, supplier, nation WHERE l shipdate = AND l quantity = 1 AND l discount between [0.02,0.03] AND l suppkey = s suppkey AND s nationkey = n nationkey GROUP BY n name Relative Error (%) Single Estimator, High Selectivity 1 node 2 nodes 4 nodes 8 nodes Time (seconds) TPC-H scale 8,000 (8TB) Single node: 16 2GHz; 16GB RAM; 4 110MB/s throughput/disk Cluster: 8 X worker + coordinator (9 nodes); Gigabit Ethernet; same rack
14 Estimation Overhead Query Execution Time (seconds) No estimation OLA Aggregate Group small Group large Join
15 References Chengjie Qin and. Sampling Estimators for Parallel Online Aggregation. BNCOD 2013, pp Chengjie Qin and. Parallel Online Aggregation in Action. SSDBM 2013, pp [Demo] Chengjie Qin and. PF-OLA: A High-Performance Framework for Parallel Online Aggregation. Distributed and Parallel Databases (DAPD), August 2013.
16 Outline 1 PF-OLA: Parallel Framework for Online Aggregation 2 OLA-GD: Online Aggregation for Gradient Descent Optimization 3 OLA-RAW: Online Aggregation over Raw Data
17 Gradient Descent Optimization
18 Gradient Descent as GLADE GLA w 11 w 12 w 1i w 1j Chunk 1 Chunk r η 11 η 1r w 1i =w 1i - β k f η1 (w 1i ) w 1i =w 1j - β k f η1 (w 1j ) End Chunk End Chunk w 1i w 1j w 1i + w 1j Local Term w 1 Node 1 w n1 w n2 w ni w nj w 1 + w n w Term w w n Chunk 1 Chunk s η n1 η ns w ni =w ni - β k f ηn (w ni ) w nj =w nj - β k f ηn (w nj ) End Chunk End Chunk w ni w nj w ni + w nj Local Term Node n
19 Approximate Gradient Descent Main idea: apply online aggregation (OLA) sampling to speed-up the execution of a speculative iteration Speculative Gradient Descent Approximate Gradient Descent w Λ(w ) w Λ(w ) Select s steps α 1, α 2,..., α s Select s steps α 1, α 2,..., α s w 1 w s w 1 w s Compute gradient & loss Update model Compute gradient & loss Estimate gradient & loss no estimators convergence yes Update model Estimate gradient & loss no w Λ(w ) w Λ(w ) no convergence yes no model convergence yes
20 Approximate BGD
21 Approximate BGD
22 Approximate BGD
23 Train SVM Model Loss per example BGD OLA # step sizes OLA OLA OLA Time (seconds) 50M examples, 200 dimensions, 136GB
24 Sample Percentage 40 Sample percentage (%) iteration # 50M examples, 200 dimensions, 136GB
25 References Chengjie Qin and. Speculative Approximations for Terascale Analytics. CoRR abs/ , December Chengjie Qin and. Speculative Approximations for Terascale Distributed Gradient Descent Optimization , Chengjie Qin, and Martin Torres. Scalable Analytics Model Calibration with Online Aggregation. IEEE Data Engineering Bulletin, Vol. 38, No. 3, pp , September 2015.
26 Outline 1 PF-OLA: Parallel Framework for Online Aggregation 2 OLA-GD: Online Aggregation for Gradient Descent Optimization 3 OLA-RAW: Online Aggregation over Raw Data
27 Motivation & Approach Time Q4 Q3 Q2 Q1 load Q4 Q3 Q2 Q1 shuffle Q4 Q3 Q2 Q1 Q4 Q3 Q2 Q1 DBMS OLA RAW OLA-RAW Error ratio DBMS OLA RAW OLA-RAW Elapsed time [sec]
28 OLA-RAW Architecture Bounds Sample Estimate Shuffle Read Raw file Text chunks Code-Generated Shuffle Extract Code-Generated Shuffle Extract Code-Generated Shuffle Extract Code-Generated Shuffle Extract Binary chunk sample Sample Sample OLA Sample Estimator In-memory Sample Synopsis
29 Parallel Bi-Level Sampling A B T Exact computation Chunk-level sampling Bi-level sampling Inspection paradox: result order random chunk order
30 Experimental Results Selectivity = 100% Error ratio Error ratio C-1 C-4 C-16 BI-1 BI-4 BI-16 EXT-1 EXT-4 EXT Elapsed time [sec] C-1 C-4 C-16 BI-1 BI-4 BI-16 EXT-1 EXT-4 EXT Elasped time [sec] Ratio of sampled chunks Selectivity = 50% Ratio of sampled chunks C BI # threads C BI # threads Ratio of sampled tuples Ratio of sampled tuples C BI # threads C BI # threads FlorinSelectivity Rusu = Progressive 10% & Algorithms & Systems
31 References Yu Cheng, Weijie Zhao, and. OLA-RAW: Scalable Exploration over Raw Data. CoRR abs/ , February Yu Cheng, Weijie Zhao, and. Bi-Level Online Aggregation on Raw Data. SSDBM 2017.
32 Thank you! Questions?
Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models
Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine
More informationDot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics
Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is
More informationStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory
StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V. N. (vishy) Vishwanathan Purdue University and Microsoft vishy@purdue.edu October 9, 2012 S.V. N. Vishwanathan (Purdue,
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationLarge-Scale Behavioral Targeting
Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra
More informationImpression Store: Compressive Sensing-based Storage for. Big Data Analytics
Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in
More information6.830 Lecture 11. Recap 10/15/2018
6.830 Lecture 11 Recap 10/15/2018 Celebration of Knowledge 1.5h No phones, No laptops Bring your Student-ID The 5 things allowed on your desk Calculator allowed 4 pages (2 pages double sided) of your liking
More informationRandom Sampling on Big Data: Techniques and Applications Ke Yi
: Techniques and Applications Ke Yi Hong Kong University of Science and Technology yike@ust.hk Big Data in one slide The 3 V s: Volume Velocity Variety Integers, real numbers Points in a multi-dimensional
More informationBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?
More informationA Θ(n) Approximation Algorithm for 2-Dimensional Vector Packing
A Θ(n) Approximation Algorithm for 2-Dimensional Vector Packing Ekow Otoo a, Ali Pinar b,, Doron Rotem a a Lawrence Berkeley National Laboratory b Sandia National Laboratories Abstract We study the 2-dimensional
More informationEnvironment (Parallelizing Query Optimization)
Advanced d Query Optimization i i Techniques in a Parallel Computing Environment (Parallelizing Query Optimization) Wook-Shin Han*, Wooseong Kwak, Jinsoo Lee Guy M. Lohman, Volker Markl Kyungpook National
More informationRESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE
RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE Yuan-chun Zhao a, b, Cheng-ming Li b a. Shandong University of Science and Technology, Qingdao 266510 b. Chinese Academy of
More informationAstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis
AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Ian Foster: Univ. of
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/
More informationUsing a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics
Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de
More informationQuiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions
Name CWID Quiz 2 Due November 26th, 2015 CS525 - Advanced Database Organization s Please leave this empty! 1 2 3 4 5 6 7 Sum Instructions Multiple choice questions are graded in the following way: You
More informationCS 347. Parallel and Distributed Data Processing. Spring Notes 11: MapReduce
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 11: MapReduce Motivation Distribution makes simple computations complex Communication Load balancing Fault tolerance Not all applications
More informationMapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second semester
More informationThe conceptual view. by Gerrit Muller University of Southeast Norway-NISE
by Gerrit Muller University of Southeast Norway-NISE e-mail: gaudisite@gmail.com www.gaudisite.nl Abstract The purpose of the conceptual view is described. A number of methods or models is given to use
More informationUsing R for Iterative and Incremental Processing
Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant
More informationProcessing Aggregate Queries over Continuous Data Streams
Processing Aggregate Queries over Continuous Data Streams Alin Dobra Computer Science Department Cornell University April 15, 2003 Relational Database Systems did dname 15 Legal 17 Marketing 3 Development
More informationMapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationAstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis
AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Ian Foster: Univ. of
More informationParallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors
Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer
More informationSPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM
SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors
More informationHighly-scalable branch and bound for maximum monomial agreement
Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program
More informationComputation of Large Sparse Aggregated Areas for Analytic Database Queries
Computation of Large Sparse Aggregated Areas for Analytic Database Queries Steffen Wittmer Tobias Lauer Jedox AG Collaborators: Zurab Khadikov Alexander Haberstroh Peter Strohm Business Intelligence and
More informationMulti-Approximate-Keyword Routing Query
Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction
More informationThe treatment of uncertainty in uniform workload distribution problems
The treatment of uncertainty in uniform workload distribution problems tefan PE KO, Roman HAJTMANEK University of šilina, Slovakia 34 th International Conference Mathematical Methods in Economics Liberec,
More informationClassification of Hand-Written Digits Using Scattering Convolutional Network
Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview
More informationA Parallel SGD method with Strong Convergence
A Parallel SGD method with Strong Convergence Dhruv Mahajan Microsoft Research India dhrumaha@microsoft.com S. Sundararajan Microsoft Research India ssrajan@microsoft.com S. Sathiya Keerthi Microsoft Corporation,
More informationParallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability
Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability Ramesh Nallapati, William Cohen and John Lafferty Machine Learning Department Carnegie Mellon
More informationLarge-Scale Matrix Factorization with Distributed Stochastic Gradient Descent
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1
More informationReview: From problem to parallel algorithm
Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:
More informationK-Nearest Neighbor Temporal Aggregate Queries
Experiments and Conclusion K-Nearest Neighbor Temporal Aggregate Queries Yu Sun Jianzhong Qi Yu Zheng Rui Zhang Department of Computing and Information Systems University of Melbourne Microsoft Research,
More informationBinding Performance and Power of Dense Linear Algebra Operations
10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More informationNearest Neighbor Search with Keywords in Spatial Databases
776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,
More informationMatrix Factorization and Factorization Machines for Recommender Systems
Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 215 Chih-Jen Lin (National Taiwan Univ.) 1 / 54 Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen
More informationRidge Regression: Regulating overfitting when using many features. Training, true, & test error vs. model complexity. CSE 446: Machine Learning
Ridge Regression: Regulating overfitting when using many features Emily Fox University of Washington January 3, 207 Training, true, & test error vs. model complexity Overfitting if: Error y Model complexity
More informationLarge-scale Collaborative Ranking in Near-Linear Time
Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack
More informationParallel Cube Tester Analysis of the CubeHash One-Way Hash Function
Parallel Cube Tester Analysis of the CubeHash One-Way Hash Function Alan Kaminsky Department of Computer Science B. Thomas Golisano College of Computing and Information Sciences Rochester Institute of
More informationLogistic Regression. Stochastic Gradient Descent
Tutorial 8 CPSC 340 Logistic Regression Stochastic Gradient Descent Logistic Regression Model A discriminative probabilistic model for classification e.g. spam filtering Let x R d be input and y { 1, 1}
More informationArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan
ArcGIS GeoAnalytics Server: An Introduction Sarah Ambrose and Ravi Narayanan Overview Introduction Demos Analysis Concepts using GeoAnalytics Server GeoAnalytics Data Sources GeoAnalytics Server Administration
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationAdvanced Data Management Technologies
ADMT 2018/19 Unit 12 J. Gamper 1/43 Advanced Data Management Technologies Unit 12 Generalized Multidimensional Join J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:
More informationRainfall data analysis and storm prediction system
Rainfall data analysis and storm prediction system SHABARIRAM, M. E. Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15778/ This document is the author deposited
More informationFast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre
Fast speaker diarization based on binary keys Xavier Anguera and Jean François Bonastre Outline Introduction Speaker diarization Binary speaker modeling Binary speaker diarization system Experiments Conclusions
More informationParallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic
More informationParallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence
Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn Dec 10, 2016 Wu-Jun Li (http://cs.nju.edu.cn/lwj)
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationCOMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann
More informationPerformance Analysis of Lattice QCD Application with APGAS Programming Model
Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models
More informationSPLITTING AND MERGING OF PACKET TRAFFIC: MEASUREMENT AND MODELLING
SPLITTING AND MERGING OF PACKET TRAFFIC: MEASUREMENT AND MODELLING Nicolas Hohn 1 Darryl Veitch 1 Tao Ye 2 1 CUBIN, Department of Electrical & Electronic Engineering University of Melbourne, Vic 3010 Australia
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)
More informationData Canopy. Accelerating Exploratory Statistical Analysis. Abdul Wasay Xinding Wei Niv Dayan Stratos Idreos
Accelerating Exploratory Statistical Analysis Abdul Wasay inding Wei Niv Dayan Stratos Idreos Statistics are everywhere! Algorithms Systems Analytic Pipelines 80 Temperature 60 40 20 0 May 2017 80 Temperature
More informationArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde
ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise is the new name for ArcGIS for Server ArcGIS Enterprise Software Components ArcGIS Server Portal
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationEffective computation of biased quantiles over data streams
Effective computation of biased quantiles over data streams Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com
More informationParallel implementation of primal-dual interior-point methods for semidefinite programs
Parallel implementation of primal-dual interior-point methods for semidefinite programs Masakazu Kojima, Kazuhide Nakata Katsuki Fujisawa and Makoto Yamashita 3rd Annual McMaster Optimization Conference:
More informationFaster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich)
Faster Machine Learning via Low-Precision Communication & Computation Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) 2 How many bits do you need to represent a single number in machine
More informationClustering algorithms distributed over a Cloud Computing Platform.
Clustering algorithms distributed over a Cloud Computing Platform. SEPTEMBER 28 TH 2012 Ph. D. thesis supervised by Pr. Fabrice Rossi. Matthieu Durut (Telecom/Lokad) 1 / 55 Outline. 1 Introduction to Cloud
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 6
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationEnergy-efficient Mapping of Big Data Workflows under Deadline Constraints
Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationWindow-aware Load Shedding for Aggregation Queries over Data Streams
Window-aware Load Shedding for Aggregation Queries over Data Streams Nesime Tatbul Stan Zdonik Talk Outline Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental
More informationOpen-source finite element solver for domain decomposition problems
1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationSchwarz-type methods and their application in geomechanics
Schwarz-type methods and their application in geomechanics R. Blaheta, O. Jakl, K. Krečmer, J. Starý Institute of Geonics AS CR, Ostrava, Czech Republic E-mail: stary@ugn.cas.cz PDEMAMIP, September 7-11,
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationHIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU
April 4-7, 2016 Silicon Valley HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU Minmin Sun, NVIDIA minmins@nvidia.com April 5th Brief Introduction of CTC AGENDA Alpha/Beta Matrix
More information15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018
15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science
More informationComposite Quantization for Approximate Nearest Neighbor Search
Composite Quantization for Approximate Nearest Neighbor Search Jingdong Wang Lead Researcher Microsoft Research http://research.microsoft.com/~jingdw ICML 104, joint work with my interns Ting Zhang from
More informationIn-Database Factorised Learning fdbresearch.github.io
In-Database Factorised Learning fdbresearch.github.io Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich December 2017 Logic for Data Science Seminar Alan Turing Institute
More informationP Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.
Exam INFO-H-417 Database System Architecture 13 January 2014 Name: ULB Student ID: P Q1 Q2 Q3 Q4 Q5 Tot (60 (20 (20 (20 (60 (20 (200 Exam modalities You are allotted a maximum of 4 hours to complete this
More informationModeling Randomized Data Streams in Caching, Data Processing, and Crawling Applications
Modeling Randomized Data Streams in Caching, Data Processing, and Crawling Applications Sarker Tanzir Ahmed and Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas
More informationDistributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria
Distributed Machine Learning: A Brief Overview Dan Alistarh IST Austria Background The Machine Learning Cambrian Explosion Key Factors: 1. Large s: Millions of labelled images, thousands of hours of speech
More informationCarnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2
More informationTHE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS. Benjamin Recht UC Berkeley
THE STATE OF CONTEMPORARY COMPUTING SUBSTRATES FOR OPTIMIZATION METHODS Benjamin Recht UC Berkeley MY QUIXOTIC QUEST FOR SUPERLINEAR ALGORITHMS Benjamin Recht UC Berkeley Collaborators Slides extracted
More informationLRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation
LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang Zhu 1, Zhiliang Qian 2, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationComputing Reaction Rates Using Macro-states
Computing Reaction Rates Using Macro-states Badi Abdul-Wahid 1 Ronan Costaouec 2 Eric Darve 2 Michelle Feng 1 Jesús Izaguirre 1 1 Notre-Dame University 2 Stanford University Tuesday July 3rd 2012 14:30
More informationIntegrated Cartographic and Programmatic Access to Spatial Data using Object-Oriented Databases in an online Web GIS
Integrated Cartographic and Programmatic Access to Spatial Data using Object-Oriented Databases in an online Web GIS ICC Dresden 2013 I. Iosifescu & L. Hurni Institute of Cartography and Geoinformation,
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationWDCloud: An End to End System for Large- Scale Watershed Delineation on Cloud
WDCloud: An End to End System for Large- Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + Anthony Castronova, * Jonathan Goodall, and * Marty Humphrey * University of Virginia + Utah
More informationPerformance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6
Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun
More informationHigh Performance Computing
Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),
More informationParallelization of the Molecular Orbital Program MOS-F
Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of
More informationECEN 689 Special Topics in Data Science for Communications Networks
ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 5 Optimizing Fixed Size Samples Sampling as
More informationMinwise hashing for large-scale regression and classification with sparse data
Minwise hashing for large-scale regression and classification with sparse data Nicolai Meinshausen (Seminar für Statistik, ETH Zürich) joint work with Rajen Shah (Statslab, University of Cambridge) Simons
More informationAnalytical Modeling of Parallel Systems
Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing
More information5 Spatial Access Methods
5 Spatial Access Methods 5.1 Quadtree 5.2 R-tree 5.3 K-d tree 5.4 BSP tree 3 27 A 17 5 G E 4 7 13 28 B 26 11 J 29 18 K 5.5 Grid file 9 31 8 17 24 28 5.6 Summary D C 21 22 F 15 23 H Spatial Databases and
More informationEfficient Data Reduction and Summarization
Ping Li Efficient Data Reduction and Summarization December 8, 2011 FODAVA Review 1 Efficient Data Reduction and Summarization Ping Li Department of Statistical Science Faculty of omputing and Information
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Nov 2, 2016 Outline SGD-typed algorithms for Deep Learning Parallel SGD for deep learning Perceptron Prediction value for a training data: prediction
More information