Data Canopy. Accelerating Exploratory Statistical Analysis. Abdul Wasay Xinding Wei Niv Dayan Stratos Idreos

Size: px
Start display at page:

Download "Data Canopy. Accelerating Exploratory Statistical Analysis. Abdul Wasay Xinding Wei Niv Dayan Stratos Idreos"

Transcription

1 Accelerating Exploratory Statistical Analysis Abdul Wasay inding Wei Niv Dayan Stratos Idreos

2 Statistics are everywhere! Algorithms Systems Analytic Pipelines

3 80 Temperature May 2017

4 80 Temperature May 2017 Mean

5 80 Temperature May 2017 Variance

6 + Lemonade Sale Correlation - Temperature Hot Choc. Sale Correlation

7 Repetitive Statistics

8 Repetition takes multiple forms Query Range Q1 Time Q2 Q3 Column Sub-range

9 Repetition takes multiple forms Query Range Q1 Time Q2 Q3 Column Overlap

10 Repetition takes multiple forms Query Range Q1 S1 Time Q2 S2 Q3 S3 Column Different Statistics

11 Repetition takes multiple forms Query Range Q1 S1 Time Q2 S2 Q3 S3 Column Mixed

12 Exploratory Workloads Exhibit Repetition Column set repeats* Template repeats* Exactly repeats* Queries (%) SQLShare 4.00 SDSS * at least once SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment Shrainik Jain, Dominik Moritz, Bill Howe, Ed Lazowska. SIGMOD 2016

13 Exploratory Workloads Exhibit Repetition Column set repeats* Template repeats* Exactly repeats* Queries (%) Repetition 54.65is everywhere - between 50% to 99% SQLShare 4.00 SDSS * at least once SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment Shrainik Jain, Dominik Moritz, Bill Howe, Ed Lazowska. SIGMOD 2016

14 How Do Existing Tools Perform? NumPy (Python) ModelTools (R) MonetDB

15 How Do Existing Tools Perform? NumPy (Python) ModelTools (R) MonetDB Sequence of Queries Q1- Mean Q2- Var. Q3- Cov. Q4- Mean Q5- Var. Q6- Cov.

16 How Do Existing Tools Perform? NumPy (Python) ModelTools (R) MonetDB Sequence of Queries Q1- Mean Q2- Var. Q3- Cov. Q4- Mean Q5- Var. Q6- Cov.

17 How Do Existing Tools Perform? NumPy Modeltools MonetDB Normalized execution time Q1- Mean Q2- Var. Q3- Cov. Q4- Mean Q5- Var. Q6- Cov.

18 How Do Existing Tools Perform? NumPy Modeltools MonetDB Normalized execution time Q1- Mean Q2- Var. Q3- Cov. Q4- Mean Q5- Var. Q6- Cov.

19 Data

20 Existing systems always compute statistics from scratch Data

21

22 Statistical queries Library of building blocks Data

23 Statistical queries Library of building blocks Data

24 Statistical queries Library of building blocks Data

25 Statistical queries Avoid redundant data access to accelerate statistical analysis Library of building blocks Data

26 Statistic Basic Aggregates { { { { a chunk Data

27 Q: Monthly Variance Variance = 1 n 2 1 n { { { { t Chunk size: 7 (a week)

28 Q: Monthly Variance Chunk size: 7 (a week) { { { { t

29 1 n Q: Monthly Variance n Chunk size: 7 (a week) { { { { t

30 Var (first week) i ( 1 7 i ) 2 Reuse between ranges Monthly mean 1 31 i Reuse between statistics Mean (first week) 1 7 i Mixed Chunk size: 7 (a week)

31 Var (first week) i ( 1 7 i ) 2 Reuse between ranges Monthly mean 1 31 i Reuse between statistics Mean (first week) 1 7 i Mixed Chunk size: 7 (a week)

32 Var (first week) i ( 1 7 i ) 2 Reuse between ranges Monthly mean 1 31 i Reuse between statistics Mean (first week) 1 7 i Mixed Chunk size: 7 (a week)

33 Var (first week) i ( 1 7 i ) 2 Reuse between ranges Monthly mean 1 31 i Reuse between statistics Mean (first week) 1 7 i Mixed Chunk size: 7 (a week)

34 ? Decompose Statistic Basic Aggregates? Synthesize? Store Chunk size? { { { { a chunk? Updates Data? Build? Memory Pressure

35 ? Decompose Statistic Basic Aggregates? Synthesize? Store Chunk size? { { { { a chunk? Updates Data? Build? Memory Pressure

36 ? Decompose Statistic Basic Aggregates? Synthesize? Store Chunk size? { { { { a chunk Data

37 Statistic Basic Aggregates

38 A basic aggregate Data column

39 A basic aggregate τ Data column

40 A basic aggregate f τ Data column

41 A basic aggregate f τ Data column τ(x) = x τ(x) = x 2

42 A basic aggregate f τ Data column f( ) f( ) f( ) f( )

43 A basic aggregate f τ Data column f( ) f( ) f( ) f( ) Example: max, min, sum, product

44 τ f( ) = ( ) = 2

45 τ f( ) = 1 1 ( ) = 2 max τ f( ) =max{ } ( ) =

46 Statistics are decomposed into building blocks f F τ τ f

47 τ n ( ) = Arithmetic Mean n τ ( ) =1

48 1 1 Y * 6 τ ( Y ) 1 n Geometric Mean ( ) = n τ ( ) =1

49 f τ Data Column

50 f Chunk f f τ Data Column

51 Basic Aggregates Chunks Aggregate function f F Statistic function f f τ τ Data column Chunk Transformation f Basic aggregates

52 Basic Aggregates Chunks Aggregate function f Statistic type F Statistic function f f τ τ Data column Chunk Transformation f Basic aggregates

53 Basic Aggregates Chunks Aggregate function f Statistic type Overlapping ranges F Statistic function f f τ τ Data column Chunk Sub-ranges Transformation f Basic aggregates

54 Basic Aggregates Chunks Aggregate function f Statistic type Overlapping ranges Mixed F Statistic function f f τ τ Data column Chunk Sub-ranges Transformation f Basic aggregates

55 Decompose Statistic Basic Aggregates? Synthesize? Store Chunk size? { { { { a chunk Data

56 Decompose Statistic Basic Aggregates? Synthesize? Store Chunk size? { { { { a chunk Data

57 f f τ τ Data column Chunk f Transformation f Basic aggregates

58 f Segment trees f τ τ Data column Chunk f Transformation f Basic aggregates

59 Data Logical chunk

60 Basic aggregates Leaves Data Logical chunk

61 Leaves Data Logical chunk

62 Leaves Data Logical chunk

63 Leaves Data

64 Leaves Data

65

66 One segment tree per basic aggregate per column Column a Column b

67 One segment tree per basic aggregate per column 2 2 Column a Column b

68 One segment tree per basic aggregate per column 2 2 Column a Column b

69 Decompose Statistic Basic Aggregates? Synthesize Store? Chunk size { { { { a chunk Data

70 Decompose Statistic Basic Aggregates? Synthesize Store? Chunk size { { { { a chunk Data

71 Calculate variance in temperature between May 15 and Oct. 15

72 Calculate variance in temperature between May 15 and Oct. 15 variance temperature May 15 and Oct. 15

73 Calculate variance in temperature between May 15 and Oct. 15 variance 2 N Recipe temperature May 15 and Oct. 15

74 Calculate variance in temperature between May 15 and Oct. 15 variance 2 Recipe N May 15 and Oct. 15 May a chunk June temperature July August Sept. Oct base data segment trees base data Chunk size = 1 month

75 Calculate variance in temperature between May 15 and Oct. 15 Plan 2 N Recipe temperature May June July August Sept. Oct base data segment trees base data fractal size = 1 month

76 Decompose Statistic Basic Aggregates Synthesize Store? Chunk size { { { { a chunk Data

77 Decompose Statistic Basic Aggregates Synthesize Store? Chunk size { { { { a chunk Data

78 base data Segment trees base data Chunk size = 1 month

79 base data Segment trees base data Chunk size = 1 month

80 Segment tree traversal O(log(n/c)) base data Segment trees base data n column size; c chunk size

81 Segment tree traversal Residual range scan O(log(n/c)) O(c) base data Segment trees base data n column size; c chunk size

82 Total cost = O(log(n/c) + c) Segment tree traversal Residual range scan O(log(n/c)) O(c) base data Segment trees base data n column size; c chunk size

83 Total cost = O(log(n/c) + c) Query cost Chunk size n column size; c chunk size

84 Total cost = O(log(n/c) + c) Query cost Chunk size n column size; c chunk size

85 Total cost = O(log(n/c) + c) Query cost Chunk size n column size; c chunk size

86 Total cost = O(log(n/c) + c) Query cost Chunk size n column size; c chunk size

87 Total cost = O(log(n/c) + c) Query cost Chunk size n column size; c chunk size

88 Total cost = O(log(n/c) + c) Query cost Cache size I/O Chunk size n column size; c chunk size

89 Experimental Analysis

90 Range Query range distribution Range-Uniform Range-Zoom-in 1 thread In-memory 40M rows, 100 columns, 5 statistic types

91 Online

92 Online Offline

93 Range Uniform Workload 1000 MonetDB Latency (ms) Modeltools (R) NumPy (Python) Online Query Sequence 1 thread In-memory 40M rows, 100 columns, 5 statistic types

94 Range Uniform Workload 1000 MonetDB Latency (ms) 100 Modeltools (R) NumPy (Python) DC is up to 2 orders of magnitude faster 10 Online Query Sequence 1 thread In-memory 40M rows, 100 columns, 5 statistic types

95 Total execution time (s) Range-Uniform* Range-Zoom-in* No DC Online Offline * 2000 Queries 1 thread In-memory 40M rows, 100 columns, 5 statistic types

96 Total execution time (s) Range-Uniform* Range-Zoom-in* DC benefits 10 regardless of the exploration scenario No DC Online Offline 1 thread In-memory 40M rows, 100 columns, 5 statistic types

97 12000 Total execution time (s) Collaborative Filtering Bayesian Classification Simple Linear Regression 0 Online No DC Individual Sequential 1 thread In-memory 40M rows, 100 columns, 5 statistic types

98 : Accelerating Exploratory Statistical Analysis Statistics are everywhere! Repetitive statistics and data access synthesizes statistics from basic ingredients

99 UERIOSITY Accelerate by synthesis Provide hints daslab.seas.harvard.edu/queriosity

100 daslab.seas.harvard.edu/ data-canopy queriosity Thank You!

101 Range-Zoom-in Response time (ms) MonetDB NumPy (Python) Modeltools(R) Online DC 0.01 Range-Zoom-in Query Sequence 1 thread In-memory 40M rows, 100 columns, 5 statistic types

102 Segment Tree in f({f( 1 ),f( 2 )}, {f( 3 ),f( 4 )}) f({f( 1 ),f( 2 )}) f({f( 3 ),f( 4 )}) i = { ( i )} f( 1 ) f( 2 ) f( 3 ) f( 4 ) Chunk i Leaves: basic aggregate on every chunk Internal node: Aggregate function applied to its two children *f() = f({ f(1), f(2), f(n) })

103 Operation Modes Offline Online Speculative Time Prep Now Future Prep Now Future Prep Now Future

104 sub-tree. needed by incoming queries. Column Existing rows New rows Query range Extend segment trees as needed Response time (s) Insert Reconstruct x2 rows x2 columns x2 both 0 1x10 3 2x10 3 3x10 3 4x10 3 5x10 3 6x10 3 7x10 3 8x10 3 Total execution time (s) Query Update Query sequence Point updates (%) Inserts Updates

105 Performance under Memory Pressure Average response time (ms) x x x x x10 5 Query sequence

106 Handling Memory Pressure Base data in memory (%) Total execution time (s) Phase DC StatSys Phase Number of rows (M)

107 Memory Feasibility 8GB 10 8 In-memory bivariate statistics M rows 10M rows 100M rows 1T rows Number of columns

108 Memory Footprint 100 s=s o Univariate DC Memory footprint (GB) Bivariate DC s=64kb Max U workload Max U workload

109 Scaling with Queries Average response time (ms) U Z 0 2x10 5 4x10 5 6x10 5 8x10 5 1x10 6 Query sequence

110 Selecting the Chunk Size 256 Total Execution time (s) M 10M 100M 1B Chunk size (bytes)

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0 NEC PerforCache Influence on M-Series Disk Array Behavior and Performance. Version 1.0 Preface This document describes L2 (Level 2) Cache Technology which is a feature of NEC M-Series Disk Array implemented

More information

Large-Scale Behavioral Targeting

Large-Scale Behavioral Targeting Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting

More information

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets IEEE Big Data 2015 Big Data in Geosciences Workshop An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets Fatih Akdag and Christoph F. Eick Department of Computer

More information

Wavelets for Efficient Querying of Large Multidimensional Datasets

Wavelets for Efficient Querying of Large Multidimensional Datasets Wavelets for Efficient Querying of Large Multidimensional Datasets Cyrus Shahabi University of Southern California Integrated Media Systems Center (IMSC) and Dept. of Computer Science Los Angeles, CA 90089-0781

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

In-Database Factorised Learning fdbresearch.github.io

In-Database Factorised Learning fdbresearch.github.io In-Database Factorised Learning fdbresearch.github.io Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich December 2017 Logic for Data Science Seminar Alan Turing Institute

More information

RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures

RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures Guangyan Zhang, Zican Huang, Xiaosong Ma SonglinYang, Zhufan Wang, Weimin Zheng Tsinghua University Qatar Computing Research

More information

Window-aware Load Shedding for Aggregation Queries over Data Streams

Window-aware Load Shedding for Aggregation Queries over Data Streams Window-aware Load Shedding for Aggregation Queries over Data Streams Nesime Tatbul Stan Zdonik Talk Outline Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

416 Distributed Systems

416 Distributed Systems 416 Distributed Systems RAID, Feb 26 2018 Thanks to Greg Ganger and Remzi Arapaci-Dusseau for slides Outline Using multiple disks Why have multiple disks? problem and approaches RAID levels and performance

More information

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine

More information

django in the real world

django in the real world django in the real world yes! it scales!... YAY! Israel Fermin Montilla Software Engineer @ dubizzle December 14, 2017 from iferminm import more data Software Engineer @ dubizzle Venezuelan living in Dubai,

More information

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise is the new name for ArcGIS for Server ArcGIS Enterprise Software Components ArcGIS Server Portal

More information

QR Decomposition in a Multicore Environment

QR Decomposition in a Multicore Environment QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance

More information

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE

More information

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam. Exam INFO-H-417 Database System Architecture 13 January 2014 Name: ULB Student ID: P Q1 Q2 Q3 Q4 Q5 Tot (60 (20 (20 (20 (60 (20 (200 Exam modalities You are allotted a maximum of 4 hours to complete this

More information

Progressive & Algorithms & Systems

Progressive & Algorithms & Systems University of California Merced Lawrence Berkeley National Laboratory Progressive Computation for Data Exploration Progressive Computation Online Aggregation (OLA) in DB Query Result Estimate Result ε

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 5 Optimizing Fixed Size Samples Sampling as

More information

Chart types and when to use them

Chart types and when to use them APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms Professor Henry Carter Fall 2016 Recap Space-time tradeoffs allow for faster algorithms at the cost of space complexity overhead Dynamic

More information

Weather Prediction Using Historical Data

Weather Prediction Using Historical Data Weather Prediction Using Historical Data COMP 381 Project Report Michael Smith 1. Problem Statement Weather prediction is a useful tool for informing populations of expected weather conditions. Weather

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

Using Oracle Rdb Partitioned Lock Trees. Norman Lastovica Oracle Rdb Engineering November 13, 06

Using Oracle Rdb Partitioned Lock Trees. Norman Lastovica Oracle Rdb Engineering November 13, 06 Using Oracle Rdb Partitioned Lock Trees Norman Lastovica Oracle Rdb Engineering November 13, 06 Agenda Locking Review Partitioned Lock Trees in OpenVMS Clusters Performance tests 2 Disclaimers Tests represented

More information

The Design Procedure. Output Equation Determination - Derive output equations from the state table

The Design Procedure. Output Equation Determination - Derive output equations from the state table The Design Procedure Specification Formulation - Obtain a state diagram or state table State Assignment - Assign binary codes to the states Flip-Flop Input Equation Determination - Select flipflop types

More information

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July 2008 1 / 23 Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R Mirko

More information

Data analysis of massive data sets a Planck example

Data analysis of massive data sets a Planck example Data analysis of massive data sets a Planck example Radek Stompor (APC) LOFAR workshop, Meudon, 29/03/06 Outline 1. Planck mission; 2. Planck data set; 3. Planck data analysis plan and challenges; 4. Planck

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2

More information

Where to Find My Next Passenger?

Where to Find My Next Passenger? Where to Find My Next Passenger? Jing Yuan 1 Yu Zheng 2 Liuhang Zhang 1 Guangzhong Sun 1 1 University of Science and Technology of China 2 Microsoft Research Asia September 19, 2011 Jing Yuan et al. (USTC,MSRA)

More information

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE Yuan-chun Zhao a, b, Cheng-ming Li b a. Shandong University of Science and Technology, Qingdao 266510 b. Chinese Academy of

More information

Multi-Approximate-Keyword Routing Query

Multi-Approximate-Keyword Routing Query Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Line Of Balance. Dr. Ahmed Elyamany

Line Of Balance. Dr. Ahmed Elyamany Line Of Balance Dr Ahmed Elyamany Intended Learning Outcomes Define the principles of Line of Balance (LOB) Demonstrate the application of LOB Understand the importance of LOB Understand the process of

More information

Summarizing Measured Data

Summarizing Measured Data Summarizing Measured Data 12-1 Overview Basic Probability and Statistics Concepts: CDF, PDF, PMF, Mean, Variance, CoV, Normal Distribution Summarizing Data by a Single Number: Mean, Median, and Mode, Arithmetic,

More information

Correlated subqueries. Query Optimization. Magic decorrelation. COUNT bug. Magic example (slide 2) Magic example (slide 1)

Correlated subqueries. Query Optimization. Magic decorrelation. COUNT bug. Magic example (slide 2) Magic example (slide 1) Correlated subqueries Query Optimization CPS Advanced Database Systems SELECT CID FROM Course Executing correlated subquery is expensive The subquery is evaluated once for every CPS course Decorrelate!

More information

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE by Gerrit Muller University of Southeast Norway-NISE e-mail: gaudisite@gmail.com www.gaudisite.nl Abstract The purpose of the conceptual view is described. A number of methods or models is given to use

More information

Applied Cartography and Introduction to GIS GEOG 2017 EL. Lecture-2 Chapters 3 and 4

Applied Cartography and Introduction to GIS GEOG 2017 EL. Lecture-2 Chapters 3 and 4 Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-2 Chapters 3 and 4 Vector Data Modeling To prepare spatial data for computer processing: Use x,y coordinates to represent spatial features

More information

Introduction to Column Stores with MonetDB and Benchmark

Introduction to Column Stores with MonetDB and Benchmark Introduction to Column Stores with MonetDB and Benchmark Seminar Database Systems Master of Science in Engineering Major Software and Systems HSR Hochschule für Technik Rapperswil www.hsr.ch/mse Supervisor:

More information

Scikit-learn. scikit. Machine learning for the small and the many Gaël Varoquaux. machine learning in Python

Scikit-learn. scikit. Machine learning for the small and the many Gaël Varoquaux. machine learning in Python Scikit-learn Machine learning for the small and the many Gaël Varoquaux scikit machine learning in Python In this meeting, I represent low performance computing Scikit-learn Machine learning for the small

More information

LECTURE 04: LINEAR REGRESSION PT. 2. September 20, 2017 SDS 293: Machine Learning

LECTURE 04: LINEAR REGRESSION PT. 2. September 20, 2017 SDS 293: Machine Learning LECTURE 04: LINEAR REGRESSION PT. 2 September 20, 2017 SDS 293: Machine Learning Announcements Stats TA hours start Monday (sorry for the confusion) Looking for some refreshers on mathematical concepts?

More information

Direct and Incomplete Cholesky Factorizations with Static Supernodes

Direct and Incomplete Cholesky Factorizations with Static Supernodes Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)

More information

Large-scale Linear RankSVM

Large-scale Linear RankSVM Large-scale Linear RankSVM Ching-Pei Lee Department of Computer Science National Taiwan University Joint work with Chih-Jen Lin Ching-Pei Lee (National Taiwan Univ.) 1 / 41 Outline 1 Introduction 2 Our

More information

Pysynphot: A Python Re Implementation of a Legacy App in Astronomy

Pysynphot: A Python Re Implementation of a Legacy App in Astronomy Pysynphot: A Python Re Implementation of a Legacy App in Astronomy Vicki Laidler 1, Perry Greenfield, Ivo Busko, Robert Jedrzejewski Science Software Branch Space Telescope Science Institute Baltimore,

More information

Gridless DSMC. Spencer Olson. University of Michigan Naval Research Laboratory Now at: Air Force Research Laboratory

Gridless DSMC. Spencer Olson. University of Michigan Naval Research Laboratory Now at: Air Force Research Laboratory Gridless DSMC Spencer Olson University of Michigan Naval Research Laboratory Now at: Air Force Research Laboratory Collaborator: Andrew Christlieb, Michigan State University 8 June 2007 30 June 2009 J

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

Non-Preemptive and Limited Preemptive Scheduling. LS 12, TU Dortmund

Non-Preemptive and Limited Preemptive Scheduling. LS 12, TU Dortmund Non-Preemptive and Limited Preemptive Scheduling LS 12, TU Dortmund 09 May 2017 (LS 12, TU Dortmund) 1 / 31 Outline Non-Preemptive Scheduling A General View Exact Schedulability Test Pessimistic Schedulability

More information

In-Database Learning with Sparse Tensors

In-Database Learning with Sparse Tensors In-Database Learning with Sparse Tensors Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich Toronto, October 2017 RelationalAI Talk Outline Current Landscape for DB+ML

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

A Blackbox Polynomial System Solver on Parallel Shared Memory Computers

A Blackbox Polynomial System Solver on Parallel Shared Memory Computers A Blackbox Polynomial System Solver on Parallel Shared Memory Computers Jan Verschelde University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science The 20th Workshop on

More information

SPATIAL INDEXING. Vaibhav Bajpai

SPATIAL INDEXING. Vaibhav Bajpai SPATIAL INDEXING Vaibhav Bajpai Contents Overview Problem with B+ Trees in Spatial Domain Requirements from a Spatial Indexing Structure Approaches SQL/MM Standard Current Issues Overview What is a Spatial

More information

ArcGIS Deployment Pattern. Azlina Mahad

ArcGIS Deployment Pattern. Azlina Mahad ArcGIS Deployment Pattern Azlina Mahad Agenda Deployment Options Cloud Portal ArcGIS Server Data Publication Mobile System Management Desktop Web Device ArcGIS An Integrated Web GIS Platform Portal Providing

More information

Lecture 11 Linear programming : The Revised Simplex Method

Lecture 11 Linear programming : The Revised Simplex Method Lecture 11 Linear programming : The Revised Simplex Method 11.1 The Revised Simplex Method While solving linear programming problem on a digital computer by regular simplex method, it requires storing

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

An Integrative Model for Parallelism

An Integrative Model for Parallelism An Integrative Model for Parallelism Victor Eijkhout ICERM workshop 2012/01/09 Introduction Formal part Examples Extension to other memory models Conclusion tw-12-exascale 2012/01/09 2 Introduction tw-12-exascale

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann

Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya anki, Bojana Dimcheva, Donald Kossmann Systems Group ETH Zurich Moving Objects Intersection Finding Position at a future

More information

INF2220: algorithms and data structures Series 1

INF2220: algorithms and data structures Series 1 Universitetet i Oslo Institutt for Informatikk I. Yu, D. Karabeg INF2220: algorithms and data structures Series 1 Topic Function growth & estimation of running time, trees (Exercises with hints for solution)

More information

Combinational Logic Design Combinational Functions and Circuits

Combinational Logic Design Combinational Functions and Circuits Combinational Logic Design Combinational Functions and Circuits Overview Combinational Circuits Design Procedure Generic Example Example with don t cares: BCD-to-SevenSegment converter Binary Decoders

More information

Statistics I Exercises Lesson 3 Academic year 2015/16

Statistics I Exercises Lesson 3 Academic year 2015/16 Statistics I Exercises Lesson 3 Academic year 2015/16 1. The following table represents the joint (relative) frequency distribution of two variables: semester grade in Estadística I course and # of hours

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

Quiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions

Quiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions Name CWID Quiz 2 Due November 26th, 2015 CS525 - Advanced Database Organization s Please leave this empty! 1 2 3 4 5 6 7 Sum Instructions Multiple choice questions are graded in the following way: You

More information

MANAGING STORAGE STRUCTURES FOR SPATIAL DATA IN DATABASES ABSTRACT

MANAGING STORAGE STRUCTURES FOR SPATIAL DATA IN DATABASES ABSTRACT Towards an Extended SQL for Treating Spatial Objects in: Y.C. Lee (ed.), Trends and Concerns of Spatial Sciences, Proceedings of the Second International Seminar, Fredericton, New Brunswick, Canada, June

More information

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise

More information

Hardware Design I Chap. 4 Representative combinational logic

Hardware Design I Chap. 4 Representative combinational logic Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload

More information

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION 4.1 Overview This chapter contains the description about the data that is used in this research. In this research time series data is used. A time

More information

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics David B. University of South Carolina Department of Statistics What are Time Series Data? Time series data are collected sequentially over time. Some common examples include: 1. Meteorological data (temperatures,

More information

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Ian Foster: Univ. of

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

ArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan

ArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan ArcGIS GeoAnalytics Server: An Introduction Sarah Ambrose and Ravi Narayanan Overview Introduction Demos Analysis Concepts using GeoAnalytics Server GeoAnalytics Data Sources GeoAnalytics Server Administration

More information

Reliability and Risk Analysis. Time Series, Types of Trend Functions and Estimates of Trends

Reliability and Risk Analysis. Time Series, Types of Trend Functions and Estimates of Trends Reliability and Risk Analysis Stochastic process The sequence of random variables {Y t, t = 0, ±1, ±2 } is called the stochastic process The mean function of a stochastic process {Y t} is the function

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

Stochastic Gradient Descent. CS 584: Big Data Analytics

Stochastic Gradient Descent. CS 584: Big Data Analytics Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration

More information

Global Optimization of Common Subexpressions for Multiplierless Synthesis of Multiple Constant Multiplications

Global Optimization of Common Subexpressions for Multiplierless Synthesis of Multiple Constant Multiplications Global Optimization of Common Subexpressions for Multiplierless Synthesis of Multiple Constant Multiplications Yuen-Hong Alvin Ho, Chi-Un Lei, Hing-Kit Kwan and Ngai Wong Department of Electrical and Electronic

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

Enhancing Reuse of Constraint Solutions to Improve Symbolic Execution

Enhancing Reuse of Constraint Solutions to Improve Symbolic Execution Enhancing Reuse of Constraint Solutions to Improve Symbolic Execution Xiangyang Jia (Wuhan University) Carlo Ghezzi (Politecnico di Milano) Shi Ying (Wuhan University) Outline Motivation Logical Basis

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Scalable 3D Spatial Queries for Analytical Pathology Imaging with MapReduce

Scalable 3D Spatial Queries for Analytical Pathology Imaging with MapReduce Scalable 3D Spatial Queries for Analytical Pathology Imaging with MapReduce Yanhui Liang, Stony Brook University Hoang Vo, Stony Brook University Ablimit Aji, Hewlett Packard Labs Jun Kong, Emory University

More information

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory Randomized Selection on the GPU Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory High Performance Graphics 2011 August 6, 2011 Top k Selection on GPU Output the top k keys

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.

More information

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3 Digital Logic: Boolean Algebra and Gates Textbook Chapter 3 Basic Logic Gates XOR CMPE12 Summer 2009 02-2 Truth Table The most basic representation of a logic function Lists the output for all possible

More information

Query Optimization: Exercise

Query Optimization: Exercise Query Optimization: Exercise Session 6 Bernhard Radke November 27, 2017 Maximum Value Precedence (MVP) [1] Weighted Directed Join Graph (WDJG) Weighted Directed Join Graph (WDJG) 1000 0.05 R 1 0.005 R

More information

META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion

META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion : An Efficient Matching-Based Method for Error-Tolerant Autocompletion Dong Deng Guoliang Li He Wen H. V. Jagadish Jianhua Feng Department of Computer Science, Tsinghua National Laboratory for Information

More information

Predicting New Search-Query Cluster Volume

Predicting New Search-Query Cluster Volume Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive

More information

Complex Dynamics of Microprocessor Performances During Program Execution

Complex Dynamics of Microprocessor Performances During Program Execution Complex Dynamics of Microprocessor Performances During Program Execution Regularity, Chaos, and Others Hugues BERRY, Daniel GRACIA PÉREZ, Olivier TEMAM Alchemy, INRIA, Orsay, France www-rocq.inria.fr/

More information

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Patent Searching using Bayesian Statistics

Patent Searching using Bayesian Statistics Patent Searching using Bayesian Statistics Willem van Hoorn, Exscientia Ltd Biovia European Forum, London, June 2017 Contents Who are we? Searching molecules in patents What can Pipeline Pilot do for you?

More information

SHIFT-SPLIT: I/O Efficient Maintenance of Wavelet-Transformed Multidimensional Data

SHIFT-SPLIT: I/O Efficient Maintenance of Wavelet-Transformed Multidimensional Data SHIFT-SPLIT: I/O Efficient aintenance of Wavelet-Transformed ultidimensional Data ehrdad Jahangiri University of Southern California Los Angeles, CA 90089-0781 jahangir@usc.edu Dimitris Sacharidis ational

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information

CSE 4201, Ch. 6. Storage Systems. Hennessy and Patterson

CSE 4201, Ch. 6. Storage Systems. Hennessy and Patterson CSE 4201, Ch. 6 Storage Systems Hennessy and Patterson Challenge to the Disk The graveyard is full of suitors Ever heard of Bubble Memory? There are some technologies that refuse to die (silicon, copper...).

More information

Concurrent Divide-and-Conquer Library

Concurrent Divide-and-Conquer Library with Petascale Electromagnetics Applications, Tech-X Corporation CScADS Workshop on Libraries and Algorithms for Petascale Applications, 07/30/2007, Snowbird, Utah Background Particle In Cell (PIC) in

More information

Database Design and Normalization

Database Design and Normalization Database Design and Normalization Chapter 11 (Week 12) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 1 1NF FIRST S# Status City P# Qty S1 20 London P1 300 S1 20 London

More information

4.8 Efficiency Experts A Solidify Understanding Task

4.8 Efficiency Experts A Solidify Understanding Task 4.8 Efficiency Experts A Solidify Understanding Task In our work so far, we have worked with linear and exponential equations in many forms. Some of the forms of equations and their names are: 2012 www.flickr.com/photos/cannongod

More information