Ef#icient Processing of Large Graphs via Input Reduction

Size: px
Start display at page:

Download "Ef#icient Processing of Large Graphs via Input Reduction"

Transcription

1 Ef#icient Processing of Large Graphs via Input Reduction Amlan Kusum, Keval Vora, Rajiv Gupta, Iulian Neamtiu HPDC Kyoto, Japan 04 June, 0

2 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution 0 v v 0 v v 0 v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t 0 0 t 0 0 t 0 t t t 0 4 4

3 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution 0 v v 0 v v 0 v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t 0 0 t 0 0 t 0 t t t 0 4 4

4 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution 0 v v t v 0 v v v v 4 v v v 7 v 8 v 0 v t v v 4 v v 7 v 8 t 0 0 t 0 0 t 0 t t t 0 4 4

5 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution 0 v v 0 v 0 v v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t 0 0 t 0 0 t 0 t t t 0 4 4

6 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution Challenging due to ever- growing graph sizes 0 v v 0 v v 0 v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t 0 0 t 0 0 t 0 t t t 0 4 4

7 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution Challenging due to ever- growing graph sizes Convergence speed is dependent on initializations 0 v v 0 v v 0 v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t t t How to Pind better initializations?

8 Key Idea 0 Compute initial values using a smaller signature of the original graph Generate smaller graph using light- weight input reduction techniques

9 Key Idea 0 Compute initial values using a smaller signature of the original graph Generate smaller graph using light- weight input reduction techniques Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

10 Key Idea 0 time(input Reduction) + time(phase ) + time(phase ) < time(original) Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

11 Outline Input Reduction Vertex Transformations Correctness of Results Evaluation Conclusion

12 Input Reduction 0 Must be light- weight & general Multilevel graph partitioning [SC 9, SC 0] Matching based contraction [ICPP 9, JPDC 98] Pruning based on edge costs affecting paths [ICDM 0] Gate graph for shortest paths problem [ICDM ] Develop vertex level transformations Easily parallelizable using the vertex centric graph processing systems

13 Vertex Transformations 0 Maintain structural integrity of the graph Preserve the overall connectivity Light- weight Local Non- interfering

14 Vertex Transformations 07

15 Vertex Transformations 07 8 V if ( indegree(v) =0) then apply T : drop v! E 0 E 0 \ outedges(v) E E\ if ( outdegree(v) =0) then apply T : drop!v E 0 E 0 \ inedges(v)

16 Vertex Transformations 07 E E\ if ( indegree(v) = outdegree(v) = ) then apply T : bypass v E 0 (E 0 \{u! v, v! w}) [{u! w} where {u! v, v! w} E 0

17 Vertex Transformations 07 G 8 V if ( w outneighbors(v) s.t.w is unchanged and outneighbors(v) \ inneighbors(w) = ) then apply T : drop v! w E 0 E 0 \{(v! w)}

18 Vertex Transformations 07

19 Other Details 08 More vertex transformations Some relax structural integrity Order of transformations UniPied graph reduction algorithm

20 Processing work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph g s f c b a d 4 e

21 Processing work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

22 Input Reduction g s f c b a d 4 e

23 Input Reduction g s f c b a d 4 e g s f c b a d

24 Input Reduction g s f c b a d 4 e s f c b a d

25 Input Reduction g s f c b a d 4 e s c b a d

26 Input Reduction g s c b a d 4 e s c b a d

27 Input Reduction g s f c b a d 4 e s c b a d

28 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

29 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

30 Processing Reduced Graph Use the original iterative algorithm s c d a b 0 c d a b s 0 s c a b d 0 s c a b d 0 9

31 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

32 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

33 Mapping Results Use default values for missing vertices s c d a b 0 9 f s c d a b g e 4 0 9

34 Processing Original Graph f s c d a b g e f c d a b g e s 4 0 9

35 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

36 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

37 Correctness: SSSP Example 8

38 Correctness: SSSP Example 8

39 Correctness: SSSP Example 8

40 Correctness: SSSP Example 8

41 Correctness: SSSP Example 8

42 Correctness 9 Transformation properties Level of vertices, edges and components Allow developing & reasoning for new transformations Algorithm behavior can be reasoned Phase initializations Properties of aggregation function Correctness argued for algorithms used accurate and approximate

43 Evaluation 0 Techniques independent of frameworks & processing environment Incorporated in Galois [PLDI ] Single machine: 4- core, GB RAM benchmarks PR, SSSP, SSWP, CC, GC, CD 4 input graphs Friendster ( E =.B), Twitter ( E =.B), UKDomain ( E = 9M), RMAT- 4 ( E = 8M)

44 Reduction a Execution Time Reduction

45 Reduction a Phase (Original Graph) Execution Time Phase (Reduced Graph) Reduction

46 Reduction a Phase (Reduced Graph) Phase (Original Graph) Execution Time Phase (Original Graph) Phase (Reduced Graph) Reduction

47 Reduction a Phase (Reduced Graph) Phase (Original Graph) Execution Time Phase (Original Graph) Reduction Reduction Phase (Reduced Graph) Reduction

48 Execution Time Speedups over parallel versions Speedups increase as ERP decreases up to an extent.x -.7x for 7% - 0% Structural dissimilarity for very low ERP Normalized Execution Time Time ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 Phase Phase Reduction ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 Friendster ( E =.B) ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 SSSP SSWP PR GC CC CD SSSP SSWP PR GC CC CD

49 Input Reduction Transformations are local, i.e., parallelizable Higher reduction requires more work Speedup 0 0 Friendster ( E =.B) FT Number of Threads Normalized Reduction Time Friendster ( E =.B) 7% 70% 0% 0% 40% 0% ERP

50 Memory Overhead 4 Tracking dissimilar elements Newly added vertices & edges. Friendster ( E =.B) Memory Overhead % 70% 0% 0% 40% 0% ERP

51 Community Detection Reduced Graph Original Graph Accuracy 00% 80% 0% 40% 0% 0% Baseline ERP Execution time (sec) Friendster ( E =.B) Accuracy 00.0% 99.8% 0.% 99.% 99.4% 99.% 99.0% Execution time (sec)

52 More Results Contribution of individual transformations Some transformations more useful than others Different graphs benepit from different transformations Improvement in scalability Results for all inputs

53 Conclusion 7 Input reduction using transformations that are Light- weight Parallelizable General Correctness reasoned using Pine- grained transformation properties Achieve.-.4x speedups

54 Thanks GRASP

Using R for Iterative and Incremental Processing

Using R for Iterative and Incremental Processing Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant

More information

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer

More information

Multi-Approximate-Keyword Routing Query

Multi-Approximate-Keyword Routing Query Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction

More information

Analytical Modeling of Parallel Systems

Analytical Modeling of Parallel Systems Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David 1.2.05 1 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on

More information

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Special Nodes for Interface

Special Nodes for Interface fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

Parallel Performance Theory - 1

Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs

A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs Matteo Ceccarello Joint work with Andrea Pietracaprina, Geppino Pucci, and Eli Upfal Università di Padova Brown University

More information

Progressive & Algorithms & Systems

Progressive & Algorithms & Systems University of California Merced Lawrence Berkeley National Laboratory Progressive Computation for Data Exploration Progressive Computation Online Aggregation (OLA) in DB Query Result Estimate Result ε

More information

Fall 2008 CSE Qualifying Exam. September 13, 2008

Fall 2008 CSE Qualifying Exam. September 13, 2008 Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for

More information

Network Flow-based Simultaneous Retiming and Slack Budgeting for Low Power Design

Network Flow-based Simultaneous Retiming and Slack Budgeting for Low Power Design Outline Network Flow-based Simultaneous Retiming and Slack Budgeting for Low Power Design Bei Yu 1 Sheqin Dong 1 Yuchun Ma 1 Tao Lin 1 Yu Wang 1 Song Chen 2 Satoshi GOTO 2 1 Department of Computer Science

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

Lecture 17: Analytical Modeling of Parallel Programs: Scalability CSCE 569 Parallel Computing

Lecture 17: Analytical Modeling of Parallel Programs: Scalability CSCE 569 Parallel Computing Lecture 17: Analytical Modeling of Parallel Programs: Scalability CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan yanyh@cse.sc.edu http://cse.sc.edu/~yanyh 1 Topic

More information

Proportional Share Resource Allocation Outline. Proportional Share Resource Allocation Concept

Proportional Share Resource Allocation Outline. Proportional Share Resource Allocation Concept Proportional Share Resource Allocation Outline Fluid-flow resource allocation models» Packet scheduling in a network Proportional share resource allocation models» CPU scheduling in an operating system

More information

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

MICROPROCESSOR REPORT.   THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By

More information

On the Relative Trust between Inconsistent Data and Inaccurate Constraints

On the Relative Trust between Inconsistent Data and Inaccurate Constraints On the Relative Trust between Inconsistent Data and Inaccurate Constraints George Beskales 1 Ihab F. Ilyas 1 Lukasz Golab 2 Artur Galiullin 2 1 Qatar Computing Research Institute 2 University of Waterloo

More information

Recall the following facts for the Ferris wheel Carlos is riding:

Recall the following facts for the Ferris wheel Carlos is riding: SECONDARY MATH III // MODULE 6 In spite of his nervousness, Carlos enjoys his first ride on the amusement park Ferris wheel. He does, however, spend much of his time with his eyes fixed on the ground below

More information

Communication-avoiding LU and QR factorizations for multicore architectures

Communication-avoiding LU and QR factorizations for multicore architectures Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations

Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations September 18, 2012 Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations Yuxing Peng, Chris Knight, Philip Blood, Lonnie Crosby, and Gregory A. Voth Outline Proton Solvation

More information

Integer programming for the MAP problem in Markov random fields

Integer programming for the MAP problem in Markov random fields Integer programming for the MAP problem in Markov random fields James Cussens, University of York HIIT, 2015-04-17 James Cussens, University of York MIP for MRF MAP HIIT, 2015-04-17 1 / 21 Markov random

More information

Parallel Performance Theory

Parallel Performance Theory AMS 250: An Introduction to High Performance Computing Parallel Performance Theory Shawfeng Dong shaw@ucsc.edu (831) 502-7743 Applied Mathematics & Statistics University of California, Santa Cruz Outline

More information

Recap from the previous lecture on Analytical Modeling

Recap from the previous lecture on Analytical Modeling COSC 637 Parallel Computation Analytical Modeling of Parallel Programs (II) Edgar Gabriel Fall 20 Recap from the previous lecture on Analytical Modeling Speedup: S p = T s / T p (p) Efficiency E = S p

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Algorithms. Grad Refresher Course 2011 University of British Columbia. Ron Maharik

Algorithms. Grad Refresher Course 2011 University of British Columbia. Ron Maharik Algorithms Grad Refresher Course 2011 University of British Columbia Ron Maharik maharik@cs.ubc.ca About this talk For those incoming grad students who Do not have a CS background, or Have a CS background

More information

PetaBricks: Variable Accuracy and Online Learning

PetaBricks: Variable Accuracy and Online Learning PetaBricks: Variable Accuracy and Online Learning Jason Ansel MIT - CSAIL May 4, 2011 Jason Ansel (MIT) PetaBricks May 4, 2011 1 / 40 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable

More information

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.

More information

iretilp : An efficient incremental algorithm for min-period retiming under general delay model

iretilp : An efficient incremental algorithm for min-period retiming under general delay model iretilp : An efficient incremental algorithm for min-period retiming under general delay model Debasish Das, Jia Wang and Hai Zhou EECS, Northwestern University, Evanston, IL 60201 Place and Route Group,

More information

Using Sparsity to Design Primal Heuristics for MILPs: Two Stories

Using Sparsity to Design Primal Heuristics for MILPs: Two Stories for MILPs: Two Stories Santanu S. Dey Joint work with: Andres Iroume, Marco Molinaro, Domenico Salvagnin, Qianyi Wang MIP Workshop, 2017 Sparsity in real" Integer Programs (IPs) Real" IPs are sparse: The

More information

An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions

An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions Shinobu Nagayama Tsutomu Sasao Jon T. Butler Dept. of Computer and Network Eng., Hiroshima City University, Hiroshima,

More information

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms Professor Henry Carter Fall 2016 Recap Space-time tradeoffs allow for faster algorithms at the cost of space complexity overhead Dynamic

More information

Sparse solver 64 bit and out-of-core addition

Sparse solver 64 bit and out-of-core addition Sparse solver 64 bit and out-of-core addition Prepared By: Richard Link Brian Yuen Martec Limited 1888 Brunswick Street, Suite 400 Halifax, Nova Scotia B3J 3J8 PWGSC Contract Number: W7707-145679 Contract

More information

Computation of Large Sparse Aggregated Areas for Analytic Database Queries

Computation of Large Sparse Aggregated Areas for Analytic Database Queries Computation of Large Sparse Aggregated Areas for Analytic Database Queries Steffen Wittmer Tobias Lauer Jedox AG Collaborators: Zurab Khadikov Alexander Haberstroh Peter Strohm Business Intelligence and

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Multipole-Based Preconditioners for Sparse Linear Systems.

Multipole-Based Preconditioners for Sparse Linear Systems. Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal

More information

New Artificial Intelligence Technology Improving Fuel Efficiency and Reducing CO 2 Emissions of Ships through Use of Operational Big Data

New Artificial Intelligence Technology Improving Fuel Efficiency and Reducing CO 2 Emissions of Ships through Use of Operational Big Data New Artificial Intelligence Technology Improving Fuel Efficiency and Reducing CO 2 Emissions of Ships through Use of Operational Big Data Taizo Anan Hiroyuki Higuchi Naoki Hamada Fuel cost and CO 2 emissions

More information

Manifold Coarse Graining for Online Semi-supervised Learning

Manifold Coarse Graining for Online Semi-supervised Learning for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,

More information

CMOS Ising Computer to Help Optimize Social Infrastructure Systems

CMOS Ising Computer to Help Optimize Social Infrastructure Systems FEATURED ARTICLES Taking on Future Social Issues through Open Innovation Information Science for Greater Industrial Efficiency CMOS Ising Computer to Help Optimize Social Infrastructure Systems As the

More information

Using Kernel Couplings to Predict Parallel Application Performance

Using Kernel Couplings to Predict Parallel Application Performance Using Kernel Couplings to Predict Parallel Application Performance Valerie Taylor, Xingfu Wu, Jonathan Geisler Department of Electrical and Computer Engineering, Northwestern University, Evanston IL 60208

More information

Recall the following facts for the Ferris wheel Carlos is riding:

Recall the following facts for the Ferris wheel Carlos is riding: In spite of his nervousness, Carlos enjoys his first ride on the amusement park Ferris wheel. He does, however, spend much of his time with his eyes fixed on the ground below him. After a while, he becomes

More information

CSCI Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm

CSCI Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm CSCI 1760 - Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm Shay Mozes Brown University shay@cs.brown.edu Abstract. This report describes parallel Java implementations of

More information

On the Probabilistic Symbolic Analysis of Software. Corina Pasareanu CMU-SV NASA Ames

On the Probabilistic Symbolic Analysis of Software. Corina Pasareanu CMU-SV NASA Ames On the Probabilistic Symbolic Analysis of Software Corina Pasareanu CMU-SV NASA Ames Probabilistic Symbolic Execution Quantifies the likelihood of reaching a target event e.g., goal state or assert violation

More information

CS533 Fall 2017 HW5 Solutions. CS533 Information Retrieval Fall HW5 Solutions

CS533 Fall 2017 HW5 Solutions. CS533 Information Retrieval Fall HW5 Solutions CS533 Information Retrieval Fall 2017 HW5 Solutions Q1 a) For λ = 1, we select documents based on similarity Thus, d 1> d 2> d 4> d 3 Start with d 1, S = {d1} R\S = { d 2, d 4, d 3} MMR(d 2) = 0.7 Maximum.

More information

Context-Aware Statistical Debugging

Context-Aware Statistical Debugging Context-Aware Statistical Debugging From Bug Predictors to Faulty Control Flow Paths Lingxiao Jiang and Zhendong Su University of California at Davis Outline Introduction Context-Aware Statistical Debugging

More information

DMP. Deterministic Shared Memory Multiprocessing. Presenter: Wu, Weiyi Yale University

DMP. Deterministic Shared Memory Multiprocessing. Presenter: Wu, Weiyi Yale University DMP Deterministic Shared Memory Multiprocessing 1 Presenter: Wu, Weiyi Yale University Outline What is determinism? How to make execution deterministic? What s the overhead of determinism? 2 What Is Determinism?

More information

Mixed Integer Programming:

Mixed Integer Programming: Mixed Integer Programming: Analyzing 12 Years of Progress Roland Wunderling CPLEX Optimizer Architect Background 2001: Manfred Padberg s60 th birthday Bixby et al., Mixed-Integer Programming: A Progress

More information

Introduction to Algorithms

Introduction to Algorithms Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 18 Prof. Erik Demaine Negative-weight cycles Recall: If a graph G = (V, E) contains a negativeweight cycle, then some shortest paths may not exist.

More information

Performance Evaluation of Scientific Applications on POWER8

Performance Evaluation of Scientific Applications on POWER8 Performance Evaluation of Scientific Applications on POWER8 2014 Nov 16 Andrew V. Adinetz 1, Paul F. Baumeister 1, Hans Böttiger 3, Thorsten Hater 1, Thilo Maurer 3, Dirk Pleiter 1, Wolfram Schenck 4,

More information

Lecture 3: Reductions and Completeness

Lecture 3: Reductions and Completeness CS 710: Complexity Theory 9/13/2011 Lecture 3: Reductions and Completeness Instructor: Dieter van Melkebeek Scribe: Brian Nixon Last lecture we introduced the notion of a universal Turing machine for deterministic

More information

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B. Javadi, H. Honda, K. Inoue, K. Murakami Faculty

More information

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069 Optimized LU-decomposition with Full Pivot for Small Batched Matrices S369 Ian Wainwright High Performance Consulting Sweden ian.wainwright@hpcsweden.se Based on work for GTC 212: 1x speed-up vs multi-threaded

More information

n-level Graph Partitioning

n-level Graph Partitioning Vitaly Osipov, Peter Sanders - Algorithmik II 1 Vitaly Osipov: KIT Universität des Landes Baden-Württemberg und nationales Grossforschungszentrum in der Helmholtz-Gemeinschaft Institut für Theoretische

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

Stochastic Decision Diagrams

Stochastic Decision Diagrams Stochastic Decision Diagrams John Hooker CORS/INFORMS Montréal June 2015 Objective Relaxed decision diagrams provide an generalpurpose method for discrete optimization. When the problem has a dynamic programming

More information

The Challenge Roadef - Équipe S23

The Challenge Roadef - Équipe S23 The Challenge Roadef - Équipe S23 Antoine Rozenknop 1 Roberto Wolfler Calvo 1 Daniel Chemla 2 Laurent Alfandari 3 Lucas Létocart 1 Guillaume Turri 1 1 LIPN, University of Paris Nord, Villetaneuse, France

More information

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 2008-09 Salvatore Orlando 1 0-1 Knapsack problem N objects, j=1,..,n Each kind of item j has a value p j and a weight w j (single

More information

Highly-scalable branch and bound for maximum monomial agreement

Highly-scalable branch and bound for maximum monomial agreement Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program

More information

Formulations and Algorithms for Minimum Connected Dominating Set Problems

Formulations and Algorithms for Minimum Connected Dominating Set Problems Formulations and Algorithms for Minimum Connected Dominating Set Problems Abilio Lucena 1 Alexandre Salles da Cunha 2 Luidi G. Simonetti 3 1 Universidade Federal do Rio de Janeiro 2 Universidade Federal

More information

Leveraging sparsity and symmetry in parallel tensor contractions

Leveraging sparsity and symmetry in parallel tensor contractions Leveraging sparsity and symmetry in parallel tensor contractions Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CCQ Tensor Network Workshop Flatiron Institute,

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

CS224W: Methods of Parallelized Kronecker Graph Generation

CS224W: Methods of Parallelized Kronecker Graph Generation CS224W: Methods of Parallelized Kronecker Graph Generation Sean Choi, Group 35 December 10th, 2012 1 Introduction The question of generating realistic graphs has always been a topic of huge interests.

More information

Shortest Path Algorithms

Shortest Path Algorithms Shortest Path Algorithms Andreas Klappenecker [based on slides by Prof. Welch] 1 Single Source Shortest Path 2 Single Source Shortest Path Given: a directed or undirected graph G = (V,E) a source node

More information

On Approximating Minimum 3-connected m-dominating Set Problem in Unit Disk Graph

On Approximating Minimum 3-connected m-dominating Set Problem in Unit Disk Graph 1 On Approximating Minimum 3-connected m-dominating Set Problem in Unit Disk Graph Bei Liu, Wei Wang, Donghyun Kim, Senior Member, IEEE, Deying Li, Jingyi Wang, Alade O. Tokuta, Member, IEEE, Yaolin Jiang

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Design and Analysis of Algorithms CSE 5311 Lecture 22 All-Pairs Shortest Paths Junzhou Huang, Ph.D. Department of Computer Science and Engineering CSE5311 Design and Analysis of Algorithms 1 All Pairs

More information

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline

More information

Discrete Optimization 2010 Lecture 7 Introduction to Integer Programming

Discrete Optimization 2010 Lecture 7 Introduction to Integer Programming Discrete Optimization 2010 Lecture 7 Introduction to Integer Programming Marc Uetz University of Twente m.uetz@utwente.nl Lecture 8: sheet 1 / 32 Marc Uetz Discrete Optimization Outline 1 Intro: The Matching

More information

Drawing Large Graphs by Multilevel Maxent-Stress Optimization

Drawing Large Graphs by Multilevel Maxent-Stress Optimization Drawing Large Graphs by Multilevel Maxent-Stress Optimization Henning Meyerhenke, Martin Nöllenburg, Christian Schulz 23rd International Symposium on Graph Drawing & Network Visualization, Karlsruhe 1

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

Marwan Burelle. Parallel and Concurrent Programming. Introduction and Foundation

Marwan Burelle.  Parallel and Concurrent Programming. Introduction and Foundation and and marwan.burelle@lse.epita.fr http://wiki-prog.kh405.net Outline 1 2 and 3 and Evolutions and Next evolutions in processor tends more on more on growing of cores number GPU and similar extensions

More information

More on NP and Reductions

More on NP and Reductions Indian Institute of Information Technology Design and Manufacturing, Kancheepuram Chennai 600 127, India An Autonomous Institute under MHRD, Govt of India http://www.iiitdm.ac.in COM 501 Advanced Data

More information

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009 Branch-and-Bound Algorithm November 3, 2009 Outline Lecture 23 Modeling aspect: Either-Or requirement Special ILPs: Totally unimodular matrices Branch-and-Bound Algorithm Underlying idea Terminology Formal

More information

The two-machine flowshop total completion time problem: A branch-and-bound based on network-flow formulation

The two-machine flowshop total completion time problem: A branch-and-bound based on network-flow formulation The two-machine flowshop total completion time problem: A branch-and-bound based on network-flow formulation Boris Detienne 1, Ruslan Sadykov 1, Shunji Tanaka 2 1 : Team Inria RealOpt, University of Bordeaux,

More information

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement Wuxi Li, Meng Li, Jiajun Wang, and David Z. Pan University of Texas at Austin wuxili@utexas.edu November 14, 2017 UT DA Wuxi Li

More information

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons. Supplementary Figure 1 Fragment indexing allows efficient spectra similarity comparisons. The cost and efficiency of spectra similarity calculations can be approximated by the number of fragment comparisons

More information

Exact Algorithms for Dominating Induced Matching Based on Graph Partition

Exact Algorithms for Dominating Induced Matching Based on Graph Partition Exact Algorithms for Dominating Induced Matching Based on Graph Partition Mingyu Xiao School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu 611731,

More information

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne

More information

Concept of Statistical Model Checking. Presented By: Diana EL RABIH

Concept of Statistical Model Checking. Presented By: Diana EL RABIH Concept of Statistical Model Checking Presented By: Diana EL RABIH Introduction Model checking of stochastic systems Continuous-time Markov chains Continuous Stochastic Logic (CSL) Probabilistic time-bounded

More information

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations! Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:

More information

A Note on Parallel Algorithmic Speedup Bounds

A Note on Parallel Algorithmic Speedup Bounds arxiv:1104.4078v1 [cs.dc] 20 Apr 2011 A Note on Parallel Algorithmic Speedup Bounds Neil J. Gunther February 8, 2011 Abstract A parallel program can be represented as a directed acyclic graph. An important

More information

Analytical Modeling of Parallel Programs. S. Oliveira

Analytical Modeling of Parallel Programs. S. Oliveira Analytical Modeling of Parallel Programs S. Oliveira Fall 2005 1 Scalability of Parallel Systems Efficiency of a parallel program E = S/P = T s /PT p Using the parallel overhead expression E = 1/(1 + T

More information

Parallel Local Graph Clustering

Parallel Local Graph Clustering Parallel Local Graph Clustering Kimon Fountoulakis, joint work with J. Shun, X. Cheng, F. Roosta-Khorasani, M. Mahoney, D. Gleich University of California Berkeley and Purdue University Based on J. Shun,

More information

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Raj Parihar Advisor: Prof. Michael C. Huang March 22, 2013 Raj Parihar Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

More information

Einstein-Podolsky-Rosen-like correlation on a coherent-state basis and Continuous-Variable entanglement

Einstein-Podolsky-Rosen-like correlation on a coherent-state basis and Continuous-Variable entanglement 12/02/13 Einstein-Podolsky-Rosen-like correlation on a coherent-state basis and Continuous-Variable entanglement Ryo Namiki Quantum optics group, Kyoto University 京大理 並木亮 求職中 arxiv:1109.0349 Quantum Entanglement

More information

5.5 Special Rights. A Solidify Understanding Task

5.5 Special Rights. A Solidify Understanding Task SECONDARY MATH III // MODULE 5 MODELING WITH GEOMETRY 5.5 In previous courses you have studied the Pythagorean theorem and right triangle trigonometry. Both of these mathematical tools are useful when

More information

MATH 120 Elementary Functions Test #3

MATH 120 Elementary Functions Test #3 MATH 10 Elementary Functions Test #3 There are two forms of the test; both are included below. 1. [0 points] On the axes on the left, graph the function f(x) = x 3 3x. Label the coordinates of the points

More information

DE [39] PSO [35] ABC [7] AO k/maxiter e-20

DE [39] PSO [35] ABC [7] AO k/maxiter e-20 3. Experimental results A comprehensive set of benchmark functions [18, 33, 34, 35, 36, 37, 38] has been used to test the performance of the proposed algorithm. The Appendix A (Table A1) presents the functions

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

DESPITE considerable progress in verification of random

DESPITE considerable progress in verification of random IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 1 Formal Analysis of Galois Field Arithmetic Circuits - Parallel Verification and Reverse Engineering Cunxi Yu Student Member,

More information

In 1980, the yield = 48% and the Die Area = 0.16 from figure In 1992, the yield = 48% and the Die Area = 0.97 from figure 1.31.

In 1980, the yield = 48% and the Die Area = 0.16 from figure In 1992, the yield = 48% and the Die Area = 0.97 from figure 1.31. CS152 Homework 1 Solutions Spring 2004 1.51 Yield = 1 / ((1 + (Defects per area * Die Area / 2))^2) Thus, if die area increases, defects per area must decrease. 1.52 Solving the yield equation for Defects

More information

A pruning pattern list approach to the permutation flowshop scheduling problem

A pruning pattern list approach to the permutation flowshop scheduling problem A pruning pattern list approach to the permutation flowshop scheduling problem Takeshi Yamada NTT Communication Science Laboratories, 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, JAPAN E-mail :

More information

Axioms of Density: How to Define and Detect the Densest Subgraph

Axioms of Density: How to Define and Detect the Densest Subgraph RT097 Computer Science; Network pages Research Report March 31, 016 Axioms of Density: How to Define and Detect the Densest Subgraph Hiroki Yanagisawa and Satoshi Hara IBM Research - Tokyo IBM Japan, Ltd.

More information