Scalable Scheduling in Parallel Processors
|
|
- Penelope Summers
- 5 years ago
- Views:
Transcription
1 00 Conference on Information Sciences and Systems, Princeton University, March 0, 00 Scalable Scheduling in Parallel Processors Jui Tsun Hung Hyoung Joong Kim Thomas G Robertai 3 Abstract A study of scalable data intensive scheduling involving load distribution on multi-level fat tree netors is presented It is shon that the speedup of processing divisible loads concurrently on a homogeneous single level tree increases linearly as the number of processors is increased This behavior is maredly different from that of past research on sequential scheduling here speedup saturates beyond a certain number of processors It is demonstrated that the ultimate limiting factor for speedup is due to hardare, not to the scheduling strategy on a multi-level fat tree I Introduction The processing of massive amounts of data on distributed and parallel netors is becoming more and more common Applications include large scientific experiments, database applications, image processing, and sensor data processing Over the past doen years, a number of researchers have mathematically modeled such processing using a divisible load scheduling model [],that is useful for data parallelism applications Divisible loads are ones that consist of data that can be arbitrarily partitioned among a number of processors interconnected through some netor Divisible load modeling usually assumes no precedence relations amongst the data Due to the linearity of the divisible model, optimal scheduling strategies under a variety of environments have been devised The majority of the divisible load scheduling literature has appeared in computer engineering periodicals Hoever, divisible load modeling should be of interest to the netoring community as it models, both computation and netor communication in a completely seamless integrated manner, and it is tractable ith its linearity assumption It has been used to accurately and directly model such features as specific netor topologies [-0], computation versus communication load intensity [][], time varying inputs [0], multiple job submission [], and monetary cost optimiation [][] Hoever, researchers in this field have noted an important performance saturation limit If speedup (or solution time is considered as a function of the number of processors, an asymptotic constant is reached as the number of processors is increased Beyond a certain point, adding processors results in minimal performance improvement In other ords, the Jui Tsun Hung, Department of Electrical and Computer Engineering, University at Stony Broo, trent@ecesunysbedu Hyoung Joong Kim, Department of Control and Instrumentation Engineering, Kangon National University, hj@ccangonacr 3 Thomas G Robertai, a Senior Member, IEEE, Associate Professor of Department of Electrical and Computer Engineering, University at Stony Broo, Stony Broo, NY 794 Phone ( , Fax ( , tom@ecesunysbedu scheduling strategies considered to date have not been scalable For the first interconnection topology considered in the literature, the linear daisy chain [], the saturation limit is usually explained by noting that, if load originates at a processor at a boundary of the chain, data must be transmitted and retransmitted i times from processor to processor before it arrives at the ith processor (assuming a node ith store and forard transmission Hoever, for subsequent interconnection topologies considered (eg bus, single level tree, hypercube, the reason for this lac of scalability has been less obvious In this paper, it is demonstrated that the reason hy the saturation occurs is because of the assumption that a node distributes load sequentially to one of its children at a time This is true for both single and multi-installment scheduling strategies discussed to date [][5] It is shon here that in a single level tree (star topology, if a processor can distribute load to all of its children concurrently, the speedup is a linear function of the number of processors The only scalability limitations are a proportionality constant hich depends on system parameters and the ability of a processor to distribute loads concurrently to all of its outgoing lins Ho might one implement the strategy that a processor distributes load concurrently to all its neighbors? A direct method, similar to hat is done in pacet sitches, is to envision that a processor has a CPU and an output buffer for each output lin Scalability can be achieved as long as the CPU can effectively distribute load to all of its output buffers concurrently Naturally, as the number of neighbors of a processor node is increased, a point ill be reached hile the CPU can not load the buffers as fast as those buffers are being emptied Hoever, for a given CPU, this architecture allos a better use of the time shared CPU up to a large number of children nodes than the earlier assumption that a CPU distributes load sequentially and completely to one output buffer and lin at a time Certainly, one can also envision high performance architectures such as multiple CPUs ithin the same node Through a consideration of concurrent load distribution by a node to its neighbors, one can see that the ultimate limitation is on hardare, not scheduling, as it should be Note that previous related or on scalability issues for parallel processing includes an experimental study of several real time load balancing schemes [3] and an experimental study of scalable scheduling for function parallelism on distributed memory multiprocessors [4] It has been non on an intuitive basis that netor elements should be ept constantly busy for good performance [5] This research mathematically quantifies that This paper begins ith the development of some notation and analytic bacground in section II The parent nodes do both load distribution and data computation in section III The conclusion appears in section IV II Model, Notation and Load Distribution In this paper, a homogeneous multi-level fat tree netor here root processors are equipped ith a front-end proces-
2 sor for off-loading communications is considered Root nodes, called intelligent roots, process a fraction of the load as ell as distribute the remaining load to their children processors, (see Fig (Layer (Layer (Layer Level (Layer 0 α Level Level p p - α α α α α p 3 α α p - p the entire load p - p p - p 0 p 0 α α α Figure : Homogeneous multi-level fat tree ith intelligent root First of all, a heterogeneous single level fat tree, level i +, ith intelligent root is described as follos All the children processors are connected to the root (parent processor via communication lins Fig shos that a intelligent root processes a fraction of the load as ell as distributes the remaining load to its children processors α Root p i pi p i m α Figure : A heterogeneous single level fat tree, level i+, ith intelligent root Note that each child processor starts computing and transmitting immediately after receiving its assigned fraction of load and continues ithout any interruption until all its assigned load fraction have been processed This is a store and forard mode of operation for computation and communication The root can begin processing at time 0, the time hen all the load is assumed to be present at the root The notations for a single heterogeneous tree are 0 m : The load fraction assigned to the root processor α i : The load fraction assigned to the i th lin-processor pair i : The inverse of the computing speed of the i th processor i : The inverse of the lin speed of the i th lin : Computing intensity constant The entire load can be processed in i seconds on the ith processor T cm : intensity constant The entire load can be transmitted in i T cm seconds over the ith lin : The finish time Time at hich the last processor accomplishes computation Therefore, α i i is the time to process the fraction α i of the entire load on the ith processor Note that the units of α i i are [load] [sec/load] [dimensionless quantity] For a multi-level homogeneous fat tree, the notations are α j 0 : The load fraction assigned to the root processor of an equivalent j th level tree α j i : The load fraction assigned to the i th lin-processor pair on an equivalent j th level tree eqi : The inverse of the equivalent computing speed of the i th level tree( from level i descending to level p i : The multiplier of the inverse of expanded capacity of the lins of level i+ ith respect to the inverse of capacity of the lins on level The value of the multiplier, p i, is the inverse of the total number of children processors descended from this lin In other ords, p i ( i j0 mj, and 0 < p i The folloing assumptions are initially made: The interconnection netor used is a star netor (single level tree netor The computing and communication loads are divisible (ie perfectly partitioned ith no precedence constraints [] Transmission and computation time are proportional (linear to the sie of the problem Each node transmits load concurrently, (simultaneously, to its children Store and forard is the method of transmission from level to level III Intelligent Root Fat Tree Single Level Tree, Level i +, ith Intelligent Root A single level tree netor, level i+, ith intelligent root, hich has m + processors and m lins, (see Fig All children processors are connected to the root processor via direct communication lins The intelligent root processor, assumed to be the only processor at hich the divisible load arrives, partitions a total processing load into m + fractions, eeps its on fraction, and distributes the other fractions α, α,, to the children processors respectively and concurrently Each processor begins computing immediately after receiving its assigned fraction of load and continues ithout any interruption until all of its assigned load fraction has been processed In order to minimie the processing finish time, all of the utilied processors in the netor must finish computing at the same time [] The process of load distribution can be represented by Gantt chart-lie timing diagrams, as illustrated in Fig 3 Note that this is a completely deterministic model From the timing diagram, Fig 3, an equation for the root and st child s solution time is 0 α p i T cm + α (
3 Root 0 α p i T cm α α p i T cm α p i m T cm m m Let then f i p i i T cm + i p i i T cm + i (8 α i f i α i ( i j f j α (9 pitcm + Tcp ( α i, 3,, m (0 p i i T cm + i From equation (8, l f l can be simplified as l f l p i T cm + p i + T cm + +,,, m ( In order to solve the set of equations, is defined as: 0 p i T cm + α ( If one substitutes this equation into the normaliation equation, the normaliation equation becomes α + α + f α + + f f f m α (3 Utiliing equation ( and solving once again for α : Figure 3: Timing diagram of single level fat tree, level i +, ith intelligent root The fundamental recursive equations of the system can be formulated as follos: α p i T cm + α α p i T cm + α ( α i p i i T cm + α i i α ip i it cm + α i i (3 p i m T cm + m p i m T cm + m (4 The normaliation equation for the single level tree ith intelligent root is + α + α + + (5 This gives m + linear equations ith m + unnons For a multi-level fat tree ith intelligent root (see Fig, the normaliation equation for each level j (equivalent to a single level tree is α j 0 + αj + αj + + αj m j,, (6 Here α j i is the fraction of load hich one of layer j s processor (one root node in level j distributes to the i th child processor No, one can manipulate equation ( - (4 to yield the solution, α i ( p i i T cm + i p i i T cm + i α i i, 3,, m (7 α + + m ( f l l + + (p i T cm + m ( p i + T cm (p i T cm + m ( 0 p i + T cm+ + Accordingly, + (p i T cm + m ( 0 p i + T cm + + (4 α i More generally, if e define 0 j i j fj fj, then + (p i T cm + m ( 0 p i + T cm+ + (5 for i,,, m From Fig 4, the finish time at hich a solution is achieved as:,m α (p i T cm + p i T cm + + (p i T cm + m ( 0 p i + T cm + + (6 As a special case, consider the situation of a homogeneous netor here all children processors have the same inverse computing speed and all lins have the same inverse transmission speed (ie i and i for i,,, m Therefore, from (8, f i is equal to, (for i,,, m Note for the root 0 can be different from i
4 For a single level tree, let T h f,0 be the solution time for the entire divisible load solved on the root processor and let T h f,m be the solution time solved on the hole tree Tf,0 h 0 Here, Tf,m h ( + m (p it cm + (7 Root (Layer Consequently, Speedup,0 h 0 ( + m Tf,m h p it cm + ( + m + m (8 (Layer - (Node i i,,,m α i p - T cm (Store and forard - Here, speedup is the effective processing gain in using m+ processors Our finding is that the speedup of the single level homogeneous tree is equal to Θ(m, hich is proportional to the number of children, per node m Speedup is linear as long as the root CPU can concurrently (simultaneously transmit load to all of its children That is, the speedup of the single level tree does not saturate (in contrast to the sequential load distribution as in [] Multiple Level Fat Tree ith Intelligent Root Consider a homogeneous multi-level fat tree netor here all processors have the same inverse computing speed,, and lins of level i+ have the transmission speed, p i, (see Fig [( i ] p i m j (9 j0 The process of load distribution for the multi-level fat tree netor using store and forard strategy for computing and communicating can be represented by Gantt chart-lie timing diagrams, (see Fig 4 For the loest single level tree, level, (see Fig 5, the inverse computational speed of an equivalent processor is defined as eq This is a valid concept as the model is a linear one (as in a Norton s equivalent queue, for instance [6] Therefore, from equation ( and (7, the computation time of level is : eq T cm + for q 0 /( T cm + Let σ T cm /, then q 0 + m (0 q 0 + σ ( If eq0 is defined as, γ 0 can be defined as eq0 / Hence, equation ( can be transformed to q 0 + σ γ 0 + σ ( The goal here is to find an expression for an equivalent processor that has the same load processing characteristics as the entire homogeneous fat tree Our strategy is to replace each of the loest most single level tree netors (hich e call level ith an equivalent processor Proceeding recursively up the tree, e replace each of the current loest most single level subtrees ith an equivalent processor This continues until the entire homogeneous fat tree netor is replaced by (Layer - (Node j j,,,m (Layer 0 (Node l l,,,m α j - p - T cm (Store and forard - (Store and forard α l T cm α l Figure 4: Timing diagram of multi-level fat tree using store and forard strategy α p 0 p 0 α α,,,m eq0 eq0 eq0 Figure 5: Level of multi-level fat tree ith intellginent root
5 a single equivalent processor, ith inverse processing speed eq Here, is the th level Levels here are numbered from the bottom level upards In terms of notation, this is done from level (this is the to bottom most layers, level (currently next bottom most to layers, up to the top level (top to layers, (see Fig Note that for the entire initial (st level equivalent processor replacement, both parent and children processors have the same inverse speed, (see Fig 5 At the th level, (equivalent to a single level tree, the parent ill have inverse speed,, and its children ill have equivalent speed eq, (see Fig 6 Referring to equation (0 and (, the equivalent Here, from equation (, 0,and m eq, q Let γ eq /, then p T cm + eq (5 q eq + p σ γ + p σ (6 Referring to equation (4, the equivalent computation time of level is given as follos: the entire load eq ptcm + eq m + γ + p σ (7 Therefore, the equivalent equation of a th level subtree, (see Fig, the equivalent computation time p - p - p - eq p T cm + eq m + γ + p σ (8 α eq- α eq- eq- Figure 6: Level of multi-level fat tree ith intelligent root computation time for the st level can be defined as: eq T cm + m + γ 0 + σ (3 For level, (see Fig 7, the equivalent inverse computational speed is defined as eq Therefore, from equation (7, the 3 α,,,m Referring to equation (8, γ eq γ + p σ m + γ + p σ eq Tcp γ + ( j0 mj σ m + γ + ( j0 mj σ (9 (30 Consequently, γ is a recursive function The value, /γ, is the speedup of a multi-level fat tree netor ith concurrent load distribution on each level and ith store and forard computation and communication from level to level Let Tf,0 e be the equivalent solution time for the entire divisible load solved on only one processor and let T e, f,m be the equivalent solution time of a hole homogeneous -level fat tree netor, on hich each level has m children processors as ell as the root processor Then, Consequently, T e f,0 the entire load T e, f,m eq the entire load p p p Speedup,0 e Tcp T e, eq f,m eq α eq α eq eq Figure 7: Level of multi-level fat tree ith intelligent root computation time eq p T cm + eq + m q (4 γ m + γ + ( j0 mj σ γ + ( j0 mj σ m + γ + ( j0 mj σ (3 (3 if m and p i, this model is the same as an linear netor ith store and forard strategy if m, this model is a binary fat tree If m 3, this model is a ternary fat tree if p i, this model is not a fat tree Each lin in this model has the same transmission speed
6 if ( i j0 mj σ approaches to ero, the model approaches an ideal case Each node can receive the load instantly and compute the data immediately In such assumption, the recursive function (30 can be simplified as A closed form solution γ γ γ m + γ (33 m 0 + m + m + + m (34 Speedup m i (35 j0 Speedup is proportional to the total number of nodes, hich is m 0 + m + m + + m Note, from (33, e can derive Speedup + m( (36 γ γ This equation expresses that the speedup of -level fat tree is the sum of the speedup of root and all the speedup from m children The speedup of -level equivalent tree is Θ(m, hich is proportional to the number of children, per node m The number of levels of a tree increases, the speedup ill approach a linear function Therefore, the multi-level fat tree ill not saturate IV Kim Type Scheduling We note that the use of Kim type scheduling [7], here processing at a child node commences as soon as load begins to be received, can be analyed in a similar manner to that described here Performance should improve somehat because of the expedited computing in this case V Discussion and Conclusions This research confirms to important points Firstly, up to the limit of CPU speed, concurrent load distribution for a single level tree leads to a linear speedup as a function of the number of children Secondly, the use of store and forard load distribution for a fat tree leads to a speedup hich approaches a linear speedup Acnoledgments The support of NSF grant CCR9933 in the course of this research is gratefully acnoledged [5] V Bharadaj, D Ghosee, and V Mani, Multi-installment load distribution in tree netors ith delay, IEEE Transaction on Aerospace and Electronic Systems, vol 3, no, pp , 995 [6] V Bharadaj, D Ghosee, and V Mani, An efficient load distribution strategy for a distributed linear netor of processors ith communication delays, Computers and Mathematics ith Applications, vol 9, no 9, pp 95, 995 [7] J Blaeic and M Drodosi, Scheduling divisible jobs on hypercubes, Parallel Computing, vol, no, pp , 995 [8] J Blaeic and M Drodosi, The performance limits of a to dimensional netor of load sharing processors, Foundations of Computing and Decision Sciences, vol, no, pp 3 5, 996 [9] J Blaeic and M Drodosi, Distributed processing of divisible jobs ith communication start-up costs, Discrete Applied Mathematics, vol 76, no -3, pp 4, May 997 [0] J Sohn and T G Robertai, Optimal time varying load sharing for divisible loads, IEEE Transactions on Aerospace and Electronic Systems, vol 34, no 3, pp , July 998 [] J Sohn, T G Robertai, and S Luryi, Optimiing computing costs using divisible load analysis, IEEE Transactions on Parallel and Distributed Systems, vol 9, pp 5 34, March 998 [] S Charcranoon, T G Robertai, and S Luryi, Cost efficient load sequencing in single-level tree netors, in Proceedings 998 Conference on Information Sciences and Systems, Princeton, NJ, March 998, Princeton University [3] V Kumar, AY Grama, and N R Vempaty, Scalable load balancing techniques for parallel computers, Journal of Parallel and Distributed Computing, vol, pp 60 79, 994 [4] S Pande, DP Agraal, and J Mauney, A scalable scheduling scheme for functional parallelism on distributedmemory multiprocessor systems, IEEE Transactions on Parallel and Distributed Systems, vol 6, pp , 995 [5] David E Culler, and Jasinder Pal Singh, Parallel Computer Architecture: A Hardare/Softare Approach, Morgan Kaufmann Publishers, 999 [6] Herog, U, Woo, L and Chandy, KM, Solution of Queueing Problems by a Recursive Technique, IBM Journal of Research and Development, pp , May 975 [7] H-J Kim, A Novel Load Distribution Algorithm for Divisible Loads, Special Issue of Cluster Computing on Divisible Load Scheduling, Summer/Fall, 00 References [] V Bharadaj, D Ghosee, V Mani, and TG Robertai, Scheduling divisible loads in parallel and distributed systems, IEEE Computer Society Press, Los Alamitos CA, 996 [] YC Cheng and TG Robertai, Distributed computation ith communication delays, IEEE Transactions on Aerospace and Electronic Systems, vol, pp 60 79, 988 [3] G D Barlas, Collection aare optimum sequencing of operations and closed form solutions for the distribution of divisible load on arbitrary processor trees, IEEE Transactions on Parallel and Distributed Systems, vol 9, no 5, pp 49 44, May 998 [4] S Bataineh and T G Robertai, Bus oriented load sharing for a netor of sensor driven processors, IEEE Transactions on Systems, Man and Cybernetics, vol, no 5, p 05, 99
Equal Allocation Scheduling for Data Intensive Applications
I. INTRODUCTION Equal Allocation Scheduling for Data Intensive Applications KWANGIL KO THOMAS G. ROBERTAZZI, Senior Member, IEEE Stony Brook University A new analytical model for equal allocation of divisible
More informationFOR decades it has been realized that many algorithms
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 1 Scheduling Divisible Loads with Nonlinear Communication Time Kai Wang, and Thomas G. Robertazzi, Fellow, IEEE Abstract A scheduling model for single
More informationNeural Networks. Associative memory 12/30/2015. Associative memories. Associative memories
//5 Neural Netors Associative memory Lecture Associative memories Associative memories The massively parallel models of associative or content associative memory have been developed. Some of these models
More informationOn the Energy-Delay Trade-off of a Two-Way Relay Network
On the Energy-Delay Trade-off of a To-Way Relay Netork Xiang He Aylin Yener Wireless Communications and Netorking Laboratory Electrical Engineering Department The Pennsylvania State University University
More informationEffects of Asymmetric Distributions on Roadway Luminance
TRANSPORTATION RESEARCH RECORD 1247 89 Effects of Asymmetric Distributions on Roaday Luminance HERBERT A. ODLE The current recommended practice (ANSIIIESNA RP-8) calls for uniform luminance on the roaday
More informationEnhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment
Enhancing Generalization Capability of SVM Classifiers ith Feature Weight Adjustment Xizhao Wang and Qiang He College of Mathematics and Computer Science, Hebei University, Baoding 07002, Hebei, China
More informationDivisible Job Scheduling in Systems with Limited Memory. Paweł Wolniewicz
Divisible Job Scheduling in Systems with Limited Memory Paweł Wolniewicz 2003 Table of Contents Table of Contents 1 1 Introduction 3 2 Divisible Job Model Fundamentals 6 2.1 Formulation of the problem.......................
More informationMultilayer Feedforward Networks. Berlin Chen, 2002
Multilayer Feedforard Netors Berlin Chen, 00 Introduction The single-layer perceptron classifiers discussed previously can only deal ith linearly separable sets of patterns The multilayer netors to be
More informationA Generalization of a result of Catlin: 2-factors in line graphs
AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 72(2) (2018), Pages 164 184 A Generalization of a result of Catlin: 2-factors in line graphs Ronald J. Gould Emory University Atlanta, Georgia U.S.A. rg@mathcs.emory.edu
More informationAny Work-conserving Policy Stabilizes the Ring with Spatial Reuse
Any Work-conserving Policy Stabilizes the Ring with Spatial Reuse L. assiulas y Dept. of Electrical Engineering Polytechnic University Brooklyn, NY 20 leandros@olympos.poly.edu L. Georgiadis IBM. J. Watson
More informationPart 8: Neural Networks
METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as
More informationModule 5: CPU Scheduling
Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained
More informationChapter 6: CPU Scheduling
Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 6.1 Basic Concepts Maximum CPU utilization obtained
More informationCOMP9334: Capacity Planning of Computer Systems and Networks
COMP9334: Capacity Planning of Computer Systems and Networks Week 2: Operational analysis Lecturer: Prof. Sanjay Jha NETWORKS RESEARCH GROUP, CSE, UNSW Operational analysis Operational: Collect performance
More informationSTATE UNIVERSITY OF NEW YORK AT CEAS TECHNICAL REP~
STATE UNIVERSITY OF NEW YORK AT STONY BROOK ~ CEAS TECHNICAL REP~ Recursive Solution of Equilibrium State Probabilities for Three Tandem Queues with Limited Buffer Space T.G. Robertazzi April 6, 1992 Recursive
More informationLecture 3a: The Origin of Variational Bayes
CSC535: 013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton The origin of variational Bayes In variational Bayes, e approximate the true posterior across parameters
More informationArtificial Neural Networks. Part 2
Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological
More informationImproved Shuffled Frog Leaping Algorithm Based on Quantum Rotation Gates Guo WU 1, Li-guo FANG 1, Jian-jun LI 2 and Fan-shuo MENG 1
17 International Conference on Computer, Electronics and Communication Engineering (CECE 17 ISBN: 978-1-6595-476-9 Improved Shuffled Frog Leaping Algorithm Based on Quantum Rotation Gates Guo WU 1, Li-guo
More information3-D FINITE ELEMENT MODELING OF THE REMOTE FIELD EDDY CURRENT EFFECT
3-D FINITE ELEMENT MODELING OF THE REMOTE FIELD EDDY CURRENT EFFECT Y. Sun and H. Lin Department of Automatic Control Nanjing Aeronautical Institute Nanjing 2116 People's Republic of China Y. K. Shin,
More informationScheduling divisible loads with return messages on heterogeneous master-worker platforms
Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont 1, Loris Marchal 2, and Yves Robert 2 1 LaBRI, UMR CNRS 5800, Bordeaux, France Olivier.Beaumont@labri.fr
More informationAverage Coset Weight Distributions of Gallager s LDPC Code Ensemble
1 Average Coset Weight Distributions of Gallager s LDPC Code Ensemble Tadashi Wadayama Abstract In this correspondence, the average coset eight distributions of Gallager s LDPC code ensemble are investigated.
More informationRandomized Smoothing Networks
Randomized Smoothing Netorks Maurice Herlihy Computer Science Dept., Bron University, Providence, RI, USA Srikanta Tirthapura Dept. of Electrical and Computer Engg., Ioa State University, Ames, IA, USA
More informationEnergy Considerations for Divisible Load Processing
Energy Considerations for Divisible Load Processing Maciej Drozdowski Institute of Computing cience, Poznań University of Technology, Piotrowo 2, 60-965 Poznań, Poland Maciej.Drozdowski@cs.put.poznan.pl
More informationA ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT
Progress In Electromagnetics Research Letters, Vol. 16, 53 60, 2010 A ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT Y. P. Liu and Q. Wan School of Electronic Engineering University of Electronic
More informationA New Recommendation System for Personal Sightseeing Route from Subjective and Objective Evaluation of Tourism Information
Information Engineering Express International Institute of Applied Informatics 2016, Vol.2, No.3, 1 10 A Ne Recommendation System for Personal Sightseeing Route from Subjective and Objective Evaluation
More informationEnergy Minimization via a Primal-Dual Algorithm for a Convex Program
Energy Minimization via a Primal-Dual Algorithm for a Convex Program Evripidis Bampis 1,,, Vincent Chau 2,, Dimitrios Letsios 1,2,, Giorgio Lucarelli 1,2,,, and Ioannis Milis 3, 1 LIP6, Université Pierre
More informationAllocation of multiple processors to lazy boolean function trees justification of the magic number 2/3
Allocation of multiple processors to lazy boolean function trees justification of the magic number 2/3 Alan Dix Computer Science Department University of York, York, YO1 5DD, U.K. alan@uk.ac.york.minster
More informationONLINE SCHEDULING OF MALLEABLE PARALLEL JOBS
ONLINE SCHEDULING OF MALLEABLE PARALLEL JOBS Richard A. Dutton and Weizhen Mao Department of Computer Science The College of William and Mary P.O. Box 795 Williamsburg, VA 2317-795, USA email: {radutt,wm}@cs.wm.edu
More informationPancycles and Hamiltonian-Connectedness of the Hierarchical Cubic Network
Pancycles and Hamiltonian-Connectedness of the Hierarchical Cubic Network Jung-Sheng Fu Takming College, Taipei, TAIWAN jsfu@mail.takming.edu.tw Gen-Huey Chen Department of Computer Science and Information
More informationECE380 Digital Logic. Synchronous sequential circuits
ECE38 Digital Logic Synchronous Sequential Circuits: State Diagrams, State Tables Dr. D. J. Jackson Lecture 27- Synchronous sequential circuits Circuits here a clock signal is used to control operation
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is donloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Amplify-and-forard based to-ay relay ARQ system ith relay combination Author(s) Luo, Sheng; Teh, Kah Chan
More informationHYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni
HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may
More informationSTATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler
STATC141 Spring 2005 The materials are from Pairise Sequence Alignment by Robert Giegerich and David Wheeler Lecture 6, 02/08/05 The analysis of multiple DNA or protein sequences (I) Sequence similarity
More informationScience Degree in Statistics at Iowa State College, Ames, Iowa in INTRODUCTION
SEQUENTIAL SAMPLING OF INSECT POPULATIONSl By. G. H. IVES2 Mr.. G. H. Ives obtained his B.S.A. degree at the University of Manitoba in 1951 majoring in Entomology. He thm proceeded to a Masters of Science
More informationStreaming Algorithms for Optimal Generation of Random Bits
Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou, and Jehoshua Bruck, Fellow, IEEE arxiv:09.0730v [cs.i] 4 Sep 0 Abstract Generating random bits from a source of biased coins (the
More informationCHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum
CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum 1997 19 CHAPTER 3 THE COMMON FACTOR MODEL IN THE POPULATION 3.0. Introduction
More informationEarly & Quick COSMIC-FFP Analysis using Analytic Hierarchy Process
Early & Quick COSMIC-FFP Analysis using Analytic Hierarchy Process uca Santillo (luca.santillo@gmail.com) Abstract COSMIC-FFP is a rigorous measurement method that makes possible to measure the functional
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationWhy do Golf Balls have Dimples on Their Surfaces?
Name: Partner(s): 1101 Section: Desk # Date: Why do Golf Balls have Dimples on Their Surfaces? Purpose: To study the drag force on objects ith different surfaces, ith the help of a ind tunnel. Overvie
More informationNotation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing
Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE
More informationFundamentals of DC Testing
Fundamentals of DC Testing Aether Lee erigy Japan Abstract n the beginning of this lecture, Ohm s la, hich is the most important electric la regarding DC testing, ill be revieed. Then, in the second section,
More informationPerformance Evaluation of Queuing Systems
Performance Evaluation of Queuing Systems Introduction to Queuing Systems System Performance Measures & Little s Law Equilibrium Solution of Birth-Death Processes Analysis of Single-Station Queuing Systems
More informationPET467E-Analysis of Well Pressure Tests 2008 Spring/İTÜ HW No. 5 Solutions
. Onur 13.03.2008 PET467E-Analysis of Well Pressure Tests 2008 Spring/İTÜ HW No. 5 Solutions Due date: 21.03.2008 Subject: Analysis of an dradon test ith ellbore storage and skin effects by using typecurve
More informationLogic Effort Revisited
Logic Effort Revisited Mark This note ill take another look at logical effort, first revieing the asic idea ehind logical effort, and then looking at some of the more sutle issues in sizing transistors..0
More informationLinear Time Computation of Moments in Sum-Product Networks
Linear Time Computation of Moments in Sum-Product Netorks Han Zhao Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 han.zhao@cs.cmu.edu Geoff Gordon Machine Learning Department
More informationAnalytical Modeling of Parallel Systems
Analytical Modeling of Parallel Systems Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing
More informationBloom Filters and Locality-Sensitive Hashing
Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,
More informationCALCULATION OF STEAM AND WATER RELATIVE PERMEABILITIES USING FIELD PRODUCTION DATA, WITH LABORATORY VERIFICATION
CALCULATION OF STEAM AND WATER RELATIVE PERMEABILITIES USING FIELD PRODUCTION DATA, WITH LABORATORY VERIFICATION Jericho L. P. Reyes, Chih-Ying Chen, Keen Li and Roland N. Horne Stanford Geothermal Program,
More informationN-bit Parity Neural Networks with minimum number of threshold neurons
Open Eng. 2016; 6:309 313 Research Article Open Access Marat Z. Arslanov*, Zhazira E. Amirgalieva, and Chingiz A. Kenshimov N-bit Parity Neural Netorks ith minimum number of threshold neurons DOI 10.1515/eng-2016-0037
More informationGiuseppe Bianchi, Ilenia Tinnirello
Capacity of WLAN Networs Summary Per-node throughput in case of: Full connected networs each node sees all the others Generic networ topology not all nodes are visible Performance Analysis of single-hop
More informationSTATE AND OUTPUT FEEDBACK CONTROL IN MODEL-BASED NETWORKED CONTROL SYSTEMS
SAE AND OUPU FEEDBACK CONROL IN MODEL-BASED NEWORKED CONROL SYSEMS Luis A Montestruque, Panos J Antsalis Abstract In this paper the control of a continuous linear plant where the sensor is connected to
More informationMinimax strategy for prediction with expert advice under stochastic assumptions
Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction
More information1 Effects of Regularization For this problem, you are required to implement everything by yourself and submit code.
This set is due pm, January 9 th, via Moodle. You are free to collaborate on all of the problems, subject to the collaboration policy stated in the syllabus. Please include any code ith your submission.
More informationParallel Numerics. Scope: Revise standard numerical methods considering parallel computations!
Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:
More informationScheduling divisible loads with return messages on heterogeneous master-worker platforms
Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont, Loris Marchal, Yves Robert Laboratoire de l Informatique du Parallélisme École Normale Supérieure
More informationSUPPORTING INFORMATION. Line Roughness. in Lamellae-Forming Block Copolymer Films
SUPPORTING INFORMATION. Line Roughness in Lamellae-Forming Bloc Copolymer Films Ricardo Ruiz *, Lei Wan, Rene Lopez, Thomas R. Albrecht HGST, a Western Digital Company, San Jose, CA 9535, United States
More informationAdaptive Noise Cancellation
Adaptive Noise Cancellation P. Comon and V. Zarzoso January 5, 2010 1 Introduction In numerous application areas, including biomedical engineering, radar, sonar and digital communications, the goal is
More informationOperations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads
Operations Research Letters 37 (2009) 312 316 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Instability of FIFO in a simple queueing
More informationOn A Comparison between Two Measures of Spatial Association
Journal of Modern Applied Statistical Methods Volume 9 Issue Article 3 5--00 On A Comparison beteen To Measures of Spatial Association Faisal G. Khamis Al-Zaytoonah University of Jordan, faisal_alshamari@yahoo.com
More informationTDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts.
TDDI4 Concurrent Programming, Operating Systems, and Real-time Operating Systems CPU Scheduling Overview: CPU Scheduling CPU bursts and I/O bursts Scheduling Criteria Scheduling Algorithms Multiprocessor
More informationOn queueing in coded networks queue size follows degrees of freedom
On queueing in coded networks queue size follows degrees of freedom Jay Kumar Sundararajan, Devavrat Shah, Muriel Médard Laboratory for Information and Decision Systems, Massachusetts Institute of Technology,
More information2/5/07 CSE 30341: Operating Systems Principles
page 1 Shortest-Job-First (SJR) Scheduling Associate with each process the length of its next CPU burst. Use these lengths to schedule the process with the shortest time Two schemes: nonpreemptive once
More informationDynamic resource sharing
J. Virtamo 38.34 Teletraffic Theory / Dynamic resource sharing and balanced fairness Dynamic resource sharing In previous lectures we have studied different notions of fair resource sharing. Our focus
More information2 Introduction to Network Theory
Introduction to Network Theory In this section we will introduce some popular families of networks and will investigate how well they allow to support routing.. Basic network topologies The most basic
More informationAccurate and Estimation Methods for Frequency Response Calculations of Hydroelectric Power Plant
Accurate and Estimation Methods for Frequency Response Calculations of Hydroelectric Poer Plant SHAHRAM JAI, ABOLFAZL SALAMI epartment of Electrical Engineering Iran University of Science and Technology
More informationVapor Pressure Prediction for Stacked-Chip Packages in Reflow by Convection-Diffusion Model
Vapor Pressure Prediction for Stacked-Chip Packages in Reflo by Convection-Diffusion Model Jeremy Adams, Liangbiao Chen, and Xuejun Fan Lamar University, PO Box 10028, Beaumont, TX 77710, USA Tel: 409-880-7792;
More informationA set theoretic view of the ISA hierarchy
Loughborough University Institutional Repository A set theoretic view of the ISA hierarchy This item was submitted to Loughborough University's Institutional Repository by the/an author. Citation: CHEUNG,
More informationComputing Maximum Subsequence in Parallel
Computing Maximum Subsequence in Parallel C. E. R. Alves 1, E. N. Cáceres 2, and S. W. Song 3 1 Universidade São Judas Tadeu, São Paulo, SP - Brazil, prof.carlos r alves@usjt.br 2 Universidade Federal
More informationUC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara
Operating Systems Christopher Kruegel Department of Computer Science http://www.cs.ucsb.edu/~chris/ Many processes to execute, but one CPU OS time-multiplexes the CPU by operating context switching Between
More informationAn Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters
IEEE/ACM TRANSACTIONS ON NETWORKING An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters Mehrnoosh Shafiee, Student Member, IEEE, and Javad Ghaderi, Member, IEEE
More informationLong run input use-input price relations and the cost function Hessian. Ian Steedman Manchester Metropolitan University
Long run input use-input price relations and the cost function Hessian Ian Steedman Manchester Metropolitan University Abstract By definition, to compare alternative long run equilibria is to compare alternative
More informationRace car Damping Some food for thought
Race car Damping Some food for thought The first article I ever rote for Racecar Engineering as ho to specify dampers using a dual rate damper model. This as an approach that my colleagues and I had applied
More informationGiuseppe Bianchi, Ilenia Tinnirello
Capacity of WLAN Networs Summary Ł Ł Ł Ł Arbitrary networ capacity [Gupta & Kumar The Capacity of Wireless Networs ] Ł! Ł "! Receiver Model Ł Ł # Ł $%&% Ł $% '( * &%* r (1+ r Ł + 1 / n 1 / n log n Area
More informationBucket handles and Solenoids Notes by Carl Eberhart, March 2004
Bucket handles and Solenoids Notes by Carl Eberhart, March 004 1. Introduction A continuum is a nonempty, compact, connected metric space. A nonempty compact connected subspace of a continuum X is called
More informationDivisible Load Scheduling: An Approach Using Coalitional Games
Divisible Load Scheduling: An Approach Using Coalitional Games Thomas E. Carroll and Daniel Grosu Dept. of Computer Science Wayne State University 5143 Cass Avenue, Detroit, MI 48202 USA. {tec,dgrosu}@cs.wayne.edu
More informationPerformance and Scalability. Lars Karlsson
Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix
More information上海超级计算中心 Shanghai Supercomputer Center. Lei Xu Shanghai Supercomputer Center San Jose
上海超级计算中心 Shanghai Supercomputer Center Lei Xu Shanghai Supercomputer Center 03/26/2014 @GTC, San Jose Overview Introduction Fundamentals of the FDTD method Implementation of 3D UPML-FDTD algorithm on GPU
More informationData analysis from capillary rheometry can be enhanced by a method that is an alternative to the Rabinowitsch correction
ANNUAL TRANSACTIONS OF THE NORDIC RHEOLOGY SOCIETY, VOL. 18, 2010 Data analysis from capillary rheometry can be enhanced by a method that is an alternative to the Rabinoitsch correction Carlos Salas-Bringas1,
More informationParallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems
Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems ANSHUL GUPTA IBM T.J.WATSON RESEARCH CENTER YORKTOWN HEIGHTS, NY 109 ANSHUL@WATSON.IBM.COM VIPIN KUMAR
More informationDivisible Load Scheduling
Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute
More informationTwo NP-hard Interchangeable Terminal Problems*
o NP-hard Interchangeable erminal Problems* Sartaj Sahni and San-Yuan Wu Universit of Minnesota ABSRAC o subproblems that arise hen routing channels ith interchangeable terminals are shon to be NP-hard
More informationTDDB68 Concurrent programming and operating systems. Lecture: CPU Scheduling II
TDDB68 Concurrent programming and operating systems Lecture: CPU Scheduling II Mikael Asplund, Senior Lecturer Real-time Systems Laboratory Department of Computer and Information Science Copyright Notice:
More informationA POPULATION-MIX DRIVEN APPROXIMATION FOR QUEUEING NETWORKS WITH FINITE CAPACITY REGIONS
A POPULATION-MIX DRIVEN APPROXIMATION FOR QUEUEING NETWORKS WITH FINITE CAPACITY REGIONS J. Anselmi 1, G. Casale 2, P. Cremonesi 1 1 Politecnico di Milano, Via Ponzio 34/5, I-20133 Milan, Italy 2 Neptuny
More informationExtending the Associative Rule Chaining Architecture for Multiple Arity Rules
Extending the Associative Rule Chaining Architecture for Multiple Arity Rules Nathan Burles, James Austin, and Simon O Keefe Advanced Computer Architectures Group Department of Computer Science University
More informationA Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window
A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY
More informationComparison of Estimators in Case of Low Correlation in Adaptive Cluster Sampling. Muhammad Shahzad Chaudhry 1 and Muhammad Hanif 2
ISSN 684-8403 Journal of Statistics Volume 3, 06. pp. 4-57 Comparison of Estimators in Case of Lo Correlation in Muhammad Shahad Chaudhr and Muhammad Hanif Abstract In this paper, to Regression-Cum-Eponential
More informationResponse of NiTi SMA wire electrically heated
, 06037 (2009) DOI:10.1051/esomat/200906037 Oned by the authors, published by EDP Sciences, 2009 Response of NiTi SMA ire electrically heated C. Zanotti a, P. Giuliani, A. Tuissi 1, S. Arnaboldi 1, R.
More informationOn improving matchings in trees, via bounded-length augmentations 1
On improving matchings in trees, via bounded-length augmentations 1 Julien Bensmail a, Valentin Garnero a, Nicolas Nisse a a Université Côte d Azur, CNRS, Inria, I3S, France Abstract Due to a classical
More information6 Solving Queueing Models
6 Solving Queueing Models 6.1 Introduction In this note we look at the solution of systems of queues, starting with simple isolated queues. The benefits of using predefined, easily classified queues will
More informationOn the Average Path Length of Complete m-ary Trees
1 2 3 47 6 23 11 Journal of Integer Sequences, Vol. 17 2014, Article 14.6.3 On the Average Path Length of Complete m-ary Trees M. A. Nyblom School of Mathematics and Geospatial Science RMIT University
More informationDivisible Load Scheduling on Heterogeneous Distributed Systems
Divisible Load Scheduling on Heterogeneous Distributed Systems July 2008 Graduate School of Global Information and Telecommunication Studies Waseda University Audiovisual Information Creation Technology
More informationParallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic
More informationFORCE NATIONAL STANDARDS COMPARISON BETWEEN CENAM/MEXICO AND NIM/CHINA
FORCE NATIONAL STANDARDS COMPARISON BETWEEN CENAM/MEXICO AND NIM/CHINA Qingzhong Li *, Jorge C. Torres G.**, Daniel A. Ramírez A.** *National Institute of Metrology (NIM) Beijing, 100013, China Tel/fax:
More informationFlow Algorithms for Parallel Query Optimization
Flow Algorithms for Parallel Query Optimization Amol Deshpande amol@cs.umd.edu University of Maryland Lisa Hellerstein hstein@cis.poly.edu Polytechnic University August 22, 2007 Abstract In this paper
More informationTowards a Theory of Societal Co-Evolution: Individualism versus Collectivism
Toards a Theory of Societal Co-Evolution: Individualism versus Collectivism Kartik Ahuja, Simpson Zhang and Mihaela van der Schaar Department of Electrical Engineering, Department of Economics, UCLA Theorem
More informationDefinition of a new Parameter for use in Active Structural Acoustic Control
Definition of a ne Parameter for use in Active Structural Acoustic Control Brigham Young University Abstract-- A ne parameter as recently developed by Jeffery M. Fisher (M.S.) for use in Active Structural
More informationLink Models for Packet Switching
Link Models for Packet Switching To begin our study of the performance of communications networks, we will study a model of a single link in a message switched network. The important feature of this model
More informationColor Reproduction Stabilization with Time-Sequential Sampling
Color Reproduction Stabilization ith Time-Sequential Sampling Teck-Ping Sim, and Perry Y. Li Department of Mechanical Engineering, University of Minnesota Church Street S.E., MN 55455, USA Abstract Color
More informationOnline Packet Routing on Linear Arrays and Rings
Proc. 28th ICALP, LNCS 2076, pp. 773-784, 2001 Online Packet Routing on Linear Arrays and Rings Jessen T. Havill Department of Mathematics and Computer Science Denison University Granville, OH 43023 USA
More informationA New Method for Calculating Oil-Water Relative Permeabilities with Consideration of Capillary Pressure
A Ne Method for Calculating Oil-Water Relative Permeabilities ith Consideration of Capillary Pressure K. Li, P. Shen, & T. Qing Research Institute of Petroleum Exploration and Development (RIPED), P.O.B.
More information