CSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions

Similar documents
CSE 4095/5095 Topics in Big Data Analytics Spring 2017; Homework 1 Solutions

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26. Homework #2. ( Due: Nov 8 )

Data Structures and Algorithms

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Discrete Wiskunde II. Lecture 5: Shortest Paths & Spanning Trees

CSE 4502/5717: Big Data Analytics

Breadth First Search, Dijkstra s Algorithm for Shortest Paths

Graph-theoretic Problems

R ij = 2. Using all of these facts together, you can solve problem number 9.

Dijkstra s Single Source Shortest Path Algorithm. Andreas Klappenecker

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

CSE 202 Homework 4 Matthias Springer, A

CS 161 Summer 2009 Homework #2 Sample Solutions

Data Structures and Algorithms CSE 465

Single Source Shortest Paths

Exam 1. March 12th, CS525 - Midterm Exam Solutions

1. [10 marks] Consider the following two algorithms that find the two largest elements in an array A[1..n], where n >= 2.

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October

Lecture 6 September 21, 2016

Slides credited from Hsueh-I Lu & Hsu-Chun Hsiao

Shortest Path Algorithms

Analysis of Algorithms. Outline. Single Source Shortest Path. Andres Mendez-Vazquez. November 9, Notes. Notes

CMPS 6610 Fall 2018 Shortest Paths Carola Wenk

Single Source Shortest Paths

A Simple Implementation Technique for Priority Search Queues

Graph Search Howie Choset

Review Of Topics. Review: Induction

IS 709/809: Computational Methods in IS Research Fall Exam Review

Design and Analysis of Algorithms

Algorithm Design and Analysis

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

Fundamental Algorithms

Fundamental Algorithms

Dynamic Programming: Shortest Paths and DFA to Reg Exps

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Catie Baker Spring 2015

1 Introduction. 1.1 The Problem Domain. Self-Stablization UC Davis Earl Barr. Lecture 1 Introduction Winter 2007

CMU Lecture 4: Informed Search. Teacher: Gianni A. Di Caro

CSE 591 Homework 3 Sample Solutions. Problem 1

Lecture 4: Best, Worst, and Average Case Complexity. Wednesday, 30 Sept 2009

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions

Quiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions

Divide-and-conquer: Order Statistics. Curs: Fall 2017

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

Greedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 10

CSE 613: Parallel Programming. Lecture 9 ( Divide-and-Conquer: Partitioning for Selection and Sorting )

data structures and algorithms lecture 2

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Dynamic Programming: Shortest Paths and DFA to Reg Exps

CSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )

Announcements. CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis. Today. Mathematical induction. Dan Grossman Spring 2010

1 Approximate Quantiles and Summaries

CS 410/584, Algorithm Design & Analysis, Lecture Notes 4

CS361 Homework #3 Solutions

Algorithms. Quicksort. Slide credit: David Luebke (Virginia)

CSCI 3110 Assignment 6 Solutions

BFS Dijkstra. Oct abhi shelat

CSC 8301 Design & Analysis of Algorithms: Lower Bounds

CPSC 320 Sample Final Examination December 2013

CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk. 10/8/12 CMPS 2200 Intro.

Query Processing in Spatial Network Databases

Recommended readings: Description of Quicksort in my notes, Ch7 of your CLRS text.

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

Even More on Dynamic Programming

Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109. Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101

Analysis of Algorithms CMPSC 565

Q520: Answers to the Homework on Hopfield Networks. 1. For each of the following, answer true or false with an explanation:

25. Minimum Spanning Trees

25. Minimum Spanning Trees

Algorithm Design Strategies V

The Las-Vegas Processor Identity Problem (How and When to Be Unique)

Algorithms and Their Complexity

CS 4407 Algorithms Lecture: Shortest Path Algorithms

CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1

Algorithm Design and Analysis

Optimal Color Range Reporting in One Dimension

Sorting. Chapter 11. CSE 2011 Prof. J. Elder Last Updated: :11 AM

CSED233: Data Structures (2017F) Lecture4: Analysis of Algorithms

Nondeterminism. September 7, Nondeterminism

Appendix of Computational Protein Design Using AND/OR Branch and Bound Search

Problem Set 1. CSE 373 Spring Out: February 9, 2016

Divide-Conquer-Glue. Divide-Conquer-Glue Algorithm Strategy. Skyline Problem as an Example of Divide-Conquer-Glue

Priority queues implemented via heaps

ICS 252 Introduction to Computer Design

P C max. NP-complete from partition. Example j p j What is the makespan on 2 machines? 3 machines? 4 machines?

Algorithms. Jordi Planes. Escola Politècnica Superior Universitat de Lleida

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm

Week 5: Quicksort, Lower bound, Greedy

Advanced Analysis of Algorithms - Midterm (Solutions)

Scribes: Po-Hsuan Wei, William Kuzmaul Editor: Kevin Wu Date: October 18, 2016

Space Complexity of Algorithms

Greedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Principles of AI Planning

COMP 250 Fall Midterm examination

Quicksort algorithm Average case analysis

Selection and Adversary Arguments. COMP 215 Lecture 19

Omega notation. Transitivity etc.

Transcription:

CSE 502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions 1. Consider the following algorithm: for i := 1 to α n log e n do Pick a random j [1, n]; If a[j] = a[j + 1] or a[j] = a[j 1] then output: Type II and quit; Output: Type I ; Analysis: Note that if the array is of type I, the above algorithm will never give an incorrect answer. probability of an incorrect answer as follows. Thus assume that the array is of type II. We ll calculate the Probability of coming up with the correct answer in one iteration of the for loop is n n = 1 n. Thus, probability of failure in any iteration is 1 1 n. As a consequence, ( ) q probability of failure in q successive iterations is 1 1 n exp( q/ n) (using the fact that (1 1/x) x 1/e for any x > 0). This probability will be n α when q α n log e n. Thus the output of this algorithm is correct with high probability. Note: There exists a deterministic algorithm to solve this problem in O( n) time. 2. It was shown in class that the maximum of n elements can be found in O(1) time using n 2 common CRCW PRAM processors. Consider the case when ɛ = 1 2. Divide the elements into groups fo size n. Assign the first n elements to the first n processors and the second n elements to the next n processors and so on. The maximum element in each group can be found in O(1) time. At this stage, we have n elements and n n processors. Hence, the maximum of these elements can be found in O(1) time. Total time = O(1). Next, consider the case when ɛ = 1 3. Here, divide the elements into groups of size n 1/3. Assign the first n 1/3 elements to the first n 2/3 processors and the second n 1/3 elements to the next n 2/3 processors and so on. The maximum element of each group can be found in O(1) time and using n /3 prceossors the maximum of these maximum elements can be found in O(1) time. For the general case, partition the input into groups with n ɛ elements in each group. Find the maximum of each group assigning n 2ɛ processors to each group. This takes 1

O(1) time. Now the problem reduces to finding the maximum of n 1 ɛ elements. Again, partition the elements with n ɛ elements in each group and find the maximum of each group. There will be only n 1 2ɛ elements left. Proceed in a similar fashion until the number of remaining elements is n. The maximum of these can be found in O(1) time. Clearly, the run time of this algorithm is O(1/ɛ). This will be a constant if ɛ is a constant. 3. The algorithm runs in phases. In each phase we eliminate a constant fraction of the input keys that cannot be the element of interest. When the number of remaining keys is n, one of the processors performs an appropriate selection and outputs the right element. To begin with all the keys are alive. In any phase of the algorithm let N stand for the number of alive keys at the beginning of the phase. At the beginning of the first phase, N = n. Consider a phase where the number of alive keys is N at the beginning of the phase. Let Y be the collection of alive keys. We employ N processors in this phase. Partition the N keys into N parts with N keys in each part. Each processor is assigned a part. Each processor in parallel finds the median of its keys in O( N) time. Let M 1, M 2,..., M N be these group medians. One of the processors finds the median M of these N group medians. This will take O( N) time. Now partition Y into Y 1 and Y 2, where Y 1 = {q Y q < M} and Y 2 = {q Y q > M}. There are 3 cases to consider: Case 1: If Y 1 = i 1, M is the element of interest. In this case, we output M and quit. Case 2: If Y 1 i, Y 1 will constitute the alive keys for the next phase. Case 3: If the above two cases do not hold, Y 2 will constitute the collection of alive keys for the next phase. In this case we set i := i Y 1 1. In cases 2 and 3 we can perform the partitions using a prefix computation that can be done in O( N) time using N processors. It is easy to see that Y 1 N and Y 2 N. As a result, it follows that the number of alive keys at the end of this phase is 3N. ( N ) Thus we infer that the run time of the algorithm is O + (3/)N + (3/)2 N +... = O( N).. If we employ k-way merge where k = cm/b, the height of the merge tree will be log(n/m). However, in the worst case we may have to do c passes through the data at log(cm/b) each level of the tree, since we can only keep B/c keys of each run. Thus the worst case number of I/O passes needed is 1 + c log(n/m). log(cm/b) 2

5. Dijkstra s algorithm can be described as follows: Algorithm 1: Dijkstra(V, E, s) Data: (V, E): a graph; s: a source node; let w(u, v) be the weight of edge (u, v); Result: array d where d u is the length of the shortest path from s to u; begin for u in V do d u := ; d s := 0; Create a priority queue Q to store pairs of the form (node, distance); Insert the pair (s, 0) into Q; while Q not empty do (u, r) := ExtractMin(Q); for every child c of u do if d c > d u + w(u, c) then d c := d u + w(u, c); Insert(Q, (c, d c )); // update distance if c present We assume that we can store the priority queue in memory (O( V )). The algorithm will read the neighbors of each node at most once. Therefore, the total number of I/Os is degu ( u E B = O E ). + V B 6. We apply the LMM algorithm with l = m = M. We assume known that we can merge M sequences of length M each in 3 passes through the data. The pseudocode of the algorithm is given below: 3

Algorithm 2: Sort(X, N) Data: X: array of elements; N = M 2 : number of elements in X; Result: sorted array X; begin // First Pass; Split the input into M runs of length M each; Sort each run and unshuffle it into m = M sequences of length M each; // Second Pass; Merge groups of l = M unshuffled sequences (in memory); // Third Pass; Shuffle groups of m = M merged sequences of length M each; At the same time clean up the dirty regions; At this point we have M sorted runs of length M M each; // Third Pass (can be done with the previous pass); Unshuffle each run of length M M into m = M sequences of length M each ; // Fourth, Fifth and Sixth Pass; Merge groups of l = M unshuffled sequences of length M each; // Seventh Pass; Shuffle groups of m = M merged sequences of length M M each; Clean up dirty regions; For an arbitrary N, the general principle is to first merge M sequences of length M each, then merge M sequences of length M M each and so on. Let K stand for M and let T (u, v) be the number of passes required to merge u sorted sequences of length v each. Then we have the familiar formulas: T (K, M) = 3 T (K, K i M) = 2 + T (K, K i 1 M) = 2i + 3 T (K c, M) = T (K, M) + T (K, KM) + T (K, K 2 M) +... + T (K, K c 1 ) c 1 = (2i + 3) = c 2 + 2c i=0 However, as we saw in the previous pseudocode, when we compute T (K c, M) we can

overlap the unshuffling at the beginning of a T (K, K i M) computation with the shuffling done at the end of the previous T (K, K i 1 M) computation. Therefore, the last equation becomes: T (K c, M) = T (K, M) +... + T (K, K c 1 ) (c 1) = c 2 + c + 1 Therefore the number of passes for M 2 and M 3 elements are: T (M 2 ) = T (M, M) = T (K 2, M) = 2 2 + 2 + 1 = 7 T (M 3 ) = T (M 2, M) = T (K, M) = 2 + + 1 = 21 In general, for a given N, if K c log N/M = N/M it means that c = 2 and the number of log M passes to sort N elements is: T (N) = T (K c, M) = ( ) 2 log N/M log N/M + 2 log M log M + 1. 5