CSE 09/09 Topics i ig Data Aalytics Sprig 2017; Homework 1 Solutios Note: Solutios to problems,, ad 6 are due to Marius Nicolae. 1. Cosider the followig algorithm: for i := 1 to α log e do Pick a radom j [1, ]; If a[j] = a[j + 1] or a[j] = a[j 1] the output: Type II ad quit; Output: Type I ; Aalysis: Note that if the array is of type I, the above algorithm will ever give a icorrect aswer. probability of a icorrect aswer as follows. Thus assume that the array is of type II. We ll calculate the Probability of comig up with the correct aswer i oe iteratio of the for loop is = 1. Thus, probability of failure i ay iteratio is 1 1. As a cosequece, ( q probability of failure i q successive iteratios is 1 1 exp( q/ (usig the fact that (1 1/x x 1/e for ay x > 0. This probability will be α whe q α log e. Thus the output of this algorithm is correct with high probability. 2. The algorithm rus i phases. I each phase we elimiate a costat fractio of the iput keys that caot be the elemet of iterest. Whe the umber of remaiig keys is, oe of the processors performs a appropriate selectio ad outputs the right elemet. To with all the keys are alive. I ay phase of the algorithm let N stad for the umber of alive keys at the ig of the phase. At the ig of the first phase, N =. Cosider a phase where the umber of alive keys is N at the ig of the phase. Let Y be the collectio of alive keys. We employ N processors i this phase. Partitio the N keys ito N parts with N keys i each part. Each processor is assiged a part. Each processor i parallel fids the media of its keys i O( N time. Let M 1, M 2,..., M N be these group medias. Oe of the processors fids the media M of these N group medias. This will take O( N time. Now partitio Y ito Y 1 ad Y 2, where Y 1 = {q Y q < M} ad Y 2 = {q Y q > M}. There are 3 cases to cosider: Case 1: If Y 1 = i 1, M is the elemet of iterest. I this case, we output 1
M ad quit. Case 2: If Y 1 i, Y 1 will costitute the alive keys for the ext phase. Case 3: If the above two cases do ot hold, Y 2 will costitute the collectio of alive keys for the ext phase. I this case we set i := i Y 1 1. I cases 2 ad 3 we ca perform the partitios usig a prefix computatio that ca be doe i O( N time usig N processors. It is easy to see that Y 1 N ad Y 2 N. As a result, it follows that the umber of alive keys at the ed of this phase is 3N. ( N Thus we ifer that the ru time of the algorithm is O + (3/N + (3/2 N +... = O( N. 3. If we employ k-way merge where k = cm/, the height of the merge tree will be log(n/. However, i the worst case we may have to do c passes through the data at log(cm/ each level of the tree, sice we ca oly keep /c keys of each ru. Thus the worst case umber of I/O passes eeded is 1 + c log(n/m. log(cm/. The FPRT algorithm for selectio works as follows. Let X = k 1, k 2,..., k ; i be the iput for the selectio problem. Here are the steps: 1 Partitio X ito groups of size each, ad fid the media of each group. Let the groups be G i for 1 i. Let the media of G i be M i, for 1 i ; 2 Fid recursively the media M of M 1, M 2,..., M / ; 3 From X get X 1 = {q X : q < M} ad X 2 = {q X : q > M}. Let 1 = X 1 ; If i = 1 + 1 the output M ad quit else if i 1 the recursively fid ad output the ith smallest elemet of X 1 else recursively fid ad output the (i 1 1st smallest elemet of X 2. We ca implemet each of the above steps as it is. Let T ( be the umber of I/O operatios take by the above algorithm o ay iput of size.. Step 1 ca be doe i oe pass (i.e., I/O operatios through the data. Step 2 takes T ( I/O operatios. Step 3 takes oe pass through the data. We ca show that the size of X 1 ad the size of X 2 caot be more tha 7. As a result, step takes o 10 more tha T ( 7 I/O operatios. 10 Thus we get the followig recurrece relatio for T (: which solves to T ( = O (. ( T ( T + T ( 7 ( 10 + Θ 2
. If a leaf ca store more keys, isertio happes i a similar way, we just have to redefie what it meas that a ode is full. Node u is full if it s a iteral ode with 2t 1 childre or if it s a leaf with t 3 keys. Algorithm 1: IsFull(u Data: u: a -Tree ode; Result: True if ode u is full, False otherwise; retur (leaf u AND u == t 3 OR (NOT leaf u AND u == 2t 1; Also, for simplicity, we will always make the root to be o-leaf. The other thig to modify is how to split a full leaf. A full leaf, which has t 3 keys, will be split ito two leafs with 2t 2 keys each. The middle key from the origial leaf moves up ad becomes a key i the paret ode. Let SPLIT NODE be the algorithm discussed i class for splittig a full ode. The followig algorithm will split a ode, takig ito accout splittig full leafs: Algorithm 2: SplitNode(p, i, u Data: p, u: two odes such that p =paret(u ad u is the i-th child of p; Result: Splits the ode u ito two odes; if leaf u the else Create ode u ; Copy last 2t 2 keys of u to u ; Isert key k u 2t 1 as the i-th key of p; Isert u as the i + 1-th child of p; Remove last 2t 1 keys from u; SPLIT NODE(p, i, u; The isertio algorithm is the the followig: 3
Algorithm 3: Isert(T, k Data: T : a -Tree; k: a key; Result: Iserts key k ito T ; r :=root(t ; if isfull(r the Create a ew ode s; s := 0; leaf s :=False; c s 1 := r; SplitNode(s, 1, r; root(t := s; r := s; IsertNoFull(r, k; Algorithm : IsertNoFull(u, k Data: u: a o full -Tree ode; k: a key; Result: Iserts key k ito the subtree rooted at u; if leaf u the Isert k at the right place; else Choose i s.t. ki 1 u k < ki u ; if IsFull(c u i the SplitNode(u, i, c u i ; Update i s.t. ki 1 u k < ki u ; IsertNoFull(c u i, k; 6. Dijkstra s algorithm ca be described as follows:
Algorithm : Dijkstra(V, E, s Data: (V, E: a graph; s: a source ode; let w(u, v be the weight of edge (u, v; Result: array d where d u is the legth of the shortest path from s to u; for u i V do d u := ; d s := 0; Create a priority queue Q to store pairs of the form (ode, distace; Isert the pair (s, 0 ito Q; while Q ot empty do (u, r := ExtractMi(Q; for every child c of u do if d c > d u + w(u, c the d c := d u + w(u, c; Isert(Q, (c, d c ; // update distace if c preset We assume that we ca store the priority queue i memory (O( V. The algorithm will read the eighbors of each ode at most oce. Therefore, the total umber of I/Os is degu ( u E = O E. + V