A recursive construction of efficiently decodable list-disjunct matrices

CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct mtrices The method described here is from Ngo-Port-Rudr [1]. 1 Min ide Let M be t N (d, l)-list-disjunct mtrix. Given test outcome vector y {0, 1} t, the nive decoder cn identify the set R y of items which contins ll the positives plus < l negtive items in time O(tN). From the probbilistic proof in the previous lecture, we lso know tht we cn chieve t = O(d 2 /l log(n/(d + l))). For smll d nd l, most of the running time of the nive decoding lgorithm is spent on the liner term N. N)? How do we reduce the running time down to sub-liner? Sy even s high s Õ(poly(d) For concreteness, let us consider specil cse when d = l. For every d N/2, there exists (d, d)- list-disjunct mtrix with t = O(d log(n/d)) rows which is decodble in time O(Nd log(n/d)). We shll use this knowledge to construct (d, d)-list-disjunct mtrix which is decodble in sub-liner time s follows. Imgine putting ll N blood smples on N N grid. Let us index the smples by pirs [r, c] for integers 1 c, r N. Every row nd every column contins exctly N blood smples ech. For ech column c, crete column smple lbeled [, c] by pooling together ll smples [r, c] for r [ N]. We thus hve N column smples [, c], ech of which is set of N individul smples. Similrly, we hve N row smples lbeled [r, ] for r [ N]. A column/row smple is sid to be positive if it contins t lest one positive smple. Now, let M h (h is for horizontl) nd M v (v) is for verticl) be two t 1 N (d, d)-list-disjunct mtrices. Let M be ny t 2 N (d, d)-list-disjunct mtrix. We use M v to test the column smples, use M h to test the row smples, nd use M to test the norml smples. In totl, we hve t = 2t 1 + t 2 (perhps dptive) tests. After running the nive decoding lgorithm on M v nd M h in time O(t 1 N), we cn identify set C of less thn 2d column smples [, c] which contin ll the positive column smples; nd, we cn identify set R of less thn 2d row smples [r, ] which contin ll the positive row smples. The positive individul smples must lie in the intersection of the columns C nd the rows R. There re totlly 2d 2d cndidte smples in the intersections. Thus, we hve nrrowed down N to set of t most 4d 2 smples. Finlly, we run the nive decoding lgorithm on M restricted only to these 4d 2 smples to get set of t most 2d cndidtes. The running time of the decoding lgorithm on M is only t (4d 2 ). In totl, the running time is O(t 1 N + td 2 ). Since we cn chieve t 1 = O(d log( N/d)) nd t 2 = O(d log(n/d)), the totl decoding time is bout O((d N + d 3 ) log(n/d)), which is subliner in N for smll d. 2 Turning the ide into non-dptive system We wnt non-dptive group testing strtegy. The ide described bove does not work s is. Wht we need is single t N mtrix which is (d, d)-list-disjunct, not few mtrices of different dimensions. How do we 1

relize the ide of testing column smples nd row smples? It turns out we only need the structures of the mtrices M v, M h, but we do not need to use them directly. From the mtrix M v which gin is of dimension t 1 N construct mtrix M v of dimension t 1 N s follows. Ech row i of M v represents test (pool) of M v. If the test i in M v contins column smple [, c] then we put ll items lbeled [r, c] in the ith test of M v. In other words, there is 1 in row i nd column [r, c] of M v if nd only if there is 1 in row i nd column [, c] of M v. It is not hrd to see tht test using M v turns positive if nd only if the corresponding test using M v (on the column smples) turns positive. Similrly, we construct the mtrix M h of dimension t 1 N. Our finl group testing mtrix M is constructed by stcking verticlly three mtrices M v, M h, nd M, for totl of t = 2t 1 + t 2 rows. The decoding strtegy is to use the rest results corresponding to the rows of M h nd M v to identify set of t msot 4d 2 cndidte columns of M, nd then run the nive decoder on the M prt restricted to these 4d 2 columns. The totl decoding time is O(t 1 N + d 2 t 2 ). Formlly, the bove resoning leds to the following lemm. Lemm 2.1. Let d be positive integer. Suppose for N d 2, there exists (d, d)-list-disjunct mtrix M N of size t(d, N) N which cn be decoded in time D(d, N); nd, there exists (d, d)-listdisjunct mtrix M of size t (d, N) N. Then, there exists (d, d)-list-disjunct mtrix M N with rows nd N columns which cn be decoded in time t(d, N) = 2t(d, N) + t (d, N) D(d, N) = 2D(d, N) + 4d 2 t (d, N). Next, we pply the bove lemm recursively. Let us pply recursion few times to see the pttern. First, from the probbilistic method proof we discussed in previous lecture, we know there exists n bsolute constnt c such tht, for every d N, there is (d, d)-list-disjunct mtrix M (d, N) of dimension t (d, N) N where t (d, N) = cd log 2 (N/d). When N = d 2, let M N = M d be the d d identity mtrix which is certinly (d, d)-list-disjunct with t(d, d) = d rows nd which cn be decoded in time D(d, d) = d. Now, using M (d, d) nd the lemm, we obtin mtrix M d 2 with dimension t(d, d 2 ) d 2 nd decoding time D(d, d 2 ), where t(d, d 2 ) = 2d + cd log d D(d, d 2 ) = 2d + (4d 2 ) cd log d When N = d 4, pplying the lemm using M N = M d2 just constructed bove, we obtin t(d, d 4 ) = 4d + 2cd log(d) + cd log(d 3 ) D(d, d 4 ) = 4d + 8cd 3 log d + (4d 2 ) cd log(d 3 ) Generlly, for N = d 2i we hve the following recurrences t(d, d 2i ) = 2t(d, d 2i 1 ) + cd log(d 2i 1 ) D(d, d 2i ) = 2D(d, d 2i 1 ) + 4cd 3 log(d 2i 1 ). The bse cse is when i = 0, where we hve t(d, d) = d nd D(d, d) = d. 2

When N = d 2i, note tht 2 i = log d N nd i = log log d N. To solve for t(d, d 2i ), we iterte the recursion s follows. t(d, d 2i ) = 2t(d, d 2i 1 ) + cd log(d 2i 1 ) = 2t(d, d 2i 1 ) + (2 i 1)cd log d [ ] = 2 2t(d, d 2i 2 ) + (2 i 1 1)cd log d + (2 i 1)cd log d = 4t(d, d 2i 2 ) + [2 2 i (1 + 2)]cd log d [ ] = 4 2t(d, d 2i 3 ) + (2 i 2 1)cd log d + [2 2 i (1 + 2)]cd log d = 8t(d, d 2i 3 ) + [3 2 i (1 + 2 + 4)]cd log d =... = 2 i t(d, d) + [i2 i (1 + 2 + 4 + + 2 i 1 )]cd log d = 2 i t(d, d) + [i2 i 2 i + 1]cd log d = 2 i t(d, d) + [i2 i i]cd log d = d log d N + (log log d N) cd log(n/d). For d 2, obviously log d N = O(log(N/d). Hence, t(d, d 2i ) = O ((log log d N) d log(n/d)). Wht bout the cse when N d 2i for some integer i? For N d 2, let i = log log d N, then d 2i 1 < N d 2i. This mens i < log log d N + 1. We construct M N by removing from M d 2 i rbitrrily d2i N columns. The resulting mtrix cn be operted on s if those removed columns were ll negtive items. It is not hrd to see tht t(d, N) = O ((log log d N) d log(n/d)). Wht is relly nice bout this result is tht we now hve n efficiently decodble (d, d)-list-disjunct mtrix M N for every N d 2 whose number of rows is only brown up from the optiml O(d log(n/d)) by doulbe-log fctor of log log d N. Exercise 1. Using the bove recurrence, show tht The bove nlysis leds to the following result. D(t, N) = O ( (log log d N)d 3 log(n/d) ). Theorem 2.2. For ny positive integers d < N, there exists (d, d)-list-disjunct mtrix M with t = O ((log log d N) d log(n/d)) rows nd N columns. Furthermore, M cn be decoded in time O ( (log log d N) d 3 log(n/d) ). 3

3 Generlizing the min ide The bove ides cn be gretly generlized. The following section cn be skipped without losing much of the intuition behind the results. Theorem 3.1. Let n d 1 be integers. Assume for every i d, there is (d, l)-list disjunct t(i) i mtrix M i for integers 1 l n d. Let 1 log n nd 1 b log n/ be integers. Then there exists t,b n mtrix M,b tht is (d, l)-list disjunct tht cn be decoded in time D,b where nd t,b = log b ( log n ( b j ) b t j n ( ( )) log n 2 D,b = O t,b + (d + l) b. (2) Finlly, if the fmily of mtrices {M i } i d is (strongly) explicit then so is M,b. Proof. We will construct the finl mtrix M,b recursively. In prticulr, let such mtrix in the recursion with N columns be denoted by M,b (N). Note tht the finl mtrix is M,b = M,b (n). (For nottionl convenience, we will define D,b (N) nd t,b (N) to be the decoding time for nd the number of rows in M,b (N) respectively). Next, we define the recursion. If N 2, then set M,b (N) = M N. Note tht in this cse, t,b (N) = t(n). Further, we will use the nive decoder in the bse cse, which implies tht D,b (N) = O(t,b (N) N) O(2 t,b (N)). It is esy to check tht both (1) nd (2) re stisfied. Finlly becuse M N is (d, l)-list disjunct mtrix, so is M,b (N). Now consider the cse when N > 2. For i [b], define M (i) to be the t,b ( b N) N mtrix whose kth column (for k [N]) is identicl to the mth column in M,b ( b N) where m is the ith chunk of 1 b log n bits in k (gin, we think of k nd m s their respective binry representtions). Define M,b (N) to be the stcking of M (1), M (2),..., M (b) nd M N. Since M N is (d, l)-list disjunct mtrix, so is M,b (N). Next, we verify tht (1) holds. To this end note tht In prticulr, (by induction) ll the M (i) contribute b log log b N b «ı 1 t,b (N) = b t,b ( b N) + t(n). (3) ( b j b t j ) b N = log b ( log N j=1 ( ) b j b t j N rows. Since M N dds nother t(n) rows, M,b (N) indeed stisfies (1). Finlly, we consider the decoding of M,b (N). The decoding lgorithm is nturl: we run the decoding lgorithm for M,b ( b N) (tht is gurnteed by induction) on the prt of the (erroneous) outcome vector corresponding to ech of the M (i) (i [b]) to compute sets S i with the following gurntee: ech of the t most d defective indices k [N] projected to the ith chunk of 1 b log n is contined in S i. Finlly, we run the nive decoding lgorithm for M N on S def = S 1 S 2 S b (note tht by definition of S i ll of the defective items will be in S). To complete the proof, we need to verify tht this lgorithm tkes time s climed in (2). Note tht D,b (N) = b D,b ( b N) + O ( S t(n)). (1) 4

By induction, we hve D,b ( b ( N) = O t,b ( b ( )) log N 2 N) + (d + l) b, nd since M,b ( b N) is (d, l)-list disjunct mtrix, we hve S (d + l) b. The three reltions bove long with (3) show tht D,b (N) stisfies (2), s desired. Finlly, the clim on explicitness follows from the construction. The bound in (1) is somewht unwieldy. We note in Corollry 3.2 tht when t(i) = d x log y i for some rels x, y 1, we cn chieve efficient decoding with only log-log fctor increse in number of tests. We will primrily use this result in our pplictions. Note tht in this cse the bound in (1) cn be bounded s log b ( log n ( log n b j d x b j ) y ( ) log n log b d x log y n. Theorem 3.1 with b = 2 nd = log d, long with the observtion bove, implies the following: Corollry 3.2. Let n d 1 be integers nd x, y 1 be rels. Assume for every i d, there is (d, l)-list disjunct O(d x log y i) i mtrix for integers 1 l n d. Then there exists t n mtrix tht is (d, l)-list disjunct tht cn be decoded in poly(t, l) time, where t O (d x log y n log log d n). Finlly, if the originl mtrices re (strongly) explicit then so is the new one. In other words, the bove result implies tht we cn chieve efficient decoding with only log-log fctor increse in number of tests. We will primrily use the bove result in our pplictions. To show the verstility of Theorem 3.1 we present nother instntition. For ny 0 ɛ 1, if we pick = b log(d + l) nd b = (log n/) ɛ, we get the following result: Corollry 3.3. Let n d 1 be integers nd x, y 1, 0 < ɛ 1 be rels. Assume for every i d, there is (d, l)-list disjunct O(d x log y i) i mtrix for integers 1 l n d. Then there exists t n mtrix tht is (d, l)-list disjunct tht cn be decoded in poly(t, l) 2 1+ɛ log ɛ n log(d+l) time, where ( ) 1 t O ɛ dx log y n. Finlly, if the originl mtrices re (strongly) explicit then so is the new one. Note tht the bove result implies tht with only constnt fctor blow-up in the number of tests, one cn perform sub-liner in n time decoding when (d + l) is polynomilly smll in n. References [1] H. Q. NGO, E. PORAT, AND A. RUDRA, Efficiently decodble error-correcting list disjunct mtrices nd pplictions - (extended bstrct), in ICALP (1), 2011, pp. 557 568. 5