Graphlet Screening (GS)

Size: px

Start display at page:

Download "Graphlet Screening (GS)"

Bryce Gordon
6 years ago
Views:

1 Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36

2 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton Rutgers University of Wisconsin Jiashun Jin Graphlet Screening (GS) 2 / 36

3 Rare/Weak signals in LSI Rare. Signals sparsely scatter across different observation units; no priori where there are Weak. Signals are individually weak (new) Box and Meyer (1986) Jiashun Jin Graphlet Screening (GS) 3 / 36

4 A linear model Y = X β + z, X = X n,p, z N(0, I n ) p n 1 β is sparse: many coordinates are 0 Gram matrix G = X X has unit diagonals is sparse (few large coordinates in each row) Jiashun Jin Graphlet Screening (GS) 4 / 36

5 Linkage Disequilibrium (LD) Jiashun Jin Graphlet Screening (GS) 5 / 36

6 Idea Jiashun Jin Graphlet Screening (GS) 5 / 36

7 Univariate Screening Y = X β + z, X = X n,p = [x 1, x 2,..., x p ] For a threshold t > 0, apply Hard-Thresholding: { ˆβ j HT (Y, xj ), (Y, x = j ) t, 0, otherwise Jiashun Jin Graphlet Screening (GS) 6 / 36

8 Signal cancellation Even if β j is large, E[(x j, Y )] could be small Denote the support of β by S = S(β) = {1 j p : β j 0} p (Y, x j ) = (x l, x j )β l + (z, x j ) l=1 = β j + l S(β)\{j} (x j, x l )β l + N(0, 1), signal cancellation may happen as x j {x l : l S(β) \ {j}} Jiashun Jin Graphlet Screening (GS) 7 / 36

9 Exhaustive Multivariate Screening (EMS) Fix m 0 1 and a small π 0 > 0 For m = 1, 2,..., m 0 and any subset J = {j 1, j 2,..., j m }, j 1 < j 2 <... < j m Project Y onto {x j1, x j2,..., x jm }: Y P J Y Retain the nodes in J if and only if P ( χ 2 m(0) P J Y 2) π 0 Fit the model with all retained nodes Jiashun Jin Graphlet Screening (GS) 8 / 36

10 Rationale Hope: maybe for some small-size set J, {x j : j J } {x l : l S(β) \ J }; Y = β j x j + β j x j + z j J j S(β)\J Possible signal cancellation if we look at any single projected coefficients (Y, x j ) No signal cancelation if look at the projected coefficients {(Y, x j ) : j J } together Jiashun Jin Graphlet Screening (GS) 9 / 36

11 Problems Computationally infeasible: m 0 m=1 ( ) p m Inefficiency: include too many candidates for screening; signals need to be stronger than necessary to survive the screening Our proposal: Graphlet Screening (GS) Jiashun Jin Graphlet Screening (GS) 10 / 36

12 Graph of Strong Dependence (GOSD) Define GOSD as the graph G = (V, E): V = {1, 2,..., p}: each variable is a node Nodes i and j have an edge iff (x i, x j ) δ, (δ = 1 log(p), say) G = X X sparse = G sparse Jiashun Jin Graphlet Screening (GS) 11 / 36

13 Graphlet Screening (GS) A(m 0 ) = {All connected subgraphs of G; size m 0 } GS: same as EMS, except for EMS exhaustively screens all J = {j 1, j 2,..., j m }, j 1 < j 2 <... < j m GS screens J if and only if J A(m 0 ) Lemma. If d be the maximum degree of G, then A(m 0 ) Cp(ed) m 0 ; e = Jiashun Jin Graphlet Screening (GS) 12 / 36

14 Mutual orthogonality G: there is an edge between i and j (x i, x j ) 1/ log(p) S = S(β) = {1 i p : β i 0} Restricting nodes to S forms a subgraph G S, and G S = G S,1... G S,M : G S,l : components By how G is defined, approximately, {x j : j G S,1 } {x j : j G S,2 }... {x j : j G S,M } Jiashun Jin Graphlet Screening (GS) 13 / 36

15 Maximum component size m0 (β, G, δ) Surprise. m0 (β, G, δ) is small in many cases, where m 0(β, G, δ) = max 1 l M G S,l Lemma. If I {β j 0} iid Bernoulli(ɛ), then P ( m 0(β, G, δ) > m ) p(edɛ) m+1, m > 1, where d = d(g) is maximum degree of G Jiashun Jin Graphlet Screening (GS) 14 / 36

16 Signal archipelago I Signals split into many small-size disconnected islands I Columns indexed by different islands: mutual orthogonal I No signal cancellation when we screen each island individually Jiashun Jin Graphlet Screening (GS) 15 / 36

17 Why GS works A(m 0 ) = {All connected subgraphs of G; size m 0 }, G S = G S,1 G S,2... G S,M ; maximum size m0 (β, G, δ) GS screens a set J J A(m 0 ) If m 0 m0 (β, G, δ) G S,l A(m 0 ) : on our screen list! Approximately, no signal cancellation for {x j : G S,l } {x j : G S \ G S,l }, approx. Y = β j x j + j G S,l j G S \G S,l β j x j + z Jiashun Jin Graphlet Screening (GS) 16 / 36

18 Summary Method US ( EMS GS Computation p p ) m 0 Cp(ed) m 0 Efficiency yes no yes Robust to Signal Cancellation no yes yes Jiashun Jin Graphlet Screening (GS) 17 / 36

19 Methods Jiashun Jin Graphlet Screening (GS) 17 / 36

20 Application I: rank features For feature j, find all J j, calculate P-values; use π j = minimum of all such P-values for ranking Example. Rows of X = X n,p are iid N(0, Ω) Ω block-wise diagonal, with building blocks ( 1 ρ ) ρ 1 β: partitioned to blocks of 2, (0, 0), prob. (1 ɛ), (0, τ), prob. ɛ/4, (β 2k 1, β 2k ) = (τ, 0), prob. ɛ/4, (τ, τ), prob. ɛ/2 Jiashun Jin Graphlet Screening (GS) 18 / 36

21 ROC of US vs ROC of GS (m 0 = 2) p = 1000, n = 500, ɛ = ρ = 0.8,τ = 4 US GS ρ = 0.8,τ = 1.5 US GS Jiashun Jin Graphlet Screening (GS) 19 / 36

22 SNP data (chromosome 21) GS: m 0 = 2; p = 3937, n = Left: (β 2k 1, β 2k ) = (τ, τ), prob Right: (β 3k 2, β 3k 1, β 3k ) = (τ, τ, τ), prob pair signals, τ = triplet signals, τ = US GS US GS Jiashun Jin Graphlet Screening (GS) 20 / 36

23 Application II. Variable Selection A two-stage screen and clean procedure: Apply GS, with small modifications Clean Jiashun Jin Graphlet Screening (GS) 21 / 36

24 Screen step A(m 0 ) = {All connected subgraphs of G with size m 0 } Arrange by size, ties breaking lexicographically: A(m 0 ) = {J 1, J 2,..., J T } Initializing with S 0 = For t = 1, 2,..., T, letting S t 1 be the set of retained indices in stage t 1, update S t 1 by { St 1 J S t = t, if P Jt Y 2 P Jt S t 1 Y 2 2q log(p), S t 1, otherwise Ŝ = S T : all retained nodes Jiashun Jin Graphlet Screening (GS) 22 / 36

25 Clean step Fixing tuning parameters (u gs, v gs ), j / Ŝ: set ˆβ gs j = 0 j Ŝ: decompose GŜ = GŜ,1 GŜ,2... GŜ, ˆM, and estimate {β j : j GŜ,l } by minimizing P G Ŝ,l (Y j GŜ,l β j x j ) 2 + (u gs ) 2 β 0, subject to either β j = 0 or β j v gs Jiashun Jin Graphlet Screening (GS) 23 / 36

26 Tuning parameters of GS Feature ranking: need (δ, m 0 ); choices of which can be guided by G and computation capacity Variable selection: also need (q, u gs, v gs ) q: flexible and insensitive u gs is relatively easy to estimate v gs is relatively hard to estimate Jiashun Jin Graphlet Screening (GS) 24 / 36

27 Minimax Theory Jiashun Jin Graphlet Screening (GS) 24 / 36

28 Sparse model Y = X β + z, z N(0, I n ) β = b µ, b i iid Bernoulli(ɛ), µ M p (τ, a) b µ R p : (b µ) j = b j µ j M p(τ, a) = {µ R p : τ µ j a τ} As p, link (ɛ, τ) to p by fixed parameters: ɛ = ɛ p = p ϑ, τ = τ p = 2r log(p) Jiashun Jin Graphlet Screening (GS) 25 / 36

29 Sparse model, II. Random design Y = X β+z, X = X 1... X n, X i iid N(0, 1 n Ω) Ω: unknown correlation matrix n = n p = p θ and (1 ϑ) < θ < 1, so that pɛ p n p p Jiashun Jin Graphlet Screening (GS) 26 / 36

30 Minimax Hamming distance Measuring errors with Hamming distance: [ p H p ( ˆβ, ɛ p, µ; Ω) = E j=1 Minimax Hamming distance: Hamm p(ϑ, θ, r, a, Ω) = inf ˆβ 1 { sgn( ˆβ j ) sgn(β j ) }] sup H p ( ˆβ, ɛ p, µ; Ω) µ M p(τ p,a) Jiashun Jin Graphlet Screening (GS) 27 / 36

31 Exponent ρ j = ρ j (ϑ, r, Ω) { Define ω = ω(s 0, S 1 ; Ω) = inf δ δ Ωδ } where { δ u (0) u (1) u (k) : i = 0, i / S k 1 u (k), k = 0, 1 i a, i S k Define ρ(s 0, S 1 ; ϑ, r, a, Ω) = S 0 + S 1 ϑ + ωr ( S 1 S 0 ) 2 ϑ 2 4ωr Minimax rate critically depends on the exponents: ρ j = ρ j (ϑ, r; Ω) = min ρ(s 0, S 1, ϑ, r, a, Ω) (S 0,S 1 ):j S 0 S 1 not dependent on (θ, a) (mild regularity cond.) computable; has explicit form for some Ω Jiashun Jin Graphlet Screening (GS) 28 / 36

32 Asymptotic minimaxity of GS Assume p j=1 Ω(i, j) γ C, γ (0, 1), 1 i p Screen-step: q is a properly small number Clean-step: set u gs = 2ϑ log p, and v gs = τ p Theorem GS achieves optimal rate of convergence: sup µ M p(τ p,a) H p ( ˆβ gs, ɛ p, µ, Ω) L p [ ( p j=1 p ρ j ] ) + p 1 (m 0 +1)ϑ L p [ Hamm p(ϑ, θ, r, a, Ω) + p 1 (m 0+1)ϑ where L p is a generic multi-log(p) term Jiashun Jin Graphlet Screening (GS) 29 / 36 ]

33 L 0 -penalization method Donoho and Starck (1989) Y X β 2 + λ β 0 : λ > 0: tuning parameter Idea: Where there is no noise Infinite solutions to Y = X β But only one is very sparse Hope: the sparsest solution is the truth (Occam s Razor) Noiseless Strong noise Jiashun Jin Graphlet Screening (GS) 30 / 36

34 L 0 /L 1 -penalization methods L 1 -solution L 0 -solution truth Method L 0 /L 1 -penalization CASE Regime Rare/Strong Rare/Weak Loss I {sgn( ˆβ) sgn(β)} Hamm(sgn( ˆβ), sgn(β)) Optimality Not Yes Motivation Imaging/Engineering Genetics/Genomics Design Controllable/Nice Uncontrollable/Bad Key idea One-stage global method Multi-stage local method Hamm: Hamming distance Donoho and Stark (1989), Tibshiraini (1996), and many others Jiashun Jin Graphlet Screening (GS) 31 / 36

35 Phase Diagram Jiashun Jin Graphlet Screening (GS) 31 / 36

36 A Corollary Suppose the conditions of the theorem hold. If additionally, Ω(i, j) 4 2 5, i j, then { Hamm p(ϑ, θ, r, a, Ω) 1 + o(1), r < ϑ, = pɛ p L p p (ϑ r)2 4r, 1 < r ϑ < Right hand side: rate when Ω = I p ; Jiashun Jin Graphlet Screening (GS) 32 / 36

37 Phase Diagram Exact Recovery Exact Recovery 3 3 r r Almost Full Recovery 1.5 Almost Full Recovery No Recovery ϑ 0.5 No Recovery Left: Ω = I p ; line r = ϑ; curve r = (1 + 1 ϑ) 2. Right: Ω as in the corollary; green line: r/ϑ = ϑ Jiashun Jin Graphlet Screening (GS) 33 / 36

38 Simulation comparison LASSO Graphlet Screening LASSO Graphlet Screening LASSO Graphlet Screening by-2 blockwise 0.3 Penta-diagonal 0.3 Random Correlation τ p τ p τ p p = 5000, n = 4000, pɛ p = 250; τ p = 6, 7,..., 12. Left to right: G is block-wise, penta-diagonal, randomly generated ( sprandsym in matlab). Jiashun Jin Graphlet Screening (GS) 34 / 36

39 Take-home messages GS is a computationally feasible approach that overcomes the challenge of signal cancellation Key insight: GS splits into different small-size signal islands G S,l, and {x j : j G S,l } are mutual orthogonal (approx). minimax rate depends on X locally so we have to act locally Optimal in variable selection, while penalization methods are not A flexible idea and that is useful in many different situations Jiashun Jin Graphlet Screening (GS) 35 / 36

40 Main References Genovese C, Jin J, Wasserman L, and Yao Z (2012) A comparison of the lasso and marginal regression. J. Mach. Learn. Res., Ji P and Jin J (2012) UPS delivers optimal phase diagram in high-dimensional variable selection Ann. Statist. 40(1), Jin J, Zhang C-H and Zhang Q (2012) Optimality of Graphlet Screening in High Dimensional Variable Selection. arxiv: Ke T, Jin J and Fan J (2012) Covariance assisted screening and estimation. arxiv: Jiashun Jin Graphlet Screening (GS) 36 / 36

Covariate-Assisted Variable Ranking

Covariate-Assisted Variable Ranking Tracy Ke Department of Statistics Harvard University WHOA-PSI@St. Louis, Sep. 8, 2018 1/18 Sparse linear regression Y = X β + z, X R n,p, z N(0, σ 2 I n ) Signals (nonzero