A Power and Prediction Analysis for Knockoffs with Lasso Statistics

Size: px

Start display at page:

Download "A Power and Prediction Analysis for Knockoffs with Lasso Statistics"

Donald Powers
6 years ago
Views:

1 A Power and Prediction Analysis for Knockoffs with Lasso Statistics Asaf Weinstein a, Rina Barber b, and mmanuel Candès a a Stanford University b University of Chicago Abstract Knockoffs is a new framework for controlling the false discovery rate (FDR) in multile hyothesis testing roblems involving comlex statistical models. While there has been great emhasis on Tye- I error control, Tye-II errors have been far less studied. In this aer we analyze the false negative rate or, equivalently, the ower of a knockoff rocedure associated with the Lasso solution ath under an i.i.d. Gaussian design, and find that knockoffs asymtotically achieve close to otimal ower with resect to an omniscient oracle. Furthermore, we demonstrate that for sarse signals, erforming model selection via knockoff filtering achieves nearly ideal rediction errors as comared to a Lasso oracle equied with full knowledge of the distribution of the unknown regression coefficients. The i.i.d. Gaussian design is adoted to leverage results concerning the emirical distribution of the Lasso estimates, which makes ower calculation ossible for both knockoff and oracle rocedures. 1 Introduction Suose that for each of n subjects we have measured a set of features which we want to use in a linear model to exlain a resonse, and lan to use the Lasso to select relevant features. A desirable rocedure controls the (exected) roortion of false variables entering the exlanatory model; that is, the ratio between the number of null coefficients estimated as nonzero by the Lasso rocedure and the total number of coefficients estimated as nonzero. Concetually, then, we imagine roceeding along the Lasso ath by decreasing the enalty λ, and need to decide when to sto and reject the hyotheses corresonding to nonzero estimates, in such a way that the false discovery rate (FDR) does not exceed a reset level q. The setu above differs from many classical situations, where we know that the null statistics the statistics for which the associated null hyothesis is true are (aroximately) indeendent draws from a distribution that is (aroximately) known; in such situations, it is relatively easily to estimate the roortion of null hyotheses that are rejected. In stark contrast, in our setting we do not know the samling distribution of β j (λ), and, to further comlicate matters, this distribution generally deends on the entire vector β (this is unlike the distribution of the least-squares estimator, where β OLS j deends only on β j ). Barber and Candès (2015) have roosed a method that utilizes artificial knockoff features, and used it to circumvent the difficulties described above. The general idea behind the method is to introduce a set of fake variables (i.e., variables that are known to not belong in the model), in such a way that there is some form of exchangeability under swaing a true null variable and a knockoff counterart. This exchangeability essentially imlies that a true null variable and a corresonding knockoff variable are indistinguishable (in the sense that the sufficient statistic is insensitive to swaing the two), hence given the event that one of the two was selected, each is equally robable to be the selected. We can then use the number of knockoff selections to estimate the number of true nulls being selected.

2 In the original aer, Barber and Candès (2015) considered a Gaussian linear regression model with a fixed number of fixed (nonrandom) redictors, and n. Since then, extensions and variations on knockoffs have been roosed, which made the method alicable to a wide range of roblems. Barber and Candès (2016) considered the high dimensional (n < ) setting, where control over tye III error (sign) is also studied. Candès et al. (2016) constructed robabilistic model-x knockoffs for a nonarametric setting with random covariates. A grou knockoff filter has been roosed by Dai and Barber (2016) for detecting relevant grous of variables in a linear model. The focus in these works is on constructing knockoff variables with the necessary roerties to enable control over the false discovery rate. On the other hand, the Tye II error rate has been less carefully examined. Power calculations are in general hard, esecially when sohisticated test statistics, such as the Lasso estimates, are used. There is, however, a secific setting that enables us to exactly characterize (in some articular sense) the asymtotic distribution of the Lasso estimator relying on the theory of Aroximate Message-Passing (AMP). More secifically, we work in the setu of Bayati and Montanari (2012) which entails an n Gaussian design matrix with n, and n/ δ > 0; and the regression coefficients are i.i.d. coies from a mixture distribution Π = (1 ɛ)δ 0 ɛπ, where Π is the distribution of the nonzero comonents 1 (i.e., has no oint mass at zero). We consider a rocedure that enters variables into the model according to their order of (first) aearance on the Lasso ath. Aealing to the results in Bayati and Montanari (2012), we identify an oracle rocedure that, with the knowledge of the distribution Π (and the noise level σ 2 ), is able to asymtotically attain maximum true ositive rate subject to controlling the FDR. We show that a rocedure which uses knockoffs for calibration, and hence a different solution ath, is able to strictly control the FDR and at the same time asymtotically achieve ower very close to that of the oracle adatively in the unknown Π. Figure 1 shows ower of (essentially) the level-q knockoff rocedure roosed in Section 3 versus that of the level-q oracle rocedure, as q varies and for two different distributions Π. The to two lots are theoretical asymtotic redictions, and the bottom lots show simulation counterarts. As imlied above, the oracle gets to use the knowledge of Π and ɛ in choosing the value of the Lasso enalty arameter λ, so that the asymtotic false discovery roortion is q; by contrast, the knockoff rocedure is alied in exactly the same way in both situations, that is, it does not utilize any knowledge of Π or ɛ. Yet the to two lots in Figure 1 show that the knockoff rocedure adats well to the unknown true distribution of the coefficients, at least for FDR levels in the range that would tyically be of interest, in the sense that it blindly achieves close to otimal asymtotic ower. Figures 4 and 5 of Section 4 give another reresentation of these findings, which are the main results of the aer. While the main focus is on the testing roblem, we address also the issue of model selection and rediction. Motivated by the work of Abramovich et al. (2006), which formally connected FDR control and estimation in the sequence model, our aim is to investigate whether favorable roerties associated with FDR control carry over to the linear model. As Abramovich et al. (2006) remark at the outset, in the secial case of the sequence model, a full decision-theoretic analysis of the redictive risk is ossible. For the linear model case, Bogdan et al. (2015) roosed a enalized least squares estimator, SLOP, motivated from the ersective of multile testing and FDR control, and Su et al. (2016) even roved that this estimator is asymtotically minimax under the same design as that considered in the current aer, although in a different sarsity regime, namely ɛ 0. In truth, the linear model is substantially more difficult to analyze than the sequence model. Fortunately, however, the secific working assumtions we adot allow to analyze the (asymtotic) redictive risk, again relying on the elegant results of Bayati and Montanari (2012) for the Lasso estimates. While our results lack the rigor of those in, e.g., Abramovich et al. (2006) and Su et al. (2016), we think that the findings reorted in Section 5 are valuable in investigating a roblem for which 1 With some abuse of notation, in this aer we use Π and Π to refer to either a distribution or a random variable that has the corresonding distribution. 2

3 Theoretical: Π * = δ 50 Theoretical: Π * = ex(1) TPP (knockoff) q=0.05 q=0.1 q=0.125 TPP (knockoff) q=0.05 q=0.3 q= TPP (oracle) TPP (oracle) Simulation: Π * = δ 50 Simulation: Π * = ex(1) TPP (knockoff) TPP (knockoff) TPP (oracle) TPP (oracle) Figure 1: Asymtotic ower of the knockoff rocedure as comared to the oracle, for two different distributions Π : oint mass (left) and exonential (right). ɛ = 0.2, δ = 1, σ = 0.5. To: Theoretical rediction. ach of the oints marked on the to grahs corresonds to a articular value of q, and was generated as follows. (i) Fix q; (ii) find λ and t s.t. FDP (t) = q and FDP aug(t ) = q, where FDP and FDP aug are given in (5) and (31); (iii) lot the oint ( TPP (t), TPP aug(t ) ), where TPP and TPP aug are given in (6) and (29). The broken line is the 45 degrees line. The knockoff rocedure adats to an unknown Π : in both situations the asymtotic ower (TPP) of the knockoffs rocedure is very close to that of the oracle rocedure. Bottom: Simulation results, ɛ = 0.2; n = = 5000; σ = 0.5. ach of the two grahs is meant to be an analogue of the asymtotic limits above. 3

4 conclusive answers are currently not available. The rest of the aer is organized as follows. Section 2 reviews some relevant results from Su et al. (2017) and resents the oracle false discovery rate-ower tradeoff diagram for the Lasso. In Section 3 we resent a simle testing rocedure which uses knockoffs for FDR calibration, and rove some theoretical guarantees. A ower analysis, which comares the knockoff rocedure to the oracle rocedure, is carried out in Section 4. Section 5 is concerned with using knockoffs for rediction, and Section 6 concludes with a short discussion. 2 An oracle tradeoff diagram Adoting the setu from Su et al. (2017), we consider the linear model y = Xβ z (1) in which X R n has i.i.d. N (0, 1/n) entries and the errors z i are i.i.d. N (0, σ 2 ) with σ 0 fixed and arbitrary. We assume that the regression coefficients β j, j = 1,..., are i.i.d. coies of a random variable Π with (Π 2 ) < and P(Π 0) = ɛ (0, 1) for a fixed constant ɛ. In this section we will be interested in the case where n, with n/ δ for a ositive constant δ. Note that Π is assumed to not deend on, and so the exected number of nonzero elements β j is equal to ɛ. Let β(λ) be the Lasso solution, 1 β(λ) = argmin b R 2 y Xb 2 λ b 1, (2) and denote V (λ) = {j : βj (λ) 0, β j = 0}, T (λ) = {j : βj (λ) 0, β j 0}, R(λ) = {j : β j (λ) 0} and k = {j : β j 0}. Hence V (λ) is regarded as the number of false discoveries made by the Lasso; T (λ) is the number of true discoveries; R(λ) is the total number of discoveries, and k is the number of true signals. We would like to remark here that β j = 0 imlies that X j (the j-th variable) is indeendent of y marginally and conditionally on any subset of X j := {X 1,..., X j 1, X j1,..., X }; hence the interretation of rejecting the hyothesis β j = 0 as a false discovery is clear and unambiguous. The false discovery roortion (FDP) is defined as usual as FDP(λ) = and the true ositive roortion (TPP) is defined as V (λ) 1 R(λ) TPP(λ) = T (λ) 1 k. Su et al. (2017) build on the results in Bayati and Montanari (2012) to devise a tradeoff diagram between TPP and FDP in the linear sarsity regime. Lemma A.1 in Su et al. (2017), which is adoted from Bogdan et al. (2013) and is a consequence of Theorem 1.5 in Bayati and Montanari (2012), redicts the limits of FDP and TPP at a fixed value of λ. Throughout, let η t ( ) be the soft-thresholding oerator, given by η t (x) = sgn(x)( x t). Also, let Π denote a random variable distributed according to the conditional distribution of Π given Π 0; that is, { Π, w.. ɛ, Π = (3) 0, w.. 1 ɛ. Finally, denote by α 0 the unique root of the equation (1 t 2 )Φ( t) tφ(t) = δ/2. 4

5 Lemma 2.1 (Lemma A.1 in Su et al., 2017; Theorem 1 in Bogdan et al., 2013; see also Theorem 1.5 in Bayati and Montanari, 2012). The Lasso solution with a fixed λ > 0 obeys V (λ) T (λ) P 2(1 ɛ)φ( α), P P( Π τw > ατ, Π 0) = ɛp( Π τw > ατ), where W N (0, 1) indeendently of Π, and τ > 0, α > max{α 0, 0} is the unique solution to τ 2 = σ 2 1 δ (η ατ (Π τw ) Π) 2 λ = (1 1δ ) P( Π τw > ατ) ατ. (4) As exlained in Su et al., 2017, Lemma 2.1 is a consequence of the fact that, under the working assumtions, (β, β(λ)) is in some (limited) sense asymtotically distributed as (β, η ατ (β τw)), where W N (0, I) indeendently of β; here, the soft-thresholding oeration acts on each comonent of a vector. It follows immediately from Lemma 2.1 that for a fixed λ > 0, the limits of FDP and TPP are and FDP(λ) P 2(1 ɛ)φ( α) 2(1 ɛ)φ( α) ɛp( Π τw > ατ) FDP (λ) (5) TPP(λ) P P( Π τw > ατ) TPP (λ). (6) The arametric curve (FDP (λ), TPP (λ)) imlicity defines the FDP level attainable at a secific TPP level for the Lasso solution with a fixed λ as q Π (TPP (λ); ɛ, δ, σ) = FDP (λ). (7) Interreted in the oosite direction, for known Π and ɛ, we have an analytical exression for the asymtotic ower of a rocedure that, for a given q, chooses λ in (2) in such a way that FDP (λ) = q. We describe how to comute the tradeoff curve q Π ( ; ɛ, δ, σ) for a given Π and ɛ in the Aendix. Because in ractice Π and ɛ would usually be unknown, we refer to this rocedure as the oracle rocedure. The oracle rocedure is not a legal multile testing rocedure, as it requires knowledge of ɛ, Π and σ 2, which is of course realistically unavailable. Still, it serves as a reference for evaluating the erformance of valid testing rocedures. Cometing with this oracle is the subject of the current aer, and the articular working assumtions above were chosen simly because it is ossible under these assumtions to obtain a tractable exression for the emirical distribution of the Lasso solution. Before roceeding, we would like to mention another oracle rocedure. Hence, consider the rocedure which for λ uses the value λ = inf{λ : FDP(λ) q}. (8) This can be thought of as an oracle that is allowed to see the actual β but is still committed to the Lasso ath, and stos the last time the FDP is smaller than q. From now on, we think of λ as decreasing in time, and regard a variable j as entering before variable j if su{λ : β j (λ) 0} su{λ : β j (λ) 0}. Although this may aear as a stronger oracle in some sense, it is asymtotically equivalent to the fixed-λ oracle discussed earlier: by Theorem 3 in Su et al. (2017), the event { } FDP(λ) q Π (TPP(λ) 0.001; ɛ, δ, σ) (9) λ>0.01 has robability tending to one, hence the lower bound on FDP rovided by q Π ( ; ɛ, δ, σ) holds uniformly in λ. 5

6 3 Calibration using knockoffs If we limit ourselves to rocedures that use the Lasso with a fixed or even data-deendent value of λ, from the revious section we see that the achievable TPP can never asymtotically exceed that of the oracle rocedure. Hence the asymtotic ower of the oracle rocedure serves as an asymtotic benchmark. In this section we roose a cometitor to the oracle, which utilizes knockoffs for FDR calibration. While the rocedure resented below is sequential and formally does not belong to the class of Lasso estimates with a data-deendent choice of λ, it will be almost equivalent to the corresonding rocedure belonging to that class. We discuss this further later on. 3.1 A knockoff filter for the i.i.d. design Let the model be given by (1) as before. In this section, unless otherwise secified, it suffices to assume that the entries of X are i.i.d. from an arbitrary known continuous distribution G (not necessarily normal). Furthermore, unless otherwise indicated, in this section we treat n, as fixed and condition throughout on β. The definitions in the current section override the revious definitions, if there is any conflict. Let X R n r be a matrix with i.i.d. entries drawn from G indeendently of X and of all other variables, where r is a fixed ositive integer. Next, let X := [X X] R n (r) denote the augmented design matrix, and consider the Lasso solution for the augmented design, 1 β(λ) = argmin b R r 2 y Xb 2 λ b 1. (10) For simlicity we kee the notation β(λ) for the solution corresonding to the augmented design, although it is of course different from (2) (for one thing, (10) has r comonents versus for (2)). Let H = {1,..., }, H 0 = {j H : β j = 0}, K 0 = { 1,..., r} be the sets of indices corresonding to the original variables, the null variables and the knockoff variables, resectively. Note that, because we are conditioning on β, the set H 0 is nonrandom. Now define the statistics T j = su{λ : β j (λ) 0}, j = 1,..., r. (11) Informally, T j measures how early the jth variable enters the Lasso ath (the larger, the earlier). Let V 0 (λ) = {j H 0 : T j λ}, V 1 (λ) = {j K 0 : T j λ} be the number of true null and fake null variables, resectively, which enter the Lasso ath before time λ. Also, let R(λ) = {j H : T j λ} be the total number of original variables entering the Lasso ath before time λ. A visual reresentation of the quantities defined above is given in Figure 2. 6

7 nonnull: true null: fake null: R(t) V 0 (t)=6, R(s) V 0 (s)=7 V 0 (t)=3, V 0 (s)=5 V 1 (t)=2, V 1 (s)=3 0 s t T Figure 2: Reresentation of test statistics. ach T j is reresented by a marker: round markers corresond to original variables (H), triangles reresent fakes (knockoffs, K 0 ); black corresonds to nonnull (H \ H 0 ), red reresents null (true or fake, H 0 K 0 ). The class of rocedures we roose (henceforth, knockoff rocedures) reject the null hyotheses corresonding to {j H : T j λ}, where λ = inf λ Λ : (1 V 1(λ)) R(λ) H π 0 1 K 0 q (12) for an estimate π 0 of π 0 := H 0 / H and for a set Λ associated with π 0. Note that conditionally on β, π 0 is a (nonrandom) arameter, and in this section we seak of estimating π 0 rather than 1 ɛ. Hence the tests in this class are indexed by the estimate of π 0 (and by Λ). Denote by X j, j = 1,..., r, the columns of the augmented matrix X. To see why the rocedure makes sense, observe that, conditionally on β, the distribution of (X, y) is invariant under ermutations of the variables in the set {H 0 K 0 } (just by symmetry). Therefore, conditionally on {j H 0 K 0 : T j λ} = k, any subset of size k of {H 0 K 0 } is equally robable to be the selected set {j H 0 K 0 : T j λ}. It follows that Therefore, V 0 (λ)/ H 0 V 1 (λ)/ K 0. V 0 (λ) V 1 (λ) H π 0 K 0 V 1 (λ) H π 0 K 0 and hence (12) can be interreted as the smallest λ such that an estimate of FDP(λ) V 0(λ) 1 R(λ) is below q. The fact that the rocedure defined above, with π 0 = 1, controls the FDR, is a consequence of the more general result below. Theorem 3.1 (FDR control for knockoff rocedure with π 0 = 1). Let T j, j = 1,..., r be test statistics with a joint distribution that is invariant to ermutations in the set H 0 K 0 (that is, reordering the statistics with indices in the extended null set, leaves the joint distribution unchanged). Set V 0 (t) = {j H 0 : T j 7

8 t}, V 1 (t) = {j K 0 : T j t}, and R(t) = {j H : T j t}. Then the rocedure that rejects H j 0 if T j τ, where τ = inf t R : (1 V H 1(t)) 1 K 0 q (13) 1 R(t) controls the FDR at level q. In fact, [ ] V0 (τ) q H 0 1 R(τ) H. Proof. We will use the following lemma, which is roved in the Aendix. Lemma 3.2. Let X Hyer(n 0, n 1 ; m) be a hyergeometric random variable, i.e., ( n0 )( n1 ) P(X = k) = k m k ) with m > 0 and set Y = X/(1 m X). Then 2 In articular, for any value of m. Y = n 0 1 n 1 ( 1 ( n0 n 1 m Y n 0 1 n 1 ( n0 1 ( m n0 n 1 m We roceed to rove the theorem. Denote m 0 = H 0 and recall that H = and K 0 = r. Recall also that we are conditioning throughout on β (so that, in articular, H 0 is nonrandom). We will show that FDR control holds conditionally on β (it therefore also holds unconditionally). With the notation above, We have FDR = We will show that Consider the filtration [ ] V0 (τ) = 1 R(τ) { τ = inf t : (1 V 1(t)) 1 R(t) [ (1 V1 (τ)) 1r 1 R(τ) 1r ) ) ) V 0 (τ) (1 V 1 (τ)). } q. 1r ] [ V 0 (τ) q (1 V 1 (τ)) = q 1 r 1r [ V0 (τ) 1 V 1 (τ) [ ] V0 (τ) m 0 1 V 1 (τ) 1 r. (14) F t = σ ({V 0 (s)} 0 s t, {V 1 (s)} 0 s t, {R(s)} 0 s t ). That is, at all times s t, we have knowledge about the number of true null, fake null and nonnull variables with statistics larger than s. We claim that V 0 (t) 1 V 1 (t) 2 Here and below we adot the convention that ( 0 0) = 1, and ( n k) = 0 if n < 0 or if either k < 0 or k > n. 8 ] ].

9 is a suermartingale w.r.t. {F t }, and τ is a stoing time. Indeed, let t > s. We need to show that [ ] V0 (t) 1 V 1 (t) F s V 0(s) 1 V 1 (s). (15) The crucial observation is that, conditional on V 0 (s), V 1 (s) and V 0 (t) V 1 (t) = m, V 0 (t) Hyer(V 0 (s), V 1 (s); m). This follows from the exchangeability of the test statistics for the true and fake nulls (in other words, conditional on everything else, the order of the red markers in Figure 2 is random). Now, Lemma 3.2 assures us that [ V0 (t) 1 V 1 (t) V 0(s), V 1 (s), V 0 (t) V 1 (t) = m ] V 0(s) 1 V 1 (s) regardless of the value of m. Hence, the suermartingale roerty (15) holds (and it is also clear that τ is a stoing time). By the otional stoing theorem for suermartingales, [ V0 (t) 1 V 1 (t) ] [ V0 (0) 1 V 1 (0) ] = ( (16) [ ]) V0 (0) 1 V 1 (0) V 0(0) V 1 (0). (17) But conditional on V 0 (0) V 1 (0) = m we have that V 0 (0) Hyer(m 0, r; m), and so, invoking Lemma 3.2 once again, we have [ ] V0 (0) 1 V 1 (0) V 0(0) V 1 (0) = m m 0 1 r no matter the value of m. 3.2 stimating the roortion of nulls From our analysis, we see that FDR control would continue to hold if we relaced the number of hyotheses = H with the number of nulls m 0 = H 0 in the definition of τ. Of course, this quantity is unobservable, or, equivalently, π 0 = H 0 / H is unobservable. We can, however, estimate π 0 using a similar aroach to that of Storey (2002), excet that we relace calculations under the theoretical null with the counting of knockoff variables at the bottom of the Lasso ath: secifically, for any λ 0 we have {j H 0 : T j λ 0 } H 0 again due to the exchangeability of null features. It follows that {j K 0 : T j λ 0 }, K 0 H 0 K 0 {j H 0 : T j λ 0 } {j K 0 : T j λ 0 } ( K 0 1) 1 {j H 0 : T j λ 0 } {j K 0 : T j λ 0 } ( K 0 1) 1 {j H : T j λ 0 }. {j K 0 : T j λ 0 } For small λ 0, we exect only few non-nulls among {j H : T j λ 0 } so that {j H 0 : T j λ 0 } {j H : T j λ 0 }. Hence, we roose to estimate π 0 by π 0 = ( K 0 1) H We again state our result more generally. 1 {j H : T j λ 0 }. (18) {j K 0 : T j λ 0 } 9

10 Theorem 3.3 (FDR control for knockoff rocedure with an estimate of π 0 ). Set π 0 = ( K 0 1) H 1 {j H : T j t 0 } {j K 0 : T j t 0 } (19) ( π 0 is set to if the denominator vanishes). In the setting of Theorem 3.1, the rocedure that rejects H j 0 if T j τ, where τ = inf t t 0 : (1 V H π 1(t)) 0 1 K 0 q (20) 1 R(t) obeys FDR q. Proof. We use the same notation as in the roof of Theorem 3.1. As before, we have that [ ] V 0 (τ) FDR q (1 V 1 (τ)) π. 0 1r Again, the rocess {V 0 (t)/(1 V 1 (t))} t t0 starting at t 0, is a suermartingale w.r.t. {F t } t t0 and τ is a stoing time. Then the otional stoing theorem gives [ ] [ ] V 0 (τ) V 0 (t 0 ) (1 V 1 (τ)) π 0 1r (1 V 1 (t 0 )) π 0 1r [ ] V0 (t 0 ) = 1 V 1 (t 0 ) r V 1 (t 0 ) (21) 1 R(t 0 ) [ ] V0 (t 0 ) 1 V 1 (t 0 ) r V 1 (t 0 ). 1 m 0 V 0 (t 0 ) As before, V 0 (t 0 ) Hyer(m 0, r; m) conditional on V 0 (t 0 ) V 1 (t 0 ) = m. In the Aendix we rove that if a random variable X follows this distribution, then [ ] ( m0 )( ) r X 1 m X r X m m = 1 0 m m m 0 m ) (22) 1 m 0 X ( m0 r m In articular, for all m the result is no more than 1, thereby comleting the roof of the theorem. In ractice, we of course suggest to use the minimum between one and (18) as an estimator for π 0. While we cannot obtain a result similar to Theorem 3.3 when the truncated estimate of π 0 is used instead of (18), we can aeal to results from aroximate message assing, in the sirit of the calculations from Section 4, to obtain an aroximate asymtotic result. Thus, let π 0 = 1 ( ( K0 1) H 1 {j H : T j t 0 } {j K 0 : T j t 0 } and adot the working assumtions stated at the beginning of Section 4. As in the roof of the theorem above, [ ] V 0 (τ) FDR q (1 V 1 (τ)) π, 0 1r 10 ).

11 and by the same reasoning as above, [ ] [ ] V 0 (τ) V 0 (t 0 ) (1 V 1 (τ)) π 0 1r (1 V 1 (t 0 )) π 0 1r [ ( V0 (t 0 ) r = 1 V 1 (t 0 ) max V1 (t 0 ) 1 R(t 0 ), 1 r )] [ ( V0 (t 0 ) r 1 V 1 (t 0 ) max V1 (t 0 ) 1 m 0 V 0 (t 0 ), 1 r )]. Now, as in equation (26) below, we have (23) V 0 (t 0 ) = {j H 0 : T j t 0 } {j H 0 : β j (t 0 ) 0, β j = 0} 2(1 ɛ )Φ( α 1), where ɛ = ɛ/(1 ρ) and where α 1 is the same as in equation (31), when relacing λ 0 with t 0. Also, using equation (27), we have V 1 (t 0 ) r = {j K 0 : T j t 0 } r {j K 0 : β j (t 0 ) 0} r 2Φ( α 1). Treating the aroximations above as equalities (in Section 4 we argue that these should be good aroximations), we have / V 0 (t 0 ) 1 V 1 (t 0 ) r 2(1 ɛ )Φ( α 1 ) 2Φ( α 1 ) = 1 ɛ and Hence, / r V 1 (t 0 ) r 1 m 0 V 0 (t 0 ) [1 2Φ( α 1 )] (1 ɛ )[1 2Φ( α 1 )] = 1 / 1 ɛ, 1 r r 1. ( V 0 (t 0 ) r 1 V 1 (t 0 ) max V1 (t 0 ) 1 m 0 V 0 (t 0 ), 1 r ) We conclude that, u to the aroximations above, 4 Power Analysis lim su FDR q. n, (1 ɛ 1 ) max(, 1) = 1. 1 ɛ The main observation in this aer is that we can use the consequences of Theorem 1.5 in Bayati and Montanari (2012) to obtain an asymtotic TPP-FDP tradeoff diagram for the Lasso solution associated with the knockoff setu; that is, the Lasso solution corresonding to the augmented design X. Indeed, the working assumtions required for alying Lemma 2 continue to hold, only with a slightly different set of arameters. Recall that X R n (r) has i.i.d. N (0, 1/n) entries. Taking r = ρ for a constant ρ > 0, we have that n/( r) δ := δ/(1 ρ) as n,. Furthermore, because Y is related to X through Y = X where 0 = (0,..., 0) T R r, we have that the emirical distribution of the coefficients corresonding to the augmented design converges exactly to { Π Π, w.. ɛ, = 0, w.. 1 ɛ (24), 11 [ β 0 ],

12 where ɛ := ɛ/(1 ρ) = lim{ɛ/( ρ )}. Fortunately, it is only the emirical distribution of the coefficient vector that needs to have a limit of the form (24), for Lemma 2.1 to follow from Theorem 1.5 of Bayati and Montanari (2012); see also the remarks in Section 2 of Su et al. (2017). The working hyothesis of Section 2 therefore holds with δ and ɛ relaced by δ and ɛ. Recalling the definitions of H, H 0 and K 0 from Section 3, a counterart of Lemma 2.1 asserts that {j H : β j (λ) 0, β j = 0} {j H : β j (λ) 0, β j 0} 2(1 ɛ)φ( α ), (25) ɛp( Π τ W > α τ ) (26) and {j K 0 : β j (λ) 0} r 2Φ( α ), (27) where β j (λ) is the solution to (10); W is a N (0, 1) variable indeendent of Π; and τ > 0, α > max{α 0, 0} is the unique solution to (4) when δ is relaced by δ and ɛ is relaced by ɛ. Hence, FDP(λ) = {j H 0 : T j λ} 1 {j H : T j λ} = {j H 0 : β j (λ ) 0 for some λ λ} 1 {j H : β j (λ ) 0 for some λ λ} {j H 0 : β j (λ) 0} 1 {j H : β j (λ) 0} 2(1 ɛ)φ( α ) 2(1 ɛ)φ( α ) ɛp( Π τ W > α τ ) FDP aug(t). (28) Above, we allowed ourselves to aroximate {j H : β j (λ ) 0 for some λ λ} with {j H : β j (λ) 0}, and likewise for H 0. If least-angle regression (fron et al., 2004) were used instead of the Lasso, the two quantities would have been equal; but because the Lasso solution ath is considered here, we cannot comletely rule out the event in which a variable that entered the ath dros out, and the former quantity is in general larger. Still, we exect the difference to be very small, at least for large enough values of λ (corresonding to small FDP). This is verified by the simulation of Figure 3: the green circles, comuted using the statistics T j, are very close to the red circles, comuted when rejections corresond to variables with β j (λ) 0. All three simulation examles were comuted for a single realization on the original n design (not the augmented design). Similar results were obtained when we reeated the exeriment (i.e., for a different realization of the data). Similarly, TPP aug (λ) {j H \ H 0 : T j λ} 1 {j H : β j 0} {j H \ H 0 : β j (λ) 0} 1 {j H : β j 0} P( Π τ W > α τ ) TPP aug(λ) (29) 12

13 ε= TPP ε= FDP 0.4 FDP FDP ε= TPP TPP Figure 3: Discreancy between testing rocedures for three different values of. In all three examles n = = 5000, Π ex(1), and σ = 0.5; and TPP is lotted against FDP. Green circles corresond to the rocedure that uses the statistics Tj in (11). Red circles corresond to the rocedure that rejects when βbj (λ) 6= 0. Black circles are theoretical redictions from Section 4 with δ = 1. ach curve is is comuted from a single realization of the data, and uses an n design matrix (i.e., not the augmented design). and [ aug (λ) FDP (1 {j K0 : Tj λ} ) H b π0 1 K0 {j H : Tj λ} (1 {j K0 : βbj (λ) 6= 0} ) 2Φ( α0 ) 2(1 )Φ( α0 ) P( Π τ 0 W > α0 τ 0 ) H b π0 1 K0 {j H : βbj (λ) 6= 0} (30) lim π b0, where W, τ 0 > 0, α0 > max{α00, 0} are as before, and where π b0 is any estimate of π0 that has a limit. Taking π b0 to be the minimum between one and (19), we have that the limit in (30) can be aroximated by 2Φ( α0 ) 2(1 )Φ( α0 ) P( Π τ 0 W > α0 τ 0 ) [P( Π τ20 W > α20 τ20 ) P( Π τ10 W > α10 τ10 )] [ aug (λ), 1 1 FDP 0 0 2[Φ( α2 ) Φ( α1 )] (31) where τ10 > 0, α10 > max{α00, 0} is the unique solution to (4) when λ = λ0 and when δ is relaced by δ 0 and is relaced by 0 ; and where τ20 > 0, α20 > max{α00, 0} is the limit of the unique solution to (4) when λ 0 also with δ 0 (res. 0 ) in lieu of δ (res. ). Indeed, {j H : 0 < Tj < λ0 } {j H : Tj > 0} {j H : Tj λ0 } = {j H : βbj (0 ) 6= 0} {j H : βbj (λ0 ) 6= 0} 0 2(1 )Φ( α2 ) P( Π τ20 W > α20 τ20 ) (32) 2(1 )Φ( α10 ) P( Π τ10 W > α10 τ10 ) and, by a similar calculation, {j K0 : 0 < Tj λ0 } 2Φ( α20 ) 2Φ( α10 ); K0 13 (33)

14 δ=2, ε=0.2 δ=1, ε=0.2 FDP q=0.3 q=0.1 q=0.5 FDP q=0.1 q=0.3 q= TPP TPP δ=2, ε=0.4 δ=1, ε=0.4 FDP q=0.1 q=0.3 q=0.5 FDP q=0.1 q=0.3 q= TPP TPP Figure 4: Tradeoff diagrams for Π ex(1). σ = 0.5; ρ = 1. Different anels corresond to different combinations of δ and ɛ, as aears in the title of each of the lots. The black (res. light blue) curve corresonds to the oracle (res. knockoffs). Markers denote the air (t, fd) of asymtotic ower and tye I error, attained for three examle values of q: black for oracle, light blue for knockoff. The knockoff rocedure loses a little bit of ower due to the estimate of 1 ɛ (roortion of true nulls). 14

15 Theoretical: Π * = δ 50 Theoretical: Π * = ex(1) FDP q=0.1 q=0.5 q=0.3 FDP q=0.1 q=0.3 q= TPP TPP Theoretical: Π * = 0.7N(0,1)0.3N(2,1) Theoretical: Π * = 0.5δ δ 50 FDP q=0.1 q=0.3 q=0.5 FDP q=0.5 q=0.3 q= TPP TPP Figure 5: Tradeoff diagrams for different Π, with ɛ = 0.2, δ = 1 and σ = 0.5. using (32) and (33) to comute the limit of (19), we obtain (31). As λ > 0 varies, a arametric curve (FDP aug(λ), TPP aug(λ)) is traced which secifies the FDP level attainable at a secific TPP level for the Lasso solution corresonding to the augmented design, as q Π aug(tpp aug(λ); ɛ, δ, σ) = FDP aug(λ). (34) Figure 4 shows this curve against the analogous curve (FDP (λ), TPP (λ)), associated with the original design X and the oracle rocedure, when Π is exonential with mean 1. The different anels corresond to different combinations of δ and ɛ. The roortion ρ was taken to be 1 in all scenarios, in other words, the number of knockoff variables r was taken to be equal to the number of original variables ; we observed only very slight differences in the curves when ρ was varied between 0.1 and 1. Figure 5 shows the two curves, (FDP (λ), TPP (λ)) and (FDP aug(λ), TPP aug(λ)), for different distributions Π. In all situations deicted in Figures 4 and 5, the light blue curve, corresonding to the augmented setu, lies very close to the black curve, which corresonds to the original design. Hence, augmenting the design with knockoffs seems to have little effect on the tradeoff diagram, or, the achievable ower for a given FDP level, at least for FDP levels small enough to be of interest. For examle, if we set q = 0.05, then there exist values λ and t such that FDP (λ) = FDP aug(λ ) = 0.05 and TPP (λ) TPP aug(λ ). Of course, 15

16 the knockoff rocedure achieves this without knowledge of ɛ and Π, while the oracle relies on knowing these quantities in choosing λ. Note, still, that for a given inut q, the knockoff rocedure essentially (asymtotically) chooses λ such that FDP aug(λ ), not FDP aug(λ ), is equal to q. Because the former uses a conservative estimate of 1 ɛ, FDP aug(λ ) is usually larger than FDP aug(λ ) and, consequently, the ower attained by the knockoff rocedure is a little lower. This ga is deicted by the markers in the lots, each air of blue and black markers corresonding to a articular value of q. For examle, for δ = 1 and ɛ = 0.2, oracle ower is and knockoffs ower is 0.18 at q = 0.1. The value of t 0 used is 0.1. Figure 1 gives an alternative reresentation (see cation). 5 Prediction Armed with a rocedure that controls the FDR for the model (1), we now exlore utilizing knockoffs for estimation uroses (the rediction error and the estimation error are equivalent here because we have uncorrelated redictors). Our ursuit is to a large degree insired by the work of Abramovich et al. (2006), which for the first time rigorously linked FDR control and estimation error. At the same time, the working assumtions we adot in the current aer allow us to take further advantage of the results of Bayati and Montanari (2012), namely, to asymtotically analyze not only the ower or the knockoff rocedure but also the estimation error associated with it. As in the receding analysis of testing errors, the (normalized) risk of the Lasso estimator at a fixed λ tends in robability to a simle limit. Secifically, by Corollary 1.6 of Bayati and Montanari (2012), we have that, for a given air of ɛ and Π, and for fixed λ, lim 1 [ β(λ) β 2 (η ατ (Π τw ) Π) 2] R (λ; Π, ɛ), (35) where W N (0, 1) indeendently of Π, and τ > 0, α > max{α 0, 0} is the unique solution to (4). Because we have a tractable form for the limiting risk, we can minimize it over λ to obtain an oracle choice for λ, λ OL(Π, ɛ) = argmin R (λ; Π, ɛ). (36) λ The oracle choice of λ deends on Π and on ɛ and therefore does not roduce a legitimate estimator for β, but it sets a benchmark for evaluating the erformance of a rocedure that lugs in any (ossibly datadeendent) value for λ. Consider now a fixed q and the knockoff rocedure from Section 3.2, which selects the j-th variable if T j τ, where τ is given by (12) and π 0 by (19) and Λ = [t 0, ). This is a legal variable selection rocedure (it does not deend on Π or on ɛ) with the FDR control guarantees of Theorem 3.3. In the limit as n,, it corresonds to a choice of λ defined by FDP aug(λ) = q where FDP aug(λ) is given in (31). Call this the knockoff choice for λ and denote it by λ KO (q). Then the quantity { R (λ KO ν(q) su (q); } Π, ɛ) Π R (λ OL (Π, ɛ); Π = su, ɛ) Π { R (λ KO (q); } Π, ɛ) inf λ R (λ; Π, ɛ) is the worst-case (in Π ) inflation of the asymtotic risk due to using λ KO (q) instead of the otimal choice for λ (our use of the term risk inflation alludes to the similarity to the criterion roosed in Foster and George, 1994). It is not clear that ν(q) is at all finite; but if it were true that there exists a choice of q such that for all ɛ, δ, σ, ν(q) is bounded by 1 for small, it would be rather reassuring. ven for fixed ɛ, δ and σ, comuting (37) is not trivial because we need to maximize the ratio over all distributions Π ; neither were we able to bound (37) in general. The roblem is much simler in the case 16 (37)

17 where Π is a oint mass at some nonzero (ositive) value λ, because in this case the risk inflation is a univariate function. In Figure 6 we lot the risk inflation as a function of λ for ɛ = 0.1 and δ = 1, σ = 0.5. The maximum risk inflation is and attained at t = 1.9, which is quite interesting as this is a case of neither very low nor very high signal to noise ratio. Changing ɛ from 0.1 to 0.05 yielded maximum risk inflation of only risk inflation ε= t Figure 6: Risk inflation for oint mass at λ, as a function of λ. ɛ = 0.1, δ = 1, σ = 0.5. The maximum is attained at t = 1.9. We continue our investigation with an emirical study. Hence, we chose a large number of members of a flexible arametric family of distributions, and evaluate the risk inflation on each. Secifically, we took δ = 1, σ = 1 and ɛ = 0.1, and consider the family of mixtures of Gamma distributions { 8 } 8 w i Gam(α i ) : 0 w i 1, w i = 1 i=1 where α = (0.1, 0.8, 1.5, 2.2, 2.9, 3.6, 4.3, 5.0) and where Gam(α) is the Gamma distribution with shae arameter α and rate one. We then consider a restriction of the original family, namely, the subset { 8 (u i / } u j )Gam(α i ) : u i {0, 1, 2, 3, 4} i=1 which includes 383,809 unique members. A few of these distributions are lotted in color in Figure 7 (the black curves corresond to distributions which belong to the original mixture family but not the restricted one). Next, for each of these 383,809 distributions Π we comute the ratio R (λ KO (q); Π, ɛ) inf λ R (λ; Π, ɛ) numerically. For q = 0.7, this ratio was always below Figure 8 shows a histogram of the 383,809 values: the distribution with the maximum risk inflation aears in Figure 7 in red; the runner-us look similar. We also comuted the ratio for some distributions that belong to the original mixture family but not the restricted one, and these are shown in black in Figure 7; the ratio for both was less than To summarize, for q = 0.7 the risk inflation did not exceed 17% in any of our examles. i=1 17

18 density Figure 7: Gamma mixture distributions. The colored lines reresent distributions in the restricted family of 383,809 distributions. Black lines are distributions that belong to the original mixture family, but not the restricted one. The red curve corresonds to the distribution that attained the maximum ratio among the 383,809 distributions. Numbers in legend are risk inflation values corresonding to the different distributions in the lot. Frequency ratio Figure 8: Histogram of the 383,809 comuted risk ratios. 6 Discussion The Lasso is nowadays frequently used for selecting variables in a linear model. Here we have considered the controlled variable selection roblem, where a valid rocedure must satisfy that the exected roortion 18

19 of incorrectly selected variables (the FDR) is ket below a re-secified level q. This connects our work to that of Tibshirani et al. (2016) and Fithian et al. (2015), who consider hyothesis testing for rocedures which sequentially select variables into a model. However, the null hyothesis tested in these aers is different, corresonding to an incremental test versus a full-model test; see the elaborate and illuminating discussion in Fithian et al. (2015). We have roosed an adatation of the Lasso-based knockoff rocedure of Barber and Candès (2015), which has rovable FDR guarantees. Under the working assumtions, which include a Gaussian i.i.d. design, it is ossible to analyze the asymtotic tradeoff between the false discovery roortion and ower of the roosed rocedure. Also, while the i.i.d. N (0, 1) design is erhas not realistic, our aim is to comare knockoffs against the best ossible mechanism for selection over the lasso ath, and such a mechanism has been develoed exlicitly only for the i.i.d. N (0, 1) setting. We have demonstrated that the roosed rocedure comes close in erformance to an oracle rocedure, which uses knowledge about the underlying signal to choose the Lasso tuning arameter in such a way that FDP is controlled. Moving away from the testing roblem, we have also demonstrated that FDR calibration with knockoffs has otential in achieving good rediction erformance, adatively in the sarsity ɛ and in Π. In this article the focus was on rocedures tied to the Lasso ath. There is certainly room for investigating the use of other test statistics to measure feature imortance. Using the Lasso statistics T j in (11) for the knockoff rocedure might be subotimal, as can be seen in Figure 5 and as was imlied in Su et al. (2017): in the Lasso ath false discoveries are intersersed between true ones. As a matter of fact, an otimal oracle rocedure exists for the setu considered in the current aer. Namely, the rocedure which rejects the hyotheses with T j τ where T j = P(β j = 0 y, X) is the local false discovery rate at (y, X) and where τ is selected so that the rocedure has FDR control at q. Candès et al. (2016) have already roosed to use similar Bayesian statistics. Note that Tj defined above deends on Π and ɛ, hence we associate it with an oracle. It would be interesting to think if there is a way to mimic this oracle with a comutationally feasible emirical Bayes rocedure. ven if we commit to Lasso statistics, it might be ossible to increase ower by ordering the variables differently. For examle, Candès et al. (2016) suggested to use the absolute value of the Lasso coefficient, evaluated at the cross-validated λ, to measure feature imortance. We leave for future work the investigation of the otential gain in ower in using alternative knockoff statistics. Acknowledgements. C. was artially suorted by the Office of Naval Research under grant N , by the National Science Foundation via DMS , and by the Math X Award from the Simons Foundation. A. W. is suorted by the same Math X Award. A Proofs Proof of Lemma 3.2. By definition, Y = 1 k m Assuming n 0, n 1 > 0, for 1 k m we have ( ) ( ) n0 n0 1 k = n 0, k k 1 ( n0 k 1 m k 1 1 m k 19 )( ) n1 k m k ) ( n0 n 1 m ( ) n1 = 1 ( ) n1 1. m k n 1 1 m k 1

20 Therefore, Y = n 0 1 n 1 1 ( n0 n 1 m ) 1 k m ( ) ( ) ( n0 1 n1 1 = n 0 1 k 1 m k 1 1 n 1 ( n0 1 ( m n0 n 1 m If n 0 = 0, then X = Y = 0. If n 1 = 0, then X = Y = m. In both cases the claimed result still holds. ) ) ). Proof of identity (22). Assuming m 0, r > 0, we have ( = = X 1 m X m m 0 ) r X m = 1 m 0 X m 0! (k 1)!(m 0 k 1)! k=1 (m r1) m m 0 ( m0 )( r k 1 m k1 ( m0 r k=1 (m r1) m ) ) = 1 m m 0 k=1 (m r1) k 1 m k r k m 1 m 0 k ( m0 k )( r ) m k ) ( m0 r m r! (m k 1)!(r m k 1)! m!(m 0 r m)! (m 0 r)! )( ) r ( m0 m 0 m ( m0 r m m m 0 m If m 0 = 0, then X = 0 and the identity still holds. If r = 0, then X = m and the identity again continues to holds. ) B Comuting the curve q Π To comute q Π (TPP (λ); ɛ, δ, σ) for secific Π and ɛ, we need to be able to comute the air (FDP (λ), TPP (λ)) for each λ > 0. This, in turn, requires to find for each λ the air (α, τ) given by the system of equations (4). Hence, the functionals and f(π, ɛ) := (η ατ (Π τw ) Π) 2 = (1 ɛ)(η ατ (τw )) 2 ɛ(η ατ (Π τw ) Π ) 2 g(π, ɛ) := P( Π τw > ατ) = 2(1 ɛ)φ( α) ɛp( Π τw > ατ), which aear in (4), are evaluated numerically for a given CDF F Π of Π. For a fixed value of λ, our code takes as an inut ɛ and a CDF F Π of Π (as well as δ and σ 2 ) and returns a solution for (α, τ) in (4). Thus, for given values of α and τ, we need to be able to comute f(π, ɛ) and g(π, ɛ) defined in Section 2, as functionals of F Π. We exlain how we imlement this. Let R(µ) := (η ατ (µ τw ) µ) 2 and Q(µ) := P ( µ τw > ατ). Writing f(π, ɛ) as a Riemann- Stieltjes integral and using integration by arts, we have f(π, ɛ) := (η ατ (Π τw ) Π) 2 = = R( )F Π ( ) R( )F Π ( ) = R( )F Π ( ) R( )F Π ( ) = 1 α 2 τ 2 R(µ) df Π F Π (µ) dr(µ) F Π (µ)r (µ)dµ F Π (µ)[2µ (Φ (α µ/τ)) (Φ ( α µ/τ))]dµ 20

21 where we substituted a closed-form exression for the derivative of the risk of soft thresholding (see, e.g., Johnstone, 2011, Ch. 2.7). The last exression is evaluated using numerical integration. Similarly, g(π, ɛ) := P( Π τw > ατ) = Q(µ) df Π = Q( )F Π ( ) Q( )F Π ( ) = Q( )F Π ( ) Q( )F Π ( ) = 1 1/τ F Π (µ) dq(µ) F Π (µ)q (µ) dµ F Π (µ)[φ(α µ/τ) φ( α µ/τ)] dµ, and, again, the last exression is evaluated using numerical integration. References Felix Abramovich, Yoav Benjamini, David L Donoho, and Iain M Johnstone. Secial invited lecture: adating to unknown sarsity by controlling the false discovery rate. The Annals of Statistics, ages , Rina Foygel Barber and mmanuel J Candès. Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5): , Rina Foygel Barber and mmanuel J Candès. A knockoff filter for high-dimensional selective inference. arxiv rerint arxiv: , Mohsen Bayati and Andrea Montanari. The lasso risk for gaussian matrices. I Transactions on Information Theory, 58(4): , Małgorzata Bogdan, wout van den Berg, Weijie Su, and mmanuel J Candès. Sulementary materials for statistical estimation and testing via the sorted l1 norm Małgorzata Bogdan, wout van den Berg, Chiara Sabatti, Weijie Su, and mmanuel J Candès. Sloe adative variable selection via convex otimization. The annals of alied statistics, 9(3):1103, mmanuel Candès, Yingying Fan, Lucas Janson, and Jinchi Lv. Panning for gold: Model-free knockoffs for high-dimensional controlled variable selection. arxiv rerint arxiv: , Ran Dai and Rina Foygel Barber. The knockoff filter for fdr control in grou-sarse and multitask regression. arxiv rerint arxiv: , Bradley fron, Trevor Hastie, Iain Johnstone, Robert Tibshirani, et al. Least angle regression. The Annals of statistics, 32(2): , William Fithian, Jonathan Taylor, Robert Tibshirani, and Ryan Tibshirani. Selective sequential model selection. arxiv rerint arxiv: , Dean P Foster and dward I George. The risk inflation criterion for multile regression. The Annals of Statistics, ages ,

22 Iain M Johnstone. Gaussian estimation: Sequence and wavelet models. Unublished manuscrit, John D Storey. A direct aroach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3): , Weijie Su, mmanuel Candès, et al. Sloe is adative to unknown sarsity and asymtotically minimax. The Annals of Statistics, 44(3): , Weijie Su, Małgorzata Bogdan, mmanuel Candès, et al. False discoveries occur early on the lasso ath. The Annals of Statistics, 45(5): , Ryan J Tibshirani, Jonathan Taylor, Richard Lockhart, and Robert Tibshirani. xact ost-selection inference for sequential regression rocedures. Journal of the American Statistical Association, 111(514): ,

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some