Knockoffs as Post-Selection Inference

Size: px
Start display at page:

Download "Knockoffs as Post-Selection Inference"

Transcription

1 Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017

2 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15

3 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p How can we select important explanatory variables with few mistakes? Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15

4 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p How can we select important explanatory variables with few mistakes? Applications to: Medicine/genetics/health care Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15

5 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p How can we select important explanatory variables with few mistakes? Applications to: Medicine/genetics/health care Economics/political science Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15

6 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p How can we select important explanatory variables with few mistakes? Applications to: Medicine/genetics/health care Economics/political science Industry/technology Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15

7 Controlled Variable Selection (cont d) What is an important variable? Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15

8 Controlled Variable Selection (cont d) What is an important variable? We consider X j to be unimportant if the conditional distribution of Y given X 1,..., X p does not depend on X j. Formally, X j is unimportant if it is conditionally independent of Y given X -j : Y X j X -j Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15

9 Controlled Variable Selection (cont d) What is an important variable? We consider X j to be unimportant if the conditional distribution of Y given X 1,..., X p does not depend on X j. Formally, X j is unimportant if it is conditionally independent of Y given X -j : Y X j X -j Markov Blanket of Y : smallest set S such that Y X -S X S Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15

10 Controlled Variable Selection (cont d) What is an important variable? We consider X j to be unimportant if the conditional distribution of Y given X 1,..., X p does not depend on X j. Formally, X j is unimportant if it is conditionally independent of Y given X -j : Y X j X -j Markov Blanket of Y : smallest set S such that Y X -S X S For GLMs with no stochastically redundant covariates, equivalent to {j : β j = 0} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15

11 Controlled Variable Selection (cont d) What is an important variable? We consider X j to be unimportant if the conditional distribution of Y given X 1,..., X p does not depend on X j. Formally, X j is unimportant if it is conditionally independent of Y given X -j : Y X j X -j Markov Blanket of Y : smallest set S such that Y X -S X S For GLMs with no stochastically redundant covariates, equivalent to {j : β j = 0} To make sure we do not make too many mistakes, we seek to select a set Ŝ to control the false discovery rate (FDR): ( ) #{j in FDR(Ŝ) = E Ŝ : X j unimportant} #{j in Ŝ} q (e.g. 10%) Here is a set of variables Ŝ, 90% of which I expect to be important Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15

12 How to Incorporate Adaptivity/Selection? Especially important for complex, high-dimensional problems Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 3 / 15

13 How to Incorporate Adaptivity/Selection? Especially important for complex, high-dimensional problems One approach: Post-Selection Inference Idea: give analyst unrestricted access to the data, adjust inference afterwards Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 3 / 15

14 How to Incorporate Adaptivity/Selection? Especially important for complex, high-dimensional problems One approach: Post-Selection Inference Idea: give analyst unrestricted access to the data, adjust inference afterwards Berk et al. (2013) Adjustment does not depend on what the analyst does Generality leads to conservativeness in practice Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 3 / 15

15 How to Incorporate Adaptivity/Selection? Especially important for complex, high-dimensional problems One approach: Post-Selection Inference Idea: give analyst unrestricted access to the data, adjust inference afterwards Berk et al. (2013) Adjustment does not depend on what the analyst does Generality leads to conservativeness in practice Lee et al. (2016) and many others, at/from Stanford and elsewhere Track what analyst does/looks at, condition on it for inference Inference is exact! Not general: adjustment must be specifically Taylored to what analyst looks at Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 3 / 15

16 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15

17 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Reusable Holdout from differential privacy literature (Dwork et al., 2015) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15

18 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Reusable Holdout from differential privacy literature (Dwork et al., 2015) Constrains access to data by adding noise to queries in specific way Subject to that constraint, analyst can arbitrarily use query outcomes for selection Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15

19 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Reusable Holdout from differential privacy literature (Dwork et al., 2015) Constrains access to data by adding noise to queries in specific way Subject to that constraint, analyst can arbitrarily use query outcomes for selection Bounds bias of naïve inference, regardless of what analyst does/queries (just depends on number of queries) Generality leads to conservativeness Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15

20 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Reusable Holdout from differential privacy literature (Dwork et al., 2015) Constrains access to data by adding noise to queries in specific way Subject to that constraint, analyst can arbitrarily use query outcomes for selection Bounds bias of naïve inference, regardless of what analyst does/queries (just depends on number of queries) Generality leads to conservativeness This talk: Knockoffs Not originally proposed as method for inference after adaptivity/selection Thought about in a different way, fits in well! Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15

21 Generic Knockoffs Framework (1) Construct knockoffs: Artificial versions ( knockoffs ) X j of each covariate X j Act as controls for assessing importance of original covariates Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 5 / 15

22 Generic Knockoffs Framework (1) Construct knockoffs: Artificial versions ( knockoffs ) X j of each covariate X j Act as controls for assessing importance of original covariates (2) Compute knockoff statistics: Scalar statistics Z j ( Z j) for each covariate (knockoff) measuring importance Scalar W j combines Z j, Z j to measure relative importance of X j vs. Xj Positive W j denotes original more important, strength measured by magnitude Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 5 / 15

23 Generic Knockoffs Framework (1) Construct knockoffs: Artificial versions ( knockoffs ) X j of each covariate X j Act as controls for assessing importance of original covariates (2) Compute knockoff statistics: Scalar statistics Z j ( Z j) for each covariate (knockoff) measuring importance Scalar W j combines Z j, Z j to measure relative importance of X j vs. Xj Positive W j denotes original more important, strength measured by magnitude (3) Find the knockoff threshold: Order the variables by decreasing W j and proceed down list Select only variables with positive W j until last time negatives positives q Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 5 / 15

24 Generic Knockoffs Framework (1) Construct knockoffs: Artificial versions ( knockoffs ) X j of each covariate X j Act as controls for assessing importance of original covariates (2) Compute knockoff statistics: Scalar statistics Z j ( Z j) for each covariate (knockoff) measuring importance Scalar W j combines Z j, Z j to measure relative importance of X j vs. Xj Positive W j denotes original more important, strength measured by magnitude (3) Find the knockoff threshold: Order the variables by decreasing W j and proceed down list Select only variables with positive W j until last time negatives positives q Coin-flipping property: The key to knockoffs is that steps (1) and (2) are done specifically to ensure that, conditional on W 1,..., W p, the signs of the unimportant/null W j are independently ±1 with probability 1/2 Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 5 / 15

25 Original Knockoffs (Barber and Candès, 2015) Some Notation: y is the n 1 column vector of i.i.d. draws of the random variable Y X j is the n 1 column vector of i.i.d. draws of the random variable X j X j is the knockoff variable (column vector) for X j Design matrices X := [X 1 X p ], X := [ X 1 X p ] Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 6 / 15

26 Original Knockoffs (Barber and Candès, 2015) Some Notation: y is the n 1 column vector of i.i.d. draws of the random variable Y X j is the n 1 column vector of i.i.d. draws of the random variable X j X j is the knockoff variable (column vector) for X j Design matrices X := [X 1 X p ], X := [ X 1 X p ] Original knockoffs procedure: Knockoffs constructed geometrically/deterministically: [ [X X ] [X X X ] = X X X diag{s} X X diag{s} X X ] Finite-sample FDR control and leverages sparsity for power Takes canonical model-based approach: condition on X, assume and rely heavily on low-dimensional (n p) Gaussian linear model Sufficiency requirement: W j function only of [X X ] [X X ] and [X X ] y Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 6 / 15

27 Model-Free Knockoffs (Candès et al., 2016) Shifts the burden of knowledge from Y onto X: Joint distribution of X 1,..., X p can be arbitrary but it must be known, and the observations must be independently and identically disributed (i.i.d.) rows of X = (X i,1,..., X i,p ) iid F Uses F to generate random knockoffs that are swap-exchangeable: [X 1 X j X p X 1 X j X p ] D = [X 1 X j X p X 1 X j X p ] Conditions on y (so no assumptions/model needed for Y X 1,..., X p ) Works for any n and p No sufficiency requirement on W j Robust to overfitting X s distribution Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 7 / 15

28 Model-Free Knockoffs (Candès et al., 2016) Shifts the burden of knowledge from Y onto X: Joint distribution of X 1,..., X p can be arbitrary but it must be known, and the observations must be independently and identically disributed (i.i.d.) rows of X = (X i,1,..., X i,p ) iid F Uses F to generate random knockoffs that are swap-exchangeable: [X 1 X j X p X 1 X j X p ] D = [X 1 X j X p X 1 X j X p ] Conditions on y (so no assumptions/model needed for Y X 1,..., X p ) Works for any n and p No sufficiency requirement on W j Robust to overfitting X s distribution Instead of modeling y and conditioning on X, conditions on y and models X Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 7 / 15

29 Knockoff Statistics (a) Pick a vector-valued function g to compute variable importances Z j, Z j (Z 1,..., Z p, Z 1,..., Z ( p ) := g y, [X X ) ] g must satisfy symmetry property: for any S {1,..., p}, g (y, [X X ) ( ] swap(s) = g y, [X X ) ] swap(s) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 8 / 15

30 Knockoff Statistics (a) Pick a vector-valued function g to compute variable importances Z j, Z j (Z 1,..., Z p, Z 1,..., Z ( p ) := g y, [X X ) ] g must satisfy symmetry property: for any S {1,..., p}, g (y, [X X ) ( ] swap(s) = g y, [X X ) ] swap(s) (b) Pick an antisymmetric function f j : R 2 R, i.e., and set W j = f j (Z j, Z j ) f j (a, b) = f j (b, a) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 8 / 15

31 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15

32 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Example: ( g runs lasso y [X X ) ]; λ to get (β 1,..., β p, β 1,..., β p ); takes abs. values f j (a, b) = a b Lasso Coefficient Difference (LCD) statistic: W j = β j β j Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15

33 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Example: ( g runs lasso y [X X ) ]; λ to get (β 1,..., β p, β 1,..., β p ); takes abs. values f j (a, b) = a b Lasso Coefficient Difference (LCD) statistic: W j = β j β j Allows analyst to use huge range of statistical tools Know little about the data: nonparametric methods like cross-validation, random forests, neural networks Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15

34 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Example: ( g runs lasso y [X X ) ]; λ to get (β 1,..., β p, β 1,..., β p ); takes abs. values f j (a, b) = a b Lasso Coefficient Difference (LCD) statistic: W j = β j β j Allows analyst to use huge range of statistical tools Know little about the data: nonparametric methods like cross-validation, random forests, neural networks Know more about the data: parametric or Bayesian methods (guarantees do not rely on model/prior being well-specified, nor MCMC converging) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15

35 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Example: ( g runs lasso y [X X ) ]; λ to get (β 1,..., β p, β 1,..., β p ); takes abs. values f j (a, b) = a b Lasso Coefficient Difference (LCD) statistic: W j = β j β j Allows analyst to use huge range of statistical tools Know little about the data: nonparametric methods like cross-validation, random forests, neural networks Know more about the data: parametric or Bayesian methods (guarantees do not rely on model/prior being well-specified, nor MCMC converging) Cannot use human intuition from looking at data: no adaptivity Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15

36 Post-Selection Inference Approach Generate knockoffs (step 1 of knockoffs pipeline); reveal randomly swapped data: y, [X X ] swap(r) where R is uniformly random subset of {1,..., p} (analyst does not know R) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 10 / 15

37 Post-Selection Inference Approach Generate knockoffs (step 1 of knockoffs pipeline); reveal randomly swapped data: y, [X X ] swap(r) where R is uniformly random subset of {1,..., p} (analyst does not know R) Choose antisymmetric f, symmetric g however you want from swapped data No way of distinguishing null X j from X j, so FDR still controlled (exactly!) Potentially a lot of information to distinguish non-null X j from X j, and hence adapt f and g to be more powerful Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 10 / 15

38 Post-Selection Inference Approach Generate knockoffs (step 1 of knockoffs pipeline); reveal randomly swapped data: y, [X X ] swap(r) where R is uniformly random subset of {1,..., p} (analyst does not know R) Choose antisymmetric f, symmetric g however you want from swapped data No way of distinguishing null X j from X j, so FDR still controlled (exactly!) Potentially a lot of information to distinguish non-null X j from X j, and hence adapt f and g to be more powerful Use these data-dependent f and g to compute W (rest of step 2 of knockoffs pipeline) and find the knockoff threshold (step 3) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 10 / 15

39 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15

40 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15

41 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, (Z 1,..., Z p, Z 1,..., Z ( p ) swap(j) = g y, [X X ) ] swap(j) symmetry of g = g (y, [X X ) ] swap(j) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15

42 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, (Z 1,..., Z p, Z 1,..., Z ( p ) swap(j) = g y, [X X ) ] swap(j) symmetry of g = g (y, [X X ) ] swap(j) ( D swap-exchangeability, nullity of j, uniformity of R = g y, [X X ) ] = (Z 1,..., Z p, Z 1,..., Z p ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15

43 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, (Z 1,..., Z p, Z 1,..., Z ( p ) swap(j) = g y, [X X ) ] swap(j) symmetry of g = g (y, [X X ) ] swap(j) ( D swap-exchangeability, nullity of j, uniformity of R = g y, [X X ) ] = (Z 1,..., Z p, Z 1,..., Z p ) W j = f j (Z j, Z j ) D = f j ( Z j, Z j ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15

44 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, (Z 1,..., Z p, Z 1,..., Z ( p ) swap(j) = g y, [X X ) ] swap(j) symmetry of g = g (y, [X X ) ] swap(j) ( D swap-exchangeability, nullity of j, uniformity of R = g y, [X X ) ] = (Z 1,..., Z p, Z 1,..., Z p ) W j = f j (Z j, Z j ) D = f j ( Z j, Z j ) = f j (Z j, Z j ) = W j Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15

45 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

46 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: 0 Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

47 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 10 W 2 W 9 W 7 W 6 0 W 3 W 8 W 1 W 4 W 5 Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

48 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W 5 Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

49 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W 5 #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

50 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

51 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

52 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

53 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

54 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

55 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

56 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

57 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

58 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

59 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W W W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

60 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W W W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15 ˆτ

61 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 7 W 6 0 W 1 W 4 W 5 S = {1, 4, 5, 6, 7} #{negative W j} #{positive W j} q = 20% ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15

62 Intuition for FDR Control ( ) #{null Xj selected} FDR = E #{total X j selected} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 13 / 15

63 Intuition for FDR Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 13 / 15

64 Intuition for FDR Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 13 / 15

65 Intuition for FDR Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 13 / 15

66 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15

67 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains More than doubled power in real GWAS (see also Sesia et al. (2017)) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15

68 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains More than doubled power in real GWAS (see also Sesia et al. (2017)) Advantage of adaptivity hard to measure in simulation Despite generality, post-selection inference knockoffs is not conservative Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15

69 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains More than doubled power in real GWAS (see also Sesia et al. (2017)) Advantage of adaptivity hard to measure in simulation Despite generality, post-selection inference knockoffs is not conservative Robustness Remarkably robust to overfitting X s distribution in-sample Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15

70 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains More than doubled power in real GWAS (see also Sesia et al. (2017)) Advantage of adaptivity hard to measure in simulation Despite generality, post-selection inference knockoffs is not conservative Robustness Remarkably robust to overfitting X s distribution in-sample W j more robust when less sensitive to higher moments of X distribution Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15

71 Main Takeaway The knockoffs framework Adaptive: allows unlimited human judgement applied to slightly-masked data Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

72 Main Takeaway The knockoffs framework Adaptive: allows unlimited human judgement applied to slightly-masked data General: wrapper for nearly any measure of variable importance Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

73 Main Takeaway The knockoffs framework Adaptive: allows unlimited human judgement applied to slightly-masked data General: wrapper for nearly any measure of variable importance Powerful: accommodates most powerful tools, yet nonconservative Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

74 Main Takeaway The knockoffs framework Adaptive: allows unlimited human judgement applied to slightly-masked data General: wrapper for nearly any measure of variable importance Powerful: accommodates most powerful tools, yet nonconservative Exact: provably, non-asymptotically, controls FDR Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

75 Appendix Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

76 References Barber, R. F. and Candès, E. J. (2015). Controlling the false discovery rate via knockoffs. Ann. Statist., 43(5): Berk, R., Brown, L., Buja, A., Zhang, K., and Zhao, L. (2013). Valid post-selection inference. Ann. Statist., 41(2): Candès, E., Fan, Y., Janson, L., and Lv, J. (2016). Panning for gold: Model-free knockoffs for high-dimensional controlled variable selection. arxiv preprint arxiv: Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., and Roth, A. (2015). The reusable holdout: Preserving validity in adaptive data analysis. Science, 349(6248): Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist., 44(3): Sesia, M., Sabatti, C., and Candès, E. (2017). Gene hunting with knockoffs for hidden markov models. arxiv preprint arxiv: Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

77 Robustness Exact Cov 0.75 Power 0.50 FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

78 Robustness Exact Cov Graph. Lasso 0.75 Power 0.50 FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

79 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 0.75 Power 0.50 FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

80 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 62.5% Emp. Cov 0.75 Power 0.50 FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

81 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 62.5% Emp. Cov 0.75 Power % Emp. Cov FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

82 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 62.5% Emp. Cov 0.75 Power % Emp. Cov 87.5% Emp. Cov FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

83 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 62.5% Emp. Cov 0.75 Power % Emp. Cov 87.5% Emp. Cov FDR % Emp. Cov Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

84 Simulations in Low-Dimensional Linear Model Power FDR 0.50 Methods BHq Marginal 0.25 BHq Max Lik. MF Knockoffs Orig. Knockoffs Coefficient Amplitude Coefficient Amplitude Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures. The design matrix is i.i.d. N (0, 1/n), n = 3000, p = 1000, and y comes from a Gaussian linear model with 60 nonzero regression coefficients having equal magnitudes and random signs. The noise variance is 1. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

85 Simulations in Low-Dimensional Nonlinear Model Power 0.50 FDR Coefficient Amplitude Methods 0.25 BHq Marginal BHq Max Lik. MF Knockoffs Coefficient Amplitude Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures. The design matrix is i.i.d. N (0, 1/n), n = 3000, p = 1000, and y comes from a binomial linear model with logit link function, and 60 nonzero regression coefficients having equal magnitudes and random signs. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

86 Simulations in High Dimensions Power 0.50 FDR Coefficient Amplitude 0.25 Methods BHq Marginal MF Knockoffs Coefficient Amplitude Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures. The design matrix is i.i.d. N (0, 1/n), n = 3000, p = 6000, and y comes from a binomial linear model with logit link function, and 60 nonzero regression coefficients having equal magnitudes and random signs. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

87 Simulations in High Dimensions with Dependence Power 0.50 FDR Methods BHq Marginal MF Knockoffs Autocorrelation Coefficient Autocorrelation Coefficient Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures. The design matrix has AR(1) columns, and marginally each X j N (0, 1/n). n = 3000, p = 6000, and y follows a binomial linear model with logit link function, and 60 nonzero coefficients with random signs and randomly selected locations. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

88 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

89 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

90 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Entire analysis took 6 hours of serial computation time; 1 hour in parallel Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

91 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Entire analysis took 6 hours of serial computation time; 1 hour in parallel Model-free knockoffs made twice as many discoveries as original analysis Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

92 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Entire analysis took 6 hours of serial computation time; 1 hour in parallel Model-free knockoffs made twice as many discoveries as original analysis Some new discoveries confirmed in larger study Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

93 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Entire analysis took 6 hours of serial computation time; 1 hour in parallel Model-free knockoffs made twice as many discoveries as original analysis Some new discoveries confirmed in larger study Some corroborated by work on nearby genes: promising candidates Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

94 Robustness on Real Data Power 0.50 FDR Coefficient Amplitude Coefficient Amplitude Figure: Power and FDR (target is 10%) for model-free knockoffs applied to subsamples of a chromosome 1 of real genetic design matrix; n 1, 400. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

95 Careful with Symmetry Property Have to be a bit careful recall symmetry property: for any S {1,..., p} g (y, [X X ) ( ] swap(s) = g y, [X X ) ] e.g., if covariates split into K groups G 1,..., G K swap(s) Incorrect: g runs group lasso with 2K groups G 1,..., G K, G 1,..., G K Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

96 Careful with Symmetry Property Have to be a bit careful recall symmetry property: for any S {1,..., p} g (y, [X X ) ( ] swap(s) = g y, [X X ) ] e.g., if covariates split into K groups G 1,..., G K swap(s) Incorrect: g runs group lasso with 2K groups G 1,..., G K, G 1,..., G K Correct: g runs group lasso with K groups {G 1, G 1 },..., {G K, G K } Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

97 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

98 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

99 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

100 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Joint PMF of (X 1:p, X 1:j ) is L(X -j, X j, X 1:j 1 )L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

101 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Joint PMF of (X 1:p, X 1:j ) is L(X -j, X j, X 1:j 1 )L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

102 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Joint PMF of (X 1:p, X 1:j ) is L(X -j, X j, X 1:j 1 )L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

103 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Joint PMF of (X 1:p, X 1:j ) is L(X -j, X j, X 1:j 1 )L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

104 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

105 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

106 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

107 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

108 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} More precisely: ( ) #{null X mfdr = E j selected} q 1 + #{total X j selected} q ˆτ ( ) #{null positive Wj > ˆτ} = E q 1 + #{positive W j > ˆτ} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

109 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} More precisely: ( ) ( ) #{null X mfdr = E j selected} #{null positive Wj > ˆτ} q 1 = E + #{total X j selected} q 1 + #{positive W j > ˆτ} ( #{null positive Wj > ˆτ} = E 1 + #{null negative W j > ˆτ} 1 + #{null negative W ) j > ˆτ} q 1 + #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

110 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} More precisely: ( ) #{null X mfdr = E j selected} q 1 + #{total X j selected} = E q ˆτ ( #{null positive Wj > ˆτ} = E q 1 + #{positive W j > ˆτ} ( #{null positive Wj > ˆτ} 1 + #{null negative W j > ˆτ} 1 + #{null negative W j > ˆτ} q 1 + #{positive W j > ˆτ} }{{} q by definition of ˆτ ) ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

111 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} More precisely: ( ) #{null X mfdr = E j selected} q 1 + #{total X j selected} ( #{null positive Wj > ˆτ} = E 1 + #{null negative W j > ˆτ} } {{ } Supermartingale 1 with ˆτ a stopping time q ˆτ ( #{null positive Wj > ˆτ} = E q 1 + #{positive W j > ˆτ} 1 + #{null negative W j > ˆτ} q 1 + #{positive W j > ˆτ} }{{} q by definition of ˆτ ) ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Lecture 12 April 25, 2018

Lecture 12 April 25, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 12 April 25, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes, Chenyang Zhong 1 Outline Agenda: The Knockoffs Framework 1. The Knockoffs Framework

More information

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

PANNING FOR GOLD: MODEL-FREE KNOCKOFFS FOR HIGH-DIMENSIONAL CONTROLLED VARIABLE SELECTION. Emmanuel J. Candès Yingying Fan Lucas Janson Jinchi Lv

PANNING FOR GOLD: MODEL-FREE KNOCKOFFS FOR HIGH-DIMENSIONAL CONTROLLED VARIABLE SELECTION. Emmanuel J. Candès Yingying Fan Lucas Janson Jinchi Lv PANNING FOR GOLD: MODEL-FREE KNOCKOFFS FOR HIGH-DIMENSIONAL CONTROLLED VARIABLE SELECTION By Emmanuel J. Candès Yingying Fan Lucas Janson Jinchi Lv Technical Report No. 2016-05 October 2016 Department

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

arxiv: v1 [stat.me] 29 Dec 2018

arxiv: v1 [stat.me] 29 Dec 2018 On the Construction of Knockoffs in Case-Control Studies Rina Foygel Barber Emmanuel J. Candès arxiv:1812.11433v1 [stat.me] 29 Dec 2018 December, 2018 Abstract Consider a case-control study in which we

More information

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary

More information

Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection

Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection Emmanuel Candès 1, Yingying Fan 2, Lucas Janson 1, and Jinchi Lv 2 1 Department of Statistics, Stanford University

More information

RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs

RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs Yingying Fan 1, Emre Demirkaya 1, Gaorong Li 2 and Jinchi Lv 1 University of Southern California 1 and Beiing University of Technology 2 October

More information

Panning for gold: model-x knockoffs for high dimensional controlled variable selection

Panning for gold: model-x knockoffs for high dimensional controlled variable selection J. R. Statist. Soc. B (2018) 80, Part 3, pp. 551 577 Panning for gold: model-x knockoffs for high dimensional controlled variable selection Emmanuel Candès, Stanford University, USA Yingying Fan, University

More information

A knockoff filter for high-dimensional selective inference

A knockoff filter for high-dimensional selective inference 1 A knockoff filter for high-dimensional selective inference Rina Foygel Barber and Emmanuel J. Candès February 2016; Revised September, 2017 Abstract This paper develops a framework for testing for associations

More information

Selection-adjusted estimation of effect sizes

Selection-adjusted estimation of effect sizes Selection-adjusted estimation of effect sizes with an application in eqtl studies Snigdha Panigrahi 19 October, 2017 Stanford University Selective inference - introduction Selective inference Statistical

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS. BY RINA FOYGEL BARBER 1 AND EMMANUEL J. CANDÈS 2 University of Chicago and Stanford University

CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS. BY RINA FOYGEL BARBER 1 AND EMMANUEL J. CANDÈS 2 University of Chicago and Stanford University The Annals of Statistics 2015, Vol. 43, No. 5, 2055 2085 DOI: 10.1214/15-AOS1337 Institute of Mathematical Statistics, 2015 CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS BY RINA FOYGEL BARBER 1 AND

More information

arxiv: v1 [stat.me] 11 Feb 2016

arxiv: v1 [stat.me] 11 Feb 2016 The knockoff filter for FDR control in group-sparse and multitask regression arxiv:62.3589v [stat.me] Feb 26 Ran Dai e-mail: randai@uchicago.edu and Rina Foygel Barber e-mail: rina@uchicago.edu Abstract:

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Recent Developments in Post-Selection Inference

Recent Developments in Post-Selection Inference Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

The knockoff filter for FDR control in group-sparse and multitask regression

The knockoff filter for FDR control in group-sparse and multitask regression The knockoff filter for FDR control in group-sparse and multitask regression Ran Dai Department of Statistics, University of Chicago, Chicago IL 6637 USA Rina Foygel Barber Department of Statistics, University

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Katrina Ligett HUJI & Caltech joint with Seth Neel, Aaron Roth, Bo Waggoner, Steven Wu NIPS 2017

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Multilayer Knockoff Filter: Controlled variable selection at multiple resolutions

Multilayer Knockoff Filter: Controlled variable selection at multiple resolutions Multilayer Knockoff Filter: Controlled variable selection at multiple resolutions arxiv:1706.09375v1 stat.me 28 Jun 2017 Eugene Katsevich and Chiara Sabatti June 27, 2017 Abstract We tackle the problem

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Approximate Message Passing

Approximate Message Passing Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations

Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations Joshua R. Wang November 1, 2016 1 Model and Results Continuing from last week, we again examine provable algorithms for

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018 LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,

More information

Construction of PoSI Statistics 1

Construction of PoSI Statistics 1 Construction of PoSI Statistics 1 Andreas Buja and Arun Kumar Kuchibhotla Department of Statistics University of Pennsylvania September 8, 2018 WHOA-PSI 2018 1 Joint work with "Larry s Group" at Wharton,

More information

Some new ideas for post selection inference and model assessment

Some new ideas for post selection inference and model assessment Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

arxiv: v2 [stat.me] 9 Aug 2018

arxiv: v2 [stat.me] 9 Aug 2018 Submitted to the Annals of Applied Statistics arxiv: arxiv:1706.09375 MULTILAYER KNOCKOFF FILTER: CONTROLLED VARIABLE SELECTION AT MULTIPLE RESOLUTIONS arxiv:1706.09375v2 stat.me 9 Aug 2018 By Eugene Katsevich,

More information

Accumulation test demo - simulated data

Accumulation test demo - simulated data Accumulation test demo - simulated data Ang Li and Rina Foygel Barber May 27, 2015 Introduction This demo reproduces the ordered hypothesis testing simulation in the paper: Ang Li and Rina Foygel Barber,

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9 Problem du jour Week 3: Wednesday, Jan 9 1. As a function of matrix dimension, what is the asymptotic complexity of computing a determinant using the Laplace expansion (cofactor expansion) that you probably

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information

Confounder Adjustment in Multiple Hypothesis Testing

Confounder Adjustment in Multiple Hypothesis Testing in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi)

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi) Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi) Our immediate goal is to formulate an LLN and a CLT which can be applied to establish sufficient conditions for the consistency

More information

arxiv: v3 [stat.me] 14 Oct 2015

arxiv: v3 [stat.me] 14 Oct 2015 The Annals of Statistics 2015, Vol. 43, No. 5, 2055 2085 DOI: 10.1214/15-AOS1337 c Institute of Mathematical Statistics, 2015 CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS arxiv:1404.5609v3 [stat.me]

More information

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome. Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Generalized Cp (GCp) in a Model Lean Framework

Generalized Cp (GCp) in a Model Lean Framework Generalized Cp (GCp) in a Model Lean Framework Linda Zhao University of Pennsylvania Dedicated to Lawrence Brown (1940-2018) September 9th, 2018 WHOA 3 Joint work with Larry Brown, Juhui Cai, Arun Kumar

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Recent Advances in Post-Selection Statistical Inference

Recent Advances in Post-Selection Statistical Inference Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University June 26, 2016 Joint work with Jonathan Taylor, Richard Lockhart, Ryan Tibshirani, Will Fithian, Jason Lee,

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling 12735: Urban Systems Modeling Lec. 09 Bayesian Networks instructor: Matteo Pozzi x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 1 outline example of applications how to shape a problem as a BN complexity of the inference

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information