Knockoffs as Post-Selection Inference
|
|
- Tiffany Green
- 5 years ago
- Views:
Transcription
1 Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017
2 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15
3 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p How can we select important explanatory variables with few mistakes? Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15
4 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p How can we select important explanatory variables with few mistakes? Applications to: Medicine/genetics/health care Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15
5 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p How can we select important explanatory variables with few mistakes? Applications to: Medicine/genetics/health care Economics/political science Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15
6 Controlled Variable Selection Conditional modeling setup: Y an outcome of interest (AKA response or dependent variable), X 1,..., X p a set of p potential explanatory variables (AKA covariates, features, or independent variables), Want to learn about the conditional distribution of Y given X 1,..., X p How can we select important explanatory variables with few mistakes? Applications to: Medicine/genetics/health care Economics/political science Industry/technology Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 1 / 15
7 Controlled Variable Selection (cont d) What is an important variable? Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15
8 Controlled Variable Selection (cont d) What is an important variable? We consider X j to be unimportant if the conditional distribution of Y given X 1,..., X p does not depend on X j. Formally, X j is unimportant if it is conditionally independent of Y given X -j : Y X j X -j Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15
9 Controlled Variable Selection (cont d) What is an important variable? We consider X j to be unimportant if the conditional distribution of Y given X 1,..., X p does not depend on X j. Formally, X j is unimportant if it is conditionally independent of Y given X -j : Y X j X -j Markov Blanket of Y : smallest set S such that Y X -S X S Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15
10 Controlled Variable Selection (cont d) What is an important variable? We consider X j to be unimportant if the conditional distribution of Y given X 1,..., X p does not depend on X j. Formally, X j is unimportant if it is conditionally independent of Y given X -j : Y X j X -j Markov Blanket of Y : smallest set S such that Y X -S X S For GLMs with no stochastically redundant covariates, equivalent to {j : β j = 0} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15
11 Controlled Variable Selection (cont d) What is an important variable? We consider X j to be unimportant if the conditional distribution of Y given X 1,..., X p does not depend on X j. Formally, X j is unimportant if it is conditionally independent of Y given X -j : Y X j X -j Markov Blanket of Y : smallest set S such that Y X -S X S For GLMs with no stochastically redundant covariates, equivalent to {j : β j = 0} To make sure we do not make too many mistakes, we seek to select a set Ŝ to control the false discovery rate (FDR): ( ) #{j in FDR(Ŝ) = E Ŝ : X j unimportant} #{j in Ŝ} q (e.g. 10%) Here is a set of variables Ŝ, 90% of which I expect to be important Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 2 / 15
12 How to Incorporate Adaptivity/Selection? Especially important for complex, high-dimensional problems Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 3 / 15
13 How to Incorporate Adaptivity/Selection? Especially important for complex, high-dimensional problems One approach: Post-Selection Inference Idea: give analyst unrestricted access to the data, adjust inference afterwards Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 3 / 15
14 How to Incorporate Adaptivity/Selection? Especially important for complex, high-dimensional problems One approach: Post-Selection Inference Idea: give analyst unrestricted access to the data, adjust inference afterwards Berk et al. (2013) Adjustment does not depend on what the analyst does Generality leads to conservativeness in practice Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 3 / 15
15 How to Incorporate Adaptivity/Selection? Especially important for complex, high-dimensional problems One approach: Post-Selection Inference Idea: give analyst unrestricted access to the data, adjust inference afterwards Berk et al. (2013) Adjustment does not depend on what the analyst does Generality leads to conservativeness in practice Lee et al. (2016) and many others, at/from Stanford and elsewhere Track what analyst does/looks at, condition on it for inference Inference is exact! Not general: adjustment must be specifically Taylored to what analyst looks at Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 3 / 15
16 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15
17 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Reusable Holdout from differential privacy literature (Dwork et al., 2015) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15
18 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Reusable Holdout from differential privacy literature (Dwork et al., 2015) Constrains access to data by adding noise to queries in specific way Subject to that constraint, analyst can arbitrarily use query outcomes for selection Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15
19 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Reusable Holdout from differential privacy literature (Dwork et al., 2015) Constrains access to data by adding noise to queries in specific way Subject to that constraint, analyst can arbitrarily use query outcomes for selection Bounds bias of naïve inference, regardless of what analyst does/queries (just depends on number of queries) Generality leads to conservativeness Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15
20 How to Incorporate Adaptivity/Selection? (cont d) Second approach: slightly restrict analyst s access to data so that inference unaffected by adaptivity/selection Reusable Holdout from differential privacy literature (Dwork et al., 2015) Constrains access to data by adding noise to queries in specific way Subject to that constraint, analyst can arbitrarily use query outcomes for selection Bounds bias of naïve inference, regardless of what analyst does/queries (just depends on number of queries) Generality leads to conservativeness This talk: Knockoffs Not originally proposed as method for inference after adaptivity/selection Thought about in a different way, fits in well! Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 4 / 15
21 Generic Knockoffs Framework (1) Construct knockoffs: Artificial versions ( knockoffs ) X j of each covariate X j Act as controls for assessing importance of original covariates Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 5 / 15
22 Generic Knockoffs Framework (1) Construct knockoffs: Artificial versions ( knockoffs ) X j of each covariate X j Act as controls for assessing importance of original covariates (2) Compute knockoff statistics: Scalar statistics Z j ( Z j) for each covariate (knockoff) measuring importance Scalar W j combines Z j, Z j to measure relative importance of X j vs. Xj Positive W j denotes original more important, strength measured by magnitude Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 5 / 15
23 Generic Knockoffs Framework (1) Construct knockoffs: Artificial versions ( knockoffs ) X j of each covariate X j Act as controls for assessing importance of original covariates (2) Compute knockoff statistics: Scalar statistics Z j ( Z j) for each covariate (knockoff) measuring importance Scalar W j combines Z j, Z j to measure relative importance of X j vs. Xj Positive W j denotes original more important, strength measured by magnitude (3) Find the knockoff threshold: Order the variables by decreasing W j and proceed down list Select only variables with positive W j until last time negatives positives q Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 5 / 15
24 Generic Knockoffs Framework (1) Construct knockoffs: Artificial versions ( knockoffs ) X j of each covariate X j Act as controls for assessing importance of original covariates (2) Compute knockoff statistics: Scalar statistics Z j ( Z j) for each covariate (knockoff) measuring importance Scalar W j combines Z j, Z j to measure relative importance of X j vs. Xj Positive W j denotes original more important, strength measured by magnitude (3) Find the knockoff threshold: Order the variables by decreasing W j and proceed down list Select only variables with positive W j until last time negatives positives q Coin-flipping property: The key to knockoffs is that steps (1) and (2) are done specifically to ensure that, conditional on W 1,..., W p, the signs of the unimportant/null W j are independently ±1 with probability 1/2 Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 5 / 15
25 Original Knockoffs (Barber and Candès, 2015) Some Notation: y is the n 1 column vector of i.i.d. draws of the random variable Y X j is the n 1 column vector of i.i.d. draws of the random variable X j X j is the knockoff variable (column vector) for X j Design matrices X := [X 1 X p ], X := [ X 1 X p ] Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 6 / 15
26 Original Knockoffs (Barber and Candès, 2015) Some Notation: y is the n 1 column vector of i.i.d. draws of the random variable Y X j is the n 1 column vector of i.i.d. draws of the random variable X j X j is the knockoff variable (column vector) for X j Design matrices X := [X 1 X p ], X := [ X 1 X p ] Original knockoffs procedure: Knockoffs constructed geometrically/deterministically: [ [X X ] [X X X ] = X X X diag{s} X X diag{s} X X ] Finite-sample FDR control and leverages sparsity for power Takes canonical model-based approach: condition on X, assume and rely heavily on low-dimensional (n p) Gaussian linear model Sufficiency requirement: W j function only of [X X ] [X X ] and [X X ] y Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 6 / 15
27 Model-Free Knockoffs (Candès et al., 2016) Shifts the burden of knowledge from Y onto X: Joint distribution of X 1,..., X p can be arbitrary but it must be known, and the observations must be independently and identically disributed (i.i.d.) rows of X = (X i,1,..., X i,p ) iid F Uses F to generate random knockoffs that are swap-exchangeable: [X 1 X j X p X 1 X j X p ] D = [X 1 X j X p X 1 X j X p ] Conditions on y (so no assumptions/model needed for Y X 1,..., X p ) Works for any n and p No sufficiency requirement on W j Robust to overfitting X s distribution Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 7 / 15
28 Model-Free Knockoffs (Candès et al., 2016) Shifts the burden of knowledge from Y onto X: Joint distribution of X 1,..., X p can be arbitrary but it must be known, and the observations must be independently and identically disributed (i.i.d.) rows of X = (X i,1,..., X i,p ) iid F Uses F to generate random knockoffs that are swap-exchangeable: [X 1 X j X p X 1 X j X p ] D = [X 1 X j X p X 1 X j X p ] Conditions on y (so no assumptions/model needed for Y X 1,..., X p ) Works for any n and p No sufficiency requirement on W j Robust to overfitting X s distribution Instead of modeling y and conditioning on X, conditions on y and models X Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 7 / 15
29 Knockoff Statistics (a) Pick a vector-valued function g to compute variable importances Z j, Z j (Z 1,..., Z p, Z 1,..., Z ( p ) := g y, [X X ) ] g must satisfy symmetry property: for any S {1,..., p}, g (y, [X X ) ( ] swap(s) = g y, [X X ) ] swap(s) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 8 / 15
30 Knockoff Statistics (a) Pick a vector-valued function g to compute variable importances Z j, Z j (Z 1,..., Z p, Z 1,..., Z ( p ) := g y, [X X ) ] g must satisfy symmetry property: for any S {1,..., p}, g (y, [X X ) ( ] swap(s) = g y, [X X ) ] swap(s) (b) Pick an antisymmetric function f j : R 2 R, i.e., and set W j = f j (Z j, Z j ) f j (a, b) = f j (b, a) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 8 / 15
31 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15
32 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Example: ( g runs lasso y [X X ) ]; λ to get (β 1,..., β p, β 1,..., β p ); takes abs. values f j (a, b) = a b Lasso Coefficient Difference (LCD) statistic: W j = β j β j Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15
33 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Example: ( g runs lasso y [X X ) ]; λ to get (β 1,..., β p, β 1,..., β p ); takes abs. values f j (a, b) = a b Lasso Coefficient Difference (LCD) statistic: W j = β j β j Allows analyst to use huge range of statistical tools Know little about the data: nonparametric methods like cross-validation, random forests, neural networks Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15
34 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Example: ( g runs lasso y [X X ) ]; λ to get (β 1,..., β p, β 1,..., β p ); takes abs. values f j (a, b) = a b Lasso Coefficient Difference (LCD) statistic: W j = β j β j Allows analyst to use huge range of statistical tools Know little about the data: nonparametric methods like cross-validation, random forests, neural networks Know more about the data: parametric or Bayesian methods (guarantees do not rely on model/prior being well-specified, nor MCMC converging) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15
35 Original Approach Choose g (with symmetry property) and f j a priori; run entire knockoffs pipeline Example: ( g runs lasso y [X X ) ]; λ to get (β 1,..., β p, β 1,..., β p ); takes abs. values f j (a, b) = a b Lasso Coefficient Difference (LCD) statistic: W j = β j β j Allows analyst to use huge range of statistical tools Know little about the data: nonparametric methods like cross-validation, random forests, neural networks Know more about the data: parametric or Bayesian methods (guarantees do not rely on model/prior being well-specified, nor MCMC converging) Cannot use human intuition from looking at data: no adaptivity Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 9 / 15
36 Post-Selection Inference Approach Generate knockoffs (step 1 of knockoffs pipeline); reveal randomly swapped data: y, [X X ] swap(r) where R is uniformly random subset of {1,..., p} (analyst does not know R) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 10 / 15
37 Post-Selection Inference Approach Generate knockoffs (step 1 of knockoffs pipeline); reveal randomly swapped data: y, [X X ] swap(r) where R is uniformly random subset of {1,..., p} (analyst does not know R) Choose antisymmetric f, symmetric g however you want from swapped data No way of distinguishing null X j from X j, so FDR still controlled (exactly!) Potentially a lot of information to distinguish non-null X j from X j, and hence adapt f and g to be more powerful Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 10 / 15
38 Post-Selection Inference Approach Generate knockoffs (step 1 of knockoffs pipeline); reveal randomly swapped data: y, [X X ] swap(r) where R is uniformly random subset of {1,..., p} (analyst does not know R) Choose antisymmetric f, symmetric g however you want from swapped data No way of distinguishing null X j from X j, so FDR still controlled (exactly!) Potentially a lot of information to distinguish non-null X j from X j, and hence adapt f and g to be more powerful Use these data-dependent f and g to compute W (rest of step 2 of knockoffs pipeline) and find the knockoff threshold (step 3) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 10 / 15
39 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15
40 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15
41 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, (Z 1,..., Z p, Z 1,..., Z ( p ) swap(j) = g y, [X X ) ] swap(j) symmetry of g = g (y, [X X ) ] swap(j) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15
42 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, (Z 1,..., Z p, Z 1,..., Z ( p ) swap(j) = g y, [X X ) ] swap(j) symmetry of g = g (y, [X X ) ] swap(j) ( D swap-exchangeability, nullity of j, uniformity of R = g y, [X X ) ] = (Z 1,..., Z p, Z 1,..., Z p ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15
43 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, (Z 1,..., Z p, Z 1,..., Z ( p ) swap(j) = g y, [X X ) ] swap(j) symmetry of g = g (y, [X X ) ] swap(j) ( D swap-exchangeability, nullity of j, uniformity of R = g y, [X X ) ] = (Z 1,..., Z p, Z 1,..., Z p ) W j = f j (Z j, Z j ) D = f j ( Z j, Z j ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15
44 Exchangeability Endows Coin-Flipping Model-free knockoffs are generated to be swap-exchangeable: for any j, [X X ] swap(j) D = [X X ] Coin-flipping property for W j : for any null covariate j, (Z 1,..., Z p, Z 1,..., Z ( p ) swap(j) = g y, [X X ) ] swap(j) symmetry of g = g (y, [X X ) ] swap(j) ( D swap-exchangeability, nullity of j, uniformity of R = g y, [X X ) ] = (Z 1,..., Z p, Z 1,..., Z p ) W j = f j (Z j, Z j ) D = f j ( Z j, Z j ) = f j (Z j, Z j ) = W j Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 11 / 15
45 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
46 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: 0 Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
47 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 10 W 2 W 9 W 7 W 6 0 W 3 W 8 W 1 W 4 W 5 Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
48 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W 5 Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
49 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W 5 #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
50 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
51 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
52 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
53 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
54 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
55 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
56 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W 8 W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
57 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
58 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W 6 0 W W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
59 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W W W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
60 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 9 W 2 W 7 W 10 W W W 3 W 1 W 4 W #{negative W j} #{positive W j} q = 20% Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15 ˆτ
61 Find the Knockoff Threshold Example with p = 10 and q = 20% = 1/5: W 7 W 6 0 W 1 W 4 W 5 S = {1, 4, 5, 6, 7} #{negative W j} #{positive W j} q = 20% ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 12 / 15
62 Intuition for FDR Control ( ) #{null Xj selected} FDR = E #{total X j selected} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 13 / 15
63 Intuition for FDR Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 13 / 15
64 Intuition for FDR Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 13 / 15
65 Intuition for FDR Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 13 / 15
66 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15
67 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains More than doubled power in real GWAS (see also Sesia et al. (2017)) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15
68 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains More than doubled power in real GWAS (see also Sesia et al. (2017)) Advantage of adaptivity hard to measure in simulation Despite generality, post-selection inference knockoffs is not conservative Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15
69 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains More than doubled power in real GWAS (see also Sesia et al. (2017)) Advantage of adaptivity hard to measure in simulation Despite generality, post-selection inference knockoffs is not conservative Robustness Remarkably robust to overfitting X s distribution in-sample Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15
70 Power and Robustness Power Non-adaptive statistics (e.g., LCD) already see substantial power gains More than doubled power in real GWAS (see also Sesia et al. (2017)) Advantage of adaptivity hard to measure in simulation Despite generality, post-selection inference knockoffs is not conservative Robustness Remarkably robust to overfitting X s distribution in-sample W j more robust when less sensitive to higher moments of X distribution Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 14 / 15
71 Main Takeaway The knockoffs framework Adaptive: allows unlimited human judgement applied to slightly-masked data Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
72 Main Takeaway The knockoffs framework Adaptive: allows unlimited human judgement applied to slightly-masked data General: wrapper for nearly any measure of variable importance Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
73 Main Takeaway The knockoffs framework Adaptive: allows unlimited human judgement applied to slightly-masked data General: wrapper for nearly any measure of variable importance Powerful: accommodates most powerful tools, yet nonconservative Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
74 Main Takeaway The knockoffs framework Adaptive: allows unlimited human judgement applied to slightly-masked data General: wrapper for nearly any measure of variable importance Powerful: accommodates most powerful tools, yet nonconservative Exact: provably, non-asymptotically, controls FDR Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
75 Appendix Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
76 References Barber, R. F. and Candès, E. J. (2015). Controlling the false discovery rate via knockoffs. Ann. Statist., 43(5): Berk, R., Brown, L., Buja, A., Zhang, K., and Zhao, L. (2013). Valid post-selection inference. Ann. Statist., 41(2): Candès, E., Fan, Y., Janson, L., and Lv, J. (2016). Panning for gold: Model-free knockoffs for high-dimensional controlled variable selection. arxiv preprint arxiv: Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., and Roth, A. (2015). The reusable holdout: Preserving validity in adaptive data analysis. Science, 349(6248): Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist., 44(3): Sesia, M., Sabatti, C., and Candès, E. (2017). Gene hunting with knockoffs for hidden markov models. arxiv preprint arxiv: Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
77 Robustness Exact Cov 0.75 Power 0.50 FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
78 Robustness Exact Cov Graph. Lasso 0.75 Power 0.50 FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
79 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 0.75 Power 0.50 FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
80 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 62.5% Emp. Cov 0.75 Power 0.50 FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
81 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 62.5% Emp. Cov 0.75 Power % Emp. Cov FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
82 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 62.5% Emp. Cov 0.75 Power % Emp. Cov 87.5% Emp. Cov FDR Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
83 Robustness Exact Cov Graph. Lasso 50% Emp. Cov 62.5% Emp. Cov 0.75 Power % Emp. Cov 87.5% Emp. Cov FDR % Emp. Cov Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
84 Simulations in Low-Dimensional Linear Model Power FDR 0.50 Methods BHq Marginal 0.25 BHq Max Lik. MF Knockoffs Orig. Knockoffs Coefficient Amplitude Coefficient Amplitude Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures. The design matrix is i.i.d. N (0, 1/n), n = 3000, p = 1000, and y comes from a Gaussian linear model with 60 nonzero regression coefficients having equal magnitudes and random signs. The noise variance is 1. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
85 Simulations in Low-Dimensional Nonlinear Model Power 0.50 FDR Coefficient Amplitude Methods 0.25 BHq Marginal BHq Max Lik. MF Knockoffs Coefficient Amplitude Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures. The design matrix is i.i.d. N (0, 1/n), n = 3000, p = 1000, and y comes from a binomial linear model with logit link function, and 60 nonzero regression coefficients having equal magnitudes and random signs. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
86 Simulations in High Dimensions Power 0.50 FDR Coefficient Amplitude 0.25 Methods BHq Marginal MF Knockoffs Coefficient Amplitude Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures. The design matrix is i.i.d. N (0, 1/n), n = 3000, p = 6000, and y comes from a binomial linear model with logit link function, and 60 nonzero regression coefficients having equal magnitudes and random signs. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
87 Simulations in High Dimensions with Dependence Power 0.50 FDR Methods BHq Marginal MF Knockoffs Autocorrelation Coefficient Autocorrelation Coefficient Figure: Power and FDR (target is 10%) for MF knockoffs and alternative procedures. The design matrix has AR(1) columns, and marginally each X j N (0, 1/n). n = 3000, p = 6000, and y follows a binomial linear model with logit link function, and 60 nonzero coefficients with random signs and randomly selected locations. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
88 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
89 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
90 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Entire analysis took 6 hours of serial computation time; 1 hour in parallel Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
91 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Entire analysis took 6 hours of serial computation time; 1 hour in parallel Model-free knockoffs made twice as many discoveries as original analysis Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
92 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Entire analysis took 6 hours of serial computation time; 1 hour in parallel Model-free knockoffs made twice as many discoveries as original analysis Some new discoveries confirmed in larger study Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
93 Genetic Analysis of Crohn s Disease 2007 case-control study by WTCCC n 5, 000, p 375, 000; preprocessing mirrored original analysis Strong spatial structure: second-order knockoffs generated using Approximate SDP on genetic covariance estimate Entire analysis took 6 hours of serial computation time; 1 hour in parallel Model-free knockoffs made twice as many discoveries as original analysis Some new discoveries confirmed in larger study Some corroborated by work on nearby genes: promising candidates Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
94 Robustness on Real Data Power 0.50 FDR Coefficient Amplitude Coefficient Amplitude Figure: Power and FDR (target is 10%) for model-free knockoffs applied to subsamples of a chromosome 1 of real genetic design matrix; n 1, 400. Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
95 Careful with Symmetry Property Have to be a bit careful recall symmetry property: for any S {1,..., p} g (y, [X X ) ( ] swap(s) = g y, [X X ) ] e.g., if covariates split into K groups G 1,..., G K swap(s) Incorrect: g runs group lasso with 2K groups G 1,..., G K, G 1,..., G K Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
96 Careful with Symmetry Property Have to be a bit careful recall symmetry property: for any S {1,..., p} g (y, [X X ) ( ] swap(s) = g y, [X X ) ] e.g., if covariates split into K groups G 1,..., G K swap(s) Incorrect: g runs group lasso with 2K groups G 1,..., G K, G 1,..., G K Correct: g runs group lasso with K groups {G 1, G 1 },..., {G K, G K } Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
97 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
98 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
99 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
100 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Joint PMF of (X 1:p, X 1:j ) is L(X -j, X j, X 1:j 1 )L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
101 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Joint PMF of (X 1:p, X 1:j ) is L(X -j, X j, X 1:j 1 )L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
102 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Joint PMF of (X 1:p, X 1:j ) is L(X -j, X j, X 1:j 1 )L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
103 Sequential Independent Pairs Generates Valid Knockoffs Algorithm 1 Sequential Conditional Independent Pairs for j = {1,..., p} do Sample X j from L(X j X -j, X1:j 1 ) conditionally independently of X j end Proof sketch (discrete case): Denote PMF of (X 1:p, X 1:j 1 ) by L(X -j, X j, X 1:j 1 ) Conditional PMF of Xj X 1:p, X 1:j 1 is L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ). Joint PMF of (X 1:p, X 1:j ) is L(X -j, X j, X 1:j 1 )L(X -j, X j, X 1:j 1 ) u L(X -j, u, X 1:j 1 ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
104 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
105 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
106 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
107 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
108 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} More precisely: ( ) #{null X mfdr = E j selected} q 1 + #{total X j selected} q ˆτ ( ) #{null positive Wj > ˆτ} = E q 1 + #{positive W j > ˆτ} Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
109 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} More precisely: ( ) ( ) #{null X mfdr = E j selected} #{null positive Wj > ˆτ} q 1 = E + #{total X j selected} q 1 + #{positive W j > ˆτ} ( #{null positive Wj > ˆτ} = E 1 + #{null negative W j > ˆτ} 1 + #{null negative W ) j > ˆτ} q 1 + #{positive W j > ˆτ} q ˆτ Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
110 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} More precisely: ( ) #{null X mfdr = E j selected} q 1 + #{total X j selected} = E q ˆτ ( #{null positive Wj > ˆτ} = E q 1 + #{positive W j > ˆτ} ( #{null positive Wj > ˆτ} 1 + #{null negative W j > ˆτ} 1 + #{null negative W j > ˆτ} q 1 + #{positive W j > ˆτ} }{{} q by definition of ˆτ ) ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
111 Proof of Control ( ) #{null Xj selected} FDR = E #{total X j selected} ( ) #{null positive Wj > ˆτ} = E #{positive W j > ˆτ} ( ) #{null negative Wj > ˆτ} E #{positive W j > ˆτ} ( ) #{negative Wj > ˆτ} E #{positive W j > ˆτ} More precisely: ( ) #{null X mfdr = E j selected} q 1 + #{total X j selected} ( #{null positive Wj > ˆτ} = E 1 + #{null negative W j > ˆτ} } {{ } Supermartingale 1 with ˆτ a stopping time q ˆτ ( #{null positive Wj > ˆτ} = E q 1 + #{positive W j > ˆτ} 1 + #{null negative W j > ˆτ} q 1 + #{positive W j > ˆτ} }{{} q by definition of ˆτ ) ) Lucas Janson (Harvard Statistics) Knockoffs as Post-Selection Inference 15 / 15
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationLecture 12 April 25, 2018
Stats 300C: Theory of Statistics Spring 2018 Lecture 12 April 25, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes, Chenyang Zhong 1 Outline Agenda: The Knockoffs Framework 1. The Knockoffs Framework
More information3 Comparison with Other Dummy Variable Methods
Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction
More informationFalse Discovery Rate
False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR
More informationPANNING FOR GOLD: MODEL-FREE KNOCKOFFS FOR HIGH-DIMENSIONAL CONTROLLED VARIABLE SELECTION. Emmanuel J. Candès Yingying Fan Lucas Janson Jinchi Lv
PANNING FOR GOLD: MODEL-FREE KNOCKOFFS FOR HIGH-DIMENSIONAL CONTROLLED VARIABLE SELECTION By Emmanuel J. Candès Yingying Fan Lucas Janson Jinchi Lv Technical Report No. 2016-05 October 2016 Department
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationarxiv: v1 [stat.me] 29 Dec 2018
On the Construction of Knockoffs in Case-Control Studies Rina Foygel Barber Emmanuel J. Candès arxiv:1812.11433v1 [stat.me] 29 Dec 2018 December, 2018 Abstract Consider a case-control study in which we
More informationSummary and discussion of: Controlling the False Discovery Rate via Knockoffs
Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary
More informationPanning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection
Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection Emmanuel Candès 1, Yingying Fan 2, Lucas Janson 1, and Jinchi Lv 2 1 Department of Statistics, Stanford University
More informationRANK: Large-Scale Inference with Graphical Nonlinear Knockoffs
RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs Yingying Fan 1, Emre Demirkaya 1, Gaorong Li 2 and Jinchi Lv 1 University of Southern California 1 and Beiing University of Technology 2 October
More informationPanning for gold: model-x knockoffs for high dimensional controlled variable selection
J. R. Statist. Soc. B (2018) 80, Part 3, pp. 551 577 Panning for gold: model-x knockoffs for high dimensional controlled variable selection Emmanuel Candès, Stanford University, USA Yingying Fan, University
More informationA knockoff filter for high-dimensional selective inference
1 A knockoff filter for high-dimensional selective inference Rina Foygel Barber and Emmanuel J. Candès February 2016; Revised September, 2017 Abstract This paper develops a framework for testing for associations
More informationSelection-adjusted estimation of effect sizes
Selection-adjusted estimation of effect sizes with an application in eqtl studies Snigdha Panigrahi 19 October, 2017 Stanford University Selective inference - introduction Selective inference Statistical
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationSelective Inference for Effect Modification
Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.
More informationCONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS. BY RINA FOYGEL BARBER 1 AND EMMANUEL J. CANDÈS 2 University of Chicago and Stanford University
The Annals of Statistics 2015, Vol. 43, No. 5, 2055 2085 DOI: 10.1214/15-AOS1337 Institute of Mathematical Statistics, 2015 CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS BY RINA FOYGEL BARBER 1 AND
More informationarxiv: v1 [stat.me] 11 Feb 2016
The knockoff filter for FDR control in group-sparse and multitask regression arxiv:62.3589v [stat.me] Feb 26 Ran Dai e-mail: randai@uchicago.edu and Rina Foygel Barber e-mail: rina@uchicago.edu Abstract:
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationPost-Selection Inference
Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis
More informationRecent Developments in Post-Selection Inference
Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationThe knockoff filter for FDR control in group-sparse and multitask regression
The knockoff filter for FDR control in group-sparse and multitask regression Ran Dai Department of Statistics, University of Chicago, Chicago IL 6637 USA Rina Foygel Barber Department of Statistics, University
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationAccuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization
Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Katrina Ligett HUJI & Caltech joint with Seth Neel, Aaron Roth, Bo Waggoner, Steven Wu NIPS 2017
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationAlpha-Investing. Sequential Control of Expected False Discoveries
Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationMultilayer Knockoff Filter: Controlled variable selection at multiple resolutions
Multilayer Knockoff Filter: Controlled variable selection at multiple resolutions arxiv:1706.09375v1 stat.me 28 Jun 2017 Eugene Katsevich and Chiara Sabatti June 27, 2017 Abstract We tackle the problem
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationInference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities
Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationRisk and Noise Estimation in High Dimensional Statistics via State Evolution
Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationStability and the elastic net
Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationAlgorithms Reading Group Notes: Provable Bounds for Learning Deep Representations
Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations Joshua R. Wang November 1, 2016 1 Model and Results Continuing from last week, we again examine provable algorithms for
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018
LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,
More informationConstruction of PoSI Statistics 1
Construction of PoSI Statistics 1 Andreas Buja and Arun Kumar Kuchibhotla Department of Statistics University of Pennsylvania September 8, 2018 WHOA-PSI 2018 1 Joint work with "Larry s Group" at Wharton,
More informationSome new ideas for post selection inference and model assessment
Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationarxiv: v2 [stat.me] 9 Aug 2018
Submitted to the Annals of Applied Statistics arxiv: arxiv:1706.09375 MULTILAYER KNOCKOFF FILTER: CONTROLLED VARIABLE SELECTION AT MULTIPLE RESOLUTIONS arxiv:1706.09375v2 stat.me 9 Aug 2018 By Eugene Katsevich,
More informationAccumulation test demo - simulated data
Accumulation test demo - simulated data Ang Li and Rina Foygel Barber May 27, 2015 Introduction This demo reproduces the ordered hypothesis testing simulation in the paper: Ang Li and Rina Foygel Barber,
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationBindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9
Problem du jour Week 3: Wednesday, Jan 9 1. As a function of matrix dimension, what is the asymptotic complexity of computing a determinant using the Laplace expansion (cofactor expansion) that you probably
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationSupplementary Note on Bayesian analysis
Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan
More informationConfounder Adjustment in Multiple Hypothesis Testing
in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationLecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi)
Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi) Our immediate goal is to formulate an LLN and a CLT which can be applied to establish sufficient conditions for the consistency
More informationarxiv: v3 [stat.me] 14 Oct 2015
The Annals of Statistics 2015, Vol. 43, No. 5, 2055 2085 DOI: 10.1214/15-AOS1337 c Institute of Mathematical Statistics, 2015 CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS arxiv:1404.5609v3 [stat.me]
More informationClassification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.
Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationGeneralized Cp (GCp) in a Model Lean Framework
Generalized Cp (GCp) in a Model Lean Framework Linda Zhao University of Pennsylvania Dedicated to Lawrence Brown (1940-2018) September 9th, 2018 WHOA 3 Joint work with Larry Brown, Juhui Cai, Arun Kumar
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out
More informationModel Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University
Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationRecent Advances in Post-Selection Statistical Inference
Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University June 26, 2016 Joint work with Jonathan Taylor, Richard Lockhart, Ryan Tibshirani, Will Fithian, Jason Lee,
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationBayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling
12735: Urban Systems Modeling Lec. 09 Bayesian Networks instructor: Matteo Pozzi x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 1 outline example of applications how to shape a problem as a BN complexity of the inference
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More information