Clustering by Important Features PCA (IF-PCA)
|
|
- Meredith Reeves
- 5 years ago
- Views:
Transcription
1 Clustering by Important Features PCA (IF-PCA) Rare/Weak Signals and Phase Diagrams Jiashun Jin, CMU David Donoho (Stanford) Zheng Tracy Ke (Univ. of Chicago) Wanjie Wang (Univ. of Pennsylvania) August 12, 2015 Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 1 / 42
2 Clustering subjects using microarray data # Data Name Source K n (# of subjects) p (# of genes) 1 Brain Pomeroy (02) Breast Cancer Wang et al. (05) Colon Cancer Alon et al. (99) Leukemia Golub et al. (99) Lung Cancer Gordon et al. (02) Lung Cancer(2) Bhattacharjee et al. (01) Lymphoma Alizadeh et al. (00) Prostate Cancer Singh et al. (02) SRBCT Kahn (01) Su-Cancer Su et al (01) Goal. Predict class labels Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 2 / 42
3 Left/right singular vectors features samples Left singular vector: (n-dimensional) eigenvector of XX Right singular vector: (p-dimensional) eigenvector of X X Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 3 / 42
4 Principal Component Analysis (PCA) Karl Pearson ( ) Idea: Transformation Dimension Reduction while keeping main info. Data = signal + noise (signal matrix: low-rank) Microarray data (after standardization): many columns of the signal matrix are 0 Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 4 / 42
5 Important Features PCA (IF-PCA) Idea: PCA applied to a small fraction of carefully selected features: Rank features by Kolmogorov-Smirnov statistic Select those with the largest KS-scores Apply PCA to the post-selection data matrix features selected features PC s samples Azizyan et al (2013), Chan and Hall (2010), Fan and Lv (2008) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 5 / 42
6 IF-PCA (microarray data) W i (j) = [X i (j) X (j)]/ˆσ(j) : W = [w 1,..., w p ] = [W 1,..., W n], feature-wise normalization F n,j (t) = 1 n 1{W i (j) t} n 1. Rank features with Kolmogorov-Smirnov (KS) scores ψ n,j = n sup <t< i=1 F n,j (t) Φ(t), (Φ: CDF of N(0, 1)) 2. Renormalize the KS scores by (Efron s empirical null) ψ n,j = ψ n,j mean of all p different KS-scores SD of all p different KS-scores 3. Fix t > 0. Û (t) R n,k 1 : first (K 1) left singular vectors of post-selection data matrix [w j : ψ n,j t] 4. Apply classical k-means algorithm to Û (t) R n,k 1 IF-PCA-HCT : t = tp HC : Higher Criticism threshold (TBA) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 6 / 42
7 The blessing of feature selection x-axis: 1, 2,..., n; y-axis: entries of Û HC (U (t) for t = tp HC ) Left: plot of ÛHC (Lung Cancer; K = 2 so ÛHC R n is a vector) Right: counterpart of ÛHC without feature selection ADCA MPM ADCA MPM Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 7 / 42
8 Efron s null correction (Lung Cancer) Efron s theoretic Null: density of ψ n,j if X i (j) iid N(u j, σ 2 j ) (not depend on (j, u j, σ j ); easy to simulate). Theoretic null is a bad fit to ψ n,j (top) but a nice fit to ψ n,j (bottom) Theoretic Null Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 8 / 42
9 How to set the threshold t? CV: not directly implementable (class labels unknown) FDR: need tuning and target on Rare/Strong signals [Benjaminin and Hochberg (1995), Efron (2010)] t (threshold) #{selected features} feature-fdr errors Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 9 / 42
10 Tukey s Higher Criticism John W. Tukey ( ) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 10 / 42
11 Higher Criticism (HC) Review papers: Donoho and Jin (2015), Jin and Ke (2015) Proposed by Donoho and Jin (2004) for sparse signal detection Found useful in GWAS, DNA Copy Number Variants (CNV), Cosmology and Astronomy, Disease surveillance Extended to many different directions: Innovated HC (for colored noise), signals detection in a regression model, HC when noise dist. is unknown/nongaussian, estimate the proportion of non-null effects Threshold choice in classification [Donoho and Jin (2008, 2009), Fan et al (2013), Jin (2009)] Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 11 / 42
12 Threshold choice by HC for IF-PCA Jin and Wang (2015), Jin, Ke and Wang (2015) HC p (t) = t HC p = argmax t {HC p (t)}, p[ĝp (t) F 0 (t)] n[ Ĝ p (t) F 0 (t)] + Ĝp(t) (new) F 0 : survival function of Efron s theoretical null Ĝ p : empirical survival function of renormalized KS-scores ψn,j Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 12 / 42
13 HCT: implementation Compute P-values: π j = F 0 (ψn,j ), 1 j p Sort P-values: π (1) < π (2) <... < π (p) Define the HC functional by p(k/p π(k) ) HC p,k = k/p + max{ n(k/p π (k) ), 0} Let ˆk = argmax {1 k p/2,π(k) >log(p)/p}{hc p,k }. HC threshold t HC p is the ˆk-th largest KS-score HC p (t) t=ψ (k) = p[k/p π(k) ] k/p + ; ψ(k) : sorted KS scores n(k/p π (k) ) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 13 / 42
14 Illustration ordered KS scores ordered p values HC objective Function Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 14 / 42
15 Illustration, II (Lung Cancer) 0.2 HC threshold x-axis: # of selected features; y-axis: error rates by IF-PCA Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 15 / 42
16 Comparison r = error rate of IF-PCA-HCT minimum error rate of all other methods # Data set K kmeans kmeans++ Hier SpecGem* IF-PCA-HCT r 1 Brain (.09) Breast Cancer (.05) Colon Cancer (.07) Leukemia (.09) Lung Cancer (.09) Lung Cancer(2) (.00) Lymphoma (.13) Prostate Cancer (.01) SRBCT (.06) SuCancer (.05) *: SpecGem is classical PCA [Lee et al (2010)]; Arthur and Vassilvitskii (2007) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 16 / 42
17 Sparse PCA and variants of IF-PCA features selected features PC s samples # Data set K Clu-sPCA IF-kmeans IF-Hier IF-PCA-HCT 1 Brain Breast Cancer Colon Cancer Leukemia Lung Cancer Lung Cancer(2) Lymphoma Prostate Cancer SRBCT SuCancer : project to estimated feature space (sparse PCA) and then clustering; unclear how to set λ (ideal λ is used above); clustering feature estimation Zou et al (2006), Witten and Tibshirani (2010) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 17 / 42
18 Summary (so far) IF-PCA-HCT consists of three simple steps Marginal screening (KS) Threshold choice (Empirical null + HCT) Post-selection PCA tuning free, fast, and yet effective easily extendable and adaptable Next: theoretical reasons for HCT in RW settings Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 18 / 42
19 RW viewpoint In many types of Big Data, signals of interest are not only sparse (rare) but also individually weak, and we have no priori where these RW signals are Large p small n (e.g., genetics and genomics) (Signal strength) α n $ or labor Clustering: α = 6; classification: α = 2 Technical limitation (e.g., astronomy) Early detection (e.g., disease surveillance) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 19 / 42
20 RW model and Phase Diagram Many methods/theory target on Rare/Strong signals (if conditions XX hold and all signals are sufficiently strong...) Our proposal: RW model: parsimonious model capturing the main factors (sparsity and signal strength) Phase Diagram: provides sharp results that characterize when the desired goal is impossible or possible to achieve an approach to distinguish non-optimal and optimal procedures Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 20 / 42
21 Sparse signal detection (global testing) Donoho and Jin (2004), Ingster (1997, 1999), Jin (2004) X = µ + Z, X R p, Z N(0, I p ) H (p) 0 : µ = 0, vs. H (p) 1 : µ(j) iid (1 ɛ p )ν 0 +ɛ p ν τp Calibration and subtlety of the problem: ɛ p = p β, τ p = 2r log(p), 0 < β < 1, r > 0 Rare: only a small fraction of non-zero means Weak: signals only moderately strong Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 21 / 42
22 Phase Diagram (signal detection/recovery) 0, 0 < β < 1/2 standard phase function: r = ρ(β); ρ(β) = β 1 2, 1 2 < β < 3 4 (1 1 β) 2 3, 4 < β < Estimable Exact Recovery r 0.5 Detectable r 2 Almost Full Recovery Undetectable β 0.5 Detectable Undetectable Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 22 / 42
23 RW settings: why we care? Growing awareness of irreproducibility Many methods/theory focus on Rare/Strong signals, do not cope well with RW settings FDR-controlling methods (showed before) L 0 /L 1 -penalization methods Minimax framework Ioannidis, (2005). Why most published research findings are false ; Donoho and Stark (1989); Chen et al (1995); Tibshirani (1996); Abrmovich et al (2006); Jin et al (2015); Ke et al (2015) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 23 / 42
24 A two-class model for clustering Jin, Ke & Wang (2015) X i = l i µ+z i, Z i iid N(0, Ip ), i = 1,..., n (p n) l i = ±1: unknown class labels (main interest) µ R p : feature vector RW: only a small fraction of entries of µ is nonzero, each contributes weakly to clustering Goal. Theoretical insight on HCT (and more) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 24 / 42
25 IF-PCA simplified to two-class model features selected features PC s samples microarray two-class model pre-normalization yes skipped feature-wise screening Kolmogorov-Smirnov chi-square ψ j = sup t F n,j (t) Φ(t) ψ j = ( x j n)/ 2n re-normalization Efron s null correction skipped threshold choice HCT same post-selection PCA same same Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 25 / 42
26 Asymptotic Rare/Weak (ARW) model X = lµ + Z R n,p, Z: iid N(0, 1) entries l i = ±1 with equal prob. µ(j) iid (1 ɛ p )ν 0 + ɛ p ν τp large p small n : n = p θ, 0 < θ < 1 RW signal: ɛ p = p β, τ p = 4 (4/n)r log(p) Note: ψ j = ( x j 2 n)/ 2n N(0, 1) or N( 2r log(p), 1) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 26 / 42
27 Three functions: HC, idealhc, SNR Given X = lµ + Z, we retain feature j if ψ j t F0 (t): survival function of normalized χ 2 2n (0) Ĝ p (t): empirical survival function ψ j Û (t) : first left singular vector of [x j : ψ j t] 1. HC(t) : p[ Ĝ p (t) F 0 (t)]/[ n[ĝp(t) F 0 (t)]+ĝp(t)] idealhc(t) : HC(t) with Ĝp(t) replaced by Ḡp(t) 3. SNR(t): Û (t) SNR(t) l + z + rem, z N(0, I n ) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 27 / 42
28 Three thresholds: t HC p t idealhc p t ideal p 4 HCT HC IdealHC SNR SNR(t) = E[ µ (t) 2 ] E[ µ (t) 2 ] + E[ Ŝ(t) ]/n p( Ḡ p (t) F 0 (t)) n[ḡp (t) F 0 (t)] + Ḡ p (t) µ (t) : µ restricted to Ŝ(t) (index set of all retained features) E µ (t) 2 τp 2 p[ḡp(t) F 0 (t)] E µ (t) 2 + E[ Ŝ(t) /n] τ p 2 [Ḡp(t) F 0 (t)] + Ḡp(t)/n Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 28 / 42
29 Impossibility Let ρ(β) be the standard phase function (before). Define ρ θ (β) = (1 θ)ρ( β θ ); recalling n = pθ Consider IF-PCA with t > 0. Let Û (t) R n be the first left singular vector of post-selection data matrix Consider ARW with [x j : ψ j t] n = p θ, ɛ p = p β, τ p = 4 (4/n)r log(p) If r < ρ θ (β), then for any threshold t, Cos(Û(t), l) c 0 < 1 (IF-PCA partially fails) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 29 / 42
30 Possibility If r > ρ θ (β), then For some t, Cos(Û(t), l) 1 There is a non-stochastic function SNR(t) such that for some t, SNR(t) 1 and Û (t) SNR(t)l + z + rem, z N(0, I n ); HCT yields the right threshold choice: tp HC /tp ideal 1 in prob.; tp ideal IF-PCA-HCT yields successful clustering: Hamm p (ˆl HCT ; β, r, θ) 0 = argmax t { SNR(t)} : Hamm p (ˆl; β, α, θ) = (n/2) 1 min b=±sgn(ˆl) { n i=1 P( b i sgn(l i ) )} Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 30 / 42
31 Phase Diagram (IF-PCA) #(useful features) p 1 β, τ p = 4 (4/n)r log(p); n = p θ (θ =.6) Success r 0.5 r Success Failure 0.1 Failure β β Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 31 / 42
32 Summary (so far) Big Data are here, but signals are RW RW model and Phase Diagrams: theoretical framework specifically for RW settings, allow for insights that will be otherwise overlooked HC provides the right threshold choice Limitation: IF-PCA only works at a specific signal strength and in a limited sparsity range Want a more complete story: statistical limits for clustering under ARW Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 32 / 42
33 ARW for statistical limits (a slight change) weaker X = lµ + Z l i = ±1 with equal prob. α Most interesting range for IF-PCA µ(j) iid (1 ɛ p )ν 0 + ɛ p ν τp n = p θ, ɛ p = p β τ = n 1/4 s = p 1/2 s = n 1/ β sparser For statistical limits For IF-PCA τ p τ p = p α, α > 0 τ p = 4 (4/n)r log(p) s := #{signals} 1 s p n s p Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 33 / 42
34 Aggregation methods sa Simple Aggregation: ˆl = sgn ( x ) Sparse Aggregation: ˆl sa N = sgn( xŝ), where Ŝ = Ŝ(N) = argmax { } {S: S =N} xs 1 Features Features Features Features Samples Samples xx 1 xx 2 xx 3 xx 1 xx 2 xx xx pp 2 xx pp 1 xx pp xx pp 2 xx pp 1 xx pp Samples Samples xx 1 xx 2 xx 3 xx 1 xx 2 xx xx pp 2 xx pp 1 xx pp xx pp 2 xx pp xx xx xx SS xx SS Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 34 / 42
35 Comparison of methods Method Simple Agg. PCA Sparse Agg. IF-PCA ˆl sa ˆl if ˆl sa N (N p) ˆlif t (t > 0) Sparsity dense dense sparse sparse Strength weak weak strong* strong F. Selection No No Yes Yes Complexity Poly. Poly. NP-hard Poly. Tuning No No Yes Yes** Notation. ˆl if t : IF-PCA adapted to two-class model Notation. ˆl if : classical PCA (a special case) *: signals are comparably stronger but still weak **: a tuning-free version exists Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 35 / 42
36 Statistical limits (clustering) Hamm p (ˆl; β, α, θ) = (n/2) 1 ηθ clu (β) = Consider ARW where min b=±sgn(ˆl) { n P ( b i sgn(l i ) )} i=1 (1 2β)/2, 0 < β < 1 θ 2 1 θ θ/2, < β < 1 θ 2 (1 β)/2, β > 1 θ n = p θ, ɛ p = p β, τ p = p α When α > ηθ clu(β), Hamm p(ˆl; β, α, θ) 1 When α < η clu(β), θ Hamm p (ˆl sa ; β, α, θ) 0 for β < 1 θ Hamm p (ˆl sa 1 θ N ; β, α, θ) 0 for β > 2 ; 2 (N = p 1 β ) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 36 / 42
37 Phase Diagram (clustering; θ = 0.6) For statistical limits For IF-PCA τ p τ p = p α, α > 0 τ p = 4 (4/n)r log(p) Range of β 0 < β < 1 1/2 < β < 1 θ/2 weaker τ = p 1/2 /s Impossible weaker Impossible 0.3 τ = n 1/2 0.3 α Simple Aggregation Sparse Aggregation (NP-hard) τ = s 1/2 α Simple Aggreg. NP-hard Classical PCA IF-PCA β sparser β sparser Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 37 / 42
38 Two closely related problems X = lµ + Z, Z: iid N(0, 1) entries (sig). Estimate support of µ (Signal recovery) n Hamm p (ˆµ; β, r, θ) = (pɛ p ) 1 P(sgn(ˆµ i ) sgn(µ i )) i=1 (hyp). Test H (p) 0 that X = Z against alternative H (p) 1 that X = lµ + Z (global hypothesis testing) TestErr p ( ˆT, β, r, β) = P H (p) 0 (Reject H 0 ) + P (p) H (Accept H (p) 0 ) 1 Arias-Castro & Verzelen (2015), Johnstone & Lu (2001), Rigollet & Berthet (2013) Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 38 / 42
39 Statistical limits (three problems) µ(j) iid (1 ɛ p )ν 0 + ɛ p ν τp, n = p θ, s #{useful features} p 1 β, signal strength = p α τ=n 1/4 p 1/2 /s Clustering Signal Recovery Hypothesis Testing α τ=p 1/2 /s τ=n 1/2 s=n τ=(sn) 1/4 0.1 τ=n 1/4 τ=s 1/ β Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 39 / 42
40 Computable upper bounds (two problems) Left: signal recovery. Right: (global) hypothesis testing µ(j) iid (1 ɛ p )ν 0 + ɛ p ν τp, n = p θ, s #{useful features} p 1 β, signal strength = p α Clustering Signal Recovery (stat) Hypothesis Testing Signal Recovery (compu) Clustering Signal Recovery Hypothesis Testing (stat) Hypothesis Testing (compu) τ=n 1/4 ( p/s) α 0.3 τ=n 1/2 α τ=(p/n) 1/4 s 1/ τ=n 1/4 s=p 1/2 0.1 τ=n 1/4 s=p 1/ β Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 40 / 42 β
41 Take-home message Big Data is here, but your signals are RW; we need to do many things very differently RW model and Phase Diagrams are theoretical framework specifically designed for RW settings, and expose many insights we do not see with more traditional frameworks IF-PCA and Higher Criticism are simple and easy-to-adapt methods which are provably effective for analyzing real data, especially for Rare/Weak settings Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 41 / 42
42 Acknowledgements Many thanks for David Donoho, Stephen Fienberg, Peter Hall, Jon Wellner and many others for inspirations, collaborations, and encouragements! Main references. Jin J, Wang W (2015) Important Features PCA for high dimensional clustering. arxiv Jin J, Ke Z, Wang W (2015) Phase transitions for high dimensional clustering and related problems. arxiv Two review papers on HC. D. Donoho and J. Jin (2015). Higher Criticism for Large-Scale Inference, especially for rare and weak effects. Statistical Science, 30(1), J. Jin and Z. Ke (2015). Rare and weak effects in large-scale inference: methods and phase diagrams. arxiv Jiashun Jin, CMU Clustering by Important Features PCA (IF-PCA) 42 / 42
Clustering by Important Features PCA (IF-PCA)
Clustering by Important Features PCA (IF-PCA) Rare/Weak Signals and Phase Diagrams Jiashun Jin, CMU Zheng Tracy Ke (Univ. of Chicago) Wanjie Wang (Univ. of Pennsylvania) August 5, 2015 Jiashun Jin, CMU
More informationGraphlet Screening (GS)
Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationCovariate-Assisted Variable Ranking
Covariate-Assisted Variable Ranking Tracy Ke Department of Statistics Harvard University WHOA-PSI@St. Louis, Sep. 8, 2018 1/18 Sparse linear regression Y = X β + z, X R n,p, z N(0, σ 2 I n ) Signals (nonzero
More informationPHASE TRANSITIONS FOR HIGH DIMENSIONAL CLUSTERING AND RELATED PROBLEMS
The Annals of Statistics 27, Vol. 45, o. 5, 25 289 DOI:.24/6-AOS522 Institute of Mathematical Statistics, 27 PHASE TRASITIOS FOR HIGH DIMESIOAL CLUSTERIG AD RELATED PROBLEMS BY JIASHU JI, ZHEG TRACY KE
More informationOptimal Detection of Heterogeneous and Heteroscedastic Mixtures
Optimal Detection of Heterogeneous and Heteroscedastic Mixtures T. Tony Cai Department of Statistics, University of Pennsylvania X. Jessie Jeng Department of Biostatistics and Epidemiology, University
More informationEstimation and Confidence Sets For Sparse Normal Mixtures
Estimation and Confidence Sets For Sparse Normal Mixtures T. Tony Cai 1, Jiashun Jin 2 and Mark G. Low 1 Abstract For high dimensional statistical models, researchers have begun to focus on situations
More informationOptimal detection of weak positive dependence between two mixture distributions
Optimal detection of weak positive dependence between two mixture distributions Sihai Dave Zhao Department of Statistics, University of Illinois at Urbana-Champaign and T. Tony Cai Department of Statistics,
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationOptimal detection of heterogeneous and heteroscedastic mixtures
J. R. Statist. Soc. B (0) 73, Part 5, pp. 69 66 Optimal detection of heterogeneous and heteroscedastic mixtures T. Tony Cai and X. Jessie Jeng University of Pennsylvania, Philadelphia, USA and Jiashun
More informationFactor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)
Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationHigher Criticism Testing for Signal Detection in Rare And Weak Models
Higher Criticism Testing for Signal Detection in Rare And Weak Models Niclas Blomberg Master Thesis at KTH Royal Institute of Technology Stockholm, September 2012 Supervisor: Tatjana Pavlenko Abstract
More informationarxiv: v2 [math.st] 25 Aug 2012
Submitted to the Annals of Statistics arxiv: math.st/1205.4645 COVARIANCE ASSISTED SCREENING AND ESTIMATION arxiv:1205.4645v2 [math.st] 25 Aug 2012 By Tracy Ke,, Jiashun Jin and Jianqing Fan Princeton
More informationGLOBAL TESTING UNDER SPARSE ALTERNATIVES: ANOVA, MULTIPLE COMPARISONS AND THE HIGHER CRITICISM 1
The Annals of Statistics 2011, Vol. 39, No. 5, 2533 2556 DOI: 10.1214/11-AOS910 Institute of Mathematical Statistics, 2011 GLOBAL TESTING UNDER SPARSE ALTERNATIVES: ANOVA, MULTIPLE COMPARISONS AND THE
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationH 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that
Lecture 28 28.1 Kolmogorov-Smirnov test. Suppose that we have an i.i.d. sample X 1,..., X n with some unknown distribution and we would like to test the hypothesis that is equal to a particular distribution
More informationPost-selection Inference for Forward Stepwise and Least Angle Regression
Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September
More informationGLOBAL TESTING UNDER SPARSE ALTERNATIVES: ANOVA, MULTIPLE COMPARISONS AND THE HIGHER CRITICISM
Submitted to the Annals of Statistics arxiv: 7.434 GLOBAL TESTING UNDER SPARSE ALTERNATIVES: ANOVA, MULTIPLE COMPARISONS AND THE HIGHER CRITICISM By Ery Arias-Castro Department of Mathematics, University
More informationGeneralized Linear Models and Its Asymptotic Properties
for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for
More informationOPTIMAL PROCEDURES IN HIGH-DIMENSIONAL VARIABLE SELECTION
OPTIMAL PROCEDURES IN HIGH-DIMENSIONAL VARIABLE SELECTION by Qi Zhang Bachelor of Science in Math and Applied Math, China Agricultural University, 2008 Submitted to the Graduate Faculty of the Department
More informationFalse Discovery Rate
False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR
More informationConfounder Adjustment in Multiple Hypothesis Testing
in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationCOVARIATE ASSISTED VARIABLE RANKING. By Zheng Tracy Ke and Fan Yang University of Chicago
Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 COVARIATE ASSISTED VARIABLE RANKING By Zheng Tracy Ke and Fan Yang University of Chicago Consider a linear model y = Xβ + z, z N(0, σ 2 I n).
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationJournal Club: Higher Criticism
Journal Club: Higher Criticism David Donoho (2002): Higher Criticism for Heterogeneous Mixtures, Technical Report No. 2002-12, Dept. of Statistics, Stanford University. Introduction John Tukey (1976):
More informationLecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017
Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationarxiv: v1 [math.st] 20 Nov 2009
REVISITING MARGINAL REGRESSION arxiv:0911.4080v1 [math.st] 20 Nov 2009 By Christopher R. Genovese Jiashun Jin and Larry Wasserman Carnegie Mellon University May 31, 2018 The lasso has become an important
More informationOPTIMAL CLASSIFICATION IN SPARSE GAUSSIAN GRAPHIC MODEL
The Annals of Statistics 2013, Vol. 41, No. 5, 2537 2571 DOI: 10.1214/13-AOS1163 Institute of Mathematical Statistics, 2013 OPTIMAL CLASSIFICATION IN SPARSE GAUSSIAN GRAPHIC MODEL BY YINGYING FAN 1,JIASHUN
More informationMethods for High Dimensional Inferences With Applications in Genomics
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Methods for High Dimensional Inferences With Applications in Genomics Jichun Xie University of Pennsylvania,
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationAsymptotic distribution of the largest eigenvalue with application to genetic data
Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene
More informationSTATS 306B: Unsupervised Learning Spring Lecture 13 May 12
STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality
More informationDETECTING RARE AND WEAK SPIKES IN LARGE COVARIANCE MATRICES
Submitted to the Annals of Statistics DETECTING RARE AND WEAK SPIKES IN LARGE COVARIANCE MATRICES By Zheng Tracy Ke University of Chicago arxiv:1609.00883v1 [math.st] 4 Sep 2016 Given p-dimensional Gaussian
More informationDoing Cosmology with Balls and Envelopes
Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationCramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou
Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics Wen-Xin Zhou Department of Mathematics and Statistics University of Melbourne Joint work with Prof. Qi-Man
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationL. Brown. Statistics Department, Wharton School University of Pennsylvania
Non-parametric Empirical Bayes and Compound Bayes Estimation of Independent Normal Means Joint work with E. Greenshtein L. Brown Statistics Department, Wharton School University of Pennsylvania lbrown@wharton.upenn.edu
More informationarxiv: v1 [math.st] 29 Jan 2010
DISTILLED SENSING: ADAPTIVE SAMPLING FOR SPARSE DETECTION AND ESTIMATION arxiv:00.53v [math.st] 29 Jan 200 By Jarvis Haupt, Rui Castro, and Robert Nowak Rice University, Columbia University, and University
More informationA significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More informationarxiv: v1 [math.st] 11 Dec 2008
Feature Selection by Higher Criticism Thresholding: Optimal Phase Diagram David Donoho 1 and Jiashun Jin 2 1 Department of Statistics, Stanford University 2 Department of Statistics, Carnegie Mellon University
More informationFDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES
FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept
More informationProportion of non-zero normal means: universal oracle equivalences and uniformly consistent estimators
J. R. Statist. Soc. B (28) 7, Part 3, pp. 461 493 Proportion of non-zero normal means: universal oracle equivalences and uniformly consistent estimators Jiashun Jin Purdue University, West Lafayette, and
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationAFT Models and Empirical Likelihood
AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t
More informationarxiv: v1 [math.st] 31 Mar 2009
The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN
More informationSparse statistical modelling
Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1]
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationEstimation of a Two-component Mixture Model
Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint
More informationRegularized Discriminant Analysis and Its Application in Microarray
Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of
More informationGuarding against Spurious Discoveries in High Dimension. Jianqing Fan
in High Dimension Jianqing Fan Princeton University with Wen-Xin Zhou September 30, 2016 Outline 1 Introduction 2 Spurious correlation and random geometry 3 Goodness Of Spurious Fit (GOSF) 4 Asymptotic
More informationarxiv: v1 [stat.me] 30 Dec 2017
arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationOptimal Rates of Convergence for Estimating the Null Density and Proportion of Non-Null Effects in Large-Scale Multiple Testing
Optimal Rates of Convergence for Estimating the Null Density and Proportion of Non-Null Effects in Large-Scale Multiple Testing T. Tony Cai 1 and Jiashun Jin 2 Abstract An important estimation problem
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview
Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationDistilled Sensing: Adaptive Sampling for. Sparse Detection and Estimation
Distilled Sensing: Adaptive Sampling for Sparse Detection and Estimation Jarvis Haupt, Rui Castro, and Robert Nowak arxiv:00.53v2 [math.st] 27 May 200 Abstract Adaptive sampling results in dramatic improvements
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationMultiple Change-Point Detection and Analysis of Chromosome Copy Number Variations
Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationRATE ANALYSIS FOR DETECTION OF SPARSE MIXTURES
RATE ANALYSIS FOR DETECTION OF SPARSE MIXTURES Jonathan G. Ligo, George V. Moustakides and Venugopal V. Veeravalli ECE and CSL, University of Illinois at Urbana-Champaign, Urbana, IL 61801 University of
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationFalse Discovery Control in Spatial Multiple Testing
False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University
More informationPost-selection inference with an application to internal inference
Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,
More informationLikelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel
More informationEstimating the Null and the Proportion of Non- Null Effects in Large-Scale Multiple Comparisons
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 7 Estimating the Null and the Proportion of Non- Null Effects in Large-Scale Multiple Comparisons Jiashun Jin Purdue
More informationIncorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests
Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the
More informationA knockoff filter for high-dimensional selective inference
1 A knockoff filter for high-dimensional selective inference Rina Foygel Barber and Emmanuel J. Candès February 2016; Revised September, 2017 Abstract This paper develops a framework for testing for associations
More informationarxiv: v1 [math.st] 20 Nov 2009
FEATURE SELECTION WHEN THERE ARE MANY INFLUENTIAL FEATURES Peter Hall 12 Jiashun Jin 3 Hugh Miller 1 arxiv:0911.4076v1 [math.st] 20 Nov 2009 SUMMARY. Recent discussion of the success of feature selection
More informationTweedie s Formula and Selection Bias. Bradley Efron Stanford University
Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?
More informationLarge-Scale Multiple Testing of Correlations
Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity
More informationFDR and ROC: Similarities, Assumptions, and Decisions
EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationRegularized Discriminant Analysis and Its Application in Microarrays
Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,
More informationHigh-throughput Testing
High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationInference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities
Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events
More informationStat 206: Estimation and testing for a mean vector,
Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where
More informationInference with Transposable Data: Modeling the Effects of Row and Column Correlations
Inference with Transposable Data: Modeling the Effects of Row and Column Correlations Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationVARIABLE SELECTION AND INDEPENDENT COMPONENT
VARIABLE SELECTION AND INDEPENDENT COMPONENT ANALYSIS, PLUS TWO ADVERTS Richard Samworth University of Cambridge Joint work with Rajen Shah and Ming Yuan My core research interests A broad range of methodological
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More information3 Comparison with Other Dummy Variable Methods
Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction
More informationJournal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts
Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA
More informationKnockoffs as Post-Selection Inference
Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationEstimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq
Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Xing Ren 1, Jianmin Wang 1,2,, Song Liu 1,2, and Jeffrey C. Miecznikowski 1,2,
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationMultiple testing: Intro & FWER 1
Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More information