Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)
|
|
- Christopher Daniel
- 6 years ago
- Views:
Transcription
1 Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017
2 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber estimation 4 FARM-test 5 Numerical studies
3 Introduction
4 Heavy-tailed distributions ubiquitous in modern statistics and machine learning financial returns; macroeconomics time series high-throughput data: microarrays, proteomics, fmri arising easily in high-dimensional data at odd with sub-gaussian or sub-exponential assumptions
5 Example 1: Macroeconomic time series 131 macroeconomic series (Stock & Watson 10, Ludvigson & Ng, 10, Mccraven & Ng, 16). Histgram of Kurtosis Frequency series heavier than t 5! All financial return time series 0 t
6 Example 2: RNA-seq Data Gene expressions for 104 autism patients and controls Distribution of Kurtosis a /19K gene exp. heavier than t 5! kurtosis By chance, some have heavier tails in high dim. Aim: Reduce significantly the tail assumptions
7 Example 3: Protein and Gene Expressions NCI-60: 60 human cancer cell lines (Shankavaram et al., 2007) Histgram of Kurtosis of Protein Expressions Histgram of Kurtosis of Gene Expressions Frequency Frequency Protein: 49/162 Gene: 6542/17924 heavier than t 5!
8 Large-Scale Hypothesis Testing X = (X 1,...,X p ) T has mean µ = (µ 1,...,µ p ) T. Multiple test: H 0j : µ j = 0 vs H 1j : µ j 0, for j = 1,...,p. Challenge: Strong dependence between X 1,...,X p (Benjamini & Hochberg, 95; Storey, 02; Donoho & Jin, 04; Genovese & Wasserman, 04; Efron, 07, 10; Fan et al., 12; Desai & Storey, 12; Barber & Candés, 15, Fan & Han, 17+; ) Global test: H 0 : µ = 0 vs H 1 : µ 0.
9 A common model for dependence Factor model: X ij = µ j + b T j f i + u ij, i = 1,...,n, j = 1,...,p.
10 Importance of Factor Adjustments A synthetic three-factor model: X i = µ + Bf i + u i, i = 1,...,n, f i N (0,I 3 ), B = (b jl ) IID U( 1,1) & u i t 3 (0,I p ). Model setup: (n,p) = (100,500), µ j = 0.6 for j p/4; 0, otherwise. Histogram of Sample Means Histogram of Sample Means with Factor Adjustment Frequency Frequency
11 Importance of Robust Adjustments Histogram of Robust Means Histogram of Robust Means with Factor Adjustment Frequency Frequency Decreased noise!
12 Principles of Robustification
13 Principle of Robustification I: Truncation Data: X i IID(µ,σ 2 ). Truncation: Let Xi = sgn(x i )min( X i,τ). Exponential concentration: When τ σ n, (Fan, Wang, Zhu, 17) ( 1 P n n i=1 X i µ t σ n ) 2exp( ct 2 ), univ const c 1/t 2, for sample mean Fundamental to high-dim. estimation
14 Robust Covariance Inputs Data: X i IID(0,Σ), d-dim. σ ij = E(X i X j ) (EX i )(EX j ). }{{} =0 Elementwise truncation: x ij = sgn(x ij )min( x ij,τ) and Σ = (1/n) n i=1 x i x T i If E X j X k 2 M and τ (nm/(logd)) 1 4, we have ( P Σ Σ max am logd n for any a > 2 and a universal constant c. ) d 2 a/c
15 Robust Covariance Inputs Data: X i IID(0,Σ), d-dim. σ ij = E(X i X j ) (EX i )(EX j ). }{{} =0 Elementwise truncation: x ij = sgn(x ij )min( x ij,τ) and Σ = (1/n) n i=1 x i x T i If E X j X k 2 M and τ (nm/(logd)) 1 4, we have ( P Σ Σ max am logd n for any a > 2 and a universal constant c. ) d 2 a/c
16 Applications of Covariance Finance: portfolio risk and management Stat & ML: Classification, graphical models, PCA ? Regression: Let Σ = cov((x T,Y ) T ). E(Y X T β) 2 = ( β T,1)Σ ( β T,1) T Inference: Hotelling T 2, FDR/FDP control,
17 Principle of High-dim Robustification II Adaptive Huber loss: (Catoni 12; Fan, Li, Wang 17) 50 { x ρ τ (x) = 2 45, if x τ 40 µ τ = argmin τ(2 x τ), if x > τ. n i=1 ρ τ (Y i µ), u Then, for τ = nc/t with c SD(Y ), (Fan, Li, Wang 17) c P( µ τ µ t ) 2exp( t 2 /16), t n/8, n loss function: lτ (u) 1/t 2, for sample mean τ =0.5 τ =1 τ =2 τ =3 τ =4 τ =5 least squares
18 Adaptive Huber estimation
19 Adaptive Huber Estimation Robust covariante adjustments: ( µ, β T ) T n argmin µ,β τ (Y i µ X i=1l T i β) }{{} ε i Bahadur Representation If τ = τ 0 n(p t) 1/2 with τ 0 σ = var(ε i ) and X sub-gaussian, [ ] [ ] µ µ n Σ 1/2 ( β β 1 ) n l 1 τ(ε i ) i Σ 1/2 X R n(τ) p, i n P{R n (τ) > c p+t n } 8e t Results extended to the case with EX 4 i C via truncation.
20 A Note of Huber Regression in Low Dimensions Huber s M-estimator: βτ = argmin β R d (1/n) n i=1 l τ (y i x T i β) } {{ } L τ (β) is a sample version of β τ = argmin β R d EL τ (β). Heavy-tailed noise: v δ = 1 n n i=1 E ε i 1+δ < for some δ (0,1]. Bias β τ β 2 v δ τ δ for large τ. Larger τ less bias at cost of robustness
21 Phase Transition and Optimality Error bound For any t > 0, τ 0 ν δ, and τ = τ 0 (n/t) max{ 1 1+δ,1/2} P { S 1/2 n ( β τ β ) ( t 2 > 4τ 0 p n ) min{ δ 1+δ,1/2}} (2p + 1)e t, 2 d e n e pn min{ δ 1+δ,1/2} optimal =min{ /(1 + ), 1/2} 1/2 = 1+ impossible region =
22 Remarks The same behavior as sub-gaussian case when var. is finite. The results extend to random designs with sub-gaussian tails. Bahadur representation holds with exponential concentration. The results extends to high-dimensional with L 1 -penalty.
23 Dependent-adjusted Test T = n( µ τ µ )/σ, with τ = τ 0 n(p wn ) 1/2. Cramér-type deviation If E ε 3 < and w n and w n = o( n). Then, sup P( T z) 0 z o{min( w n, nw 1 2 2Φ(z) 1 0. n )} Taking τ n 1/3 gives the optimal result: P( Tτn x)/(1 Φ(x)) 1 uniformly over x [0,o(n 1/6 )). Rate has also been derived. Extended to heavy-tailed designs σ can be estimated by robust residual variances
24 Covariates and Factors-adjusted Multiple Tests
25 Covariate-adjusted Robust Test Covariate adjustments: X ij = µ j + b T j f i + u ij (known factor). For each null, run the robust two-sided test. Let P j = 2Φ( T j ) be the P-value for testing H 0 : µ j = 0. Uniform approximation of P-values Under regularity conditions, if logp = o{min(w n,nwn 2 )}, then max 1 j p P j P true j 1 1{Ptrue j > α/p} = o(1) as n,
26 FDP approximation Total discoveries: R(z) = p j=1 1( Tj z). FDP(z) = j Null 1( Tj z) R(z), FDP N (z) = 2p 0Φ( z). R(z) Valid approximation of FDP If m p p and m p, max FDP(z) 0 z Φ 1 (1 m p /(2p)) FDP N (z) 1 0 in probability.
27 Remarks 1 True FDP can be well approximated by normal dist., after factor adjustments, as if data are weakly dep. with reduced noise. 2 Verify the chosen critical value ẑ N,α yields truly FDP at level α. 3 Proportion of true nulls p 0 /p can be estimated as in Storey (2002): π 0 (λ) = 1 (1 λ)p p j=1 1( Pj > λ), where Pj = 2Φ( T j )
28 Factor-adjust Robust Test Model: X i = µ + Bf i + u i, i [n], f i is unobserved. Robust Estimation of Realized Factors: Note that X j = µ j + b T j f + ū j, j = 1,...,p Cross-sectional regression: If B were known, we estimate f by f = argminf R d p j=1 l γ ( Xj b T j f), where γ = γ(n, p) is a robustification parameter.
29 Factor-adjust Robust Test Model: X i = µ + Bf i + u i, i [n], f i is unobserved. Robust Estimation of Realized Factors: Note that X j = µ j + b T j f + ū j, j = 1,...,p Cross-sectional regression: If B were known, we estimate f by f = argminf R d p j=1 l γ ( Xj b T j f), where γ = γ(n, p) is a robustification parameter.
30 FARM Test Statistics n T j = σ jj b j 2 2 ( µj b T j f), σjj = RVar(X ij ) Validity of FDP Approximation Assume p n log(n), log(p) n 1/2 and that µ = (µ 1,...,µ p ) T is sparse, among other reg. cond. Then FDP N (z) FDP(z) = o P (1) as n,p.
31 Estimation of Loading Matrix B estimated by top d unnormalized eigenvectors of robust covariance matrix of var(x): 1 Elementwise robust estimator: σ ij = E(X i X j ) (EX i )(EX j ) 2 Robust U-type estimator: with ψ τ (u) = ( u τ)sign(u), Σ U (τ) = ( 1 ( ) 1 n ψ τ 2) 2 X j X k 2 (Xj X k )(X j X k ) T 2. X j k j X k 2 2 FARM-test: Choose ẑ α such that FDP N (ẑ α ; B) = α. Asymptotic results Validity of such a procedure in FDP control and estimation is proved.
32 Numerical Studies
33 Model and Methods Factor model: X i = µ + Bf i + u i, n {100,150,200},p = 500. B = (b jl ) IID U( 1,1), f i N (0,I 3 ), u i N (0,4I 3 ) or t 3 (0,I 3 ), µ j = 0.5, 1 j 25; µ j = 0, otherwise. Competing methods: 1 FARM-H: FARM-Test with adaptive Huber covariance estimator; 2 FARM-U: FARM-Test with U-type covariance estimator; 3 FAM: A non-robust counterpart of FARM (sample mean + cov.); 4 PFA: Principal factor approximation (Fan and Han, 17+); 5 Naive: Multiple t-tests ignoring factors.
34 FDP Control Table: Empirical mean abs. error between estimated & oracle FDP (t = 0.01) p = 500 u i n FARM-H FARM-U FAM PFA Naive Normal t Non-robust methods break down!
35 Power Comparisons Table: Empirical power p = 500 u i n FARM-H FARM-U FAM PFA Naive Normal t Little price to pay for robustness!
36 Power Curve under Varying Signal Strength Empirical power with respect to signal strength Empirical power FARM H FARM U FAM PFA Naive Signal Strength Figure: Empirical power versus signal strength for t 3 -distributed noise
37 Applications to Neuroblastoma Data German patients diagnosed between 1989 and 2004, aged from 0 to 296 months (median 15 months) customized oligonucleotide microarray with p = 10, focus on 3-y Event Free Survival, (49 + and 190 ) and 420 genes respectively have kurtosis heavier than t 5.
38 Effect of Adjustments and Differently Expressed Genes Before After Before After Negative group before adjustment Negative group after adjustment Negative group before adjustment Negative group after adjustment corr > 1/3 corr < 1/3. At t = 0.05, FARM-U, FAM and naive methods identify 2128, 1767, 1131 differently expressed genes.
39 Summary Introduce a simple robust principle Develop non-asymptotic Bahadur representation for adaptive Huber estimator Demonstrate a phase transition phenomenon. Propose a new factor-adjusted robust multiple test: FARM-test. Verify benefits and conclusions by simulation studies.
40 The End g{tç~ léâ A new perspective on robust M-estimation: finite sample theory and applications to dependence-adjusted multiple testing. (with W.-X. Zhou, K. Bose & H. Liu) Preprint, FARM-Test: factor-adjusted robust multiple testing with false discovery control. (with K. Yuan, Q. Sun & W.-X. Zhou) Preprint, 2017.
arxiv: v1 [stat.me] 15 Nov 2017
FARM-Test: Factor-adjusted robust multiple testing with false discovery control Jianqing Fan, Yuan Ke, Qiang Sun and Wen-Xin Zhou arxiv:1711.05386v1 [stat.me] 15 Nov 2017 Abstract Large-scale multiple
More informationGuarding against Spurious Discoveries in High Dimension. Jianqing Fan
in High Dimension Jianqing Fan Princeton University with Wen-Xin Zhou September 30, 2016 Outline 1 Introduction 2 Spurious correlation and random geometry 3 Goodness Of Spurious Fit (GOSF) 4 Asymptotic
More informationA NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING
Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING By Wen-Xin Zhou, Koushiki Bose, Jianqing Fan,
More informationarxiv: v1 [math.st] 15 Nov 2017
Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING arxiv:1711.05381v1 [math.st] 15 Nov 2017 By
More informationSparse Learning and Distributed PCA. Jianqing Fan
w/ control of statistical errors and computing resources Jianqing Fan Princeton University Coauthors Han Liu Qiang Sun Tong Zhang Dong Wang Kaizheng Wang Ziwei Zhu Outline Computational Resources and Statistical
More informationTheory and Applications of High Dimensional Covariance Matrix Estimation
1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications
More informationLarge-Scale Multiple Testing of Correlations
Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationLarge-Scale Hypothesis Testing
Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationConfounder Adjustment in Multiple Hypothesis Testing
in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen
More informationThe Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility
The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationPortfolio Allocation using High Frequency Data. Jianqing Fan
Portfolio Allocation using High Frequency Data Princeton University With Yingying Li and Ke Yu http://www.princeton.edu/ jqfan September 10, 2010 About this talk How to select sparsely optimal portfolio?
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationA Large-Sample Approach to Controlling the False Discovery Rate
A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationRobust Covariance Estimation for Approximate Factor Models
Robust Covariance Estimation for Approximate Factor Models Jianqing Fan, Weichen Wang and Yiqiao Zhong arxiv:1602.00719v1 [stat.me] 1 Feb 2016 Department of Operations Research and Financial Engineering,
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationDoing Cosmology with Balls and Envelopes
Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie
More information3 Comparison with Other Dummy Variable Methods
Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationEstimating False Discovery Proportion Under Arbitrary Covariance Dependence
Estimating False Discovery Proportion Under Arbitrary Covariance Dependence arxiv:1010.6056v2 [stat.me] 15 Nov 2011 Jianqing Fan, Xu Han and Weijie Gu May 31, 2018 Abstract Multiple hypothesis testing
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationLarge-Scale Multiple Testing of Correlations
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-5-2016 Large-Scale Multiple Testing of Correlations T. Tony Cai University of Pennsylvania Weidong Liu Follow this
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationEstimation and Confidence Sets For Sparse Normal Mixtures
Estimation and Confidence Sets For Sparse Normal Mixtures T. Tony Cai 1, Jiashun Jin 2 and Mark G. Low 1 Abstract For high dimensional statistical models, researchers have begun to focus on situations
More informationarxiv: v2 [stat.me] 1 Mar 2019
A Factor-Adjusted Multiple Testing Procedure with Application to Mutual Fund Selection Wei Lan and Lilun Du arxiv:1407.5515v2 [stat.me] 1 Mar 2019 Southwestern University of Finance and Economics, and
More informationHeteroskedasticity in Time Series
Heteroskedasticity in Time Series Figure: Time Series of Daily NYSE Returns. 206 / 285 Key Fact 1: Stock Returns are Approximately Serially Uncorrelated Figure: Correlogram of Daily Stock Market Returns.
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationEstimation of a Two-component Mixture Model
Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint
More informationSub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments
Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate
More informationLikelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel
More informationA Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices
A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash
More informationEstimation of High-dimensional Vector Autoregressive (VAR) models
Estimation of High-dimensional Vector Autoregressive (VAR) models George Michailidis Department of Statistics, University of Michigan www.stat.lsa.umich.edu/ gmichail CANSSI-SAMSI Workshop, Fields Institute,
More informationCramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou
Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics Wen-Xin Zhou Department of Mathematics and Statistics University of Melbourne Joint work with Prof. Qi-Man
More informationVast Volatility Matrix Estimation for High Frequency Data
Vast Volatility Matrix Estimation for High Frequency Data Yazhen Wang National Science Foundation Yale Workshop, May 14-17, 2009 Disclaimer: My opinion, not the views of NSF Y. Wang (at NSF) 1 / 36 Outline
More informationLesson 4: Stationary stochastic processes
Dipartimento di Ingegneria e Scienze dell Informazione e Matematica Università dell Aquila, umberto.triacca@univaq.it Stationary stochastic processes Stationarity is a rather intuitive concept, it means
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationIncorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests
Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the
More informationMultiple Testing. Hoang Tran. Department of Statistics, Florida State University
Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome
More informationA Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data
A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction
More informationExperience Rating in General Insurance by Credibility Estimation
Experience Rating in General Insurance by Credibility Estimation Xian Zhou Department of Applied Finance and Actuarial Studies Macquarie University, Sydney, Australia Abstract This work presents a new
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationGARCH Models Estimation and Inference
GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in
More informationRobust Testing and Variable Selection for High-Dimensional Time Series
Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional
More informationSummary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing
Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper
More informationStatistical Inference On the High-dimensional Gaussian Covarianc
Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional
More informationA GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE
A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationSummary and discussion of: Controlling the False Discovery Rate via Knockoffs
Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationHomogeneity Pursuit. Jianqing Fan
Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationAsymmetric least squares estimation and testing
Asymmetric least squares estimation and testing Whitney Newey and James Powell Princeton University and University of Wisconsin-Madison January 27, 2012 Outline ALS estimators Large sample properties Asymptotic
More informationClustering by Important Features PCA (IF-PCA)
Clustering by Important Features PCA (IF-PCA) Rare/Weak Signals and Phase Diagrams Jiashun Jin, CMU Zheng Tracy Ke (Univ. of Chicago) Wanjie Wang (Univ. of Pennsylvania) August 5, 2015 Jiashun Jin, CMU
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationFalse Discovery Control in Spatial Multiple Testing
False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University
More information6. The econometrics of Financial Markets: Empirical Analysis of Financial Time Series. MA6622, Ernesto Mordecki, CityU, HK, 2006.
6. The econometrics of Financial Markets: Empirical Analysis of Financial Time Series MA6622, Ernesto Mordecki, CityU, HK, 2006. References for Lecture 5: Quantitative Risk Management. A. McNeil, R. Frey,
More informationStat 206: Estimation and testing for a mean vector,
Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where
More informationProblem Set 7. Ideally, these would be the same observations left out when you
Business 4903 Instructor: Christian Hansen Problem Set 7. Use the data in MROZ.raw to answer this question. The data consist of 753 observations. Before answering any of parts a.-b., remove 253 observations
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationModified Simes Critical Values Under Positive Dependence
Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationInference with Transposable Data: Modeling the Effects of Row and Column Correlations
Inference with Transposable Data: Modeling the Effects of Row and Column Correlations Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research
More informationCONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING
Submitted to the Annals of Statistics CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING By Jingshu Wang, Qingyuan Zhao, Trevor Hastie, Art B. Owen Stanford University We consider large-scale studies
More informationCausal Inference: Discussion
Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement
More informationTO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao
TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao ABSTRACT. In the analysis of microarray data, and in some other
More information13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity
Outline: Further Issues in Using OLS with Time Series Data 13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process I. Stationary and Weakly Dependent Time Series III. Highly Persistent
More informationIntroduction to Rare Event Simulation
Introduction to Rare Event Simulation Brown University: Summer School on Rare Event Simulation Jose Blanchet Columbia University. Department of Statistics, Department of IEOR. Blanchet (Columbia) 1 / 31
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationFDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES
FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationBootstrapping high dimensional vector: interplay between dependence and dimensionality
Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationRobust Sparse Quadratic Discriminantion. Jianqing Fan
Robust Sparse Quadratic Discriminantion Jianqing Fan Princeton University with Tracy Ke, Han Liu and Lucy Xia May 2, 2014 Outline 1 Introduction 2 Rayleigh Quotient for sparse QDA 3 Optimization Algorithm
More informationAssessing the dependence of high-dimensional time series via sample autocovariances and correlations
Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),
More informationNonlinear time series
Based on the book by Fan/Yao: Nonlinear Time Series Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 27, 2009 Outline Characteristics of
More informationBayesian Models for Regularization in Optimization
Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,
More informationAsymptotic Equivalence of Regularization Methods in Thresholded Parameter Space
Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/
More informationJournal Club: Higher Criticism
Journal Club: Higher Criticism David Donoho (2002): Higher Criticism for Heterogeneous Mixtures, Technical Report No. 2002-12, Dept. of Statistics, Stanford University. Introduction John Tukey (1976):
More informationMedian Cross-Validation
Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational
More information