Marginal Screening and Post-Selection Inference

Size: px
Start display at page:

Download "Marginal Screening and Post-Selection Inference"

Transcription

1 Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, / 29

2 Outline 1 Background on Marginal Screening Tables in Case-Control Studies (joint work with Min Qian) 3 Binary Screening Test (BST) 4 Forward Stepwise BST 5 Censored Survival Data (Tzu-Jung Huang, M. and Min Qian) Ian McKeague (Columbia University) Marginal Screening August 13, / 29

3 Wald Lecture at JSM 2017 by Candès Ian McKeague (Columbia University) Marginal Screening August 13, / 29

4 Panning for gold Are there any active predictors (in the sense defined by Candès)? The most relevant question with low signal to noise ratio (e.g., epidemiology) and sparse signals (if any). Knockoff approach furnishes rigorous FDR control (not FWER control): Barber and Candès (2016): knockoff filter for testing associations in high-dimensional linear model. Candès, Fan, Janson and Lv (2017): Panning for gold: model-free knockoffs for high-dimensional controlled variable selection Fighting words: To constrain oneself to marginal testing is to completely ignore the vast modern literature on sparse regression that, while lacking finite-sample Type I error control, has had tremendous success establishing other useful inferential guarantees such as model selection consistency under high-dimensional asymptotics... Ian McKeague (Columbia University) Marginal Screening August 13, / 29

5 The unreasonable effectiveness of marginal testing Assumption-lean full linear model: p Y = α 0 + β k X k + ɛ where ɛ has mean 0, finite variance, and is uncorrelated with each X k. The power of marginal testing derives from the fact that If (X 1,..., X p ) has a non-singular covariance matrix, k=1 H 0 : β k = 0, k = 1,..., p holds if and only if Y is marginally uncorrelated with each X k. That is, marginal testing does address the question of whether there are any active predictors in the full model (not the wrong question after all). Ian McKeague (Columbia University) Marginal Screening August 13, / 29

6 Marginal screening Least squares fitting of to each (standardized) predictor X k. E(Y X k ) = α k + β k X k This abuses the notation for β k, but it doesn t matter! Y is marginally uncorrelated with X k if and only if β k = 0. Parameter of interest: θ 0 = β k0, where k 0 arg max k=1,...,p Corr(X k, Y ). The Problem: Test whether θ 0 0 and provide a CI for θ 0 Ian McKeague (Columbia University) Marginal Screening August 13, / 29

7 Marginal screening (cont d) Test statistics for H 0 : θ 0 = 0 versus H a : θ 0 0. Maximally-selected slope ˆθ n = Ĉov(Xˆkn, Y ) Var(Xˆk n ), ˆk n arg max k=1,...,p Ĉorr(X k, Y ). Reject H 0 for large ˆθ n. Equivalently, Maximally-selected correlation ˆρ n = Ĉorr(Xˆk n, Y ), Reject H 0 for large ˆρ n. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

8 Marginal screening (cont d) Has the hallmarks of post-selection inference: non-regular asymptotics at the null hypothesis, unstable behavior in small samples. How to provide an accurate p-value and CI for θ 0 while preserving the assumption-lean approach? i.e. without adding assumptions such as independent normal errors, as needed for conditional testing approaches, say. McKeague and Qian (2015, JASA): adaptive resampling test (ART). A modified nonparametric bootstrap provides valid (post-selection) p-values. Forward stepwise ART used to identify the presence of additional active predictors. Luedtke and van der Laan (2017, JASA): regularization approach. Asymptotically normal test statistic, CIs easily constructed, p can grow exponentially with n. Restricted to independent errors. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

9 2 2 Tables in Case-Control Studies (GWAS) Motivating Example: Risk Assessment of Cerebrovascular Events (RACE Study, 2017) On-going case-control study of stroke: over 5,000 imaging confirmed cases of stroke and 5,000 controls recruited from seven medical centers in Pakistan. Subset of 1,220 cases with early-onset stroke (stroke before age 60), and 1,273 controls Genome-wide SNP data available for this subset of subjects Goal: identify novel genetic factors associated with early-onset stroke Few if any known genetic associations (in contrast to Crohn s disease, say, where 30 genes are known) Ian McKeague (Columbia University) Marginal Screening August 13, / 29

10 Case-Control Set-Up Standard unmatched case-control study: N = M 1 (cases) + M 2 (controls) Binary disease status: D {case, control} Binary risk factors: W k {exposed, unexposed}, k = 1,..., p. Fixed margins. log-odds ratio (instead of correlation) to quantify the association. Is D significantly associated with any of the risk factors W 1,..., W p? Ideal approach would be model free, avoiding the requirement of a high-dimensional logistic regression model. Claim: the unreasonable effectiveness of marginal screening still holds. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

11 Multiple 2 2 Tables For the k-th risk factor, k = 1,..., p, Cases Controls Exposed X k N 1k X k N 1k Unexposed M 1 X k X k + N 2k M 1 N 2k M 1 M 2 N X k noncentral hypergeometric parameterized by the odds ratio. Odds Ratio: θ k = P(D = 1 W k = 1)/P(D = 0 W k = 1) P(D = 1 W k = 0)/P(D = 0 W k = 0). Hypotheses: H 0 : θ 1 =... = θ p = 1 versus H a : at least one θ k 1. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

12 Overview of Inference for 2 2 Tables Classical chi-squared and Fisher exact tests Mantel Haenszel test of H 0 : θ 1 =... = θ p = 1 versus H a : at least one θ k 1. requires all the odds ratios θ k to be identical. Kou and Ying (1996): asymptotic theory of the empirical log-odds ratio for a single table. Kou and Ying (2006) studied the problem of estimating a common odds ratio from sequences of dependent 2 2 tables. There is an extensive literature on tests for homogeneity of multiple odds ratios, e.g. Reis, Hirji and Afifi (1999) Ian McKeague (Columbia University) Marginal Screening August 13, / 29

13 Existing screening methods H 0 : θ 1 =... = θ p = 1 versus H a : at least one θ k 1. Marginal p-values: control of FWER using Bonferroni highly conservative for large p Permutation test (D randomly permuted among subjects) heavy computational burden FDR control for lasso (logistic regression) based on knockoffs Is there a more powerful, computationally efficient, and model free approach? Ian McKeague (Columbia University) Marginal Screening August 13, / 29

14 Hypotheses Question: Is D significantly related to any of the risk factors W 1,..., W p? Define k 0 arg max log θ k /σ k. k=1,...,p where σ k > 0 is a prescribed sequence of normalizing constants. Hypotheses: H 0 : log θ 0 = 0 versus H a : log θ 0 0, where θ 0 = θ k0. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

15 Binary Screening Test (BST) Empirical odds ratio ˆθ k = X k(x k + N 2k M 1 ), k = 1,..., p. (N 1k X k )(M 1 X k ) Estimate of k 0 : ˆk N arg max log ˆθ k /ˆτ k, k where ˆτ k is the standard error of log ˆθ k. Test Statistic T N = log ˆθ N = log ˆθˆk N. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

16 Asymptotic behavior of ˆθ N under local alternatives Local parameterization: log θ (N) = log θ (0) + b/ N, θ = (θ 1,..., θ p ) T. Hypotheses: H 0 : log θ N = 0 versus H a : log θ N 0, where θ N = θ kn, k N arg max k log θ k /σ k. Theorem Under regularity conditions, N(log ˆθ N log θ N ) d { σk0 Z k0 if θ (0) 1, σ K Z K + b K b k if θ (0) = 1, where (Z 1,..., Z p ) T N(0, C X ) with C X be the limit of Corr(X 1,..., X p ); k 0 = arg max k log θ (0) k /σ k assumed to be unique when θ (0) 1; k = arg max k b k /σ k assumed to be unique when θ (0) = 1 and b 0; and K = arg max k=1,...,p (Z k + b k /σ k ) 2. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

17 Key regularity condition Kou and Ying (1996) established the (marginal) representation M 1 d X k = I (η sk (1 + θ 1 k λ sk) 1, s N 1k ), s=1 where η sk iid Unif(0, 1) and λ sk 0 are the roots of the Jacobi polynomial φ k (z) = min(m 1,N 1k ) u=max(0,m 1 N 2k ) ( N1k u ) ( N2k M 1 u ) z u. We need to assume this representation holds jointly over all k = 1,..., p. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

18 Calibration of BST Calibrate under the null θ N = 1. That is θ (0) = 1 and b = 0. Only need to estimate the distribution of σ K Z K, where K = arg max k=1,...,p Z 2 k. σ 2 k can be consistently estimated by N ˆτ 2 k, where ˆτ k is the standard error of log ˆθ k. (Z 1,..., Z p ) T N(0, C X ), with C X consistently estimated by the sample correlation matrix of the vector of risk factors (W 1,..., W p ) restricted to the data on D = 1. Draw from the estimated null distribution of log ˆθ N using Monte Carlo simulations to obtain critical values. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

19 Confidence intervals for log θ 0 Three possibilities: CI 0 : use the same Monte Carlo calibration as BST CI max : use the most conservative critical values given by the limiting distributions in the Theorem as a function of b, with the ˆk N -th component of b allowed to vary freely and all its other components set to zero. CI boot : Select the value of on the grid that provides the best bˆkn agreement with the nominal 95% level in terms of the coverage of a bootstrapped version of log ˆθ N. Only CI boot works well. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

20 Simulation studies Three scenarios: A) Null: W k Ber(0.5) for k = 1,..., p; B) Alternative (weak dense signal): W k Ber(0.6), k = 1,..., p/2, W k Ber(0.5), k = p/2 + 1,..., p for cases, and W k Ber(0.5), k = 1,..., p/2, W k Ber(0.6), k = p/2 + 1,..., p for controls; C) Alternative (strong sparse signal): W 1 Ber(0.65), W 2 Ber(0.6), and W 3 Ber(0.55) for cases, W k Ber(0.4), k = 1, 2, 3 for controls, and W k Ber(0.5), k = 4,..., p. N = 200 with M 1 = M 2 = 100. Varying p from 10 to 400. Three correlation structures: independent, exchangeable, AR(1). Ian McKeague (Columbia University) Marginal Screening August 13, / 29

21 Independent risk factors Model p BST Bonferroni Permutation A (0.5) 3.1 (0.15) 4.7 (10.7) (0.9) 1.9 (0.06) 4.5 (51) (1.6) 2.7 (0.1) 4.7 (102) (2.1) 3.6 (0.2) 5.6 (193) (4.3) 2.3 (0.4) 4.6 (389) B (0.5) 51.0 (0.02) 60.3 (11) (1.0) 64.0 (0.06) 81.0 (55) (1.6) 73.0 (0.1) 85.0 (102) (2.4) 85.7 (0.2) 91.9 (200) (4.3) 86.1 (0.4) 96.3 (387) C (0.6) 90.9 (0.02) 93.6 (11) (1.0) 68.5 (0.06) 77.5 (51) (1.4) 64.0 (0.1) 71.3 (100) (2.2) 54.8 (0.2) 60.4 (200) (4.0) 42.7 (0.4) 54.3 (388) Table: Empirical rejection rates (%) over 1, 000 Monte Carlo iterations and average runtime (seconds) per iteration when W k, k = 1,..., p, are independent. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

22 AR(1) risk factors Model p BST Bonferroni Permutation A B C Table: Empirical rejection rates (%) with AR(1) correlation structure Corr(W j, W k ) = 0.5 j k. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

23 Exchangeable risk factors Model p BST Bonferroni Permutation A B C Table: Empirical rejection rates (%) based on 1,000 samples generated from models A, B and C with exchangeable correlation structure Corr(W j, W k ) = 0.5 for j k. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

24 Performance Comparisons Simulation studies show: BST has good control of type I error rate, while consistently maintaining the highest power compared with the Bonferroni and Permutation test approaches. Advantage of BST is most evident when the 2 2 tables are highly correlated. BST is 10 times slower than Bonferroni (due to the computationally intensive simulation step). BST 100 times faster than the Permutation test (using 1000 permutations and 1000 Monte Carlo draws) BST is 1000 times faster than ART (our marginal screening test based on linear regression, which needs the double bootstrap). Ian McKeague (Columbia University) Marginal Screening August 13, / 29

25 Forward Stepwise BST Run BST. If a significant risk factor is found (say ˆk N ), then 1 Split the data on the remaining risk factors into two collections of p 1 tables: exposed or unexposed to ˆk N ; 2 For each of the remaining p 1 risk factors, calculate the Mantel-Haenszel OR estimate ˆθ k and standard error of log-or ˆτ k from each pair of the 2 2 tables. This yields a new test statistic T N. 3 Estimate the null distribution with the new correlation matrix C X estimated by the (p 1) (p 1) submatrix of the original estimate C X excluding the entries involving ˆk N. Repeat steps 1 3 until no more significant risk factors are found. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

26 Example: RACE Study N = 2493 with 1220 cases and 1273 controls. p = 2000: first 2,000 genetic variants on chromosome 5. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

27 Example: RACE Study (cont d) 95% confidence intervals based on CI boot : Ian McKeague (Columbia University) Marginal Screening August 13, / 29

28 Censored survival data (Huang, M, Q, 2017, Statistica Sinica, submitted) Only observe Ỹ = min(y, C), δ = 1 Y C, X = (X 1,..., X p ), where C is independent of X and ɛ. ART extends using a synthetic response Y S in place of Y. Koul, Susala and Van Ryzin (1981): linear regression based on Y S = δỹ where S(t) = P(C > t) survival function of C, S(Ỹ ) with plug-in of the K-M estimator of S. Correlations preserved: Corr(Y S, X k ) = Corr(Y, X k ) for all k, so the unreasonable effectiveness of marginal screening still applies. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

29 Selected References McKeague, I. W. and Qian, M. (2017). Marginal screening of 2 2 tables in large-scale case-control studies. In preparation for Biometrics. McKeague, I. W. and Qian, M. (2015). An adaptive resampling test for detecting the presence of significant predictors (with discussion). JASA 110, Wang, H. Judy, McKeague, I. W. and Qian, M. (2017). Testing for marginal linear effects in quantile regression. JRSS-B, to appear. Huang, T.-J., McKeague, I. W. and Qian, M. (2017). Marginal screening for high-dimensional predictors of survival outcomes. Submitted to Statistica Sinica. Ian McKeague (Columbia University) Marginal Screening August 13, / 29

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Knockoffs as Post-Selection Inference

Knockoffs as Post-Selection Inference Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Lecture 12 April 25, 2018

Lecture 12 April 25, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 12 April 25, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes, Chenyang Zhong 1 Outline Agenda: The Knockoffs Framework 1. The Knockoffs Framework

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

A knockoff filter for high-dimensional selective inference

A knockoff filter for high-dimensional selective inference 1 A knockoff filter for high-dimensional selective inference Rina Foygel Barber and Emmanuel J. Candès February 2016; Revised September, 2017 Abstract This paper develops a framework for testing for associations

More information

Multiple Sample Categorical Data

Multiple Sample Categorical Data Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007

PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007 Cohort study s formulations PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007 Srine Dudoit Division of Biostatistics Department of Statistics University of California, Berkeley www.stat.berkeley.edu/~srine

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Selection-adjusted estimation of effect sizes

Selection-adjusted estimation of effect sizes Selection-adjusted estimation of effect sizes with an application in eqtl studies Snigdha Panigrahi 19 October, 2017 Stanford University Selective inference - introduction Selective inference Statistical

More information

Analysis of Categorical Data Three-Way Contingency Table

Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

More information

1 Comparing two binomials

1 Comparing two binomials BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Rank conditional coverage and confidence intervals in high dimensional problems

Rank conditional coverage and confidence intervals in high dimensional problems conditional coverage and confidence intervals in high dimensional problems arxiv:1702.06986v1 [stat.me] 22 Feb 2017 Jean Morrison and Noah Simon Department of Biostatistics, University of Washington, Seattle,

More information

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Bimal Sinha Department of Mathematics & Statistics University of Maryland, Baltimore County,

More information

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

A Reliable Constrained Method for Identity Link Poisson Regression

A Reliable Constrained Method for Identity Link Poisson Regression A Reliable Constrained Method for Identity Link Poisson Regression Ian Marschner Macquarie University, Sydney Australasian Region of the International Biometrics Society, Taupo, NZ, Dec 2009. 1 / 16 Identity

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

Empirical Likelihood Inference for Two-Sample Problems

Empirical Likelihood Inference for Two-Sample Problems Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Testing for Marginal Linear Effects in Quantile Regression

Testing for Marginal Linear Effects in Quantile Regression Testing for Marginal Linear Effects in Quantile Regression Huixia Judy Wang, Ian W. McKeague and Min Qian Abstract This paper develops a new marginal testing procedure to detect the presence of significant

More information

Double Bootstrap Confidence Intervals in the Two Stage DEA approach. Essex Business School University of Essex

Double Bootstrap Confidence Intervals in the Two Stage DEA approach. Essex Business School University of Essex Double Bootstrap Confidence Intervals in the Two Stage DEA approach D.K. Chronopoulos, C. Girardone and J.C. Nankervis Essex Business School University of Essex 1 Determinants of efficiency DEA can be

More information

Big Data Analysis with Apache Spark UC#BERKELEY

Big Data Analysis with Apache Spark UC#BERKELEY Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

arxiv: v1 [stat.me] 29 Dec 2018

arxiv: v1 [stat.me] 29 Dec 2018 On the Construction of Knockoffs in Case-Control Studies Rina Foygel Barber Emmanuel J. Candès arxiv:1812.11433v1 [stat.me] 29 Dec 2018 December, 2018 Abstract Consider a case-control study in which we

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Test of Association between Two Ordinal Variables while Adjusting for Covariates

Test of Association between Two Ordinal Variables while Adjusting for Covariates Test of Association between Two Ordinal Variables while Adjusting for Covariates Chun Li, Bryan Shepherd Department of Biostatistics Vanderbilt University May 13, 2009 Examples Amblyopia http://www.medindia.net/

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

A multiple testing procedure for input variable selection in neural networks

A multiple testing procedure for input variable selection in neural networks A multiple testing procedure for input variable selection in neural networks MicheleLaRoccaandCiraPerna Department of Economics and Statistics - University of Salerno Via Ponte Don Melillo, 84084, Fisciano

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

ST495: Survival Analysis: Hypothesis testing and confidence intervals

ST495: Survival Analysis: Hypothesis testing and confidence intervals ST495: Survival Analysis: Hypothesis testing and confidence intervals Eric B. Laber Department of Statistics, North Carolina State University April 3, 2014 I remember that one fateful day when Coach took

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information

Equivalence of random-effects and conditional likelihoods for matched case-control studies

Equivalence of random-effects and conditional likelihoods for matched case-control studies Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and

More information

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15. NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION

More information

Discussion of Papers on the Extensions of Propensity Score

Discussion of Papers on the Extensions of Propensity Score Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Andreas Buja joint with: Richard Berk, Lawrence Brown, Linda Zhao, Arun Kuchibhotla, Kai Zhang Werner Stützle, Ed George, Mikhail

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap University of Zurich Department of Economics Working Paper Series ISSN 1664-7041 (print) ISSN 1664-705X (online) Working Paper No. 254 Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and

More information

Data Uncertainty, MCML and Sampling Density

Data Uncertainty, MCML and Sampling Density Data Uncertainty, MCML and Sampling Density Graham Byrnes International Agency for Research on Cancer 27 October 2015 Outline... Correlated Measurement Error Maximal Marginal Likelihood Monte Carlo Maximum

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Statistica Sinica 20 (2010), 441-453 GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Antai Wang Georgetown University Medical Center Abstract: In this paper, we propose two tests for parametric models

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Confounder Adjustment in Multiple Hypothesis Testing

Confounder Adjustment in Multiple Hypothesis Testing in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen

More information

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

On Two-Stage Hypothesis Testing Procedures Via Asymptotically Independent Statistics

On Two-Stage Hypothesis Testing Procedures Via Asymptotically Independent Statistics UW Biostatistics Working Paper Series 9-8-2010 On Two-Stage Hypothesis Testing Procedures Via Asymptotically Independent Statistics James Dai FHCRC, jdai@fhcrc.org Charles Kooperberg fred hutchinson cancer

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

PANNING FOR GOLD: MODEL-FREE KNOCKOFFS FOR HIGH-DIMENSIONAL CONTROLLED VARIABLE SELECTION. Emmanuel J. Candès Yingying Fan Lucas Janson Jinchi Lv

PANNING FOR GOLD: MODEL-FREE KNOCKOFFS FOR HIGH-DIMENSIONAL CONTROLLED VARIABLE SELECTION. Emmanuel J. Candès Yingying Fan Lucas Janson Jinchi Lv PANNING FOR GOLD: MODEL-FREE KNOCKOFFS FOR HIGH-DIMENSIONAL CONTROLLED VARIABLE SELECTION By Emmanuel J. Candès Yingying Fan Lucas Janson Jinchi Lv Technical Report No. 2016-05 October 2016 Department

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Latent Variable Methods for the Analysis of Genomic Data

Latent Variable Methods for the Analysis of Genomic Data John D. Storey Center for Statistics and Machine Learning & Lewis-Sigler Institute for Integrative Genomics Latent Variable Methods for the Analysis of Genomic Data http://genomine.org/talks/ Data m variables

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information