Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Size: px
Start display at page:

Download "Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)"

Transcription

1 Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse

2 Meta-Analysis Today s outline P-value based methods. Fixed effect model. Meta vs. pooled analysis. Random effects model. Meta vs. pooled analysis New random effects analysis

3 Meta-Analysis Single study is under-powered because the effect of common variant is very modest (OR 1.4) Meta-analysis is an effective way to combine data from multiple independent studies

4 Meta-Analysis Methods P-value based Regression coefficient based Fixed effects model Random effects model

5 P-value Based Conduct meta-analysis using p-values Simple and widely used Fisher and Z -score based methods

6 P-value Based K studies T Fisher = Simple and works well K 2 log(p k ) χ 2 2K k=1 Direction of effect is not considered

7 P-value Based Stouffer s Z -score (one-sided right-tailed) Z k = Φ(1 p k ) Φ is the standard cumulative normal distribution function Z = K k=1 Z k K Z follows the standard normal distribution.

8 Fisher versus Z -score These two are not perfectly linear, but they follow a highly linear relationship over the range of Z values most observed (Z 1).

9 P-value Based Z -score can incorporate direction of effects and weighting scheme βk : effect for study k w k = n k Z k = Φ(1 p k /2) sign(β k ) K k=1 Z = w kz k N(0, 1) K k=1 w 2 k

10 P-value Based Easy to use Z -score is perhaps more popular as it works well to find a consistent signal from studies Cannot estimate the effect size

11 Regression Coefficient Based For each study k = 1,..., K g{e(y ik )} = X ik α k + G ik β k Estimate β k and its standard error, ( β k, σ k ) β k N(β k, σ 2 k )

12 Fixed Effect Model Assume β 1 = = β K Effect size estimation β = K k=1 w k β k K k=1 w k n( β β) N ( 0, n k w 2 k σ2 k ( k w k) 2 w k = 1 gives the minimal variance of β var( β k ) )

13 Fixed Effect Model Assumes all studies in the analysis have the same effect Each study can be considered as a random sample drawn from the population with true parameter value β. There is between-study heterogeneity Different definitions of phenotypes. Effect size may be higher (or lower) in certain subgroups (e.g., age, sex).

14 Assess Heterogeneity Cochran s Q test K Q = w k ( β k β) 2 k=1 Q should be large if there is heterogeneity. Under the null hypothesis of no heterogeneity Q χ 2 K 1 Under powered when there are fewer studies.

15 Assess Heterogeneity Measure the % of total variance explained by the between-study heterogeneity (Higgins and Thompson 2002) I 2 Q (K 1) = 100% Q > 50% indicates large heterogeneity Intuitive interpretation, simple to calculate, and can be accompanied by an uncertainty interval

16 An Example

17 Meta- vs. Pooled-Analysis Meta analysis: Combining summary statistics (e.g., β k ) of studies Pooled analysis: Combining original or individual-level of all studies

18 Relative Efficiency For k = 1,, K studies g{e(y ki )} = α k + β T X ki β is common to all K studies; α k is specific to the kth study The maximum likelihood estimator (MLE) β k for the kth study by maximizing n k L k = sup αk f (Y ki, X ki ; β, α k ) i=1

19 Relative Efficiency β is MLE of β by maximizing K n k L = sup {α1,,α K } f (Y ki, X ki ; β, α k ) k=1 i=1 K n k = sup αk f (Y ki, X ki ; β, α k ) = k=1 K k=1 L k i=1

20 Relative Efficiency Information Recall var( β) = 1 I(β) I(β) = 2 K β 2 log L = 2 β 2 log L k = var( β) = K k=1 k=1 1 1 var(β k ) = K I k (β) k=1 1 K k=1 I k(β) Using summary statistics has the same asymptotic efficiency as using original data, if β is the only common parameter across studies.

21 Relative Efficiency Common nuisance parameters, say γ. ( ) ( ) IK I k = ββ I K βγ Iββ I I = βγ I K β I K γγ { K var( β) = (I kββ I kβγ I 1 k=1 = (I ββ K k=1 kγγ I kγβ) I kβγ I 1 kγγ I kγβ) 1 I γβ I γγ } 1 (I ββ I βγ I 1 γγ I γβ ) 1 = var( β) Equality holds if and only of var(β k ) 1 cov( β k, γ k ) are the same among the K studies.

22 Random Effects Model Under the fixed effect model, β 1 = = β K = β The random effects model β k = β + ξ k (k = 1,, K ) ξ k N(0, τ 2 )

23 Random Effects Model Estimation of τ 2 DerSimonian & Laird (1986) method-of-moments estimator V k = var( β k β k ) var( β k ) = V k + τ 2 τ 2 = V 1 K Q (K 1) V 2 K / V 1 K Estimate β with inverse variance estimator ŵ k = var( β k ) SE( β) = β = K k=1 ŵk β k K k=1 ŵk 1 k ŵk

24 Random Effects Model Fixed effect model is more powerful, but ignores the heterogeneity between studies. Random effects model is probably more robust, but is under powered. The confidence intervals have poor coverage for small and moderate K.

25 Meta- vs Pooled-Analysis The maximum likelihood estimator β from pooled data can be obtained by maximizing K k=1 ) 1 f k (Y ki X ki, β + ξ k ) ( exp ξ2 k 2πτ 2 2τ 2 dξ k Challenging to establish the theoretical properties of β and β under the random effects model because n k K.

26 Asymptotic Distributions (Zeng and Lin 2015) Assumptions: For k = 1,, K, n k = π k n for some constant π k within a compact interval (0, ). τ 2 = 1 n σ2, σ 2 is a constant. The between-study variability is comparable to the within-study variability. Asymptotic distribution of MLE β n( β β0 ) d ( K n τ 2 d Ã; n K Vk v k k=1 π k v k + π k A ) K k=1 π 1/2 k v k + π k A Z k Z k are independently distributed N(0, v k + π k σ 2 0 ). A is a complicated form that involves {Z k }. It is a mixture of normal random variables with the mixing probabilities both being random and correlated with the normal random variables.

27 Asymptotic Distributions Asymptotic distribution of β n τ 2 d  n( β β0 ) d ( K k=1 π k v k + π k  ) 1 K k=1 π 1/2 k Z k v k + π k  As n, K, Kn 1/2 0, var( β) var( β), i.e., the weighted average estimator β is at least as efficient as the MLE When K = 100, 200, 400, the empirical relative efficiency is 1.037, 1.083, and Perform statistical inference for β and β based on asymptotic distributions using resampling techniques Or simpler, use the profile likelihood to construct 95% confidence intervals (Hardy and Thompson, 1996)

28 95% Coverage Probabilities Solid: Zeng & Lin; Dashed: DerSimonian-Laird; dotted Jackson-Bowden resampling method; dot-dash: Hardy-Thompson profile method. Left: K=10 studies; Right: K=20

29 Testing with random effects model Random effects (RE) model gives less significant p values than the fixed effects model when the variants show varying effect sizes between studies Ironic because RE is designed specifically for the case in which there is heterogeneity All associations identified RE are usually identified by the fixed effect model Causal variants showing high between-study heterogeneity might not be discovered by either method

30 Revisit the Traditional RE First step: estimate the effect size and CI by taking heterogeneity into account. Second step: normalize β/se( β) and translate it into p-value. Effectively, RE assumes heterogeneity under the null hypothesis, i.e., β k N(0, σ 2 + τ 2 ). There should not be heterogeneity under the null because β 1 = = β K = 0.

31 New RE New null hypothesis H 0 : β = 0, τ 2 = 0. Model β = ( β 1,, β K ). β k N(0, σ 2 ) β follows a multivariate normal distribution β MVN(β1, Σ) The likelihood ratio test statistic. Σ = V + τ 2 I, V = diag(v 1,, V K ). S new = 2 log L 1 L 0 = L 1 = K k=1 K k=1 L 0 1 2πVk exp ( β 2 k 2V k ) ( 1 2π(Vk + τ 2 ) exp ( β k β) 2 2(V k + τ 2 ) )

32 New RE Basically we test both fixed and random effects together S new = S β + S τ 2 S β is equal to the Z-statistic based on the fixed effect model, and S τ 2 is the test statistic for testing τ 2 = 0 β is unconstrained, but τ 2 0. Hence, S new 1 2 χ χ2 0. However, asymptotics cannot be applied because of small K. The asymptotic p-value is conservative because of the tail of asymptotic distribution is thicker than that of the true distribution at the tail Resampling approach should be used to calculate p-values

33 Power Comparison

34 Interpretation and Prioritization In the usual meta-analysis where one collects similar studies and expects the common effect, the results found by the fixed effects model should be the top priority. An association showing large heterogeneity requires careful investigation (e.g., different pathways, LD pattern, different study populations). Effect size estimate and CI is the same as those in the current RE, but may be inconsistent between wide CI and statistically significant results.

35 Comparison between meta- vs pooled-analysis Meta Pooled Logistic Easy Difficult Time-efficient Cheap Bias Time-consuming Costly Publication Possible Unlikely Exposure assessment May not be comparable Comparable Confounders May not be consistent Consistent Efficiency Relatively large data Similar Similar Sparse data May be unstable Better Complex analysis (e.g. machine learning) Difficult Easy Long-term benefit Always requires coordination Better

36 Meta- vs pooled-analysis for sparse data β is unstable if data are sparse in each study. However, if the interest is only on testing H 0 : β = 0, there are ways to combine the studies with only summary statistics. Recall the score test: [ [ 0)] β β=0 log L(β, α I(β = 0 α 0 ) 0)] 1 β β=0 log L(β, α χ 2 1 I(β = 0 α 0 ) = { I ββ I βα I 1 ααi αβ } β=0, α0 [ ] β log L(β, α 0) β=0 = [ K k=1 β log L β=0 k(β, α 0 )] I ββ β=0, α0 = K β=0, α0. Similar for I βα, I αα, and I αβ k=1 I kββ

37 Summary P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

38 Recommended Reading DerSimonian R & Laird N (1986). Meta-analysis in clinical trials. Contr Clin Trials 7: Han B and Eskin E (2011). Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am J Hum Genet 88: Hardy RJ & Thomson SG (1996) A likelihood approach to meta-analysis with random effects. Stat in Med 30: Lin DY and Zeng D (2010). On the relative efficiency of using summary statistics vs. individual level data in meta-analysis. Biometrika 97: Zeng D and Lin DY (2015). On random-effects meta-analysis. Biometrika 102:

On the relative efficiency of using summary statistics versus individual level data in meta-analysis

On the relative efficiency of using summary statistics versus individual level data in meta-analysis On the relative efficiency of using summary statistics versus individual level data in meta-analysis By D. Y. LIN and D. ZENG Department of Biostatistics, CB# 7420, University of North Carolina, Chapel

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Review We have covered so far: Single variant association analysis and effect size estimation GxE interaction and higher order >2 interaction Measurement error in dietary variables (nutritional epidemiology)

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

An accurate test for homogeneity of odds ratios based on Cochran s Q-statistic

An accurate test for homogeneity of odds ratios based on Cochran s Q-statistic Kulinskaya and Dollinger TECHNICAL ADVANCE An accurate test for homogeneity of odds ratios based on Cochran s Q-statistic Elena Kulinskaya 1* and Michael B Dollinger 2 * Correspondence: e.kulinskaya@uea.ac.uk

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Maximum-Likelihood Estimation: Basic Ideas

Maximum-Likelihood Estimation: Basic Ideas Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators

More information

Confidence Distribution

Confidence Distribution Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

More information

A re-appraisal of fixed effect(s) meta-analysis

A re-appraisal of fixed effect(s) meta-analysis A re-appraisal of fixed effect(s) meta-analysis Ken Rice, Julian Higgins & Thomas Lumley Universities of Washington, Bristol & Auckland tl;dr Fixed-effectS meta-analysis answers a sensible question regardless

More information

Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy

Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy Received: 20 October 2016 Revised: 15 August 2017 Accepted: 23 August 2017 DOI: 10.1002/sim.7492 RESEARCH ARTICLE Extending the MR-Egger method for multivariable Mendelian randomization to correct for

More information

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Summary and discussion of The central role of the propensity score in observational studies for causal effects Summary and discussion of The central role of the propensity score in observational studies for causal effects Statistics Journal Club, 36-825 Jessica Chemali and Michael Vespe 1 Summary 1.1 Background

More information

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

EPSE 594: Meta-Analysis: Quantitative Research Synthesis EPSE 594: Meta-Analysis: Quantitative Research Synthesis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 24, 2019 Ed Kroc (UBC) EPSE 594 January 24, 2019 1 / 37 Last time Composite effect

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

Hypothesis testing:power, test statistic CMS:

Hypothesis testing:power, test statistic CMS: Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this

More information

Meta-analysis. 21 May Per Kragh Andersen, Biostatistics, Dept. Public Health

Meta-analysis. 21 May Per Kragh Andersen, Biostatistics, Dept. Public Health Meta-analysis 21 May 2014 www.biostat.ku.dk/~pka Per Kragh Andersen, Biostatistics, Dept. Public Health pka@biostat.ku.dk 1 Meta-analysis Background: each single study cannot stand alone. This leads to

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

8. Instrumental variables regression

8. Instrumental variables regression 8. Instrumental variables regression Recall: In Section 5 we analyzed five sources of estimation bias arising because the regressor is correlated with the error term Violation of the first OLS assumption

More information

Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Power and sample size calculations for designing rare variant sequencing association studies.

Power and sample size calculations for designing rare variant sequencing association studies. Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Design of Multiregional Clinical Trials (MRCT) Theory and Practice

Design of Multiregional Clinical Trials (MRCT) Theory and Practice Design of Multiregional Clinical Trials (MRCT) Theory and Practice Gordon Lan Janssen R&D, Johnson & Johnson BASS 2015, Rockville, Maryland Collaborators: Fei Chen and Gang Li, Johnson & Johnson BASS 2015_LAN

More information

Binary choice 3.3 Maximum likelihood estimation

Binary choice 3.3 Maximum likelihood estimation Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood

More information

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017 Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 46 Lecture Overview 1. Variance Component

More information

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method Rebecca Barter February 2, 2015 Confidence Intervals Confidence intervals What is a confidence interval? A confidence interval is calculated

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 7 Course Summary To this point, we have discussed group sequential testing focusing on Maintaining

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Biostat Methods STAT 5820/6910 Handout #9a: Intro. to Meta-Analysis Methods

Biostat Methods STAT 5820/6910 Handout #9a: Intro. to Meta-Analysis Methods Biostat Methods STAT 5820/6910 Handout #9a: Intro. to Meta-Analysis Methods Meta-analysis describes statistical approach to systematically combine results from multiple studies [identified follong an exhaustive

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 17. Bayesian inference; Bayesian regression Training == optimisation (?) Stages of learning & inference: Formulate model Regression

More information

Is the cholesterol concentration in blood related to the body mass index (bmi)?

Is the cholesterol concentration in blood related to the body mass index (bmi)? Regression problems The fundamental (statistical) problems of regression are to decide if an explanatory variable affects the response variable and estimate the magnitude of the effect Major question:

More information

Prediction intervals for random-effects meta-analysis: a confidence distribution approach

Prediction intervals for random-effects meta-analysis: a confidence distribution approach Statistical Methods in Medical Research 2018. In press. DOI: 10.1177/0962280218773520 Prediction intervals for random-effects meta-analysis: a confidence distribution approach Kengo Nagashima 1 *, Hisashi

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

More information

Probability and Statistics

Probability and Statistics CHAPTER 5: PARAMETER ESTIMATION 5-0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 5: PARAMETER

More information

Prediction intervals for random-effects meta-analysis: a confidence distribution approach

Prediction intervals for random-effects meta-analysis: a confidence distribution approach Preprint / Submitted Version Prediction intervals for random-effects meta-analysis: a confidence distribution approach Kengo Nagashima 1 *, Hisashi Noma 2, and Toshi A. Furukawa 3 Abstract For the inference

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Bimal Sinha Department of Mathematics & Statistics University of Maryland, Baltimore County,

More information

Chapter 20: Logistic regression for binary response variables

Chapter 20: Logistic regression for binary response variables Chapter 20: Logistic regression for binary response variables In 1846, the Donner and Reed families left Illinois for California by covered wagon (87 people, 20 wagons). They attempted a new and untried

More information

Likelihood and Fairness in Multidimensional Item Response Theory

Likelihood and Fairness in Multidimensional Item Response Theory Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational

More information

Score test for random changepoint in a mixed model

Score test for random changepoint in a mixed model Score test for random changepoint in a mixed model Corentin Segalas and Hélène Jacqmin-Gadda INSERM U1219, Biostatistics team, Bordeaux GDR Statistiques et Santé October 6, 2017 Biostatistics 1 / 27 Introduction

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Beta-binomial model for meta-analysis of odds ratios

Beta-binomial model for meta-analysis of odds ratios Research Article Received 9 June 2016, Accepted 3 January 2017 Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.7233 Beta-binomial model for meta-analysis of odds ratios

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

University of Bristol - Explore Bristol Research

University of Bristol - Explore Bristol Research Bowden, J., Del Greco M, F., Minelli, C., Davey Smith, G., Sheehan, N., & Thompson, J. (2017). A framework for the investigation of pleiotropy in twosample summary data Mendelian randomization. Statistics

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

A Robust Test for Two-Stage Design in Genome-Wide Association Studies

A Robust Test for Two-Stage Design in Genome-Wide Association Studies Biometrics Supplementary Materials A Robust Test for Two-Stage Design in Genome-Wide Association Studies Minjung Kwak, Jungnam Joo and Gang Zheng Appendix A: Calculations of the thresholds D 1 and D The

More information

A3. Statistical Inference Hypothesis Testing for General Population Parameters

A3. Statistical Inference Hypothesis Testing for General Population Parameters Appendix / A3. Statistical Inference / General Parameters- A3. Statistical Inference Hypothesis Testing for General Population Parameters POPULATION H 0 : θ = θ 0 θ is a generic parameter of interest (e.g.,

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

1 Preliminary Variance component test in GLM Mediation Analysis... 3

1 Preliminary Variance component test in GLM Mediation Analysis... 3 Honglang Wang Depart. of Stat. & Prob. wangho16@msu.edu Omics Data Integration Statistical Genetics/Genomics Journal Club Summary and discussion of Joint Analysis of SNP and Gene Expression Data in Genetic

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Practical considerations for survival models

Practical considerations for survival models Including historical data in the analysis of clinical trials using the modified power prior Practical considerations for survival models David Dejardin 1 2, Joost van Rosmalen 3 and Emmanuel Lesaffre 1

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression LOGISTIC REGRESSION Regression techniques provide statistical analysis of relationships. Research designs may be classified as eperimental or observational; regression analyses are applicable to both types.

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett

More information

Empirical likelihood-based methods for the difference of two trimmed means

Empirical likelihood-based methods for the difference of two trimmed means Empirical likelihood-based methods for the difference of two trimmed means 24.09.2012. Latvijas Universitate Contents 1 Introduction 2 Trimmed mean 3 Empirical likelihood 4 Empirical likelihood for the

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology

More information

Lecture 4 January 23

Lecture 4 January 23 STAT 263/363: Experimental Design Winter 2016/17 Lecture 4 January 23 Lecturer: Art B. Owen Scribe: Zachary del Rosario 4.1 Bandits Bandits are a form of online (adaptive) experiments; i.e. samples are

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Chapter 7, continued: MANOVA

Chapter 7, continued: MANOVA Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Understanding Ding s Apparent Paradox

Understanding Ding s Apparent Paradox Submitted to Statistical Science Understanding Ding s Apparent Paradox Peter M. Aronow and Molly R. Offer-Westort Yale University 1. INTRODUCTION We are grateful for the opportunity to comment on A Paradox

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information