IDR: Irreproducible discovery rate

Size: px
Start display at page:

Download "IDR: Irreproducible discovery rate"

Transcription

1 IDR: Irreproducible discovery rate Sündüz Keleş Department of Statistics Department of Biostatistics and Medical Informatics University of Wisconsin, Madison April 18, 2017 Stat 877 (Spring 17) 04/11-04/18 1 / 19

2 IDR Li, Brown, Huang, Bickel (2011). Measuring reproducibility of high-throughput experiments, Annals of Applied Statistics, Vol 5, No 3, Stat 877 (Spring 17) 04/11-04/18 2 / 19

3 IDR Developed within the ENCODE project. Most ChIP-seq experiments in the ENCODE project have two replicates. Peaks are called on each replicate separately. Peaks from two replicates are compared to identify reproducible peaks. Many peak callers result in very different numbers of peaks for the same dataset. Next 5 slides are courtesy of Anshul Kundaje. Stat 877 (Spring 17) 04/11-04/18 3 / 19

4 IDR: Expected rate of irreproducible discoveries. The goal is to limit the expected proportion of peaks that are not reproducible across replicates. Stat 877 (Spring 17) 04/11-04/18 4 / 19

5 Peak calling Evaluation Evaluated multiple peak callers SPP (Anshul Kundaje) GEM* (David Gifford Lab) PeakSeq (Mark Gerstein Lab) MACSv2 (Tao Liu, Shirley Liu) MOSAICS*/dPeak (Sunduz Keles lab) Hotspot (Bob Thurman) BELT (Victor Jin, Peggy Farnham) Scripture (Noam Soresh) *( sequence aware) Irreproducible Discovery Rate (IDR) model Evaluation datasets CTCF (narrow peaks, high SNR, high specificity motif, > 50K sites) POL2 (narrow and diffused peaks, high SNR, bind at/near TSSs, > 10K sites) GABP (narrow peaks, high SNR, homotypic closely spaced binding events, high specificity motif, > 5K sites) P300 (narrow peaks, high SNR, no sequence motif, enhancer binding) ZNF274 (diffused peaks, low SNR, at ZNF genes, < 1K peaks) Stat 877 (Spring 17) 04/11-04/18 5 / 19

6 Stat 877 (Spring 17) 04/11-04/18 6 / 19

7 Number of datasets # peaks called by MACS # peaks called by MACS Instability of default peak calling thresholds Default thresholds IDR threshold of 2% # peaks called by SPP # peaks called by SPP Stat 877 (Spring 17) 04/11-04/18 7 / 19

8 Rep2 peak ranks Rep2 peak ranks Peak calling Evaluation Peak Rank Scatter plots Stable ranking measure Evaluation criteria No. of reproducibile peak calls (IDR) Stability of ranking measures (peak rank scatter plots) Precision of binding events (distance from motifs) Resolution of closely spaced binding events Ability to detect sites for high and low SNR datasets Ease of use (Speed, memory) Backward compatibility Rep1 peak ranks Unstable ranking measure Peaks that pass IDR Peaks that do not pass IDR Rep1 peak ranks Stat 877 (Spring 17) 04/11-04/18 8 / 19

9 Peak calling evaluation Default peak caller thresholds are highly unstable and incomparable IDR greatly stabilizes peak calling thresholds CTCF POL2 ZNF274 Stat 877 (Spring 17) 04/11-04/18 9 / 19

10 How does it work? A graphical method (X 1,1, X 1,2 ),, (X n,1, X n,2 ): signal from n regions on 2 replicates. Ψ n (t, v) = 1 n n I (X i,1 > x ( (1 t)n ),1, X i,2 > x ( (1 t)n ),2 ) i=1 Ψ n (t, v): proportion of pairs that are ranked both on the upper t% of X 1 and v% of X 2 = Empirical bivariate survival function. Since consistency is usually considered as a symmetric notion, use Ψ n (t, t) Ψ n (t). Stat 877 (Spring 17) 04/11-04/18 10 / 19

11 How does it work? Let R(X i,1 ) and R(X i,2 ) be the ranks of X i,1 and X i,2. If R(X i,1 ) and R(X i,2 ) are perfectly correlated for i = 1,, n, then Ψ(t) = t and Ψ (t) = 1. If R(X i,1 ) and R(X i,2 ) are independent for i = 1,, n, then Ψ(t) = t 2 and Ψ (t) = 2t. Stat 877 (Spring 17) 04/11-04/18 11 / 19

12 How does it work? If R(X i,1 ) and R(X i,2 ) are perfectly correlated for the top t 0 n, 0 < t 0 < 1 observations and independent for the remaining (1 t 0 )n observations, then top t 0 n points fall on a straight line of slope 1 on the curve of Ψ(.) and the rest (1 t 0 )n points fall onto a parabola Ψ(t) = t2 2tt 0 +t) 1 t 0. Correspondence curve: Stat 877 (Spring 17) 04/11-04/18 12 / 19

13 How to identify t 0? Inferring the reproducibility of the signals A bivariate Copula model. If (X 1, X 2 ) is a pair of continuous random variables with distribution function F (.,.) and marginals F 1 (.) and F 2 (.), respectively, then U 1 = F 1 (x) U(0, 1) and U 2 = F 2 (x) U(0, 1) and the distribution function of (U 1, U 2 ) is a copula. C(u 1, u 2 ) = Pr(U 1 u 1, U 2 u 2 ) = Pr(X 1 F1 1 (u 1 ), X 2 F2 1 (u 2 )) = F (F1 1 (u 1 ), F2 1 (u 2 )) Equivalently, F (x 1, x 2 ) = C(F 1 (x 1 ), F 2 (x 2 )). Stat 877 (Spring 17) 04/11-04/18 13 / 19

14 (X 1,, X K ) F X with marginals F X1,, F XK (K = 2). The marginals are linked to the joint distribution through a so-called Copula function. Theorem (Sklar 59) There exists at least one Copula C X such that F X (x 1,, x K ) = C X (F X1 (x 1 ),, F X2 (x 2 ),, F XK (x K )). Copula C(.,.) is the joint density of X 1,, X K. Modeling of the marginals and the Copula can be done separately. Stat 877 (Spring 17) 04/11-04/18 14 / 19

15 A Copula mixture model K i Bernoulli(π 1 ) denotes whether i-th peak is from the consistent set (K i = 1) or the spurious (K i = 0) set. Given indicator K i, the dependence between the replicates is induced by z 1 = (z 1,1, z 1,2 ) if K i = 1 and z 0 = (z 0,1, z 0,2 ) of K i = 0. ( zi,1 z i,2 ) (( µk K i = k N µ k ) ( σ 2, k ρ k σk 2 ρ k σk 2 σk 2 )), where µ 0 = 0, µ 1 > 0, σ 2 0 = 1, ρ 0 = 0, 0 < ρ 1 < 1. Stat 877 (Spring 17) 04/11-04/18 15 / 19

16 A Copula mixture model Let ( ) zi,1 µ 1 u i,1 G(z i,1 ) = π 1 Φ + π 0 Φ(z i,1 ) σ 1 ( ) zi,2 µ 1 u i,2 G(z i,2 ) = π 1 Φ + π 0 Φ(z i,2 ), σ 1 where Φ is the standard Normal cumulative distribution function. The actual observations are x i,1 = F1 1 (u i,1 ) x i,2 = F2 1 (u i,2 ), where F 1 and F 2 are the marginal distributions of the two coordinates, which are assumed to be continuous but unknown. Stat 877 (Spring 17) 04/11-04/18 16 / 19

17 A Copula mixture model Then, for signal i: Pr(X i,1 x 1, X i,2 x 2 ) = π 0 h 0 (G 1 (F 1 (x i,1 )), G 1 (F 2 (x i,2 ))) + π 1 h 1 (G 1 (F 1 (x i,1 )), G 1 (F 2 (x i,2 ))), where ) (( 0 h 0 N 0 h 1 N (( µ1 µ 1 ( 1 0, 0 1 )), ), ( σ 2 1 ρ 1 σ 2 1 ρ 1 σ 2 1 σ 2 1 )), Stat 877 (Spring 17) 04/11-04/18 17 / 19

18 A Copula mixture model EM algorithm to estimate θ = (µ 1, ρ 1, σ 1, π 0 ). Inference is based on Pr(K i = 1 (x i,1, x i,2 ); ˆθ). Local irreproducible discovery rate: idr(x i,1, x i,2 ) = Pr(K i = 0 (x i,1, x i,2 ); ˆθ), Then, to control IDR at level α: Rank (x i,1, x i,2 ) by their idr values. Select all (x (i),1, x (i),2 ), i = 1,, l where 1 l = argmax i i i idr j α. Basically, IDR is False Discovery Rate (FDR) control in this specific Copula mixture model. j=1 Stat 877 (Spring 17) 04/11-04/18 18 / 19

19 Caveats If the peak does not appear on both lists, it is discarded. Requires to start with a large set of peaks (consisting of real and spurious peaks). How large should this set be? Continuity assumption. There are typically many ties in scores of peaks. R package: Use the latest version from How well does the semi-parametric Copula model fit the data? Model diagnostics. The main strength is that the scores for peaks can be anything (p-values, log-likelihood, ChIP to input enrichment) etc. Stat 877 (Spring 17) 04/11-04/18 19 / 19

arxiv: v1 [stat.ap] 21 Oct 2011

arxiv: v1 [stat.ap] 21 Oct 2011 The Annals of Applied Statistics 2011, Vol. 5, No. 3, 1752 1779 DOI: 10.1214/11-AOAS466 c Institute of Mathematical Statistics, 2011 MEASURING REPRODUCIBILITY OF HIGH-THROUGHPUT EXPERIMENTS 1 arxiv:1110.4705v1

More information

MEASURING REPRODUCIBILITY OF HIGH-THROUGHPUT EXPERIMENTS 1

MEASURING REPRODUCIBILITY OF HIGH-THROUGHPUT EXPERIMENTS 1 The Annals of Applied Statistics 2011, Vol. 5, No. 3, 1752 1779 DOI: 10.1214/11-AOAS466 Institute of Mathematical Statistics, 2011 MEASURING REPRODUCIBILITY OF HIGH-THROUGHPUT EXPERIMENTS 1 BY QUNHUA LI,

More information

for the Analysis of ChIP-Seq Data

for the Analysis of ChIP-Seq Data Supplementary Materials: A Statistical Framework for the Analysis of ChIP-Seq Data Pei Fen Kuan Departments of Statistics and of Biostatistics and Medical Informatics Dongjun Chung Departments of Statistics

More information

CMARRT: A TOOL FOR THE ANALYSIS OF CHIP-CHIP DATA FROM TILING ARRAYS BY INCORPORATING THE CORRELATION STRUCTURE

CMARRT: A TOOL FOR THE ANALYSIS OF CHIP-CHIP DATA FROM TILING ARRAYS BY INCORPORATING THE CORRELATION STRUCTURE CMARRT: A TOOL FOR THE ANALYSIS OF CHIP-CHIP DATA FROM TILING ARRAYS BY INCORPORATING THE CORRELATION STRUCTURE PEI FEN KUAN 1, HYONHO CHUN 1, SÜNDÜZ KELEŞ1,2 1 Department of Statistics, 2 Department of

More information

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle European Meeting of Statisticians, Amsterdam July 6-10, 2015 EMS Meeting, Amsterdam Based on

More information

Copulas. MOU Lili. December, 2014

Copulas. MOU Lili. December, 2014 Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability

More information

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction

More information

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle 2015 IMS China, Kunming July 1-4, 2015 2015 IMS China Meeting, Kunming Based on joint work

More information

Probability Distributions and Estimation of Ali-Mikhail-Haq Copula

Probability Distributions and Estimation of Ali-Mikhail-Haq Copula Applied Mathematical Sciences, Vol. 4, 2010, no. 14, 657-666 Probability Distributions and Estimation of Ali-Mikhail-Haq Copula Pranesh Kumar Mathematics Department University of Northern British Columbia

More information

TECHNICAL REPORT NO. 1151

TECHNICAL REPORT NO. 1151 DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Avenue Madison, WI 53706 TECHNICAL REPORT NO. 1151 January 12, 2009 A Hierarchical Semi-Markov Model for Detecting Enrichment with Application

More information

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

More information

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics Genome 541 Unit 4, lecture 2 Transcription factor binding using functional genomics Slides vs chalk talk: I m not sure why you chose a chalk talk over ppt. I prefer the latter no issues with readability

More information

MODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA. Naim Rashid

MODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA. Naim Rashid MODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA Naim Rashid A dissertation submitted to the faculty of the University of North Carolina

More information

Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics

Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics I believe it is helpful to number your slides for easy reference. It's been a while since I took

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests

Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests Noryanti Muhammad, Universiti Malaysia Pahang, Malaysia, noryanti@ump.edu.my Tahani Coolen-Maturi, Durham

More information

Gaussian Process Vine Copulas for Multivariate Dependence

Gaussian Process Vine Copulas for Multivariate Dependence Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,

More information

First steps of multivariate data analysis

First steps of multivariate data analysis First steps of multivariate data analysis November 28, 2016 Let s Have Some Coffee We reproduce the coffee example from Carmona, page 60 ff. This vignette is the first excursion away from univariate data.

More information

Joint Analysis of Multiple ChIP-seq Datasets with the. the jmosaics package.

Joint Analysis of Multiple ChIP-seq Datasets with the. the jmosaics package. Joint Analysis of Multiple ChIP-seq Datasets with the jmosaics Package Xin Zeng 1 and Sündüz Keleş 1,2 1 Department of Statistics, University of Wisconsin Madison, WI 2 Department of Statistics and of

More information

Semi-parametric predictive inference for bivariate data using copulas

Semi-parametric predictive inference for bivariate data using copulas Semi-parametric predictive inference for bivariate data using copulas Tahani Coolen-Maturi a, Frank P.A. Coolen b,, Noryanti Muhammad b a Durham University Business School, Durham University, Durham, DH1

More information

= Prob( gene i is selected in a typical lab) (1)

= Prob( gene i is selected in a typical lab) (1) Supplementary Document This supplementary document is for: Reproducibility Probability Score: Incorporating Measurement Variability across Laboratories for Gene Selection (006). Lin G, He X, Ji H, Shi

More information

Hybrid Copula Bayesian Networks

Hybrid Copula Bayesian Networks Kiran Karra kiran.karra@vt.edu Hume Center Electrical and Computer Engineering Virginia Polytechnic Institute and State University September 7, 2016 Outline Introduction Prior Work Introduction to Copulas

More information

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University

More information

The Instability of Correlations: Measurement and the Implications for Market Risk

The Instability of Correlations: Measurement and the Implications for Market Risk The Instability of Correlations: Measurement and the Implications for Market Risk Prof. Massimo Guidolin 20254 Advanced Quantitative Methods for Asset Pricing and Structuring Winter/Spring 2018 Threshold

More information

Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Trivariate copulas for characterisation of droughts

Trivariate copulas for characterisation of droughts ANZIAM J. 49 (EMAC2007) pp.c306 C323, 2008 C306 Trivariate copulas for characterisation of droughts G. Wong 1 M. F. Lambert 2 A. V. Metcalfe 3 (Received 3 August 2007; revised 4 January 2008) Abstract

More information

Lecture 12 April 25, 2018

Lecture 12 April 25, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 12 April 25, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes, Chenyang Zhong 1 Outline Agenda: The Knockoffs Framework 1. The Knockoffs Framework

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC

More information

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

Statistics for Differential Expression in Sequencing Studies. Naomi Altman Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Partial Correlation with Copula Modeling

Partial Correlation with Copula Modeling Partial Correlation with Copula Modeling Jong-Min Kim 1 Statistics Discipline, Division of Science and Mathematics, University of Minnesota at Morris, Morris, MN, 56267, USA Yoon-Sung Jung Office of Research,

More information

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq

Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Xing Ren 1, Jianmin Wang 1,2,, Song Liu 1,2, and Jeffrey C. Miecznikowski 1,2,

More information

Variational Inference with Copula Augmentation

Variational Inference with Copula Augmentation Variational Inference with Copula Augmentation Dustin Tran 1 David M. Blei 2 Edoardo M. Airoldi 1 1 Department of Statistics, Harvard University 2 Department of Statistics & Computer Science, Columbia

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

The Use of Copulas to Model Conditional Expectation for Multivariate Data

The Use of Copulas to Model Conditional Expectation for Multivariate Data The Use of Copulas to Model Conditional Expectation for Multivariate Data Kääri, Meelis; Selart, Anne; Kääri, Ene University of Tartu, Institute of Mathematical Statistics J. Liivi 2 50409 Tartu, Estonia

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models

More information

A measure of radial asymmetry for bivariate copulas based on Sobolev norm

A measure of radial asymmetry for bivariate copulas based on Sobolev norm A measure of radial asymmetry for bivariate copulas based on Sobolev norm Ahmad Alikhani-Vafa Ali Dolati Abstract The modified Sobolev norm is used to construct an index for measuring the degree of radial

More information

Simulating Realistic Ecological Count Data

Simulating Realistic Ecological Count Data 1 / 76 Simulating Realistic Ecological Count Data Lisa Madsen Dave Birkes Oregon State University Statistics Department Seminar May 2, 2011 2 / 76 Outline 1 Motivation Example: Weed Counts 2 Pearson Correlation

More information

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018 LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

By Bhattacharjee, Das. Published: 26 April 2018

By Bhattacharjee, Das. Published: 26 April 2018 Electronic Journal of Applied Statistical Analysis EJASA, Electron. J. App. Stat. Anal. http://siba-ese.unisalento.it/index.php/ejasa/index e-issn: 2070-5948 DOI: 10.1285/i20705948v11n1p155 Estimation

More information

Rater agreement - ordinal ratings. Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT,

Rater agreement - ordinal ratings. Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, 2012 http://biostat.ku.dk/~kach/ 1 Rater agreement - ordinal ratings Methods for analyzing

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros Web-based Supplementary Material for A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with excess Zeros Dimitris Rizopoulos, 1 Geert Verbeke, 1 Emmanuel Lesaffre 1 and Yves

More information

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6649: Clinical Trials - Statistical Design and Monitoring Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION Alireza Bayestehtashk and Izhak Shafran Center for Spoken Language Understanding, Oregon Health & Science University, Portland, Oregon, USA

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Package GMCM. R topics documented: July 2, Type Package. Title Fast estimation of Gaussian Mixture Copula Models. Version

Package GMCM. R topics documented: July 2, Type Package. Title Fast estimation of Gaussian Mixture Copula Models. Version Package GMCM July 2, 2014 Type Package Title Fast estimation of Gaussian Mixture Copula Models Version 1.0.1 Date 2014-01-30 Author Anders Ellern Bilgrau, Martin Boegsted, Poul Svante Eriksen Maintainer

More information

An Introduction to the spls Package, Version 1.0

An Introduction to the spls Package, Version 1.0 An Introduction to the spls Package, Version 1.0 Dongjun Chung 1, Hyonho Chun 1 and Sündüz Keleş 1,2 1 Department of Statistics, University of Wisconsin Madison, WI 53706. 2 Department of Biostatistics

More information

A New Generalized Gumbel Copula for Multivariate Distributions

A New Generalized Gumbel Copula for Multivariate Distributions A New Generalized Gumbel Copula for Multivariate Distributions Chandra R. Bhat* The University of Texas at Austin Department of Civil, Architectural & Environmental Engineering University Station, C76,

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis Hanxiang Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis March 4, 2009 Outline Project I: Free Knot Spline Cox Model Project I: Free Knot Spline Cox Model Consider

More information

VARIABLE SELECTION AND INDEPENDENT COMPONENT

VARIABLE SELECTION AND INDEPENDENT COMPONENT VARIABLE SELECTION AND INDEPENDENT COMPONENT ANALYSIS, PLUS TWO ADVERTS Richard Samworth University of Cambridge Joint work with Rajen Shah and Ming Yuan My core research interests A broad range of methodological

More information

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Presented August 8-10, 2012 Daniel L. Gillen Department of Statistics University of California, Irvine

More information

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018 Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals BayesMP Zhiguang Huo 1, Chi Song 2, George Tseng

More information

Financial Econometrics and Volatility Models Copulas

Financial Econometrics and Volatility Models Copulas Financial Econometrics and Volatility Models Copulas Eric Zivot Updated: May 10, 2010 Reading MFTS, chapter 19 FMUND, chapters 6 and 7 Introduction Capturing co-movement between financial asset returns

More information

Econometrics Spring School 2016 Econometric Modelling. Lecture 6: Model selection theory and evidence Introduction to Monte Carlo Simulation

Econometrics Spring School 2016 Econometric Modelling. Lecture 6: Model selection theory and evidence Introduction to Monte Carlo Simulation Econometrics Spring School 2016 Econometric Modelling Jurgen A Doornik, David F. Hendry, and Felix Pretis George-Washington University March 2016 Lecture 6: Model selection theory and evidence Introduction

More information

Alignment. Peak Detection

Alignment. Peak Detection ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

The locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5

The locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5 Title Computes local false discovery rates Version 1.1-2 The locfdr Package August 19, 2006 Author Bradley Efron, Brit Turnbull and Balasubramanian Narasimhan Computation of local false discovery rates

More information

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?

More information

Multivariate Non-Normally Distributed Random Variables

Multivariate Non-Normally Distributed Random Variables Multivariate Non-Normally Distributed Random Variables An Introduction to the Copula Approach Workgroup seminar on climate dynamics Meteorological Institute at the University of Bonn 18 January 2008, Bonn

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples

More information

Reminder: Univariate Data. Bivariate Data. Example: Puppy Weights. You weigh the pups and get these results: 2.5, 3.5, 3.3, 3.1, 2.6, 3.6, 2.

Reminder: Univariate Data. Bivariate Data. Example: Puppy Weights. You weigh the pups and get these results: 2.5, 3.5, 3.3, 3.1, 2.6, 3.6, 2. TP: To review Standard Deviation, Residual Plots, and Correlation Coefficients HW: Do a journal entry on each of the calculator tricks in this lesson. Lesson slides will be posted with notes. Do Now: Write

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

DEXSeq paper discussion

DEXSeq paper discussion DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml

More information

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where STAT 331 Accelerated Failure Time Models Previously, we have focused on multiplicative intensity models, where h t z) = h 0 t) g z). These can also be expressed as H t z) = H 0 t) g z) or S t z) = e Ht

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

Discussion of Papers on the Extensions of Propensity Score

Discussion of Papers on the Extensions of Propensity Score Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and

More information

Dispersion modeling for RNAseq differential analysis

Dispersion modeling for RNAseq differential analysis Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July

More information

7. Integrated Processes

7. Integrated Processes 7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider

More information

Estimation for nonparametric mixture models

Estimation for nonparametric mixture models Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia

More information

Tail negative dependence and its applications for aggregate loss modeling

Tail negative dependence and its applications for aggregate loss modeling Tail negative dependence and its applications for aggregate loss modeling Lei Hua Division of Statistics Oct 20, 2014, ISU L. Hua (NIU) 1/35 1 Motivation 2 Tail order Elliptical copula Extreme value copula

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Exploratory statistical analysis of multi-species time course gene expression

Exploratory statistical analysis of multi-species time course gene expression Exploratory statistical analysis of multi-species time course gene expression data Eng, Kevin H. University of Wisconsin, Department of Statistics 1300 University Avenue, Madison, WI 53706, USA. E-mail:

More information

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications

More information

Efficient estimation of a semiparametric dynamic copula model

Efficient estimation of a semiparametric dynamic copula model Efficient estimation of a semiparametric dynamic copula model Christian Hafner Olga Reznikova Institute of Statistics Université catholique de Louvain Louvain-la-Neuve, Blgium 30 January 2009 Young Researchers

More information

Copula modeling for discrete data

Copula modeling for discrete data Copula modeling for discrete data Christian Genest & Johanna G. Nešlehová in collaboration with Bruno Rémillard McGill University and HEC Montréal ROBUST, September 11, 2016 Main question Suppose (X 1,

More information

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS Proceedings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION

More information

7. Integrated Processes

7. Integrated Processes 7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Bootstrap Goodness-of-fit Testing for Wehrly Johnson Bivariate Circular Models

Bootstrap Goodness-of-fit Testing for Wehrly Johnson Bivariate Circular Models Overview Bootstrap Goodness-of-fit Testing for Wehrly Johnson Bivariate Circular Models Arthur Pewsey apewsey@unex.es Mathematics Department University of Extremadura, Cáceres, Spain ADISTA14 (BRUSSELS,

More information

Chapter 12 - Part I: Correlation Analysis

Chapter 12 - Part I: Correlation Analysis ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto Investigation of an Automated Approach to Threshold Selection for Generalized Pareto Kate R. Saunders Supervisors: Peter Taylor & David Karoly University of Melbourne April 8, 2015 Outline 1 Extreme Value

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Overview of Extreme Value Theory. Dr. Sawsan Hilal space Overview of Extreme Value Theory Dr. Sawsan Hilal space Maths Department - University of Bahrain space November 2010 Outline Part-1: Univariate Extremes Motivation Threshold Exceedances Part-2: Bivariate

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Pramod K. Varshney. EECS Department, Syracuse University This research was sponsored by ARO grant W911NF

Pramod K. Varshney. EECS Department, Syracuse University This research was sponsored by ARO grant W911NF Pramod K. Varshney EECS Department, Syracuse University varshney@syr.edu This research was sponsored by ARO grant W911NF-09-1-0244 2 Overview of Distributed Inference U i s may be 1. Local decisions 2.

More information

the long tau-path for detecting monotone association in an unspecified subpopulation

the long tau-path for detecting monotone association in an unspecified subpopulation the long tau-path for detecting monotone association in an unspecified subpopulation Joe Verducci Current Challenges in Statistical Learning Workshop Banff International Research Station Tuesday, December

More information

Probability Distribution And Density For Functional Random Variables

Probability Distribution And Density For Functional Random Variables Probability Distribution And Density For Functional Random Variables E. Cuvelier 1 M. Noirhomme-Fraiture 1 1 Institut d Informatique Facultés Universitaires Notre-Dame de la paix Namur CIL Research Contact

More information