Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Similar documents
x = , so that calculated

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Comparison of Regression Lines

Checking Pairwise Relationships. Lecture 19 Biostatistics 666

Statistics for Economics & Business

NEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

EEE 241: Linear Systems

Joint Statistical Meetings - Biopharmaceutical Section

Polynomial Regression Models

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Credit Card Pricing and Impact of Adverse Selection

Linear Regression Analysis: Terminology and Notation

Chapter 13: Multiple Regression

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Convexity preserving interpolation by splines of arbitrary degree

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

Statistics II Final Exam 26/6/18

Statistics for Business and Economics

Lecture 6: Introduction to Linear Regression

Numerical Solution of Ordinary Differential Equations

A Robust Method for Calculating the Correlation Coefficient

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

One-sided finite-difference approximations suitable for use with Richardson extrapolation

a. (All your answers should be in the letter!

Recall that quantitative genetics is based on the extension of Mendelian principles to polygenic traits.

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Statistical tables are provided Two Hours UNIVERSITY OF MANCHESTER. Date: Wednesday 4 th June 2008 Time: 1400 to 1600

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Exponential Type Product Estimator for Finite Population Mean with Information on Auxiliary Attribute

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

Negative Binomial Regression

III. Econometric Methodology Regression Analysis

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

Formulas for the Determinant

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

STATISTICS QUESTIONS. Step by Step Solutions.

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

Statistical Hypothesis Testing for Returns to Scale Using Data Envelopment Analysis

Title: Bounds and normalization of the composite linkage disequilibrium coefficient.

Solution Thermodynamics

Chapter 8 Indicator Variables

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Markov Chain Monte Carlo Lecture 6

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

A note on regression estimation with unknown population size

Improvement in Estimating the Population Mean Using Exponential Estimator in Simple Random Sampling

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

Economics 130. Lecture 4 Simple Linear Regression Continued

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Inductance Calculation for Conductors of Arbitrary Shape

A Novel Feistel Cipher Involving a Bunch of Keys supplemented with Modular Arithmetic Addition

Temperature. Chapter Heat Engine

Structure and Drive Paul A. Jensen Copyright July 20, 2003

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

AP Physics 1 & 2 Summer Assignment

Chapter 9: Statistical Inference and the Relationship between Two Variables

Bayesian predictive Configural Frequency Analysis

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

2.3 Nilpotent endomorphisms

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Estimation: Part 2. Chapter GREG estimation

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

STAT 511 FINAL EXAM NAME Spring 2001

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

DERIVATION OF THE PROBABILITY PLOT CORRELATION COEFFICIENT TEST STATISTICS FOR THE GENERALIZED LOGISTIC DISTRIBUTION

e i is a random error

Tian Zheng Department of Statistics Columbia University

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

SIMPLE LINEAR REGRESSION

DECADAL DECLINE ( )OF LOGGERHEAD SHRIKES ON CHRISTMAS BIRD COUNTS IN ALABAMA, MISSISSIPPI, AND TENNESSEE

Analysis of the Magnetomotive Force of a Three-Phase Winding with Concentrated Coils and Different Symmetry Features

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

An (almost) unbiased estimator for the S-Gini index

How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

STK4080/9080 Survival and event history analysis

x i1 =1 for all i (the constant ).

Important Instructions to the Examiners:

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

/ n ) are compared. The logic is: if the two

Statistics Chapter 4

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

DUE: WEDS FEB 21ST 2018

Transcription:

Georgetown Unversty From the SelectedWorks of Mark J Meyer 8 Usng the estmated penetrances to determne the range of the underlyng genetc model n casecontrol desgn Mark J Meyer Neal Jeffres Gang Zheng Avalable at: https://works.bepress.com/markjmeyer/4/

Usng the estmated penetrances to determne the range of the underlyng genetc model n case-control desgn Mark Meyer, Neal Jeffres, Gang Zheng* Offce of Bostatstcs Research, Natonal Heart, Lung and Blood Insttute, 67 Rockledge Drve, MSC 793, Bethesda, MD 89-793, USA *Correspondng author Emal addresses: MM: meyerm@nhlb.nh.gov NJ: nealjeff@nhlb.nh.gov GZ: zhengg@nhlb.nh.gov

Abstract It s well known that the penetrance cannot be estmated usng the retrospectve casecontrol samples wthout makng addtonal assumptons. In the lterature the estmaton of the penetrance s based on the assumptons that ether the dsease s rare or the dsease prevalence s known. We propose an alternatve approach to estmate the penetrance by assumng an underlyng genetc model even though t s unknown. Wth ths assumpton, we can obtan the pont estmates of the penetrances as functons of the genetc model, from whch the range of underlyng genetc models can be determned. We examne the performance of our results under varous genetc models usng smulaton studes and the case-control dataset of GAW6. Background Penetrance s a useful parameter n genetc assocaton studes. The penetrance s defned as the probablty of havng a dsease gven one of the three genotypes for a dallelc marker. Three penetrances are used n the lterature, each correspondng to one genotype. Varous underlyng genetc models are also defned usng the penetrances. In case-control assocaton studes, cases and controls are retrospectvely sampled from the study populatons. It s known that the penetrances cannot be estmated usng the retrospectve case-control samples unless some addtonal assumptons are made. In the lterature, one often assumes that the dsease s rare or the dsease prevalence s known. Then, under these assumptons, the penetrances can be estmated usng the retrospectve case-control samples. We propose an alternatve approach by assumng an underlyng genetc model for a SNP wth assocaton the dsease, even though the true genetc model s

unknown. Under ths assumpton we can wrte the penetrances as functons of odds ratos and the underlyng genetc model. Thus, the penetrances can be estmated wthout assumng the rare dsease or the known dsease prevalence. Because the penetrances are functons of the specfed underlyng genetc model, usng the estmates of penetrances we could determne the range of underlyng genetc models. We examne the performance of our results under varous possble genetc models usng smulaton studes and case-control data from GAW6. Methods Notaton Denote the alleles for the dallelc marker (SNP) of nterest as A and B and the three genotypes as G = AA, G = AB, and G = BB. The penetrance s then gven by f = Pr( dsease G ), =,,. The genotype counts for ( G, G, G ) n cases are denoted by ( r, r, r ) and n controls by ( s, s, s ). The odds rato OR s the rato of G relatve to G gven the dsease status, whch can be wrtten as OR = f f ) /{ f ( f )},,. Assume ( = that the genetc model s specfed by f = x) f + xf, x (,). Then the followng equatons can be obtaned: ( f x + xor OR = and, =,. ( x)( OR )( OR ) = OR f f f + f OR Estmatng penetrances under the specfed genetc model From the prevous secton, the penetrances are wrtten as functons of the two 3

odds ratos. Thus, denote f x) = h ( OR, OR x), for,,. Then the pont ( = estmate for a penetrance s gven by fˆ ( ) ( ˆ, ˆ x = h OR OR x), where O Rˆ = r s /( r ) and O Rˆ = r s /( r ) are based on the observed case-control data s for that SNP. s Usng the estmates to determne the underlyng genetc model Note that the estmates of penetrances, f ˆ ( x ), as functons of x should be between and. However, f ˆ ( x ) are not necessary between and for any x n (,). Therefore, we can determne the range of x n (,) by restrctng f ˆ ( x ) to be between and. The reason that ths approach could work s that, when one of alleles has hgher rsk,.e., f < f < f, OR > OR >, we have f x OR OR = ( x) ( OR )( OR > ) f f f f OR and = = > ( ). x f x x f + f OR Thus, a sngle nterval, x ( a, b), can be obtaned wth whch the three penetrances are between and. Then we expect the nterval (a,b) would cover the true underlyng genetc model x. Numercal Results Smulaton studes We conduct smulaton studes to examne the performance of the above results under the null and alternatve hypotheses. In the smulatons, we specfy the mnor allele frequency (MAF) p =.,.3, and.5, the dsease prevalence k =., genotype relatve rsks (GRRs) λ = f / f and λ = f / f satsfyng 4

λ = x + xλ for some x=.5,.5, and.75. We assume 5 cases and 5 controls are used and λ =.. For each replcate, whether or not the nterval (a,b) covers the true value of x s determned and the coverage of the nterval s defned as the percentage of the replcates that the nterval covers the true value of x. The results are reported n Table based on, replcates. The results n Table show that under the null hypothess ( λ = ), the coverage s about % regardless of the MAF and true value of x, whle under the alternatve hypothess ( λ = ), the coverage ncreases to above 5% when MAF s small (p =.) and the true value of x s small or large to above 95% when the true value of x s around the addtve model (x =.5) for the moderate MAF. The smulaton results ndcate the proposed approach provde some nsght nto the possble genetc model underlyng the data. Applcatons The above method s also appled to the sx assocated SNPs reported n Problem of GAW6. So the ranges of possble genetc models for these SNPs can be obtaned. The sx SNPs are gven n Table along wth the genes that they belong to and the chromosome numbers. These SNPs were reported n Plenge et al. (5) as lnked to RA or selected from the canddate-gene studes. Our approach cannot be appled to the last SNP n Table because ts x may be outsde of the range (,). Ths happens when the true genetc model does not belong to the genetc models between the recessve and domnant models. One such example s the overdomnant model whose x s ether less than or greater than. The ntervals for the underlyng genetc models for the other four SNPs are reported n Table (last column). The plots of the three estmates of the penetrances are gven 5

n Fgure for the frst four SNPs. Concluson and Dscusson We studed how to nfer possble underlyng genetc model usng the estmated penetrance usng case-control data. No other approaches seem to be avalable to provde a range of possble genetc model. Our usefulness of our results were demonstrated by the smulaton studes and were llustrated by applyng the results to GAW6 dataset. We would lke to contnue ths research to study how the dentfed range of genetc models can be used to mprove power for ntal genome-wde assocaton studes compared to usng MAX of Fredln et al. (), Zheng et al. (7), and L et al. (8). References Fredln et al. (). Trend tests for case-control studes of genetc markers: Power, sample sze and robustness. Hum. Hered. 53, 46 5. L et al. (8). MAX-rank: a smple and robust genome-wde scan for case-control assocaton studes. Hum. Genet. 3, 67-63. Plenge et al. (8). TRAF-C5 as a rsk locus for rheumatod arthrts - a genomewde study. NEJM 357, 99-9. Zheng et al. (7). Robust ranks of true assocatons n genome-wde case-control assocaton studes. BMC Proc. (Suppl ), S65. 6

Table : Smulaton of the coverage of the true underlyng genetc model based on, replcates by restrctng the estmated penetrances between and under the null ( λ = ) and alternatve ( λ = ) hypotheses. Coverage MAF True x λ = λ =..5.% 54.8%.5.8% 77.6%.5.6% 73.%.75.4% 6.7%.95.3% 53.%.3.5.6% 66.%.5.% 95.8%.5.% 93.9%.75 9.6% 76.%.95 9.9% 55.%.5.5 9.5% 63.4%.5 9.3% 9.6%.5 9.7% 98.3%.75.% 83.9%.95 9.6% 56.4% 7

Table : The sx SNPs selected from Problem of GAW6. SNP ID Genes Chromosome Model nterval rs645767 MHC 6 (.3,.94) rs4766 PTPN (.57,.78) rs7574865 STAT4 (.8,.) rs66 TNFRSFb (.57,.6) rs73838 SLCA4 5 (.46,.53) rs48696 DLG5 NA Fgure : Plots of the estmates of the three penetrances over all possble genetc models wth x n (,). 8

9