Non-iterative, regression-based estimation of haplotype associations

Similar documents
New Educators Campaign Weekly Report

A. Geography Students know the location of places, geographic features, and patterns of the environment.

Standard Indicator That s the Latitude! Students will use latitude and longitude to locate places in Indiana and other parts of the world.

Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black

Challenge 1: Learning About the Physical Geography of Canada and the United States

Multivariate Statistics

Multivariate Statistics

Additional VEX Worlds 2019 Spot Allocations

Preview: Making a Mental Map of the Region

Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee May 2018

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2017

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2018

, District of Columbia

Multivariate Analysis

Multivariate Statistics

Hourly Precipitation Data Documentation (text and csv version) February 2016

QF (Build 1010) Widget Publishing, Inc Page: 1 Batch: 98 Test Mode VAC Publisher's Statement 03/15/16, 10:20:02 Circulation by Issue

Summary of Natural Hazard Statistics for 2008 in the United States

Abortion Facilities Target College Students

Online Appendix: Can Easing Concealed Carry Deter Crime?

Club Convergence and Clustering of U.S. State-Level CO 2 Emissions

North American Geography. Lesson 2: My Country tis of Thee

What Lies Beneath: A Sub- National Look at Okun s Law for the United States.

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Printable Activity book

JAN/FEB MAR/APR MAY/JUN

Meteorology 110. Lab 1. Geography and Map Skills

2005 Mortgage Broker Regulation Matrix

Office of Special Education Projects State Contacts List - Part B and Part C

RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON

Osteopathic Medical Colleges

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

DOWNLOAD OR READ : USA PLANNING MAP PDF EBOOK EPUB MOBI

Crop Progress. Corn Mature Selected States [These 18 States planted 92% of the 2017 corn acreage]

Intercity Bus Stop Analysis

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

OUT-OF-STATE 965 SUBTOTAL OUT-OF-STATE U.S. TERRITORIES FOREIGN COUNTRIES UNKNOWN GRAND TOTAL

SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008

Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

High School World History Cycle 2 Week 2 Lifework

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/

Alpine Funds 2017 Tax Guide

GIS use in Public Health 1

Alpine Funds 2016 Tax Guide

JAN/FEB MAR/APR MAY/JUN

extreme weather, climate & preparedness in the american mind

KS PUBL 4YR Kansas State University Pittsburg State University SUBTOTAL-KS

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA

Package ZIM. R topics documented: August 29, Type Package. Title Statistical Models for Count Time Series with Excess Zeros. Version 1.

National Organization of Life and Health Insurance Guaranty Associations

MINERALS THROUGH GEOGRAPHY

A GUIDE TO THE CARTOGRAPHIC PRODUCTS OF

BlackRock Core Bond Trust (BHK) BlackRock Enhanced International Dividend Trust (BGY) 2 BlackRock Defined Opportunity Credit Trust (BHL) 3

Introducing North America

MO PUBL 4YR 2090 Missouri State University SUBTOTAL-MO

Insurance Department Resources Report Volume 1

Infant Mortality: Cross Section study of the United State, with Emphasis on Education

An Analysis of Regional Income Variation in the United States:

FLOOD/FLASH FLOOD. Lightning. Tornado

Rank University AMJ AMR ASQ JAP OBHDP OS PPSYCH SMJ SUM 1 University of Pennsylvania (T) Michigan State University

Pima Community College Students who Enrolled at Top 200 Ranked Universities

United States Geography Unit 1

112th U.S. Senate ACEC Scorecard

A Summary of State DOT GIS Activities

MINERALS THROUGH GEOGRAPHY. General Standard. Grade level K , resources, and environmen t

Outline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining

All-Time Conference Standings

CCC-A Survey Summary Report: Number and Type of Responses

Office of Budget & Planning 311 Thomas Boyd Hall Baton Rouge, LA Telephone 225/ Fax 225/

October 2016 v1 12/10/2015 Page 1 of 10

This project is supported by a National Crime Victims' Right Week Community Awareness Project subgrant awarded by the National Association of VOCA

2011 NATIONAL SURVEY ON DRUG USE AND HEALTH

A Standardized Digital Well-Record Database for the Glaciated U.S.

Physical Features of Canada and the United States

St. Luke s College Institutional Snapshot

Physical Features of Canada and the United States

Direct Selling Association 1

Earl E. ~rabhl/ This report is preliminary and has not been reviewed for conformity with U.S. Geological Survey editoral standards GEOLOGICAL SURVEY

Chapter 11 : State SAT scores for 1982 Data Listing

Stem-and-Leaf Displays

Chapter 5 Linear least squares regression

Evidence for increasingly extreme and variable drought conditions in the contiguous United States between 1895 and 2012

Division I Sears Directors' Cup Final Standings As of 6/20/2001

State Section S Items Effective October 1, 2017

The Heterogeneous Effects of the Minimum Wage on Employment Across States

SNP Association Studies with Case-Parent Trios

Fungal conservation in the USA

Conduent EDI Solutions, Inc. EDI Direct Pass Thru Fees

(Specification B) (JUN H01) (JaN11GEOG101) General Certificate of Education Secondary Education Advanced Higher TierSubsidiary Examination

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Crop / Weather Update

HP-35s Calculator Program Lambert 1

Green Building Criteria in Low-Income Housing Tax Credit Programs Analysis

Physical Therapist Centralized Application Service. COURSE PREREQUISITES SUMMARY Admissions Cycle

Transcription:

Non-iterative, regression-based estimation of haplotype associations Benjamin French, PhD Department of Biostatistics and Epidemiology University of Pennsylvania bcfrench@upenn.edu National Cancer Center Republic of Korea 12 September 2012

nationalatlas.gov TM Where We Are STATES WASHINGTON CA NADA ALASKA ATLANTIC OCEAN HAWAII MONTANA NORTH DAKOTA L a k e S u p e r i o r PACIFIC OCEAN MAINE P A C I F I C O C E A N OREGON CALIFORNIA NEVADA IDAHO ARIZONA UTAH WYOMING COLORADO NEW MEXICO SOUTH DAKOTA NEBRASKA KANSAS OKLAHOMA MINNESOTA IOWA MISSOURI ARKANSAS WISCONSIN ILLINOIS MISSISSIPPI MICHIGAN a n L a k e M i c h i g INDIANA L a k e H u r o n KENTUCKY TENNESSEE ALABAMA OHIO L a k e E r i e GEORGIA L a k e O n t a r i o NEW YORK PENNSYLVANIA WEST VIRGINIA VIRGINIA NORTH CAROLINA SOUTH CAROLINA DC DELAWARE MARYLAND A T L A VERMONT NEW HAMPSHIRE CONNECTICUT NEW JERSEY MASSACHUSETTS RHODE ISLAND E A N N T I C O C TEXAS HAWAII A R C T I C O C E A N LOUISIANA R U S S I A P A C I F I C O C E A N 0 100 mi 0 100 km ALASKA CANADA ME G U L F O F M E X I C O FLORIDA BERING SEA XIC THE BAHAMAS GULF OF ALASKA O 0 100 200 0 100 200 300 km 300 mi P A C I F I C O C E A N 0 0 200 mi 200 km CU BA U.S. Department of the Interior U.S. Geological Survey O The National Atlas of the United States of America R states2.pdf INTERIOR-GEOLOGICAL SURVEY, RESTON, VIRGINIA-2003

Collaborators Nandita Mitra, PhD Department of Biostatistics and Epidemiology University of Pennsylvania Thomas P Cappola, MD Penn Cardiovascular Institute University of Pennsylvania Thomas Lumley, PhD Department of Statistics University of Auckland B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 3 / 31

Goals To integrate haplotypes into large association studies such that haplotype imputation is done once as a data-processing step Case-control studies (binary outcome) [French et al., 2006] Prospective studies (censored survival outcome) [French et al., 2012] To allow haplotype associations to be estimated in general-purpose statistical software (eg R) by researchers expert in the subject matter In world historical terms there is a lot to be said for keeping data analysis out of the hands of statisticians Thomas Lumley B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 4 / 31

Goals To integrate haplotypes into large association studies such that haplotype imputation is done once as a data-processing step Case-control studies (binary outcome) [French et al., 2006] Prospective studies (censored survival outcome) [French et al., 2012] To allow haplotype associations to be estimated in general-purpose statistical software (eg R) by researchers expert in the subject matter In world historical terms there is a lot to be said for keeping data analysis out of the hands of statisticians Thomas Lumley B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 4 / 31

Phase ambiguity Observed data is composed of a set of unphased genotypes Diplotype (pair of haplotypes) may be ambiguous; may not know which allele was transmitted from maternal or paternal chromosome Missing data problem; impute the unobserved diplotype Father A T Mother C G Child A C T G AT/CG or AG/CT B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 5 / 31

Haplotype imputation Expectation-maximization (EM) algorithm E: calculate expected phase given haplotype frequencies M: calculate MLEs for haplotype frequencies given phase Software: haplo.stats [Sinnwell and Schaid, 2012] Bayesian algorithm Observed genotype data combined with expected haplotype patterns Haplotypes estimated from posterior distribution Software: PHASE [Stephens and Donnelly, 2003] B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 6 / 31

Diplotype uncertainty Angiotensin II receptor type 1 (AGTR1) Haplotype Diplotype Label Haplotype frequency probability D TCCACGCATCTT 0.139 0.81 F TCTGTGCATCTC 0.290 C TCCACGCATCTC 0.034 0.19 G TCTGTGCATCTT 0.272 Rare TCCGCGCATCTC < 0.001 < 0.01 Rare TCTATGCATCTT < 0.001 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 7 / 31

Target of inference If diplotypes were known, could fit a standard regression model including diplotypes D i and environmental exposures Z i (possibly with interaction) Logistic regression: Y i = {1 = case, 0 = control} logit P[Y i = 1] = α + D i β D + Z i β Z + Cox regression: Y i = min(t i, C i ); event indicator δ i = 1[Y i = T i ] log λ i (t) = log λ 0 (t) + D i β D + Z i β Z + in which β = {β D, β Z } represent association parameters of interest We integrate the set of imputed haplotypes into these regression models B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 8 / 31

Estimating associations Non-iterative weighted estimation [French et al., 2006] 1. Impute haplotypes and estimate population haplotype frequencies 2. Create multi-record data for each individual Design matrix: set of diplotypes consistent with observed genotype, possibly including environmental exposures Weights equal to conditional probability of each diplotype Weight A B C D E F G H I Rare 0.81 0 0 0 1 0 1 0 0 0 0 0.19 0 0 1 0 0 0 1 0 0 0 0.01 0 0 0 0 0 0 0 0 0 2 3. Estimate associations using a weighted regression model Logistic regression for binary outcomes Cox regression for censored survival outcomes Robust or sandwich standard error estimator Account for uncertainty in phase B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 9 / 31

Estimating associations Non-iterative weighted estimation [French et al., 2006] 1. Impute haplotypes and estimate population haplotype frequencies 2. Create multi-record data for each individual Design matrix: set of diplotypes consistent with observed genotype, possibly including environmental exposures Weights equal to conditional probability of each diplotype Weight A B C D E F G H I Rare 0.81 0 0 0 1 0 1 0 0 0 0 0.19 0 0 1 0 0 0 1 0 0 0 0.01 0 0 0 0 0 0 0 0 0 2 3. Estimate associations using a weighted regression model Logistic regression for binary outcomes Cox regression for censored survival outcomes Robust or sandwich standard error estimator Account for uncertainty in phase B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 9 / 31

Estimating associations Non-iterative weighted estimation Estimates are obtained from estimating functions Logistic regression for binary outcomes n U(β) = π id (p id, β)x id [Y id expit X id β] i=1 d d(g i ) Cox regression for censored survival outcomes n U(β) = i=1 d d(g i ) l id (β) β l id (β) = δ id π id (p id, β) X id β log exp X i dβ i R(t i ) Inference based on univariable or multivariable Wald tests Estimator is valid and efficient under the null hypothesis B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 10 / 31

Estimating associations Weighted haplotype combination [Rebbeck et al., 2009] 1. Impute haplotypes and estimate population haplotype frequencies 2. Create single-record data for each individual Design matrix: weighted combination of diplotypes consistent with observed genotype, with weights equal to conditional probability of each diplotype, possibly including environmental exposures A B C D E F G H I Rare 0 0 0.19 0.81 0 0.81 0.19 0 0 0.01 3. Estimate associations using an unweighted regression model Logistic regression for binary outcomes Cox regression for censored survival outcomes Account for uncertainty in phase B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 11 / 31

Estimating associations Weighted haplotype combination [Rebbeck et al., 2009] 1. Impute haplotypes and estimate population haplotype frequencies 2. Create single-record data for each individual Design matrix: weighted combination of diplotypes consistent with observed genotype, with weights equal to conditional probability of each diplotype, possibly including environmental exposures A B C D E F G H I Rare 0 0 0.19 0.81 0 0.81 0.19 0 0 0.01 3. Estimate associations using an unweighted regression model Logistic regression for binary outcomes Cox regression for censored survival outcomes Account for uncertainty in phase B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 11 / 31

Estimating associations Weighted haplotype combination Estimates are obtained from estimating functions Logistic regression for binary outcomes U(β) = n X i [Y i expit X i β] i=1 Cox regression for censored survival outcomes U(β) = n i=1 l i (β) β l i (β) = δ i X i β log i R(t i ) exp X i β Inference based on univariable or multivariable Wald tests Estimator is exactly valid under the null hypothesis B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 12 / 31

Simulation study In which situations do non-iterative methods provide reliable results? Angiotensin II receptor type 1 (AGTR1) Label Haplotype Frequency A ATTATGCATCTC 0.029 B ATTATGTGATCC 0.051 C TCCACGCATCTC 0.027 D TCCACGCATCTT 0.090 E TCTGTGCAACTT 0.029 F* TCTGTGCATCTC 0.223 G TCTGTGCATCTT 0.188 H TTTACACATCTC 0.038 I TTTACACATCTT 0.032 * Referent n = 500, 25% censoring (independent) B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 13 / 31

Simulation results: Moderate effects Approximate bias 2 1 0 1 2 Weighted estimation Weighted combination Known phase A 1/2.0 B 1/1.75 C 1.25 D 1.5 E 1/1.25 G 1/1.5 H 1.75 I 2.0 Hazard ratio Haplotype B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 14 / 31

Simulation results: Moderate effects % coverage of 95% confidence intervals Hazard Weighted Weighted Known Haplotype ratio estimation combination phase A 1/2.0 94 95 95 B 1/1.75 94 95 95 C 1.25 80 95 95 D 1.5 93 94 95 E 1/1.25 93 95 94 G 1/1.5 89 94 95 H 1.75 86 96 95 I 2.0 92 95 96 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 15 / 31

Simulation results: Moderate effects Rejection rate (%) of two-sided hypothesis tests Hazard Weighted Weighted Known Haplotype ratio estimation combination phase A 1/2.0 90 91 92 B 1/1.75 91 93 94 C 1.25 62 19 24 D 1.5 92 90 93 E 1/1.25 15 15 18 G 1/1.5 95 96 98 H 1.75 99 82 96 I 2.0 96 93 98 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 16 / 31

Simulation results: Strong SNP effects Approximate bias 2 1 0 1 2 Weighted estimation Weighted combination Known phase A 1.0 B 1.0 C 1.0 D 4.0 E 4.0 G 4.0 H 1.0 I 4.0 Hazard ratio Haplotype B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 17 / 31

Simulation results: Strong SNP effects % coverage of 95% confidence intervals Hazard Weighted Weighted Known Haplotype ratio estimation combination phase A 1.0 93 94 94 B 1.0 94 94 94 C 1.0 92 93 95 D 4.0 94 95 95 E 4.0 92 94 95 G 4.0 94 94 95 H 1.0 94 94 95 I 4.0 92 93 92 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 18 / 31

Simulation results: Strong SNP effects Rejection rate (%) of two-sided hypothesis tests Hazard Weighted Weighted Known Haplotype ratio estimation combination phase A 1.0 7 6 6 B 1.0 6 6 6 C 1.0 8 7 5 D 4.0 100 100 100 E 4.0 100 100 100 G 4.0 100 100 100 H 1.0 6 6 5 I 4.0 100 100 100 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 19 / 31

Simulation results: Strong non-snp effects Approximate bias 2 1 0 1 2 Weighted estimation Weighted combination Known phase A 1.0 B 1.0 C 4.0 D 1.0 E 4.0 G 4.0 H 4.0 I 1.0 Hazard ratio Haplotype B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 20 / 31

Simulation results: Strong non-snp effects % coverage of 95% confidence intervals Hazard Weighted Weighted Known Haplotype ratio estimation combination phase A 1.0 93 95 95 B 1.0 90 95 95 C 4.0 0 96 94 D 1.0 94 95 94 E 4.0 65 96 95 G 4.0 0 91 95 H 4.0 0 88 94 I 1.0 81 89 95 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 21 / 31

Simulation results: Strong non-snp effects Rejection rate (%) of two-sided hypothesis tests Hazard Weighted Weighted Known Haplotype ratio estimation combination phase A 1.0 7 5 5 B 1.0 10 5 5 C 4.0 37 100 100 D 1.0 6 5 6 E 4.0 100 100 100 G 4.0 100 100 100 H 4.0 83 100 100 I 1.0 19 11 5 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 22 / 31

Application HSPB7-CLCNKA haplotypes and adverse events in chronic heart failure Regulate renal potassium channels to control blood pressure SNP associated with heart failure in a large case-control study [Cappola et al., 2011] Genotypes available for 1150 genetically inferred Caucasians with heart failure enrolled in a prospective study 70% male; median age at study entry, 58 years 14 pre-selected SNPs Inferred 10 common haplotypes (frequency > 0.02) 65% had an unambiguous diplotype 90% had a highest posterior probability > 0.765 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 23 / 31

Application HSPB7-CLCNKA haplotypes and adverse events in chronic heart failure Outcome: time to all-cause mortality or cardiac transplantation Median follow-up, 3 years; maximum, 5 years 22% experienced an adverse event Cox regression framework Stratified by 4-level classification for disease severity Adjusted for gender, age, heart failure etiology, clinical site Time-varying covariate for age (exhibited non-proportional hazards) B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 24 / 31

Application results: Weighted estimation Label Haplotype Frequency HR (95% CI) P Q AGAGCGAGACGAGG 0.036 1.19 (0.80, 1.77) 0.39 R AGAGCGAGGGAAGG 0.160 1.04 (0.80, 1.35) 0.79 S AGAGCGGAGCAAGA 0.036 1.20 (0.80, 1.77) 0.38 T AGCGAGAGGCAAGA 0.066 0.55 (0.34, 0.88) 0.01 U GACGCGGAGCGCGG 0.063 0.80 (0.53, 1.20) 0.28 V GGAACAAGGGAAGG 0.037 0.49 (0.26, 0.92) 0.03 W GGAACAGAGCAAGA 0.299 Referent X GGAACAGAGCAAGG 0.048 1.36 (0.95, 1.95) 0.09 Y GGAGCAAGGCAAGG 0.050 1.15 (0.78, 1.69) 0.49 Z GGCGCGGAGCAAGG 0.031 1.09 (0.62, 1.92) 0.76 Overall 0.02 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 25 / 31

Application results: Weighted estimation Label Haplotype Frequency HR (95% CI) P Q AGAGCGAGACGAGG 0.036 1.19 (0.80, 1.77) 0.39 R AGAGCGAGGGAAGG 0.160 1.04 (0.80, 1.35) 0.79 S AGAGCGGAGCAAGA 0.036 1.20 (0.80, 1.77) 0.38 T AGCGAGAGGCAAGA 0.066 0.55 (0.34, 0.88) 0.01 U GACGCGGAGCGCGG 0.063 0.80 (0.53, 1.20) 0.28 V GGAACAAGGGAAGG 0.037 0.49 (0.26, 0.92) 0.03 W GGAACAGAGCAAGA 0.299 Referent X GGAACAGAGCAAGG 0.048 1.36 (0.95, 1.95) 0.09 Y GGAGCAAGGCAAGG 0.050 1.15 (0.78, 1.69) 0.49 Z GGCGCGGAGCAAGG 0.031 1.09 (0.62, 1.92) 0.76 Overall 0.02 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 25 / 31

Application results: Weighted combination Label Haplotype Frequency HR (95% CI) P Q AGAGCGAGACGAGG 0.036 1.19 (0.76, 1.87) 0.44 R AGAGCGAGGGAAGG 0.160 1.05 (0.81, 1.37) 0.73 S AGAGCGGAGCAAGA 0.036 1.20 (0.74, 1.94) 0.47 T AGCGAGAGGCAAGA 0.066 0.53 (0.32, 0.90) 0.02 U GACGCGGAGCGCGG 0.063 0.80 (0.52, 1.21) 0.29 V GGAACAAGGGAAGG 0.037 0.43 (0.21, 0.88) 0.02 W GGAACAGAGCAAGA 0.299 Referent X GGAACAGAGCAAGG 0.048 1.44 (0.95, 2.19) 0.09 Y GGAGCAAGGCAAGG 0.050 1.15 (0.77, 1.72) 0.49 Z GGCGCGGAGCAAGG 0.031 1.10 (0.64, 1.87) 0.73 Overall 0.03 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 26 / 31

Application results: Weighted combination Label Haplotype Frequency HR (95% CI) P Q AGAGCGAGACGAGG 0.036 1.19 (0.76, 1.87) 0.44 R AGAGCGAGGGAAGG 0.160 1.05 (0.81, 1.37) 0.73 S AGAGCGGAGCAAGA 0.036 1.20 (0.74, 1.94) 0.47 T AGCGAGAGGCAAGA 0.066 0.53 (0.32, 0.90) 0.02 U GACGCGGAGCGCGG 0.063 0.80 (0.52, 1.21) 0.29 V GGAACAAGGGAAGG 0.037 0.43 (0.21, 0.88) 0.02 W GGAACAGAGCAAGA 0.299 Referent X GGAACAGAGCAAGG 0.048 1.44 (0.95, 2.19) 0.09 Y GGAGCAAGGCAAGG 0.050 1.15 (0.77, 1.72) 0.49 Z GGCGCGGAGCAAGG 0.031 1.10 (0.64, 1.87) 0.73 Overall 0.03 B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 26 / 31

R packages: Non-iterative weighted estimation haplo.ccs [French and Lumley, 2011] Weighted logistic regression for binary outcomes Depends on haplo.stats package to impute haplotypes Calls glm(..., family=quasibinomial(link=logit)) Includes GEE-type sandwich standard error estimator haplo.cph (in process) Weighted Cox regression for censored survival outcomes Will depend on haplo.stats package to impute haplotypes Will call cph(..., robust=true) from Design package Allow stratification and time-varying exposures B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 27 / 31

Conclusions Non-iterative estimation methods Incorporate all diplotypes consistent with the observed genotype, either weighted (weighted estimation) or averaged (weighted combination) by the conditional probability of each diplotype Provide valid tests for genetic associations and reliable estimates of modest genetic effects of common haplotypes Convenient regression framework Adjustment for or interaction with environmental exposures Stratification and time-varying exposures in Cox regression Straightforward to implement in R haplo.ccs for binary outcomes haplo.cph for censored survival outcomes See [French et al., 2006; 2012] for implementation in Stata B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 28 / 31

Target High Mostly known May not exist Relative risk Low Probably can t be found Our focus Rare Common Haplotype frequency B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 29 / 31

Limitations Our methods and/or software may not be applicable to Related individuals Rare haplotypes Small studies Longitudinal outcomes B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 30 / 31

References dbe.med.upenn.edu/biostat-research/bcfrench 1. Cappola TP, Matkovich SJ, et al. 2011. Loss-of-function DNA sequence variant in the CLCNKA chloride channel implicates the cardio-renal axis in interindividual heart failure risk variation. Proc Natl Acad Sci U S A 108:2456 61. 2. French B, Lumley T. 2011. haplo.ccs: Estimate haplotype relative risks in case-control data. R package 1.3.1. 3. French B, Lumley T, et al. 2006. Simple estimates of haplotype relative risks in case-control data. Genet Epidemiol 30:485 94. 4. French B, Lumley T, et al. 2012. Non-iterative, regression-based estimation of haplotype associations with censored survival outcomes. Stat App Genet Mol Biol 11:4. 5. Rebbeck TR, Mitra N, et al. 2009. Modification of ovarian cancer risk by BRAC1/2-interacting genes in a multi center cohort of BRAC1/2 mutation carriers. Cancer Res 71:5792 805. 6. Sinnwell JP, Schaid DJ. 2012. haplo.stats: Statistical analysis of haplotypes with traits and covariates when linkage phase is ambiguous. R package version 1.5.5. 7. Stephens M, Donnelly P. 2003. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73:1162 69. B French (bcfrench@upenn.edu) Estimating haplotype associations September 2012 31 / 31