Checking Pairwise Relationships. Lecture 19 Biostatistics 666

Similar documents
Kinship Coefficients. Biostatistics 666

Multipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

/ n ) are compared. The logic is: if the two

Markov Chain Monte Carlo Lecture 6

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Kernel Methods and SVMs Extension

x = , so that calculated

A Robust Method for Calculating the Correlation Coefficient

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

LECTURE 9 CANONICAL CORRELATION ANALYSIS

STAT 511 FINAL EXAM NAME Spring 2001

Linear Approximation with Regularization and Moving Least Squares

This column is a continuation of our previous column

AS-Level Maths: Statistics 1 for Edexcel

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Statistics for Economics & Business

Hidden Markov Models

Comparison of Regression Lines

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

EM and Structure Learning

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Primer on High-Order Moment Estimators

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Statistics and Quantitative Analysis U4320. Segment 3: Probability Prof. Sharyn O Halloran

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Multiple Contrasts (Simulation)

A LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS. Dr. Derald E. Wentzien, Wesley College, (302) ,

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Chapter 3 Describing Data Using Numerical Measures

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Statistics Chapter 4

Negative Binomial Regression

Composite Hypotheses testing

Maximum likelihood. Fredrik Ronquist. September 28, 2005

International Journal of Mathematical Archive-3(3), 2012, Page: Available online through ISSN

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Supplementary Notes for Chapter 9 Mixture Thermodynamics

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Data Abstraction Form for population PK, PD publications

Statistics for Business and Economics

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Recall that quantitative genetics is based on the extension of Mendelian principles to polygenic traits.

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Bayesian predictive Configural Frequency Analysis

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh

Linear Regression Analysis: Terminology and Notation

Stat 543 Exam 2 Spring 2016

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

PROBABILITY PRIMER. Exercise Solutions

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 15 - Multiple Regression

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Population Design in Nonlinear Mixed Effects Multiple Response Models: extension of PFIM and evaluation by simulation with NONMEM and MONOLIX

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 10 Support Vector Machines II

Supporting Information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Laboratory 3: Method of Least Squares

STATISTICS QUESTIONS. Step by Step Solutions.

Hidden Markov Models

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Economics 130. Lecture 4 Simple Linear Regression Continued

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Title: Bounds and normalization of the composite linkage disequilibrium coefficient.

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Stat 543 Exam 2 Spring 2016

Convergence of random processes

Chapter 5 Multilevel Models

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Laboratory 1c: Method of Least Squares

Report on Image warping

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Chapter 8 Indicator Variables

Uncertainty as the Overlap of Alternate Conditional Distributions

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

U-Pb Geochronology Practical: Background

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

18.1 Introduction and Recap

Uncertainty in measurements of power and energy on power networks

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

4.1 basic idea of interval mapping

Notes on Frequency Estimation in Data Streams

Lecture 3: Probability Distributions

The Geometry of Logit and Probit

Basically, if you have a dummy dependent variable you will be estimating a probability.

Transcription:

Checkng Parwse Relatonshps Lecture 19 Bostatstcs 666

Last Lecture: Markov Model for Multpont Analyss X X X 1 3 X M P X 1 I P X I P X 3 I P X M I 1 3 M I 1 I I 3 I M P I I P I 3 I P... 1 IBD states along the chromosome are modeled usng a Markov Chan

The Lkelhood of Marker Data L M M P I 1 P I I 1 P X I I1 I IM 1... General, but slow unless there are only a few markers. Combned wth Bayes Theorem allows us to estmate probablty of IBD states at any marker.

Worked Example Consder two loc separated by θ.1 Each loc has two alleles, each wth frequency.5 If two sblngs have the followng genotypes: Sb1 Sb Marker A: 1/1 / Marker B: 1/1 1/1 What s the probablty of IBD at marker B when You consder marker B alone? You consder both markers smultaneously?

Soluton I 1 I PI 1 PI I 1 PX 1 I 1 PX I Prob 1 1 1 1 1 1

Soluton I 1 I PI 1 PI I 1 PX 1 I 1 PX I Prob.5.67.65.65.66 1.5.3.65.15.58.5.3.65.5.13 1.5.15.65. 1 1.5.7.15. 1.5.15.5..5.3.65. 1.5.3.15..5.67.5.

Soluton Takng nto account all avalable genotype data PI 1.9 PI 1 1.4 PI 1.49 Consderng only one marker, the correspondng probabltes would be.44,.44 and.11. Qute a dfference!, but whch value do you expect to be more accurate?

The Lkelhood of Marker Data L M M P I 1 P I I 1 P X I I1 I IM 1... General, but slow unless there are only a few markers. How do we speed thngs up?

Extendng the MLS Method We ust change the defnton for the weghts gven to each confguraton!...... 1 1 1 M I R I L I X P I X X P I X X P I X P w +

Today Checkng accuracy of reported relatonshps Why s ths an mportant problem? Markov Chan for Dfferent Relatve Pars Lkelhood approaches to relatonshp nference

Verfyng relatonshps s crucal Genetc analyses requre relatonshps to be specfed Msspecfyng relatonshps can lead to tests of napproprate sze Inflate Type I error Decrease power

IBS Based Approach Relatve pars wll dffer n terms of ther genetc smlarty One way to contrast dfferent types of relatves s to compare ther overall smlarty, for example, by: Calculatng the mean IBS sharng Calculatng the varance of IBS sharng

Example ~8 marker genome scan Calculated IBS for each set of putatve relatonshps Unrelated pars Sblng pars Parent-offsprng pars

Putatve Unrelated Pars Mean.87 St. Dev..7

Parent-Offsprng Pars Mean 1.7 St. Dev..5

Putatve Sblng Pars Mean 1.3 St. Dev..9

Problem Indvduals Are Outlers Crcled pars are lkely msclassfed

Addtonal Informaton n Standard Devaton of IBS Sharng Standard Devaton of IBS Sharng Mean IBS Sharng

Addtonal Informaton n Standard Devaton of IBS Sharng Standard Devaton of IBS Sharng Mean IBS Sharng

Problems wth IBS Scores Ineffcent Ignore nformaton on allele frequences Ignore correlatons between neghborng markers not too bad f large amounts of data avalable Cannot dstngush some types of relatves

Strategy: Informaton we have: X observed genotypes at each marker p allele frequences at each marker θ - recombnaton fracton between consecutve markers PXR for each possble relatonshp R unrelated, half-sb, sb-pars, MZ twns

Lkelhood Sum over IBD states at each locaton Set of possble I changes wth R 1 1 1 1... I m I m I X P I I P I P L m

Notaton R Hypotheszed Relatonshp Ik Ikm, Ikf Allele sharng at locus k X Genotype par at locus k k α R P X1, X,..., X, I R k k1 k Jont probablty of data at frst k-1 markers and IBD vector I k at marker k

Detals on I Possble nhertance patterns, no sharng 1, share maternal allele,1 share paternal allele 1,1 share both alleles For convenence, separate IBD1 nto maternal and paternal sharng states

Algorthm for Lkelhood Calculaton + M M M k k k k k I X P R L t I X P R R R I P R, 1 1 1 α α α α

Relatonshp between I and R Probablty of I 1,, 1,,,1 and 1,1: MZ Twns,,, 1 Unrelated? Parent-Offsprng? Full sbs ¼, ¼, ¼, ¼ Maternal half sbs ½, ½,, Paternal half sbs?

PXI for pars of ndvduals GENOTYPE PX 1,X I for I X 1 X,,1 or 1, 1,1 4 p 3 p p p 3 p p p p p k p p p k 4p p p p p +p p p k 4p p p k p p p k kl 4p p p k p l

Transton Matrx Full Sbs 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1 1,, 1,1,1 1,, θ θ,, 1 1 1,, r r t r +

Transton Matrx Maternal Half Sbs 1 1 1 1,1,1 1,, 1,1,1 1,, θ θ, 1, 1 1 1,, r r t r

Transton Matrx Paternal Half Sbs 1 1 1 1,1,1 1,, 1,1,1 1,, θ θ, 1, 1,, r r t r

Transton Matrx Unrelated, 1,,1 1,1, 1 1,,1 1,1

Transton Matrx MZ twns, 1,,1 1,1, 1,,1 1,1 1

Example I Consder genotypes for one marker X 1 1/1, 1/1 Assume p 1.,.5 or.8 Calculate PXR for each relatonshp MZ twn, Full Sbs, Half-Sbs, Unrelated

Example II Consder genotypes for markers X 1 1/1, / X 1/1, / Assume p 1 p ½ Assume θ.58,.1 θ.5,.5 Calculate PXR for each relatonshp

Smulatons θ.1, M5 Inferred R True R Full Sbs Half Sbs Unrelated Full Sbs.914.85.1 Half Sbs.44.87.81 Unrelated <.1.59.941

Smulatons θ., M5 Inferred R True R Full Sbs Half Sbs Unrelated Full Sbs.948.5 <.1 Half Sbs.38.899.64 Unrelated <.1.6.938

Smulatons θ.1, M4 Inferred R True R Full Sbs Half Sbs Unrelated Full Sbs 1. <.1 <.1 Half Sbs <.1 1. <.1 Unrelated <.1 <.1 1.

Bayesan Approach Alternatve to smply maxmzng PXRr Incorporates pror nformaton on the expected frequency of each relatve par R R X P R Pror r R X P R Pror X r R P

More dstant relatonshps

Problem Consder some genome scan data 38 mcrosatellte markers Consder some par of ndvduals Putatve sblngs Observed Sharng Identcal for 379/38 genotype pars LGRMZ Twns LGRAny other >

Soluton: Allow for Genotypng Errors A small proporton of errors could lead to msclassfcaton Allow for possbly erroneous genotypes ε error rate parameter [ ] ² 1 1 ² 1, 1 1 G X G P X G P I X G P I G P G X P I X P + ε ε ε

Conclusons Lkelhood approach provdes relable manner to nfer relatonshps Can ncorporate multple lnked markers Some dstant relatonshps can only be dscerned by lkelhood approach

Today Checkng of Relatonshps for Pars of Indvduals Multpont algorthm for calculatng lkelhoods for genotype data

Recommended Readng Boehnke and Cox 1997, Am J Hum Genet 61:43-49 Optonal Broman and Weber 1998, AJHG 63:1563-4 McPeek and Sun, AJHG 66:176-94 Epsten et al., AJHG 67:119-31