One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

Similar documents
Single Marker Analysis and Interval Mapping

Multiple QTL mapping

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Mapping multiple QTL in experimental crosses

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

Introduction to QTL mapping in model organisms

Statistical issues in QTL mapping in mice

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

Introduction to QTL mapping in model organisms

Users Manual of QTL IciMapping v3.1

QTL Mapping I: Overview and using Inbred Lines

Mapping multiple QTL in experimental crosses

Mapping QTL for Seedling Root Traits in Common Wheat

Introduction to QTL mapping in model organisms

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Evolution of phenotypic traits

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

(Genome-wide) association analysis

Overview. Background

Looking at the Other Side of Bonferroni

Case-Control Association Testing. Case-Control Association Testing

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Non-specific filtering and control of false positives

Mapping QTL to a phylogenetic tree

Sample Size Estimation for Studies of High-Dimensional Data

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

Lecture 9. QTL Mapping 2: Outbred Populations

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Introductory Econometrics

Average weight of Eisenhower dollar: 23 grams

Gene mapping in model organisms

Applied Statistics for the Behavioral Sciences

Statistics Handbook. All statistical tables were computed by the author.

Lecture 11: Multiple trait models for QTL analysis

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

False Discovery Rate

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

Principles of QTL Mapping. M.Imtiaz

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Multiple-Interval Mapping for Quantitative Trait Loci Controlling Endosperm Traits. Chen-Hung Kao 1

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

F79SM STATISTICAL METHODS

Mathematical Notation Math Introduction to Applied Statistics

False discovery rate control for non-positively regression dependent test statistics

I Have the Power in QTL linkage: single and multilocus analysis

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Chapter 5: HYPOTHESIS TESTING

Ch. 11 Inference for Distributions of Categorical Data

Statistical testing. Samantha Kleinberg. October 20, 2009

Model Selection for Multiple QTL

Methods for QTL analysis

The Chi-Square Distributions

CENTRAL LIMIT THEOREM (CLT)

Introductory Econometrics. Review of statistics (Part II: Inference)

Sociology 6Z03 Review II

Difference in two or more average scores in different groups

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

Statistics: CI, Tolerance Intervals, Exceedance, and Hypothesis Testing. Confidence intervals on mean. CL = x ± t * CL1- = exp

Causal Model Selection Hypothesis Tests in Systems Genetics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Lecture 21: October 19

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

Bayesian Regression (1/31/13)

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Lab I: Three-Point Mapping in Drosophila melanogaster

Lecture WS Evolutionary Genetics Part I 1

Econ 325: Introduction to Empirical Economics

Unit 12: Analysis of Single Factor Experiments

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Hypothesis Testing. ) the hypothesis that suggests no change from previous experience

Exam 2 (KEY) July 20, 2009

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

On the mapping of quantitative trait loci at marker and non-marker locations

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Chapter 7: Hypothesis Testing

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Introduction to Statistical Inference

BTRY 4830/6830: Quantitative Genomics and Genetics

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

2.3 Analysis of Categorical Data

Quantitative Genetics & Evolutionary Genetics

Single Sample Means. SOCY601 Alan Neustadtl

One-Way Analysis of Variance (ANOVA) Paul K. Strode, Ph.D.

Lecture 6. QTL Mapping

1 Hypothesis testing for a single mean

Multiple testing: Intro & FWER 1

Transcription:

One-week Course on Genetic Analysis and Plant Breeding 21-2 January 213, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation Jiankang Wang, CIMMYT China and CAAS E-mail: jkwang@cgiar.org; wangjiankang@caas.cn Web: http://www.isbreeding.net 1

Outlines Hypothesis testing and two types of associated error LOD threshold in QTL mapping QTL detection power simulation Avoid the over fitting problem in 2

Hypothesis testing and two types of associated error 3

Hypothesis testing A hypothesis is a statement that something is true. Null hypothesis: A hypothesis to be tested. We use the symbol H to represent the null hypothesis Alternative hypothesis: A hypothesis to be considered as an alternative to the null hypothesis. We use the symbol H a to represent the alternative hypothesis. The alternative hypothesis is the one believe to be true, or what you are trying to prove is true.

Two types of error We may make mistakes in the test. Type I error: reject the null hypothesis when it is true. Probability of type I error is denoted by α Type II error: accept the null hypothesis when it is wrong. Probability of type II error is denoted by β

Power of a statistical test P(reject the null hypothesis when it is false)=1-β (1-α) is the probability we accept the null when it was in fact true (1-β) is the probability we reject when the null is in fact false - this is the power of the test. The power changes depending on what the actual population parameter is.

Factors affecting power For example: H : µ=µ, H a : µ>µ Test statistic Z X µ σ n If we want to reject H, we need X σ µ n = Z So the power depends on δ =, σ, n, and α α x µ

δ is small δ is large The larger the difference δ is, the higher the power is X ~ N(µ,σ 2 /n) If H is true, X ~N(µ,σ 2 /n) If H a is true, X ~N(µ +δ,σ 2 /n) H H α α μ μ Ha Ha β 1-β β 1-β μ+δ μ+δ

Large standard error or small population size Small standard error or large population size The smaller the standard error or the larger the population size is, the higher the power is H H α α μ μ Ha Ha β 1-β β 1-β μ+δ μ+δ

Small α Large α The larger α is, the higher the power is H H α α μ Ha Ha β 1-β β 1-β μ+δ

Let s find some Type I and II errors Consider the Binomial distribution H: p=.; Ha: p.; n=6, X~B(n=6, p) Reject H when X=, or 6 Accept H when X=1,..., α =P(X= p=.) + P(X=6 p=.) =.16+.16=.312 Given p=.7, find the Type II error β, and the statistical power β= P(1<=X<= p=.7)=.8218; Power=.1782 Given p=.9, find the Type II error β, and the statistical power β= P(1<=X<= p=.9)=.4686; Power=.314

X~Binomial (n, p), n=6 {, } is the rejection region, i.e. EstP=. or 1. {1,2,3,4} is the acceptance region X H: p=. p=.7 p=.9.162.244.1 1.937.439.4 2.23437.3299.121 3.312.131836.148 4.23437.296631.9841.937.397.34294 6.162.177979.31441 Type I error Power Power.312.178223.31442 Type II error Type II error.821777.4688

Let s find some Type I and II errors Binomial distribution H: p=.; Ha: p.; n=3, X~B(3, p) Reject H when X<=9, or >=21 Accept H when X=1,..., 2 Find α Given p=.7, find the Type II error β, and the statistical power Given p=.9, find the Type II error β, and the statistical power

X H: p=. p=.7 p=.9... 1... 2... 3.4.. 4.26...133.. 6.3.. 7.1896.. 8.41.. 9.1332.. 1.27982.2. 11.876.8. 12.83.4. 13.1113.166. 14.1343.63. 1.144464.1931. 16.1343.43. 17.1113.13414.2 18.83.296.13 19.876.7.74 2.27982.986.36 21.1332.12987.16 22.41.1939.764 23.1896.166236.1843 24.3.1446.47363 2.133.14728.123 26.26.642.17766 27.4.2683.23688 28..8631.22766 29..1786.14134 3..179.42391 Type I error Power Power.42774.8347.99946 Type II error Type II error.19693.44

LOD threshold in QTL mapping Sun, Z., H. Li, L. Zhang, J. Wang*. 213. Properties of the test statistic under null hypothesis and the calculation of LOD threshold in quantitative trait loci (QTL) mapping. Acta Agronomica Sinica (accepted) 1

Threshold is used to control Type I error, say no greater than..1 Probability density.8.6.4.2 Pr{T>T. )=α=. α=. T α=. =18.3 Say we know a test under H hypothesis has the χ 2 (df=1) distribution, the use of threshold 18.3 can make sure the Type I error <. 16

The reason to control Type I error High LOD score can be simply caused by chance! We simulated five DH populations and five F2 population. But we did not assume any QTL on the six chromosomes 17

LOD LOD LOD LOD LOD 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 No QTL in DH populations 1111111111111222222222222333333333333444444444444666666666666 18 No QTL on the six chromosomes, DH population

LOD LOD LOD LOD LOD 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 No QTL in F2 populations 1111111111111222222222222333333333333444444444444666666666666 19 No QTL on the six chromosomes, F2 population

Distribution of LRT under H at each scanning position Probability..4.3.2.1 DH population 1 2 3 4 6 7 8 9 1 LRT Probability In DH populations, LRT ~ χ 2 (df=1) In F2 populations, LRT ~ χ 2 (df=2) D.F. is equal to the number of independent genetic effects to be estimated 2..4.3.2.1 F2 population 1 2 3 4 6 7 8 9 1 LRT

Number of independent tests DH population, genome-wide Type I error =. DH population, genome-wide Type I error =.1 Indepedent tests 4 3 2 1 MD=1 cm MD=2 cm MD= cm MD=1 cm MD=2 cm 8 11 14 17 2 Length of chromosome (cm) y =.13x y =.126x y =.99x y =.72x y =.4x Indepedent tests 4 3 2 1 MD=1 cm MD=2 cm MD= cm MD=1 cm MD=2 cm 8 11 14 17 2 Length of chromosome (cm) y =.213x y =.16x y =.113x y =.84x y =.82x 21

LOD threshold, assuming marker density is 1 cm Genome Genome-wide α=. Genome-wide α=.1 size DH RIL F2 DH RIL F2 1.61 1.84 2.4 2.37 2.6 3.18 7 1.77 2.1 2.7 2.3 2.73 3.36 1 1.88 2.12 2.7 2.6 2.84 3.49 1 2.4 2.28 2.87 2.81 3.1 3.66 2 2.16 2.4 3. 2.93 3.13 3.79 2 2.24 2.49 3.1 3.2 3.22 3.88 3 2.32 2.6 3.17 3.1 3.29 3.96 2.2 2.77 3.4 3.31 3. 4.18 1 2.8 3. 3.7 3.9 3.79 4.49 1 2.97 3.22 3.87 3.76 3.9 4.66 2 3.9 3.33 4. 3.88 4.7 4.79 3 3.2 3. 4.17 4.4 4.24 4.96 4 3.37 3.62 4.3 4.16 4.36.9 22

LOD threshold from permutation test Highest LOD per test 6 4 3 2 1 Barly DH population LOD threshold = 2.72 at level. Highest LOD per test 12 1 8 6 4 2 Soybean F2 population LOD threshold = 2.44 at level. 1 2 3 4 6 7 8 9 1 Permutation tests 1 2 3 4 6 7 8 9 1 Permutation test 23

QTL detection power simulation Zhang, L., H. Li, J. Wang*. 212. Statistical power of inclusive composite interval mapping in detecting digenic epistasis showing common F2 segregation ratios. Journal of Integrative Plant Biology 4: 27-279 Li, H., S. Hearne, M. Bänziger, Z. Li, and J. Wang*. 21. Statistical properties of QTL linkage mapping in biparental genetic populations. Heredity 1: 27-267. 24

Two independent QTL models Chromosome Position (cm) Additive PVE (%) Independent model I Q1 1 3.316. Q2 2 3.447 1. Q3 3 3.48 1. Q4 4 3.633 2. Genetic variance 1. Error variance 1. Heritability. Independent model II Q1 1 3.316. Q2 2 3 -.447 1. Q3 3 3.48 1. Q4 4 3 -.633 2. Genetic variance 1. Error variance 1. Heritability. 2

Two linked QTL models Chromosome Position (cm) Additive PVE (%) Linkage model I Q1 1 3.316 3.9 Q2 1 6.447 7.9 Q3 2 3.48 11.8 Q4 2 6.633 1.8 Genetic variance 1.3 Error variance 1. Heritability.66 Linkage model II Q1 1 3.316 6.8 Q2 1 6 -.447 13.7 Q3 2 3.48 2. Q4 2 6 -.633 27.3 Genetic variance.46 Error variance 1. Heritability.317 26

LOD LOD LOD LOD LOD 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 QTL mapping in simulation runs of the two independent models Independent model I Run 1 IM 11111111122222222333333334444444466666666 Run 2 IM 11111111122222222333333334444444466666666 Run 3 11111111122222222333333334444444466666666 Run 4 IM 11111111122222222333333334444444466666666 Run IM IM 11111111122222222333333334444444466666666 LOD score LOD score LOD score LOD score LOD score 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 Independent model II Run 1 IM 11111111122222222333333334444444466666666 Run 2 IM 11111111122222222333333334444444466666666 Run 3 IM 11111111122222222333333334444444466666666 Run 4 IM 11111111122222222333333334444444466666666 Run IM 11111111122222222333333334444444466666666 27

LOD score LOD score LOD score LOD score LOD score 2 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 QTL mapping in simulation runs of the two linkage models Linkage model I Run 1 IM 11111111122222222333333334444444466666666 Run 2 IM 11111111122222222333333334444444466666666 Run 3 11111111122222222333333334444444466666666 Run 4 IM 11111111122222222333333334444444466666666 Run IM IM 11111111122222222333333334444444466666666 LOD score LOD score LOD score LOD score LOD score 2 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 Linkage model II Run 1 IM 11111111122222222333333334444444466666666 Run 2 IM 11111111122222222333333334444444466666666 Run 3 IM 11111111122222222333333334444444466666666 Run 4 IM 11111111122222222333333334444444466666666 Run IM 11111111122222222333333334444444466666666 28

Count of power and false QTL for IM Run QTL identified Support interval Chrom. Position LOD PVE (%) Additive 1 cm 1 cm 1 2 2 4.97 11.44.3 False Q2 3 3.61 13.3.41 Q3 Q3 4 4 13.21 26.22.761 Q4 Q4 2 2 34.36 13.1.9 Q2 Q2 3 34.82 13.72.21 Q3 Q3 4 3 11.9 23.43.682 Q4 Q4 3 1 39. 11.22.8 Q1 Q1 2 32 4.3 1.9.482 Q2 Q2 3 4 8.3 18.42.61 False False 4 36 8.6 18..63 Q4 Q4 4 1 4 3.97 1.21.42 False Q1 2 36 2.69 6.81.343 Q2 Q2 3 34 8.92 19.66.83 Q3 Q3 4 36 8.79 2.1.91 Q4 Q4 3 33 3.8 8.16.389 Q3 Q3 4 3 11.71 26.6.71 Q4 Q4 29

Count of power and false QTL for Run QTL identified Support interval Chrom. Position LOD PVE (%) Additive 1 cm 1 cm 1 1 47 3.8.6.33 False False 2 38 6.79 9.11.448 Q2 Q2 3 33 9.7 13.81.1 Q3 Q3 4 38 16.72 2..73 Q4 Q4 2 1 3 4.6 6.26.32 Q1 Q1 2 36 9.7 12.6. Q2 Q2 3 31 7.93 1.41.44 Q3 Q3 4 27 16.77 24.93.73 False Q4 3 1 36 7.2 1.23.486 Q1 Q1 2 32 6. 8.1.432 Q2 Q2 3 38 9.2 13.63.6 Q3 Q3 4 38 13. 19.18.664 Q4 Q4 4 1 3 3.99.13.298 Q1 Q1 2 37 4.4.89.319 Q2 Q2 3 33 14.21 21.68.613 Q3 Q3 4 36 13.73 21.23.67 Q4 Q4 1 3 4.91 8.4.384 Q1 Q1 2 1 4.3 6.87.36 False False 3 34.3 9.4.419 Q3 Q3 4 3 17.46 31.6.764 Q4 Q4 3

Power and false QTL from the runs Method QTL Times to be detected Detection power (%) 1cM 2cM 1cM 2cM IM Q1 1 2 2 4 Q2 3 4 6 8 Q3 4 4 8 8 Q4 1 1 False QTL 3 1 19 6 Q1 4 4 8 8 Q2 4 4 8 8 Q3 1 1 Q4 4 8 1 False QTL 3 2 1 1 31

The best method Has the highest power Has the lowest false discovery rate 32

Independent model I Method QTL Power Pos. (%) (cm) SE LOD SE Additive SE IM Q1 2.8 3.182 3.461 3.849 1.3.421.6 Q2 62.6 34.861 3.14.84 1.692.48.82 Q3 77.7 3.6 2.669 7.13 2.178.6.92 Q4 8.4 3.67 2.464 9.2 2.84.63.9 FDR (%) 32.4 Q1 49. 34.867 3.184 4.667 1.66.34.62 Q2 73.9 34.874 2.769 7.16 2.29.4.77 Q3 82.8 34.98 2.21 1.161 2.71.48.78 Q4 89. 3.16 2.278 13.87 3.229.632.83 FDR (%) 22.6 33

Independent model II Method QTL Power Pos. (%) (cm) SE LOD SE Additive SE IM Q1 27.3 34.83 3.414 3.86 1.11.424.62 Q2 64.2 3.62 3.3.6 1.67 -.478.78 Q3 78.6 34.96 2.76 6.963 2.143.8.9 Q4 84.6 34.86 2.481 9.374 2.441 -.64.89 FDR (%) 31.1 Q1 49.2 34.831 3.24 4.89 1.64.32.63 Q2 76.1 3.3 2.861 7.142 2.328 -.448.76 Q3 8.6 3.1 2.484 1.193 2.7.48.81 Q4 9. 34.939 2.32 13.23 3.221 -.634.82 FDR (%) 21.3 34

Linkage model I Method QTL Power Pos. (%) (cm) SE LOD SE Additive SE IM Q1 24.1 3.448 2.77 6.944 2.92.62.99 Q2 49. 64.79 2.49 7.89 2.278.66.14 Q3 4.2 3.79 1.648 17.17 3.1.937.9 Q4 6. 64.16 1.624 18.682 3.39.971.93 FDR (%) 3.1 Q1 26.9 3.33 3.1 7.33 3.466.449.118 Q2. 64.872 2.71 1.19 4.184.8.133 Q3 77. 34.92 2.618 1.6 3.89.9.113 Q4 84.2 64.828 2.33 13.668 4.761.649.13 FDR (%) 26.4 3

Linkage model II Method QTL Power Pos. (%) (cm) SE LOD SE Additive SE IM Q1.3 33.333 4.714 2.96.431.313.13 Q2 2.3 66.34 3.22 3.667 1.29 -.34. Q3 6.8 31.691 2.68 3.176.3.334.31 Q4 4.6 67.746 2.68 4.169 1.32 -.373.61 FDR (%) 38.9 Q1 11.6 34.216 3.61.1 1.91.37.66 Q2 33. 66.179 3.3.872 3.188 -.42.18 Q3 6.2 34.383 2.894 8.332 3.381.492.14 Q4 6.9 6.984 2.429 11.413 4.131 -.91.114 FDR (%) 23.8 36

Size of the mapping population PVE (%) Marker density cm Marker density 1 cm Power.8 Power.9 Power.8 Power.9 1 3 6 4 >6 2 16 3 28 32 3 11 2 18 2 4 1 16 14 18 8 14 12 14 1 8 7 8 2 4 6 6 3 4 4 4 4 37

Avoid the over fitting problem in 38

Over-fitting can cause fake QTL LOD 6 4 QTL by overfitting qkwt2h-2 qkwt2h-1 qkwt2h-3 qkwt3h QTL by overfitting qkwth QTL by overfitting qkwt7h-1 qkwt7h-2 2 1. 1H 1H 1H 1H 1H 1H 1H 2H 2H 2H 2H 2H 2H 2H 2H 3H 3H 3H 3H 3H 3H 3H 4H 4H 4H 4H 4H 4H H H H H H H H H H H 6H 6H 6H 6H 6H 6H 7H 7H 7H 7H 7H 7H 7H 7H Additive effect -. -1-1. 1H 1H 1H 1H 1H 1H 1H 2H 2H 2H 2H 2H 2H 2H 2H 3H 3H 3H 3H 3H 3H 3H 4H 4H 4H 4H 4H 4H H H H H H H H H H H 6H 6H 6H 6H 6H 6H 7H 7H 7H 7H 7H 7H 7H 7H One-dimensional scaning on the barley genome, step = 1 cm 39

How can I know there is an overfitting problem? R 2 in step-wise regression exceeds the broadsense heritability There are closely linked QTL identified, especially the QTL are linked in repulsion PIN in step-wise.1.1..1 regression R 2.7289.7963.8131.8886 Use smaller PIN to avoid over-fitting problem 4