Lecture 9 GxE Mixed Models. Lucia Gutierrez Tucson Winter Institute

Similar documents
Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

G E INTERACTION USING JMP: AN OVERVIEW

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

MIXED MODELS THE GENERAL MIXED MODEL

FinQuiz Notes

Multiple QTL mapping

A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding

Variance Component Models for Quantitative Traits. Biostatistics 666

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Random Effects of Time and Model Estimation

TWO-LEVEL FACTORIAL EXPERIMENTS: BLOCKING. Upper-case letters are associated with factors, or regressors of factorial effects, e.g.

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Overview. Background

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

Lecture 2: Linear and Mixed Models

TwoFactorAnalysisofVarianceandDummyVariableMultipleRegressionModels

QTL Mapping I: Overview and using Inbred Lines

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

TASK 6.3 Modelling and data analysis support

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar

Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks.

Generalized Linear Models

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Lecture 8 Genomic Selection

Lecture 7 Correlated Characters

Mixed-Models. version 30 October 2011

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs

Ch 6. Model Specification. Time Series Analysis

Overview of clustering analysis. Yuehua Cui

Modeling the Covariance

Answer to exercise: Blood pressure lowering drugs

(Genome-wide) association analysis

DESAIN EKSPERIMEN BLOCKING FACTORS. Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Lecture 4: Allelic Effects and Genetic Variances. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Step 2: Select Analyze, Mixed Models, and Linear.

c. M. Hernandez, J. Crossa, A. castillo

I Have the Power in QTL linkage: single and multilocus analysis

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

SUPPLEMENTARY INFORMATION

Accounting for spatial variability in forest genetic trials using breedr: a case study with black poplar

Repeated Measures Data

Chapter 13 Experiments with Random Factors Solutions

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Welcome to the Introduction to Augmented Design Webinar

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Mapping multiple QTL in experimental crosses

Review of CLDP 944: Multilevel Models for Longitudinal Data

Spatial analysis of a designed experiment. Uniformity trials. Blocking

Association studies and regression

Increasing precision by partitioning the error sum of squares: Blocking: SSE (CRD) à SSB + SSE (RCBD) Contrasts: SST à (t 1) orthogonal contrasts

BAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS. Universidad de Quintana Roo, Chetumal, Quintana Roo, México.

Lecture 14 Topic 10: ANOVA models for random and mixed effects. Fixed and Random Models in One-way Classification Experiments

Lecture 08: Standard methods. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Experimental Design and Data Analysis for Biologists

Lecture 9. QTL Mapping 2: Outbred Populations

Model Selection for Multiple QTL

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures

Lecture GxE interactions

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Models for longitudinal data

Review of Unconditional Multilevel Models for Longitudinal Data

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Review of Multilevel Models for Longitudinal Data

Computational Systems Biology: Biology X

Lecture 9 Multi-Trait Models, Binary and Count Traits

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Mapping QTL to a phylogenetic tree

Chapter 1 Statistical Inference

MULTIVARIATE ANALYSIS OF VARIANCE

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Blocks are formed by grouping EUs in what way? How are experimental units randomized to treatments?

Lecture 11: Multiple trait models for QTL analysis

QTL model selection: key players

Empirical Economic Research, Part II

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

Threshold models with fixed and random effects for ordered categorical data

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29

Mixed effects models

Inference using structural equations with latent variables

Chapter 12 REML and ML Estimation

Simple Examples. Let s look at a few simple examples of OI analysis.

Non-Linear Regression Samuel L. Baker

Transcription:

Lecture 9 GxE Mixed Models Lucia Gutierrez Tucson Winter Institute 1

Genotypic Means GENOTYPIC MEANS: y ik = G i E GE i ε ik The environment includes non-genetic factors that affect the phenotype, and usually has a large influence on quantitative traits. o Micro-environment. Environment of a single plant. Need to e controlled with experimental design. o Macro-environment. Environment associated to a location and time. GxE is the norm and not the exception in plants. Therefore defining the target environments is a crucial part in plant reeding, oth for variance component estimation and identyfing superior genotypes. Bernardo 1

Outline 1. How to control for micro-environmental variation? 1. Advanced Experimental Designs. Spatial Variation 3. Mixed Models for assumption flexiility. How to model macro-environmental variation to account for correlations and heterogeneity? 1. GxE. GxE Mixed Models 3. How to include GxE into QTL analysis? 1. QTLxE. QTLxE Example 3

Experimental Design and Analysis 1. Experimental Units. 1. Homogenous Complete Randomized Design (CRD). Heterogenous in one way Randomized Complete Block Design (RCBD) 3. Heterogenous in more than one way Latin squares or latinized designs.. Large numer of treatments. 1. Incomplete Block Designs (IBD or Alpha). Unreplicated experiments (or Federer) 3. Modeling (post-locking, spatial analysis) 4. Assumptions, 1. Independence MM to model correlations. Homogenous variances MM to model heterogeneity CRD RCBD IBD 1 3 3 1 4 4 1 3 4 4 1 3 1 3 4 4 1 3 yi µ α β γ ε = µ αi ε i yi = µ αi β ε y 4 i ik = i k ( ) ik

Assumptions Classical models are ased on some limiting assumptions: Errors are independent random variales with normal distriution and homogenous variances. DEPENDENCIES Design factors impose restrictions on randomizations that induce correlations (i.e. plots within a lock are more similar to each other than to plots on a different lock). If correlations exist, they should e included in the model to make valid inferences. Genotypes may e related imposing a correlation. Field heterogeneity also induces correlations. There might e a correlation etween environments. NON-HOMOGENOUS VARIANCES Both, genetic and environmental variances are affected y the environment (i.e. they are properties of the population). Therefore, heterogenous variances are common in field experiments. 5

Mixed Models in Field Experiments Mixed models are more flexile: correlations and heterogenous variances can e modeled. FIXED EFFECTS inference is aout specific treatments all levels of a fixed factor are included in the experiment interest in testing differences in means etween treatments need for identification constraints (sum to zero, cornerstone) RANDOM EFFECTS inference is aout a population of treatments testing the population variance of a treatment assumed to have a distriution, t i ~ N(, t ) structuring of variance-covariance, imposing correlations prediction from random effects provides est estimate of treatment rankings (BLUPs) 6

Fixed Effects vs. Random Effects FIXED EFFECTS Estimation y generalized least squares (conditional on VCOV parameters) H : t i =, for all levels Wald statistic is c r distriuted, with r = nr levels -1 Also F-approximations to Wald test can e used: Wald / r is approximately F-distriuted RANDOM EFFECTS Estimation of VCOV y (RE)ML H : t = Compare likelihood (deviance = -L) of nested models, i.e., models with and without variance component under test Approximate deviance differences y Chi-square on 1 df 7

MM to flexiilize assumptions Blocks may e considered as random effects to model the correlation of plots within a lock. This makes sense if numer of locks is sufficient to estimate a variance (i.e. >) y i = µ τ i ε i i = genotype index, = lock index ~N(, ) = random lock effect ε i ~N(, ) and ε i are assumed to e independent, i.e, cov( ;ε i ) = for oservations in same lock, covariance cov(y i ; y i ) = for oservations in same lock, correlation corr(y i ; y i )= / ( ) for oservations in different locks, covariance cov(y i ; y i ) = for oservations in general var(y i ) = 8

MM to flexiilize assumptions Σ = Σ = Independent oservations Oservations in the same lock correlated Compound symmetry 9

Numer of treatments HOW TO DEAL WITH HIGH NUMBER OF TREATMENTS? 1. STRATIFICATION: Group genotypes with similar characteristics (maturity, color, family), compare within groups. NO BETWEEN GROUP COMPARISONS.. PRODUCE HOMOGENOUS EXPERIMENTAL UNITS: Make every effort to homogenize experimental area (look for soil similarity, field conditions to reduce variation, choose seeds of similar vigor). 3. USE REPEATED CHECKS: You may use checks in a systematic way to control or model soil heterogeneity. 4. EXPERIMENTAL DESIGN WITH SPATIAL CONSIDERATIONS. Use experimental designs that include a large numer of treatments while controling variaility (i.e. alpha designs, unrep, etc.). 1

Repeated checks in a RCBD Randomized Complete Block Design: Mixed Models Modeling variance components (Test-lines random effects): y = µ Bi C Tk ε i( k 1) Estimating genotypic means (Test-lines fixed effects): y = Bi C Tk ε i( k 1) y = Xβ Zu e µ y = Xβ e C C C C C 11

Advanced Designs: Alpha Designs What are ALPHA-DESIGNS (Williams et al., )?: Designs that allows for the construction of incomplete locks with a large numer of treatments (t) and locks (k) so that t is multiple of k. Includes α(,1)-lattice designs, IBD, row-column, etc. BI 1 BI BI 3 BI 4 BI 5 BI 6 1 4 7 1 3 5 8 4 5 6 3 6 9 7 8 9 Pairs of treatments that share incomplete locks: 1 time (18 pairs): 1-, 1-3, -3, 4-5, 4-6, 5-6, 7-8, 7-9, 8-9, 1-4, 1-7, 4-7, -5, -8, 5-8, 3-6, 3-9, 6-9 times (18 pairs): 1-5, 1-6, 1-8, 1-9, -4, -6, -7, 9, 3-4, 3-5, 3-7, 3-8, 4-8, 4-9, 5-7, 5-9, 6-7, 6-8 1

Advanced Designs Two possile arrangements for an incomplete lock design with r =, v = 9 and k = 3 Replicate 1 Replicate Block 1 3 1 3 1 4 7 1 3 5 8 4 5 6 3 6 9 7 8 9 Replicate 1 Replicate Block 1 3 1 3 1 4 7 1 5 4 5 8 8 6 3 6 9 3 9 7 Which is the est design? 13

Advanced Designs DESIGN Block 1: ABC; Block : ABD; Block 3: ACD; Block 4: BCD Coincidence of treatments inside incomplete locks = COMPARISONS Direct: For A-B: Block 1 and : A-B Indirect: For A-B: (Block 1, A-C) (Block 4, B-C) = A-B Block Totals: Sum Block 3 Sum Block 4 = (ACD) (BCD) = A-B 14

Mixed Models in Advanced Designs Direct and indirect comparisons of treatment effects are comined in standard least squares, fixed effects, estimates = intra lock estimates. Information on treatment differences from lock totals ecomes availale only when locks are taken random = inter lock estimates. Comination of intra and inter lock estimates for treatment differences weighing the pieces of information y their (inverse) variances is done automatically in a REML analysis 15

Incomplete Block Designs (IBD) EXAMPLE. OIL CONTENT OF ADVANCED INBRED LINES. Treatment: sunflower IL Experimental design: IBD Resolule with r=3 and s=5 Dependent variale : Y = L ha -1 Incomplete lock 1 3 4 5 1 3 4 5 R1 6 7 8 9 1 11 1 13 14 15 16 17 18 19 1 3 4 5 R 7 8 9 1 6 13 14 15 11 1 19 16 17 18 1 3 4 5 R3 8 9 1 6 7 15 11 1 13 14 17 18 19 16 TREATMENT ASSIGNMENT: Each treatment is assigned randomly to the experimental units in the first rep. In the following reps, restrictions in the randomization are conducted such that each pair of treatment is compared the same numer of times within an incomplete lock. 16

Incomplete Block Designs (IBD) Yik = αi β γ k ( ) µ ε ik Y µ = population mean α β = effect of γ ε i ik = effect of k() ik = response of = effect of = the i - th treatment on the the i - th treatment the - th rep the k - th incomplete lock within the - th rep experimental error (residual) - th rep and the k - th incomplete lock IBD with augmented checks: Variance component estimation (Random Test-lines): y µ = Bi S ( i) Ck Tl i( k l 1) Genotypic means estimation (Fixed Test-lines): y = µ Bi S ( i) Ck Tl ε i( k l 1) ε 17

Row-Column Design (RC) Similar to incomplete locks. Two sources of variation are controled: rows and columns. Better control of field heterogeneity. row rep column 1 1 3 4 5 6 7 8 1 14 9 5 6 11 8 1 13 1 8 5 7 18 1 3 7 15 16 13 9 16 4 4 4 18 6 3 17 1 14 5 11 19 1 17 15 3 19 1 Yikl = αi β γ k ( ) λl ( ) µ ε ikl Y µ = population mean α β = effect of γ λ ε i ikl l() ikl = response of = effect of k() the i - th treatment on the - th rep and the k - th row and l - th column the i - th treatment the - th rep = effect of the k - th row within the - th rep = effect of the k - th column within the - th rep = experimental error (residual) 18

Federer s Unreplicated Design (UR) MAIN MOTIVATION: There is not enough seed for each genotype in early generation testing to replicate the genotypes. But then: 1. How do we control the sources of variation?. How do we estimate experimental error? 1 T1 13 T 14 5 T1 6 3 4 5 15 16 17 7 8 9 6 T3 7 18 T1 19 3 T3 31 8 9 1 1 3 33 34 11 T 1 3 T3 4 35 T 36 3 repeated checks in a RCBD 36 genotypes 19

Federer s Unreplicated Designs (UR) EXAMPLE. BIOMASS YIELD OF 5 BARLEY F5 Treatments: 5 Barley F5 Experimental design: Federer s unrep design (checks augmented in RCBD) Dependent variale: Y = Kg ha -1 TREATMENT ASSIGNMENT: RCBD for checks were used. To each lock a numer of genotypes is included. Different genotypes are included in the different locks. Y ik µ ε = Bi C Tk ( ) ik Y µ = population mean β = effect of ε ikl k() ikl = response C = effect of T the - th rep the - th repeated check = effect of the k - th test - line within the - th check = experimental error (residual)

Spatial Modeling In mixed models with random locks, all plots within a lock are equally correlated, ut etween locks plots are uncorrelated. It is more realistic that the correlation etween plots decays with the distance etween them. VCOV for individual trials can e modeled as a product of a decaying correlation (for example: AR1) in row direction and another decaying correlation (for example: AR1) in column direction. Spatial modeling of VCOV can e additional to locks, or sustitute of locks (ut then e careful). Experimental design vs. post-locking. 1 ρ ρ 3 ρ ρ 1 ρ ρ ρ ρ 1 ρ 3 ρ ρ ρ 1 1

Model Comparison INFORMATION CRITERIA Especially for non-nested models, information criteria (AIC, BIC) may provide alternative to likelihood ratio tests AIC = -L t BIC = -L t logn t = # of variance parameters n = # of residual degrees of freedom = (noservations - nfixed_parameters). Best model has smallest AIC/BIC if using REML estimates make sure fixed effects are the same to make valid comparisons across models.

Why using Mixed Models? Greater flexiility in modeling variance-covariance structure/ dependencies etween oservations. Accounting for heterogeneity of variance and correlation. For many situations linear mixed models provide a more natural way of modeling than standard linear models. Recovery of (inter-lock) information. Shrinkage prediction of effects (BLUPs), which is of importance in genetics. Allows modeling of dependencies for spatial & temporal (locking) and genetic reasons. 3

Genotype y Environment Interaction R G Genotype1 Genotype Genotype 1 Genotype G1 ENV 1 ENV E1 E 4

Genotype y Environment Interaction R No GxE G R GxE: divergence G G1 G1 R GxE: convergence E1 E G R GxE: cross-over G G1 G1 E1 E E1 E 5

Genotype y Environment Interaction R GxE: convergence G R GxE: divergence G G1 G1 E1 E E1 E With oth divergence and convergence, it is easy to make predictions ecause there is no cross-over interaction. G is the est genotype in all the environment. However, there is heterogeneity of variance that needs to e taken into account in models for proper estimation. 6

Genotype y Environment Interaction R GxE: cross-over R GxE: cross-over G G G1 G1 E1 E E1 E With cross-over interaction, predictions should e made y environment. There is not a genotype est in all the environments. Careful also with heterogeneity of variance. 7

Genotype y Environment Interaction MULTI-ENVIRONMENT TRIALS Used to characterize a set of genotypes over varying conditions Trials in different locations Trials with different practices (agronomy) Trials over multiple years Information Does a genotype perform well over all environments? If not, in which specific environments? Can a genotype profit from improvements of the environment? 8

Multiple environments ONE-STAGE ANALYSIS P ik = Gi E Dk ( ) µ GE ε Analyze field-plot data and model GxE simultaneously. Need information from experimental design and replications. i ik TWO-STAGES ANALYSIS First stage: analysis per trial Quality control (assumptions/outliers/etc) Otain predictions per genotype Second stage: use the genotype y environment tale of predictions GxE analysis QTLxE analysis P i Trial 1 Trial Trial n GxE tale of means (predictions) = µ Gi E GEi 9

Multiple environments The analysis of MET data aims at finding an adequate model for the phenotypic responses as a function of genetic and environmental factors modelling the mean Reliale conclusions depends on an appropriate structure for the residual ε i Assumption of independence of residuals etween environments is highly unrealistic (in which case this assumption is valid?). A more realistic model assumes residuals coming from some multivariate normal distriution. Finding an appropriate structure for ε i that reflects the heterogeneity of genetic variances and correlations a necessary first step towards reliale conclusions on µ i 3

Diagonal env4 env5 env env1 env3 31

Diagonal P = µ ε i E i VCOV ( ε i Corr( Env ) 1 = ; Env * ) = * 3 = 4 Each environment has its own (residual) genetic variance (that is confounded with GxE variance). There is no genetic correlation etween environments 3

Compound Symmetry 6 5 4 3 7 1 5 3 1 4 7 3 5 3 1 1-1 - 6 4 7 6 5 4 3 1 6 6 8 7 5 6 1 5 4 3 1 4 4-1 8 6 5 3 4 3 1-1 33

Compound Symmetry * ) ; ( ) ( GE G G GE G GE G G GE G G G G GE G G G GE G G GE G i Env Env Corr VCOV ε = = = i i E P ε µ = Each environment has same (residual) genetic variance (that is confounded with GxE variance), and the genetic correlation is also the same etween all pairs of environments. 34

Unstructured env3 env env1 env4 env5 35

Unstructured P = µ ε i E i VCOV ( ε i ) = Corr( Env ; Env 1 1 31 41 * ) = 3 4 * * 3 43 4 Each environment has its own (residual) genetic variance (that is confounded with GxE variance), and the genetic correlation can change etween any pair of environments. 36

Factor Analytic i i i i i i i i z x E x E G E E β α µ µ α µ µ µ µ µ µ = = = = ) )( ( ) ; ( ) ( * * * * * 4 4 4 3 4 4 1 4 3 3 3 3 1 3 1 1 1 1 i Env Env Corr VCOV δ λ λ δ λ λ λ λ δ λ λ λ λ λ λ λ λ δ λ λ λ λ λ λ δ λ λ λ λ δ λ λ ε = = Heterogeneity of variances and correlations possile at the price of relatively few parameters 37

Finding a suitale model for the VCOV Use different summary statistics and diagnostic plots Summary statistics per environment Correlations etween environments Boxplots Scatter plots Biplots Fit different mixed models assuming different VCOV and compare the goodness of fit of them y some criterion (eg: AIC or BIC). 38

QTL x E WHY DO I NEED TO INCLUDE QTLxE? o GxE is common in plants and multi-environment evaluation for Plant Breeding required. o Modeling GxE to estimate means (BLUE) or predict BLUP is not enough? o It is possile that a QTLxE interaction exists so that some markers are favorale in one environment ut not in another one. o Identifying general QTL and environment specific QTL is helpful in selecting est genotypes. Additionally, correct error terms should e used when QTL are eing evaluated. 39

QTL x E Phenotypic data configuration Environments / time Covariales Environmental covariales Covariales Grid of genomic positions Marker Map Genotypes Phenotypic data GxE tale of means Geno-typic co-variales Genetic predictors (Genotypic covariales) Marker scores 4

QTL x E Genomewide scan with QTLxE 41

QTL x E STEPS TO COMPLETE A QTLxE ANALYSIS 1. Identify appropriate model for GxE.. Use the appropriate model to test each genomic position for the presence of a QTL (MR/SIM). 3. Use candidate QTL as cofactors to re-scan de genome (CIM). 4. Adust a final multi-qtl with ackward elimination of candidate QTL. Estimate QTL effects. 4

Information needed 1. Molecular marker scores High throughput panels, controlled conditions, repeatale, cheap, automatic scoring.. Genetic map More standard methods, small population sizes, consensus maps? Need some more development. 3. Phenotypes Crucial part, poor phenotypes means poor QTL mapping. 43

Phenotyping 1. Field-plot technique - Good techniques - Control experimental error. Experimental Design Diseases - Randomized complete lock design - Alpha-designs (RIBD, R-CD, etc.) - Augmented designs Plant Height Flag Leaf Length 3. Analysis - Post-hoc spatial corrections - Other modeling 44

Maize example (CIMMYT, MX) STRESS TRIALS 199 (Tlaltizapán, México) Well watered (WW) Intermediate stress (IS) Severe stress (SS) 1994 (Tlaltizapán, México) Intermediate stress (IS) Severe stress (SS) 1996 (Poza Rica, México) Low Nitrogen ( seasons) High Nitrogen Tlaltizapán Poza Rica Malosetti, 11 45

Genotypic performance Low correlation GxE Mean performance are different Good environments (NS9a = no stress) Bad environments (LN96a, LN96 = low N) Variaility is also different Higher NS9a Lower LN96 GxE? Possily yes, ut we can t really see it here... 46

Genotypic performance 3 groups of environments Best model for this data: Factor analytic Malosetti, 11 Model AIC SIC Deviance NParameters FA 17471 1754 17439 16 FA 17455 1753 1749 3 OUTSIDE 1753 17554 1755 9 UNSTRUCTURED 17456 17577 17384 36 HCS 1769 177 17674 9 CS 17918 1794 17914 DIAGONAL 1796 17933 1789 8 IDENTITY 1887 189 1885 47 1 Best model: FA (on asis of criterion SIC)

Marker information P i = µ x α ε i i We enrich the original model y including markers We include genetic predictors Additive effect are environment-specific! Partition G and GxE into Part explained y markers (=QTLs) Part NOT explained (residual G* and GxE* = ε) Need appropriate model for the residual ε (variancecovariance model) Malosetti, 11 48

QTLxE (SIM; VCOC=FA) Profile for environment specific QTLs Positive effect P1 allele (dark/light lue) Positive effect P allele (red/yellow) 6 5 -log1(p) 4 3 1 pos itive ne gati ve Malosetti, 11 Color code for p- values of QTL effects 49

SIM: CIM1: CIM SIM CIM 1 CIM After SIM some extra QTLs picked y CIM No maor change after second round of CIM, so stop Six candidate QTLs Malosetti, 11 5

Final Model Summary Trait: yld Population type: F Numer of genotypes: 11 Numer of environments: 8 Numer of linkage groups: 1 Numer of markers: 1 Variance-covariance model: FA List of QTLs Locus no. Locus name Linkage group Position -log1(p) QTLxE 19 L85 1 141. 13.76 yes 4 CP36 35.9 4.665 yes 73 L35 3 55.7 4.661 yes 11 L71 4 136.6 3.57 no 159 L43 6 15. 3.71 yes 37 C1P6 1 6.15 8.313 yes All 6 candidate QTLs retained in the final model But note that QTL on linkage group 4 is a main effect QTL (QTLxE term dropped) Malosetti, 11 51

QTL Location: linkage group 1 position 141 Environment Effect S.e. P %Expl. CI_LL CI_UL var. HN96-37.466 13.35.4 3.1. 66. IS9a 55.33 13.44. 7.. 66. IS94a 56.19 14.6. 7.. 66. LN96a.117 6.66.986. * * LN96 1.577 5.915.79. * * NS9a 63.76 18.95.1 5.. 66. SS9a 7.19 1.193. 15.1 14.55 157.45 SS94a 7.543 14.678.61 1.7 * * Effects changes from environment to environment Also sign of effect cross-over interaction Which allele to select? Location: linkage group 4 position 136.6 Environment Effect S.e. P %Expl. CI_LL CI_UL var. HN96-16.369 4.55..6. 167.5 IS9a -16.369 4.55..6. 167.5 IS94a -16.369 4.55..6. 167.5 LN96a -16.369 4.55. 3.1. 167.5 LN96-16.369 4.55. 3.4. 167.5 NS9a -16.369 4.55..3. 167.5 SS9a -16.369 4.55..8. 167.5 SS94a -16.369 4.55..6. 167.5 What is the difference with the previous one? Which allele to select? Malosetti, 11 5