Computations with Markers

Similar documents
GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)

Case-Control Association Testing. Case-Control Association Testing

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

F1 Parent Cell R R. Name Period. Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes

Prediction of genetic Values using Neural Networks

GBLUP and G matrices 1

Accounting for read depth in the analysis of genotyping-by-sequencing data

Biology. Revisiting Booklet. 6. Inheritance, Variation and Evolution. Name:

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Variance Component Models for Quantitative Traits. Biostatistics 666

BAYESIAN GENOMIC PREDICTION WITH GENOTYPE ENVIRONMENT INTERACTION KERNEL MODELS. Universidad de Quintana Roo, Chetumal, Quintana Roo, México.

The genomes of recombinant inbred lines

Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;

Solutions to Problem Set 4

3/4/2015. Review. Phenotype

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Bayesian Genomic Prediction with Genotype 3 Environment Interaction Kernel Models

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics

MIXED MODELS THE GENERAL MIXED MODEL

Objectives. Announcements. Comparison of mitosis and meiosis

Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution

Lesson 4: Understanding Genetics

Enduring Understanding: Change in the genetic makeup of a population over time is evolution Pearson Education, Inc.

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

Problems for 3505 (2011)

When one gene is wild type and the other mutant:

Software for genome-wide association studies having multivariate responses: Introducing MAGWAS

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

1. Understand the methods for analyzing population structure in genomes

Which of these best predicts the outcome of the changes illustrated in the diagrams?

Limited dimensionality of genomic information and effective population size

BIG IDEA 4: BIOLOGICAL SYSTEMS INTERACT, AND THESE SYSTEMS AND THEIR INTERACTIONS POSSESS COMPLEX PROPERTIES.

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

EXERCISES FOR CHAPTER 7. Exercise 7.1. Derive the two scales of relation for each of the two following recurrent series:

Heredity and Genetics WKSH

1. Draw, label and describe the structure of DNA and RNA including bonding mechanisms.

Genetic erosion and persistence of biodiversity

Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems

(Write your name on every page. One point will be deducted for every page without your name!)

Biology 211 (1) Exam 4! Chapter 12!

Yesterday s Picture UNIT 3D

Maize Genetics Cooperation Newsletter Vol Derkach 1

Efficient Haplotype Inference with Boolean Satisfiability

MODELLING STRATEGIES TO IMPROVE GENETIC EVALUATION FOR THE NEW ZEALAND SHEEP INDUSTRY. John Holmes

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

Introduction to population genetics & evolution

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

EVOLUTION UNIT. 3. Unlike his predecessors, Darwin proposed a mechanism by which evolution could occur called.

Bi-level feature selection with applications to genetic association

Notes for MCTP Week 2, 2014

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Distinctive aspects of non-parametric fitting

Lecture 9. QTL Mapping 2: Outbred Populations

Parts 2. Modeling chromosome segregation

Package BLR. February 19, Index 9. Pedigree info for the wheat dataset

Calculation of IBD probabilities

Linear Regression (1/1/17)

Quiz Section 4 Molecular analysis of inheritance: An amphibian puzzle

Variance Components: Phenotypic, Environmental and Genetic

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

progeny. Observe the phenotypes of the F1 progeny flies resulting from this reciprocal cross.

r/lt.i Ml s." ifcr ' W ATI II. The fnncrnl.icniccs of Mr*. John We mil uppn our tcpiiblicnn rcprc Died.

W i n t e r r e m e m b e r t h e W O O L L E N S. W rite to the M anageress RIDGE LAUNDRY, ST. H E LE N S. A uction Sale.

.-I;-;;, '.-irc'afr?*. P ublic Notices. TiffiATRE, H. aiety

H A M M IG K S L IM IT E D, ' i. - I f

(Genome-wide) association analysis

HEREDITY: Objective: I can describe what heredity is because I can identify traits and characteristics

Genotype Imputation and Haplotype Inference for Genome-wide Association Studies

' Liberty and Umou Ono and Inseparablo "

Case Studies in Ecology and Evolution

Big Idea #1: The process of evolution drives the diversity and unity of life

Parts 2. Modeling chromosome segregation

Genotype Imputation. Class Discussion for January 19, 2016

PALACE PIER, ST. LEONARDS. M A N A G E R - B O W A R D V A N B I E N E.

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Unit 2 Lesson 4 - Heredity. 7 th Grade Cells and Heredity (Mod A) Unit 2 Lesson 4 - Heredity

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

PanHomc'r I'rui;* :".>r '.a'' W"»' I'fltolt. 'j'l :. r... Jnfii<on. Kslaiaaac. <.T i.. %.. 1 >

MANY BILLS OF CONCERN TO PUBLIC

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

GLIDE: GPU-based LInear Detection of Epistasis

Nature Genetics: doi: /ng Supplementary Figure 1. The phenotypes of PI , BR121, and Harosoy under short-day conditions.

Lecture WS Evolutionary Genetics Part I 1

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics

NOTES CH 17 Evolution of. Populations

BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section. A/a ; B/B ; d/d X A/a ; b/b ; D/d

Biology 322 Fall 2009 Wasp Genetics: Genetic Heterogeneity and Complementation Revisted

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Family Trees for all grades. Learning Objectives. Materials, Resources, and Preparation

In animal and plant breeding, phenotypic selection indices

Science Unit Learning Summary

Unit 3 - Molecular Biology & Genetics - Review Packet

Computational Systems Biology: Biology X

Transcription:

Computations with Markers Paulino Pérez 1 José Crossa 1 1 ColPos-México 2 CIMMyT-México June, 2015. CIMMYT, México-SAGPDB Computations with Markers 1/20

Contents 1 Genomic relationship matrix 2 3 Big Data! CIMMYT, México-SAGPDB Computations with Markers 2/20

Genomic relationship matrix Genomic relationship matrix The genomic relationship matrix (G) appears naturally in several models used routinely in Genomic selection. VanRaden (2008) studied efficient methods to compute genomic predictions using this matrix. There are several ways of computing the G matrix, CIMMYT, México-SAGPDB Computations with Markers 3/20

Genomic relationship matrix 1 2 G = XX, where X is the matrix of marker genotypes of dimensions n p. For SNPs x ij {0, 1, 2}. G = (X E)(X E) 2 p j=1 p j(1 p j ), where p j is the minor allele frequency of SNP j = 1,..., p, and E is a matrix of expected frequencies of x ij under Hardy-Weiberg equilibrium from estimates of allelic frequencies. 3 G = ZZ p, where Z is the matrix of centered and standardized SNPs codes and p is the number of SNPs, that is z ij = (x ij 2p j )/ 2p j (1 p j ). CIMMYT, México-SAGPDB Computations with Markers 4/20

Continue... Genomic relationship matrix G = XX appears naturally when we assume that we can predict the phenotypes using the linear model: y = 1µ + Xβ + e, where e N(0, σ 2 ei) and β N(0, σ 2 β I). Let u = Xβ, by using the multivariate normal distribution, it can be shown that u N(0, XX ), and the model is equivalente to y = 1µ + u + e, which is usually known as G-BLUP. We will talk about this model later on. CIMMYT, México-SAGPDB Computations with Markers 5/20

Figure 1: Toy example for markers. CIMMYT, México-SAGPDB Computations with Markers 6/20

SNP coding 1 Additive effects 1 if the SNP is homozygous for the major allele x = 0 if the SNP is heterozygous 1 if the SNP is homozygous for the other allele 2 Dominant effects x = { 1 if the SNP is heterozygous 0 if the SNP is homozygous CIMMYT, México-SAGPDB Computations with Markers 7/20

Continue... #Clear workspace rm(list=ls()) #Set working directory setwd("c:/users/p.p.rodriguez/desktop/slides Paulino/2. Gmatrix/examples/") source("recode.r") source("impute.r") Genotype_info=read.csv(file="TC-10-Genotypes-ACGT.csv", header=true,na.strings="?_?",stringsasfactors=false) entry_genotype_info=genotype_info$entry Genotype_info=Genotype_info[,-c(1,2)] X=recode(Genotype_info)$X #Impute missing genotypes set.seed(123) out=impute(x) CIMMYT, México-SAGPDB Computations with Markers 8/20

Continue... #Note that marker 167 and 179 are #monomorphic and should be excluded from analysis out$monomorphic #Remove monomorphic markers, #At this point no more missing values are present X=out$X[,-out$monomorphic] #compute p phat=colmeans(x)/2 MAF=ifelse(phat<0.5,phat,1-phat) phat=maf hist(maf,main="") CIMMYT, México-SAGPDB Computations with Markers 9/20

Continue... Frequency 0 20 40 60 80 100 120 140 0.0 0.1 0.2 0.3 0.4 0.5 MAF Figure 2: Distribution of allele frequencies. CIMMYT, México-SAGPDB Computations with Markers 10/20

Computations: three ways #Computing the genomic relationship matrix G1=tcrossprod(X) X2=scale(X,center=TRUE,scale=FALSE) k=2*sum(phat*(1-phat)) G2=tcrossprod(X2)/k X3=scale(X,center=TRUE,scale=TRUE) G3=tcrossprod(X3)/ncol(X3) heatmap(g3) hist(diag(g3),main="") CIMMYT, México-SAGPDB Computations with Markers 11/20

Exercise 1 Load the weath dataset that we were using yesterday. 2 Compute the Genomic relationship matrix using equation 1. CIMMYT, México-SAGPDB Computations with Markers 12/20

Continue... 5 137 33 24 72 70 1362 34 53 29 142 28 43 103 583 61 107 131 91 32 77 75 47 119 69 102 89 79 26 12 145 110 41 96 105 39 86 35 94 81 99 109 60 27 42 139 87 74 37 50 10 132 88 98 101 68 92 19 57 143 133 83 130 84 80 67 121 82 30 126 239 106 1001 125 124 113 112 14 46 63 71 138 48 135 117 52 15 147 111 18 146 44 64 141 40 49 59 108 95 17 56 514 11 134 118 66 22 1158 104 25 144 76 85 45 120 90 54 16 36 78 55 62 20 73 93 148 65 38 129 13 21 31 97 140 123 114 127 128 116 122 5137 33 24 72 70 136 6234 53 29 142 28 43 103 58 361 107 131 91 32 77 75 47 119 69 102 89 79 26 12 145 110 41 96 105 39 86 35 94 81 99 109 60 27 42 139 87 74 37 50 10 132 88 98 101 68 92 19 57 143 133 83 130 84 80 67 121 82 30 126 23 9106 100 125 124 113 112 14 46 63 71 138 48 135 117 52 15 147 111 18 146 44 64 141 40 49 59 108 95 17 56 51 411 134 118 66 22 115 8104 25 144 76 85 45 120 90 54 16 36 78 55 62 20 73 93 148 65 38 129 13 21 31 97 7140 123 114 127 128 116 122 Figure 3: Heatmap of G matrix. CIMMYT, México-SAGPDB Computations with Markers 13/20

Continue... Frequency 0 10 20 30 40 50 60 0.5 1.0 1.5 2.0 2.5 3.0 diag(g3) Figure 4: Histogram of the diagonal elements of the G matrix. CIMMYT, México-SAGPDB Computations with Markers 14/20

Distance matrix The distance matrix, also appears naturally in RKHS models. We will review them in the next days, d ij = x i x j 2 = k (x ik x jk ) 2 Example: D=as.matrix(dist(X)) CIMMYT, México-SAGPDB Computations with Markers 15/20

Big Data! Big Data! The computation of the genomic relationship matrix is straight forward if the matrix X is small. There are application where the number of markers can be very big, CIMMYT, México-SAGPDB Computations with Markers 16/20

Big Data! Ober s prediction problem Ober et al. (2012) predicts starvation stress resistance and starle resistance in Drosophila using p = 2.5 millions SNPs and n = 192 D. melanogaster inbreed lines derived by 20 generations of full sib mating from wild-caught females from the Raleigh, North Carolina population. CIMMYT, México-SAGPDB Computations with Markers 17/20

Continue... Big Data! Prediction in D. melanogaster Using Sequence Data Genomic relationship matrix for Ober s data. Figure 2. Heatmap of the genomic relationship matrix G. The genomic relationship matrix G was calculated according to [8] using 157 lines and 2.5 million SNPs. The S after the line-id indicates that the line belongs to the set of lines for which phenotypic records for startle response were also available (in addition to the phenotypic records of starvation resistance). doi:10.1371/journal.pgen.1002685.g002 NeLf CIMMYT, México-SAGPDB Computations with Markers 18/20

Solution Big Data! Fortunately the computation of the G matrix can be fully paralleled in modern CPU processors, G ij = k (x ik 2p k )(x jk 2p k )/c When computing G ij only the genotypes of individuals (i, j) are needed. CIMMYT, México-SAGPDB Computations with Markers 19/20

Continue... Big Data! CIMMYT, México-SAGPDB Computations with Markers 20/20