1. Understand the methods for analyzing population structure in genomes
|
|
- Kristian Webb
- 6 years ago
- Views:
Transcription
1 MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population structure in genomes 2. Understand the methods for identifying disease loci in genomes 3. Explore the approach for identifying structure variants in genomes What to hand in. One report (in pdf format) addressing each of following questions including the figures generated by R when appropriate. All source code for the R exercises. We should be able to run the source code and produce the figures requested. Submit a zip file containing the completed code (if any) and the pdf file (if any) to autolab. The zip file should have the following structure./s2016hw3.pdf./q3/ put all codes related to Q3 here, if any
2 1. [15 points] Hardy-Weinberg Equilibrium (a) (5 points) Show that the Hardy-Weinberg equilibrium holds for three alleles. [Hint: Assume allele frequencies p, q, and r (p + q + r = 1) for each of the three alleles A 1, A 2, and A 3.] Based on the allele frequencies, we could get the genotype frequencies in the offspring. For genotype A 1 A 2, P (A 1 A 2 ) = pq + qp = 2pq The frequencies of all possible genotypes could be found in the following table, Genotype Frequency A 1 A 1 p 2 A 1 A 2 2pq A 1 A 3 2pr A 2 A 2 q 2 A 2 A 3 2qr A 3 A 3 r 2 Based on these phenotype frequencies, we could calculate the allele frequencies p, q and r in the offspring. p = 2p2 + 2pq + 2pr 2 q = 2q2 + 2pq + 2qr 2 r = 2r2 + 2pr + 2qr 2 = p = q = r Thus the Hardy-Weinberg equilibrium holds for three alleles. (b) (5 points) The numbers of individuals with genotypes AA, Aa, and aa at a locus are given as 232, 36, and 6, respectively. Perform a chi-square test to see if the Hardy-Weinberg Equilibrium holds for this locus at significance level α = Use the degree of freedom 1. We use p to represent the allele frequency for A, and q for a. The total number of observations is = p = = q = = Then we calculate the expected genotype frequencies. Afterwards we calculate the χ 2 statistics. E(AA) = = E(aa) = = 2.1 E(Aa) = = 44.0 χ 2 ( )2 = = (6 2.1) (36 44)2 44 By checking the χ 2 distribution table, we could find χ ,df=1 = Since 8.77 > 3.841, we reject the null hypothesis and the Hardy-Weinberg Equilibrium doesn t hold. 2
3 (c) (5 points) The Write-Fisher model as illustrated in the lecture note can be considered as a Markov chain. If we denote the number of allele A in the population in generation n by X n, then we recognize that the sequence X 0, X 1,..., is a Markov chain, the set of possible outcomes being {0, 1, 2,..., 2N}. The transition matrix of the chain is given by a binomial distribution B(2N, i/2n): ( ) 2N ( i ) j ( p ij = p(x n+1 = j X n = i) = 1 i ) 2N j j 2N 2N Show that E(X n+1 X n = i) = i. How does it relate to Hardy-Weinberg equilibrium? i Since p(x n+1 X n = i) B(2N, 2N ), E(X n+1 X n = i) = 2N i 2N = i. Since E(X n+1 X n = i) = X n, the Hardy-Weinberg equilibrium holds for the expected frequencies. 2. [5 points] HMM in PHASE and STRUCTURE Assuming K ancestral chromosomes, the transition probabilities in the hidden Markov models in PHASE as well as those embedded in the linkage model extension of STRUCTURE model the presence/abscence of recombination events between locus l and locus l + 1 with distance d l. The transition probabilities from ancestral chromosome state labels z l = k to z l+1 = k for k, k {1,..., K} are given as { P (z l+1 = k exp( d l r) + (1 exp( d l r))q k if k = k z l = k) = (1 exp( d l r))q k otherwise where r is the per-basepair recombination rate and q i s for i = 1,..., K are prior probabilities for each of the K states that sum to 1. Consider the case where the underlying genome block structure has z l = z l+1 = k but has a small segment from the m k ancestral chromosome inserted between loci l and l + 1. How is this scenario modeled by the transition probabilities above? We use R to stand for the number of recombination events between loci l and l + 1, which follow a Poisson distribution. Since r is the per-basepair recombination rate and the distance between locus l and locus l + 1 is d l. The mean value of the Poisson distribution is d l r. Thus, the density function of the Poisson distribution is as follows, p(r) = (d lr) R e d lr R! Suppose z l = z l+1 = k, P (z l+1 = k z l = k) could be calculated as follows, P (z l+1 = k z l = k) = P (R = 0) + P (R > 0) (1) = p(r = 0) + P (R = 1) + P (R = 2) + P (R = 3) + (2) K K K = p(r = 0) + p(r = 1)q k + p(r = 2) q i q k + p(r = 3) q i q j q k + (3) i=1 i=1 j=1 K = p(r = 0) + p(r = 1)q k + p(r = 2)q k q i + p(r = 3)q k K i=1 i=1 j=1 K q i q j + (4) K = p(r = 0) + p(r = 1)q k + p(r = 2)q k + p(r = 3)q k + (Because q i = 1) = p(r = 0) + q k (1 p(r = 0)) (6) = e dlr + (1 e dlr )q k (7) i=1 (5) 3
4 If z l = z l+1 = k and there is one small segment from the m k ancestral chromosome inserted between loci l and l + 1, the probability of this scenario could be calculated more explicitly. P (one small insertion) = p(r = 2)q m q k, m k The probability above is a fraction of the term P (R = 2) in equation (2). 3. [10 points] PCA and Population Structure Consider the SNP genotype data for 5912 loci on chromosome 2 from 423 individuals provided with this homework in file snp.txt. Each of the individuals are from one of the following six populations: CEU: Utah residents with Northern and Western European ancestry from the CEPH collection CHB: Han Chinese in Beijing, China JPT: Japanese in Tokyo, Japan LWK: Luhya in Webuye, Kenya MEX: Mexican ancestry in Los Angeles, California YRI: Yoruba in Ibadan, Nigeria The ancestry labels for each individual are provided in file sample names with population labels.txt. (a) (5 points) Perform PCA and plot the ancestry of the individuals on 2 dimensions using the first two principal components, as was discussed in the class. Use different colors for different true ancestry to plot the individuals in 2 dimensions after PCA. Include your plot and code. If you perform PCA on the snp matrix without scaling, or perform PCA on the covariance matrix constructed by 1 n X X without scaling 0.04 Populations CEU PC CHB JPT LWK MEX YRI PC1 4
5 If you perform PCA on the covariance matrix constructed by cov() function, 0.05 Populations CEU CHB PC2 JPT LWK 0.00 MEX YRI PC1 If you scale the original snp matrix and perform PCA on it, 20 Populations CEU PC2 0 CHB JPT LWK MEX YRI PC1 5
6 library ( ggplot2 ) popdata <- read. table (" sample_names_with_population_labels. txt ", header = FALSE ) colnames ( popdata ) <-c(" sample.id "," pop_code ") snpdata <- read. table (" snp. txt ", header = FALSE ) pcadata <- prcomp (t( snpdata ), center =TRUE, scale = TRUE ) tmppcadata <- cbind (as. data. frame ( pcadata$x [,1:2]), popdata$pop_code ) colnames ( tmppcadata ) <- c(" PC1 "," PC2 "," Populations ") tmppcadata$populations <- factor ( tmppcadata$populations ) p <- ggplot ( tmppcadata, aes (x=pc1,y=pc2, colour = Populations )) p+ geom_point ( size =2) If you scale the original snp matrix and perform PCA on the covaraince matrix constructed by 1 n X X, 0.05 Populations 0.00 CEU CHB PC2 JPT LWK 0.05 MEX YRI PC1 (b) (5 points) Which ethnic groups are similar in terms of their genomes? Which ethnic groups are different in terms of their genomes? From the plot, we could find there are three clusters. Each of them is formed by two ethnic group. Ethnic groups fall in different clusters are quite different. (1) The CHB (China) and JPT (Japan) groups overlap with each other pretty well. (2)The majority of LWK (Kenya) and YRI (Nigera) groups overlaps. (3)The CEU (Utah, European ancestry) and MEX (California, Mexican ancestry) groups share only a small intersection. 6
7 Any other pairs of ethnic groups are very different in terms of their genomes. 4. [10 points] Linkage Analysis Compute the probabilities of the following pedigrees assuming Penetrance model is p(affected dd) = 0.1, p(affected Dd) = 0.2, p(affected DD) = 0.7. The allele frequency of D is 0.02 Shaded means affected, blank means unaffected (a) (5 points) Since the allele frequency of D is 0.02, the allele frequency of d is = Further we could calculate the phenotype frequencies. P (DD) = = P (dd) = = P (Dd) = = From the genotypes of the offspring, we could infer the phenotype of M1 could be dd or Dd. P (pedigree M1 is Dd) = P (Dd)P (Dd)P (dd Dd, Dd)P (Dd Dd, Dd) P (unaffected Dd) P (affected Dd)P (unaffected dd)p (affected Dd) = (1 0.2) 0.2 (1 0.1) 0.2 = P (pedigree M1 is dd) = P (dd)p (Dd)P (dd dd, Dd)P (Dd dd, Dd) P (unaffected dd) P (affected Dd)P (unaffected dd)p (affected Dd) = (1 0.1) 0.2 (1 0.1) 0.2 = Sum up these two probabilities and we could get the probability of the pedigree. p pedigree = P (M1 is Dd) + P (M1 is dd) =
8 (b) (5 points) From the genotypes of the offspring, we could infer the only possible phenotype of M1 is Dd. P pedigree = P (pedigree M1 is Dd) = P (Dd)P (Dd)P (dd Dd, Dd)P (Dd dd, Dd)P (DD Dd, Dd) P (unaffected Dd) P (affected Dd)P (unaffected dd)p (unaffected Dd)P (affected DD) = (1 0.2) 0.2 (1 0.1) (1 0.2) 0.7 = [23 points] Genome-wide Association Studies (a) (5 points) Given the following data, perform chi-square tests to test the association between a given locus and case/control status. Control Case Major allele homozygous heterozygous Minor allele homozygous The null hypothesis H 0 is that there is no association between a given locus and case/control status. Suppose the two alleles here are A (major) and a (minor). The total number of control samples is 126 and the total number of case samples is 125. We first calculate the allele frequency under the null hypothesis. The total number of major allele homozygous, heterozygous and minor allele homozygous are 85, 76 and 90 correspondingly P (A) = 2 ( ) = P (a) = 2 ( ) = 0.51 Allele based The observed allele count table is as follows, Control Case Major allele (A) = = 75 Minor allel(a) = = 175 The expected allele count table is as follows, 8
9 Control Case Major allele (A) = = Minor allel(a) = = Then we calculate the χ 2 test statistics, χ 2 ( )2 = = ( ) By checking the χ 2 distribution table, we could find χ ,df=1 = Since 71.97>3.84, we reject the null hypothesis and there is an association between a given locus and case/control status. Genotype based Control Case Major allele homozygous = = heterozygous = = Minor allele homozygous = = Then we calculate the χ 2 test statistics, χ 2 ( )2 = = ( ) By checking the χ 2 distribution table, we could find χ ,df=2 = Since 52.07>5.99, we reject the null hypothesis and there is an association between a given locus and case/control status. Allele+Genotype based Although you could get the same answer, this is not the right way to do it. Because we don t know whether Hardy-Weinberg Equilibrium holds for current generation or not. (b) (3 points) Assuming the chi-square test in (a) above is one of 100,000 loci that were tested for associations. What is the adjusted p-value after Bonferroni correction? Allele based The p-value for the χ 2 test statistics is p(71.97, df = 1) = p 0. The adjusted p-value after Bonferroni correction is 10 5 p 0 = 10 5 p 0. Genotype based The p-value for the χ 2 test statistics is p(52.07, df = 2) = The adjusted p-value after Bonferroni correction is = (c) (5 points) Bonferroni correction is effective when all the statistical tests are independent of each other. Consider performing case/control genome wide association studies for type II diabetes based on African individuals. Consider performing the same type of study on European population. In general, African population is more ancient and African genomes have weaker linkage disequilibrium than European population. Would Bonferroni correction be more effective in African or in European population? Why? Since African genomes have weaker linkage disequilibrium than European population, each loci of African genomes are is more likely to be independent of each other. Thus the Bonferroni correction could be more effective in African population. 9
10 (d) (5 points) Given the following data, perform chi-square tests to test the association between a given locus and case/control status. Control Case Major allele homozygous heterozygous 1 2 Minor allele homozygous 1 2 The null hypothesis H 0 is that there is no association between a given locus and case/control status. Suppose the two alleles here are A (major) and a (minor). The total number of control samples is 107 and the total number of case samples is 104. We first calculate the allele frequency under the null hypothesis. The total number of major allele homozygous, heterozygous and minor allele homozygous are 205, 3 and 3 correspondingly P (A) = 2 ( ) = P (a) = 2 ( ) = 0.02 Allele based The observed allele count table is as follows, Control Case Major allele (A) = = 202 Minor allel(a) = = 6 The expected allele count table is as follows, Control Case Major allele (A) = = Minor allel(a) = = 4.16 Then we calculate the χ 2 test statistics, χ 2 ( )2 = = 1.22 (6 4.16) By checking the χ 2 distribution table, we could find χ ,df=1 = Since 1.22<3.84, the null hypothesis is not violated and there is not an association between a given locus and case/control status. Genotype based Control Case Major allele homozygous = = heterozygous = = 1.48 Minor allele homozygous = = 1.48 Then we calculate the χ 2 test statistics, χ 2 ( )2 = = (2 1.48)
11 By checking the χ 2 distribution table, we could find χ ,df=2 = Since 0.74<5.99, the null hypothesis is not violated and there is not an association between a given locus and case/control status. Allele+Genotype based Similarly we don t know whether Hardy-Weinberg Equilibrium holds for current generation or not. If you do the calculation, you could find you will draw a wrong conclusion. (e) (5 points) In (b), what is the minor allele frequency in the whole population including all samples? Can you reliably conclude on the significance of the association? Why? The minor allele frequency is 2 ( ) = 0.51 in (b). The allele frequency is fairly large which makes the significance of the association reliable. But in (d), the minor allele frequency is 2 ( ) = The sample size containing minor alleles is too small for the association study in (d), so the significance of the association is not so reliable. 6. [7 points] Haplotypes and Genome-wide Association Studies Consider the genome data below collected from case (patient) and control (normal healthy) individuals. Our goal is to see if the haplotypes formed by the three SNPs influence the disease susceptibility. Case: Individual 1...C...T..G....C...T..G. Individual 2...T...G..A....C...T..G. Individual 3...C...T..A....C...T..G. Control Individual 4...T...G..A....T...G..A. Individual 5...C...T..A....C...T..A. (a) (2 points) List haplotype alleles. Haplotype alleles are CTG, TGA and CTA. (b) (5 points) Create a contingency table that you can use for chi square test. The contingency table is as follows, Case Control Total CTG TGA CTA Total [10 points] Structural Variants Assume you are performing paired-end sequencing of a region of your own genome to see if it contains an insertion or deletion compared to the reference genome. Assume the distribution of bp distances between the two sequenced fragments (or insert sizes) in each mate pair (collected genome-wide) is given as in the lecture note. 11
12 (a) (5 points) If there was a homozygous insertion of length 100bp in your genome, what would be the distribution of the distances between the two sequenced fragments in each mate pair from your own genome? Suppose the mean value of the real distribution is 400, 0.02 group density Measured Distribution Real Distribution distance (b) (5 points) If there was a heterozygous insertion of length 100bp in your genome, what would be the distribution of the distances between the two sequenced fragments in each mate pair from your own genome? Suppose the mean value of the real distribution is 400, 12
13 0.02 group density Measured Distribution Real Distribution distance 13
CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation
CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism
More informationLinkage and Linkage Disequilibrium
Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies
More informationCase-Control Association Testing. Case-Control Association Testing
Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies
More informationPopulation Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda
1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationLecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015
Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.
More informationFor 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.
STAT 550 Howework 6 Anton Amirov 1. This question relates to the same study you saw in Homework-4, by Dr. Arno Motulsky and coworkers, and published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationLECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50
LECTURE #10 A. The Hardy-Weinberg Equilibrium 1. From the definitions of p and q, and of p 2, 2pq, and q 2, an equilibrium is indicated (p + q) 2 = p 2 + 2pq + q 2 : if p and q remain constant, and if
More informationMethods for Cryptic Structure. Methods for Cryptic Structure
Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More information1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:
.5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the
More informationIntroduction to Advanced Population Genetics
Introduction to Advanced Population Genetics Learning Objectives Describe the basic model of human evolutionary history Describe the key evolutionary forces How demography can influence the site frequency
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationQuestion: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?
October 12, 2009 Bioe 109 Fall 2009 Lecture 8 Microevolution 1 - selection The Hardy-Weinberg-Castle Equilibrium - consider a single locus with two alleles A 1 and A 2. - three genotypes are thus possible:
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationEXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?
Statistical Genetics Agronomy 65 W. E. Nyquist March 004 EXERCISES FOR CHAPTER 3 Exercise 3.. a. Define random mating. b. Discuss what random mating as defined in (a) above means in a single infinite population
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationStatistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014
Overview - 1 Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Elizabeth Thompson University of Washington Seattle, WA, USA MWF 8:30-9:20; THO 211 Web page: www.stat.washington.edu/ thompson/stat550/
More informationTutorial Session 2. MCMC for the analysis of genetic data on pedigrees:
MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation
More informationNotes on Population Genetics
Notes on Population Genetics Graham Coop 1 1 Department of Evolution and Ecology & Center for Population Biology, University of California, Davis. To whom correspondence should be addressed: gmcoop@ucdavis.edu
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationGenotype Imputation. Class Discussion for January 19, 2016
Genotype Imputation Class Discussion for January 19, 2016 Intuition Patterns of genetic variation in one individual guide our interpretation of the genomes of other individuals Imputation uses previously
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#5:(Mar-21-2010) Genome Wide Association Studies 1 Experiments on Garden Peas Statistical Significance 2 The law of causality...
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationNature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.
Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.
More informationTheoretical and computational aspects of association tests: application in case-control genome-wide association studies.
Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus
More informationHomework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:
Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships
More informationLearning ancestral genetic processes using nonparametric Bayesian models
Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew
More informationBinomial Mixture Model-based Association Tests under Genetic Heterogeneity
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,
More informationBTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014
BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y
More informationPOPULATION STRUCTURE 82
POPULATION STRUCTURE 82 Human Populations: History and Structure In the paper Novembre J, Johnson, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann A, Nelson MB, Stephens M, Bustamante
More informationSolutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin
Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele
More informationThe genomes of recombinant inbred lines
The genomes of recombinant inbred lines Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman C57BL/6 2 1 Recombinant inbred lines (by sibling mating)
More informationBreeding Values and Inbreeding. Breeding Values and Inbreeding
Breeding Values and Inbreeding Genotypic Values For the bi-allelic single locus case, we previously defined the mean genotypic (or equivalently the mean phenotypic values) to be a if genotype is A 2 A
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm
More informationThe Quantitative TDT
The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus
More informationEffect of Genetic Divergence in Identifying Ancestral Origin using HAPAA
Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA Andreas Sundquist*, Eugene Fratkin*, Chuong B. Do, Serafim Batzoglou Department of Computer Science, Stanford University, Stanford,
More informationOutline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white
Outline - segregation of alleles in single trait crosses - independent assortment of alleles - using probability to predict outcomes - statistical analysis of hypotheses - conditional probability in multi-generation
More informationPopulation Genetics I. Bio
Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn
More informationGenetic Association Studies in the Presence of Population Structure and Admixture
Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College
More informationAEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,
AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity, Today: Review Probability in Populatin Genetics Review basic statistics Population Definition
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 18: Introduction to covariates, the QQ plot, and population structure II + minimal GWAS steps Jason Mezey jgm45@cornell.edu April
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationAnalysis of DNA variations in GSTA and GSTM gene clusters based on the results of genome-wide data from three Russian populations taken as an example
Filippova et al. BMC Genetics 2012, 13:89 RESEARCH ARTICLE Open Access Analysis of DNA variations in GSTA and GSTM gene clusters based on the results of genome-wide data from three Russian populations
More informationHow to analyze many contingency tables simultaneously?
How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin, 31.10.2012 Outline Motivation: Genetic association studies Statistical
More informationMODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES
MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by
More informationBTRY 7210: Topics in Quantitative Genomics and Genetics
BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:
More informationSNP Association Studies with Case-Parent Trios
SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature
More informationCOMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics
COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung
More informationPopulations in statistical genetics
Populations in statistical genetics What are they, and how can we infer them from whole genome data? Daniel Lawson Heilbronn Institute, University of Bristol www.paintmychromosomes.com Work with: January
More informationHumans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase
Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs
More informationNotes for MCTP Week 2, 2014
Notes for MCTP Week 2, 2014 Lecture 1: Biological background Evolutionary biology and population genetics are highly interdisciplinary areas of research, with many contributions being made from mathematics,
More informationStatistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017
Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 I. χ 2 or chi-square test Objectives: Compare how close an experimentally derived value agrees with an expected value. One method to
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationThe Wright-Fisher Model and Genetic Drift
The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population
More informationWeierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN
Weierstraß-Institut für Angewandte Analysis und Stochastik Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN 2198-5855 On an extended interpretation of linkage disequilibrium in genetic
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationIntroduction to Natural Selection. Ryan Hernandez Tim O Connor
Introduction to Natural Selection Ryan Hernandez Tim O Connor 1 Goals Learn about the population genetics of natural selection How to write a simple simulation with natural selection 2 Basic Biology genome
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationFigure S1: The model underlying our inference of the age of ancient genomes
A genetic method for dating ancient genomes provides a direct estimate of human generation interval in the last 45,000 years Priya Moorjani, Sriram Sankararaman, Qiaomei Fu, Molly Przeworski, Nick Patterson,
More informationClassical Selection, Balancing Selection, and Neutral Mutations
Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationProblems for 3505 (2011)
Problems for 505 (2011) 1. In the simplex of genotype distributions x + y + z = 1, for two alleles, the Hardy- Weinberg distributions x = p 2, y = 2pq, z = q 2 (p + q = 1) are characterized by y 2 = 4xz.
More informationMajor Genes, Polygenes, and
Major Genes, Polygenes, and QTLs Major genes --- genes that have a significant effect on the phenotype Polygenes --- a general term of the genes of small effect that influence a trait QTL, quantitative
More informationSolutions to Problem Set 4
Question 1 Solutions to 7.014 Problem Set 4 Because you have not read much scientific literature, you decide to study the genetics of garden peas. You have two pure breeding pea strains. One that is tall
More informationAn Efficient and Accurate Graph-Based Approach to Detect Population Substructure
An Efficient and Accurate Graph-Based Approach to Detect Population Substructure Srinath Sridhar, Satish Rao and Eran Halperin Abstract. Currently, large-scale projects are underway to perform whole genome
More informationIntroduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies
Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies Confounding in gene+c associa+on studies q What is it? q What is the effect? q How to detect it?
More informationParts 2. Modeling chromosome segregation
Genome 371, Autumn 2017 Quiz Section 2 Meiosis Goals: To increase your familiarity with the molecular control of meiosis, outcomes of meiosis, and the important role of crossing over in generating genetic
More information(Write your name on every page. One point will be deducted for every page without your name!)
POPULATION GENETICS AND MICROEVOLUTIONARY THEORY FINAL EXAMINATION (Write your name on every page. One point will be deducted for every page without your name!) 1. Briefly define (5 points each): a) Average
More informationLecture 13: Population Structure. October 8, 2012
Lecture 13: Population Structure October 8, 2012 Last Time Effective population size calculations Historical importance of drift: shifting balance or noise? Population structure Today Course feedback The
More informationA consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation
Ann. Hum. Genet., Lond. (1975), 39, 141 Printed in Great Britain 141 A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation BY CHARLES F. SING AND EDWARD D.
More information8. Genetic Diversity
8. Genetic Diversity Many ways to measure the diversity of a population: For any measure of diversity, we expect an estimate to be: when only one kind of object is present; low when >1 kind of objects
More informationLab 12. Linkage Disequilibrium. November 28, 2012
Lab 12. Linkage Disequilibrium November 28, 2012 Goals 1. Es
More informationBackward Genotype-Trait Association. in Case-Control Designs
Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,
More informationIntroduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)
1/45 Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) Dr. Yen-Yi Ho (hoyen@stat.sc.edu) Feb 9, 2018 2/45 Objectives of Lecture 6 Association between Variables Goodness
More informationAsymptotic distribution of the largest eigenvalue with application to genetic data
Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene
More informationEM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works
EM algorithm The example in the book for doing the EM algorithm is rather difficult, and was not available in software at the time that the authors wrote the book, but they implemented a SAS macro to implement
More informationMathematical models in population genetics II
Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population
More informationSTT 843 Key to Homework 1 Spring 2018
STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ
More informationPopulation genetics snippets for genepop
Population genetics snippets for genepop Peter Beerli August 0, 205 Contents 0.Basics 0.2Exact test 2 0.Fixation indices 4 0.4Isolation by Distance 5 0.5Further Reading 8 0.6References 8 0.7Disclaimer
More informationLecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency
Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Outline Genetics of complex
More informationOutline of lectures 3-6
GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 007 Population genetics Outline of lectures 3-6 1. We want to know what theory says about the reproduction of genotypes in a population. This results
More informationF SR = (H R H S)/H R. Frequency of A Frequency of a Population Population
Hierarchical structure, F-statistics, Wahlund effect, Inbreeding, Inbreeding coefficient Genetic difference: the difference of allele frequencies among the subpopulations Hierarchical population structure
More informationEXERCISES FOR CHAPTER 7. Exercise 7.1. Derive the two scales of relation for each of the two following recurrent series:
Statistical Genetics Agronomy 65 W. E. Nyquist March 004 EXERCISES FOR CHAPTER 7 Exercise 7.. Derive the two scales of relation for each of the two following recurrent series: u: 0, 8, 6, 48, 46,L 36 7
More informationOutline of lectures 3-6
GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 009 Population genetics Outline of lectures 3-6 1. We want to know what theory says about the reproduction of genotypes in a population. This results
More informationSupporting Information
Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider
More informationParts 2. Modeling chromosome segregation
Genome 371, Autumn 2018 Quiz Section 2 Meiosis Goals: To increase your familiarity with the molecular control of meiosis, outcomes of meiosis, and the important role of crossing over in generating genetic
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationCase Studies in Ecology and Evolution
3 Non-random mating, Inbreeding and Population Structure. Jewelweed, Impatiens capensis, is a common woodland flower in the Eastern US. You may have seen the swollen seed pods that explosively pop when
More information