Learning gene regulatory networks Statistical methods for haplotype inference Part I
|
|
- Jared Hodge
- 5 years ago
- Views:
Transcription
1 Learning gene regulatory networks Statistical methods for haplotype inference Part I Input: Measurement of mrn levels of all genes from microarray or rna sequencing Samples (e.g. 200 patients with lung cancer) Lecture 3 May 21 th, 2013 ENOME 541, Spring 2013 Su In Lee S & SE, UW suinlee@uw.edu 1 oal: Reconstruct the gene regulatory network underlying genome wide gene expression Method: Probabilistic models to represent the regulatory network e 1 e 6 e p 2 Unknown structure, complete data E B P( E,B) H H?? H L?? L H?? L L?? E, B, <H,L,L> <H,L,H> <L,L,H> <L,H,H>.. <L,H,H> E B Learner E B E B P( E,B) H H.9.1 H L.7.3 L H.8.2 L L Score based learning Define scoring function that measures how well a certain structure fits the observed data. E, B, <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y>.. <N,Y,Y> Score( 1 ) = 10 Score ( 2 ) = 1.5 Score ( 3 ) = 0.01 E 1 B E 2 3 B B E Network structure is not specified Learner needs to estimate both structure and parameters Data does not contain missing values 3 Search for a structure that maximizes the score. 4 1
2 hallenges oo large search space What is the number of possible structures of n genes? omputationally costly Heuristic approaches may be trapped to local maxima. Biologically motivated constraints can alleviate the problems Module based approach ~ 3 n Only the genes in the candidate regulators list can be parents of other variables 5 2 / 2-3 x x he Module networks concept X3 X5 = Module3 X2 Repressor X4 ctivator X3 Linear PDs X 1 Module 1 Module 2 X4 Module 3 X6 activator expression repressor expression target gene expression induced repressed ctivator X3 false true Repressor X4 false ree PDs true regulation program Module 3 genes 6 Segal et al. Nat enet 03, JMLR 05 Linear module network andidate regulators (X 1,,X N ) Expression levels of genes that have regulatory roles ranscription factors, signal transduction proteins, mrn processing factors, M1 M22 Feature selection via regularization ssume linear aussian PD MLE: solve maximize w (Σw i x i Y) 2 KEM1 MK1 HP1 MSK1 DHH1 SS1 SE59 EM18 S7 ME3 UH1 P1 MF1 E1 Regulatory Program? argets PHO3 PHO5 PHO84 RIM15 PHM6 PHO2 PHO4 SS5 SPL2 I1 V3 x 1 x 2 w 1 w 2 w N parameters Y x N Regulatory network andidate regulators (features) Yeast: 350 genes Mouse: 700 genes P(Y x:w) = N(Σw i x i, ε 2 ) Problem: his objective learns too many regulators Lee et al., PLoS enet
3 L 1 regularization Select a subset of regulators ombinatorial search? Effective feature selection algorithm: L 1 regularization (LSSO) [ibshirani, J. Royal. Statist. Soc B. 1996] minimize w (Σw i x i Y) 2 + w i : convex optimization! Induces sparsity in the solution w (Many w i s set to zero) Linear module network Iterative procedure -3 x P1 + Learn a regulatory program for each module 0.5 x MF1 luster genes into modules = Module M1 MSK1 EM18 KEM1 UH1 P1 DHH1 MK1 E1 M22 S7 MF1 ME3 x 1 x 2 w 1 w 2 w N Y x N andidate regulators (features) Yeast: 350 genes Mouse: 700 genes P(Y x:w) = N(Σw i x i, ε 2 ) HP1 SS1 SE59 Better? PHO3 PHO5 PHO84 RIM15 L 1 regularized optimization PHM6 minimize PHO2 w (Σw i x i -E argets PHO4 )2+ w i SS5 SPL2 I1 V3 9 Lee et al., PLoS enet Summary Haplotype inference (5/21, 5/23) Basic concepts on Bayesian networks Probabilistic models of gene regulatory networks Learning algorithms Parameter learning Structure learning Structure discovery Evaluation Recent probabilistic approaches to reconstructing the regulatory networks 11 Background & motivation Problem statement Statistical methods for haplotype inference lark s algorithm Expectation Maximization (EM) algorithm oday oalescent based methods and HMM Haplotype inference on sequence data Example applications 12 3
4 enetic variation Single nucleotide polymorphism (SNP) Each variant is called an allele; each allele has a frequency chromosome SNP 1 SNP 2 SNP 3 P(SNP 1 = ) = 4/8 P(SNP 1 = ) = 4/8 : How about the relationship between alleles of neighboring SNPs? We need to know about haplotype 13 Haplotype combination of alleles present in a chromosome Each haplotype has a frequency, which is the proportion of chromosomes of that type in the population here are 2 N possible haplotypes But in fact, far fewer are seen in human population chromosome haplotype 123 SNP 1 SNP 2 SNP 3 Haplotype frequencies P(haplotype 123 = ) = 3/8 P(haplotype 123 = ) = 2/8 P(haplotype 123 = ) = 2/8 P(haplotype 123 = ) = 1/8 14 History of two neighboring alleles lleles that exist today arose through ancient mutation events Before mutation History of two neighboring alleles One allele arose first, and then the other Before mutation fter mutation Mutation fter mutation Mutation 15 Haplotype: combination of alleles present in a chromosome 16 4
5 Recombination can create more haplotypes Without recombination No recombination (or 2n recombination events) With recombination Recombination (2n+1 recombination events) 17 Recombinant haplotype 18 Haplotype What determines haplotype frequencies? Recombination rate (r) between neighboring alleles in the population r is different for different regions in genome Linkage disequilibrium (LD) Non random association of alleles at two or more loci, not necessarily on the same chromosome. haplotype 123 SNP 1 SNP 2 SNP 3 Haplotype frequencies P(haplotype 123 = ) = 3/8 P(haplotype 123 = ) = 2/8 P(haplotype 123 = ) = 2/8 P(haplotype 123 = ) = 1/8 How can we measure haplotypes? Haplotypes can be generated through laboratorybased experimental methods X chromosome in males Sperm typing Hybrid cell lines Other molecular techniques omputational approaches Input: enotype data from individuals in a population Output: Haplotypes of each individual in the population chromosome
6 Haplotype inference problem Sequence and SNP array data generally take the form of unphased genotypes Markers 21 Motivation Enormous amounts of genotype data are being generated Inexpensive genome wide SNP microarrays Whole genome and whole exome sequencing tools Determination of haplotype phase is increasingly important haracterizing the relationship between genetic variation and disease susceptibility Imputing low frequency variants 22 Why useful in WS? In a typical short chromosome segment, there are only a few distinct haplotypes arefully selected markers can determine status of others We can test for association of untyped SNPs Example Holm et al. Nat enet (2011) Used the inferred haplotypes for accurate imputation of a putative rare causal variant in other individuals, to obtain a stronger association signal Haplotype 1 Haplotype 2 Haplotype 3 Haplotype 4 Haplotype 5 S 1 S 2 S 3 S 4 S 5 S N Different alleles of each SNP 30% 20% 20% 20% 10% Marker (ag SNP)
7 Example Sick sinus syndrome (SSS) haracterized by slow heart rate, sinus arrest and/or failure to increase heart rate with exercise enome wide association scan of 7.2M SNPs with 792 SSS cases and 37,592 controls he association analysis yielded association with several correlated SNPs in and near MYH6 MYH7 (never before associated with SSS) 25 Outline Background & motivation Problem statement Statistical methods for haplotype inference lark s algorithm Expectation Maximization (EM) algorithm oday oalescent based methods and HMM Haplotype inference on sequence data Example applications 26 ypical genotype data wo alleles for each individual for each marker hromosome origin for each allele is unknown Observation marker 1 {} {} {} marker 2 marker 3 Multiple haplotype pairs can fit observed genotype Possible states / / / / 27 Use information on relatives? Family information can help determine phase at many markers an you propose examples? enotype: {} {} {} Maternal genotype: {} {} {} Paternal genotype: {} {} {} hen the haplotype is / / / 28 7
8 Example inferring haplotypes Still, many ambiguities might not be resolved Problem more serious with larger numbers of markers enotype: {} {} {} Maternal genotype: {} {} {} Paternal genotype: {} {} {} annot determine unique haplotype What if there are no relatives? Rely on linkage disequilibrium (LD) LD: non random association of variants at different sites in the genome ssume that population consists of small number of distinct haplotypes Problem Determine haplotypes without parental genotypes Haplotype reconstruction lso called, phasing, haplotype inference or haplotyping Data enotype on N markers from M individuals oals Frequency estimation of all possible haplotypes Haplotype reconstruction for individuals How many out of all possible haplotypes are plausible in a population? Individual i marker 1 marker 2 marker 3 31 Statistical methods for haplotypes inference Let s focus on the methods that are most widely used or historically important Browning and Browning, Nat Rev enet
9 Outline Background & motivation Problem statement Statistical methods for haplotype inference lark s algorithm Expectation Maximization (EM) algorithm oalescent based methods and HMM Haplotype inference on sequence data Example applications lark s haplotyping algorithm lark (1990) Mol Biol Evol 7: One of the first published haplotyping algorithms omputationally efficient Very fast and widely used in 1990 s More accurate methods are now available lark s haplotyping algorithm Find unambiguous individuals Initialize a list of known haplotypes Unambiguous vs. ambiguous Haplotypes for 2 SNPs (alleles: /a, B/b) What kinds of genotypes will these have? Unambiguous individuals Homozygous at every locus (e.g. {} {} {}) Haplotypes: Heterozygous at just one locus (e.g. {} {} {}) Haplotypes: or
10 lark s haplotyping algorithm Find unambiguous individuals Initialize a list of known haplotypes Parsimonious phasing example Notation (more compact representation) 0/1: homozygous at each locus (00,11) h: heterozygous at each locus (01) Resolve ambiguous individuals If possible, use two haplotypes from the list Otherwise, use one known haplotype and augment list If unphased individuals remain ssign phase randomly to one individual ugment haplotype list and continue from previous step h h 0 1 h h h 1 h Parsimony algorithm Outline Pros Very fast an deal with very long sequences ons No homozygotes or single SNP heterozygotes in the data Some haplotypes may remain unresolved Outcome depends on order in which lists are transversed Naïve, not very accurate (no modeling) Background & motivation Problem statement Statistical methods for haplotype inference lark s algorithm Expectation Maximization (EM) algorithm oalescent based methods and HMM Haplotype inference on sequence data Example applications
11 he EM haplotyping algorithm ssume that we know haplotype frequencies Excoffier and Slatkin Mol Biol Evol (1995); Qin et al. m J Hum enet (2002); Excoffier and Lischer Molecular ecology resources (2010) Why EM for haplotyping? EM is a method for MLE with hidden variables. What are the hidden variables, parameters? Hidden variables: haplotype state of each individual Parameters: haplotype frequencies Individual n Haplotype status (hidden variable) z =0 z =1 Haplotype frequencies (parameters) p b, p ab, p B, p ab 41 Probability of first outcome: 2P b P ab = 0.06 Probability of second outcome: 2P B P ab = 0.18 For example, if P B = 0.3 P ab = 0.3 P b = 0.3 P ab = onditional probabilities are onditional probability of first outcome: 2P b P ab / (2P b P ab + 2P B P ab ) = 0.25 For example, if P B = 0.3 P ab = 0.3 P b = 0.3 P ab = 0.1 ssume that we know the haplotype state of each individual omputing haplotype frequencies is straightforward Individual 1 Individual 2 Individual 3 p B =? P ab =? p b =? p ab =? onditional probability of second outcome: 2P B P ab / (2P b P ab + 2P B P ab ) = Individual
12 EM as hicken vs Egg If we know haplotype frequencies p s (parameters), we can estimate the haplotype status of individuals z s (hidden variables) If we know the haplotype state of each individual z s (hidden variables), we can estimate the haplotype frequencies p s (parameters) EM as hicken vs Egg If we know haplotype frequencies p s (parameters), we can estimate the haplotype states of individuals z s (hidden variables) If we know the haplotype state of individuals z s (hidden variables), we can estimate the haplotype frequencies p s (parameters) Individual n Haplotype status (hidden variable) z =0 z =1 Haplotype frequencies (parameters) p b, p ab, p B, p ab 45 BU we know neither; iterate Expectation step: Estimate z s, given haplotype frequencies p s Maximization step: Estimate p s, given the haplotype states of individuals z s Overall, a clever hill climbing strategy 46 Phasing By EM EM: Method for maximum likelihood parameter inference with hidden variables Parameters (haplotype frequencies) p s) Maximize uess Likelihood Inferring haplotype state of each individual E M Estimating haplotype frequencies Hidden variables (haplotype states of individuals z s) Find expected values 47 EM algorithm for haplotyping 1. uesstimate haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with fractional counts of phased genotypes 3. Estimate frequency of each haplotype by counting 4. Repeat steps 2 and 3 until frequencies are stable 48 12
13 Phasing by EM Phasing by EM Data: 1 0 h h 1 h h 1 h h Data: 1 0 h h 1 h h 1 h h Frequencies / / / /12 3/ /12 2/12 1 / /12 50 Phasing by EM Phasing by EM Maximization Data: 1 0 h h 1 h h 1 h h Haplotypes Frequencies / / / /12 3/ /12 2/12 1 / /12 Expectation 51 Data: 1 0 h h 1 h h 1 h h Haplotypes Frequencies / / / / / /12.1 2/ / /12.1 Expectation 52 13
14 Phasing by EM Data: 1 0 h h 1 h h 1 h h Haplotypes Frequencies / / / /6 53 omputational cost (for SNPs) onsider sets of m unphased genotypes Markers 1..m If markers are bi allelic 2 m possible haplotypes 2 m 1 (2 m + 1) possible haplotype pairs 3 m distinct observed genotypes 2 n 1 reconstructions for n heterozygous loci For example, if m=10 = 1024 = 524,800 = 59,049 = EM algorithm Outline Pros More accurate than lark s method Fully or partially phased individuals contribute most of the information ons Estimate depends on starting point: need to run multiple times on different starting points Implementation may become computationally expensive: cost grows rapidly with number of markers For each individual, the number of possible haplotypes is 2 m, where m is the number of makers ypically run for short sequences with < 25 SNPs Background & motivation Problem statement Statistical methods for haplotype inference lark s algorithm Expectation Maximization (EM) algorithm oalescent based methods and HMM Haplotype inference on sequence data Example applications No modeling on haplotypes
The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationInferring Transcriptional Regulatory Networks from High-throughput Data
Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20
More informationHaplotyping. Biostatistics 666
Haplotyping Biostatistics 666 Previously Introduction to te E-M algoritm Approac for likeliood optimization Examples related to gene counting Allele frequency estimation recessive disorder Allele frequency
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More information1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:
.5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the
More informationThe Lander-Green Algorithm. Biostatistics 666 Lecture 22
The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals
More informationWhat is the expectation maximization algorithm?
primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationFor 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.
STAT 550 Howework 6 Anton Amirov 1. This question relates to the same study you saw in Homework-4, by Dr. Arno Motulsky and coworkers, and published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).
More informationGenotype Imputation. Class Discussion for January 19, 2016
Genotype Imputation Class Discussion for January 19, 2016 Intuition Patterns of genetic variation in one individual guide our interpretation of the genomes of other individuals Imputation uses previously
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationSolutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin
Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationHomework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:
Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationPhasing via the Expectation Maximization (EM) Algorithm
Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline
More informationLecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationCausal Graphical Models in Systems Genetics
1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationSolutions to Problem Set 4
Question 1 Solutions to 7.014 Problem Set 4 Because you have not read much scientific literature, you decide to study the genetics of garden peas. You have two pure breeding pea strains. One that is tall
More informationInferring Protein-Signaling Networks II
Inferring Protein-Signaling Networks II Lectures 15 Nov 16, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022
More informationLearning ancestral genetic processes using nonparametric Bayesian models
Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew
More information1. Understand the methods for analyzing population structure in genomes
MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationChapter 13 Meiosis and Sexual Reproduction
Biology 110 Sec. 11 J. Greg Doheny Chapter 13 Meiosis and Sexual Reproduction Quiz Questions: 1. What word do you use to describe a chromosome or gene allele that we inherit from our Mother? From our Father?
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationComplexity and Approximation of the Minimum Recombination Haplotype Configuration Problem
Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem Lan Liu 1, Xi Chen 3, Jing Xiao 3, and Tao Jiang 1,2 1 Department of Computer Science and Engineering, University
More informationLinkage and Linkage Disequilibrium
Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies
More informationAUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author
AUTHORIZATION TO LEND AND REPRODUCE THE THESIS As the sole author of this thesis, I authorize Brown University to lend it to other institutions or individuals for the purpose of scholarly research. Date
More informationFull file at CHAPTER 2 Genetics
CHAPTER 2 Genetics MULTIPLE CHOICE 1. Chromosomes are a. small linear bodies. b. contained in cells. c. replicated during cell division. 2. A cross between true-breeding plants bearing yellow seeds produces
More informationNew imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)
New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) Kelly Swarts PAG Allele Mining 1/11/2014 Imputation is the projection
More informationTheoretical and computational aspects of association tests: application in case-control genome-wide association studies.
Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus
More informationUse of hidden Markov models for QTL mapping
Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationOutline of lectures 3-6
GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 007 Population genetics Outline of lectures 3-6 1. We want to know what theory says about the reproduction of genotypes in a population. This results
More informationUNIT 8 BIOLOGY: Meiosis and Heredity Page 148
UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular
More informationStatistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014
Overview - 1 Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Elizabeth Thompson University of Washington Seattle, WA, USA MWF 8:30-9:20; THO 211 Web page: www.stat.washington.edu/ thompson/stat550/
More informationBackward Genotype-Trait Association. in Case-Control Designs
Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,
More informationThe phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.
Series 2: Cross Diagrams - Complementation There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationBTRY 7210: Topics in Quantitative Genomics and Genetics
BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:
More information27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1
10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex
More informationGene mapping in model organisms
Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2
More informationPopulation Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda
1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;
More informationLatent Variable models for GWAs
Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs
More informationEM algorithm and applications Lecture #9
EM algorithm and applications Lecture #9 Bacground Readings: Chapters 11.2, 11.6 in the text boo, Biological Sequence Analysis, Durbin et al., 2001.. The EM algorithm This lecture plan: 1. Presentation
More informationPopulation Genetics II (Selection + Haplotype analyses)
26 th Oct 2015 Poulation Genetics II (Selection + Halotye analyses) Gurinder Singh Mickey twal Center for Quantitative iology Natural Selection Model (Molecular Evolution) llele frequency Embryos Selection
More informationLecture WS Evolutionary Genetics Part I 1
Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationMajor questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.
Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm
More informationCHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection
CHAPTER 23 THE EVOLUTIONS OF POPULATIONS Section C: Genetic Variation, the Substrate for Natural Selection 1. Genetic variation occurs within and between populations 2. Mutation and sexual recombination
More informationOutline of lectures 3-6
GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 009 Population genetics Outline of lectures 3-6 1. We want to know what theory says about the reproduction of genotypes in a population. This results
More informationQuiz Section 4 Molecular analysis of inheritance: An amphibian puzzle
Genome 371, Autumn 2018 Quiz Section 4 Molecular analysis of inheritance: An amphibian puzzle Goals: To illustrate how molecular tools can be used to track inheritance. In this particular example, we will
More informationFriday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo
Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization
More informationBustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either
More informationTutorial Session 2. MCMC for the analysis of genetic data on pedigrees:
MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation
More informationInferring Protein-Signaling Networks
Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1
More informationQuestion: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?
October 12, 2009 Bioe 109 Fall 2009 Lecture 8 Microevolution 1 - selection The Hardy-Weinberg-Castle Equilibrium - consider a single locus with two alleles A 1 and A 2. - three genotypes are thus possible:
More informationSNP Association Studies with Case-Parent Trios
SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature
More informationParts 2. Modeling chromosome segregation
Genome 371, Autumn 2018 Quiz Section 2 Meiosis Goals: To increase your familiarity with the molecular control of meiosis, outcomes of meiosis, and the important role of crossing over in generating genetic
More informationSNP-SNP Interactions in Case-Parent Trios
Detection of SNP-SNP Interactions in Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 2, 2009 Karyotypes http://ghr.nlm.nih.gov/ Single Nucleotide Polymphisms
More informationHumans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase
Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs
More informationCausal Discovery Methods Using Causal Probabilistic Networks
ausal iscovery Methods Using ausal Probabilistic Networks MINFO 2004, T02: Machine Learning Methods for ecision Support and iscovery onstantin F. liferis & Ioannis Tsamardinos iscovery Systems Laboratory
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationHaplotype-based variant detection from short-read sequencing
Haplotype-based variant detection from short-read sequencing Erik Garrison and Gabor Marth July 16, 2012 1 Motivation While statistical phasing approaches are necessary for the determination of large-scale
More informationParts 2. Modeling chromosome segregation
Genome 371, Autumn 2017 Quiz Section 2 Meiosis Goals: To increase your familiarity with the molecular control of meiosis, outcomes of meiosis, and the important role of crossing over in generating genetic
More informationOn the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease
On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,
More informationTime allowed: 2 hours Answer ALL questions in Section A, ALL PARTS of the question in Section B and ONE question from Section C.
UNIVERSITY OF EAST ANGLIA School of Biological Sciences Main Series UG Examination 2017-2018 GENETICS BIO-5009A Time allowed: 2 hours Answer ALL questions in Section A, ALL PARTS of the question in Section
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationNotes for MCTP Week 2, 2014
Notes for MCTP Week 2, 2014 Lecture 1: Biological background Evolutionary biology and population genetics are highly interdisciplinary areas of research, with many contributions being made from mathematics,
More informationIntroduction to QTL mapping in model organisms
Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com
More informationProblems for 3505 (2011)
Problems for 505 (2011) 1. In the simplex of genotype distributions x + y + z = 1, for two alleles, the Hardy- Weinberg distributions x = p 2, y = 2pq, z = q 2 (p + q = 1) are characterized by y 2 = 4xz.
More informationCSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation
CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism
More informationEXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?
Statistical Genetics Agronomy 65 W. E. Nyquist March 004 EXERCISES FOR CHAPTER 3 Exercise 3.. a. Define random mating. b. Discuss what random mating as defined in (a) above means in a single infinite population
More informationIntroduction to Genetics
Introduction to Genetics We ve all heard of it, but What is genetics? Genetics: the study of gene structure and action and the patterns of inheritance of traits from parent to offspring. Ancient ideas
More informationInferring Genetic Architecture of Complex Biological Processes
Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationOutline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white
Outline - segregation of alleles in single trait crosses - independent assortment of alleles - using probability to predict outcomes - statistical analysis of hypotheses - conditional probability in multi-generation
More informationSupporting Information
Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider
More informationEfficient Haplotype Inference with Boolean Satisfiability
Efficient Haplotype Inference with Boolean Satisfiability Joao Marques-Silva 1 and Ines Lynce 2 1 School of Electronics and Computer Science University of Southampton 2 INESC-ID/IST Technical University
More informationOutline for today s lecture (Ch. 14, Part I)
Outline for today s lecture (Ch. 14, Part I) Ploidy vs. DNA content The basis of heredity ca. 1850s Mendel s Experiments and Theory Law of Segregation Law of Independent Assortment Introduction to Probability
More informationClassical Selection, Balancing Selection, and Neutral Mutations
Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected
More informationCOMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics
COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationQ Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2.
NCEA Level 2 Biology (91157) 2018 page 1 of 6 Assessment Schedule 2018 Biology: Demonstrate understanding of genetic variation and change (91157) Evidence Q Expected Coverage Achievement Merit Excellence
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More information(Write your name on every page. One point will be deducted for every page without your name!)
POPULATION GENETICS AND MICROEVOLUTIONARY THEORY FINAL EXAMINATION (Write your name on every page. One point will be deducted for every page without your name!) 1. Briefly define (5 points each): a) Average
More informationNatural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate
Natural Selection Population Dynamics Humans, Sickle-cell Disease, and Malaria How does a population of humans become resistant to malaria? Overproduction Environmental pressure/competition Pre-existing
More information