Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays

Size: px
Start display at page:

Download "Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays"

Transcription

1 Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays Department of Biostatistics Johns Hopkins Bloomberg School of Public Health November 18, 2008

2 Acknowledgments Rob Scharpf Giovanni Parmigiani Rafael Irizarry + Benilton Carvalho Jonathan Pevsner + Nate Miller, Eli Roberson, Jason Ting

3 Trisomy

4 Karyotypes General Cytogenetics Information

5 FISH Courtesy of the Pevsner Laboratory

6 SNP chip data

7 Deletion copy number AA AB BB Mb Chr Chr 12

8 Amplification 10 8 AA AB BB copy number 2 1 Mb Chr 7

9 Uniparental Isodisomy 5 4 AA AB BB 3 copy number Mb Chr 1 Chr 2

10 Cancer samples

11 Mosaicism copy number 1 AA AB BB Mb Chr 2

12 Estimation 1 By SNP: Estimate genotype and copy number for each SNP. 2 Within a sample: Borrow strength between SNPs to infer regions of LOH and copy number changes. 3 Between samples: Comparison between normal and disease populations to find chromosomal alterations associated with disease.

13 SNPchip S4 classes and methods

14 The structure of the data we observe At each SNP, we observe a noisy measure of the true copy number and genotype (and possibly also measures of confidence in those estimates).

15 Another HMM...? Novel (and we believe, important) HMM features: 1 Model the observation sequence of genotype calls and copy number jointly (Vanilla) 2 Integrate confidence estimates of the genotype calls and copy number estimates (ICE) QuantiSNP and PennCNV also model genotype and copy number jointly! Colella et. al. (2007), QuantiSNP: an objective Bayes hidden-markov model..., Nucleic Acids Res 35(6): Wang et. al. (2008), PennCNV: An integrated hidden-markov model designed for..., Genome Research 17:

16 The Vanilla HMM components Observations ĈN and ĜT Hidden states Initial state probability distribution Transition probabilities Emission probabilities

17 Hidden states

18 Transition probabilities Following suggestions in the literature, we model the transition probabilities as a function of the distance d between SNPs. Specifically, let θ(d) 1 e 2d denote the probability that SNP i is not informative (I c ) for SNP at i + 1. For example: τ =P { (d) = P { = P { P i+1 i, d } i+1, I i, d } + P i+1, I c i, d i+1 I, i, d } P { I i, d } + i+1 I c, i, d P I c i, d = P { } θ(d).

19 Emission probabilities We assume conditional independence between copy number estimates and the genotype calls. For example: f( c CN, c GT ) = f( c CN ) f( c GT ) n o n o = f ccn ց f cgt = β ց n ccn o β n cgt o.

20 More information The confidence in genotype calls can differ substantially between SNPs! 4 Concordant 2 AA AB BB Discordant called BB (AB) Sense Antisense

21 Integrating confidence estimates for genotype calls Let S cgt be the confidence score for the genotype estimate. We can estimate from Hapmap the following densities: n o n o n f SĤOM ĤOM, HOM, f SĤOM ĤOM, HET, f S HET d HET, d o n HOM, f S HET d HET, d o HET. Note: n o f SĤOM ĤOM, n f S HET d HET, d o n o f SĤOM ĤOM, HOM n f S HET d HET, d o HOM.

22 Emission Probabilities - Loss Recall that f( c CN, c GT ) = f( c CN ) f( c GT ) n o n o = f ccn ց f cgt = β ց n ccn o β n cgt o. If the state for a particular SNP is Loss, we have β n cgt, S cgt o = f n o n cgt f S cgt GT, c o.

23 Emission Probabilities - Retention For retention, the true genotype can be HET or HOM: β n cgt, Sd GT o n o n = f cgt f S GT d GT, c o n o n = f cgt f S GT d, HOM GT, c o n + f S d GT, HET GT, c o n o n = f cgt f S GT d HOM, GT, c o n f HOM GT, c o n + f S GT d HET, GT, c o n f HET GT, c o n o n = f cgt f S GT d HOM, GT c o n f HOM GT, c o n + f S GT d HET, GT c o n f HET GT, c o

24 Copy number and genotype calls for chromosome A D B C E

25 Vanilla and ICE HMMs for genotype and copy number Deletion Normal LOH Amplification A D B C E 2 1 Van ICE A D B E Van ICE Mb

26 A HapMap sample Deletion Normal LOH Amplification 1 Van ICE Van ICE Mb

27 Many HapMap samples

28 SNP Trio

29 HMM for SNP Trio chromosome 10 chromosome 22 BPI BPI UPI F UPI F UPI M UPI M MI D MI D MI S non BPI MI S non BPI BPI BPI position (Mb) position (Mb)

30 De novo deletion

31 Homozygous and hemizygous deletions GF GM M A B S

32 Homozygous and hemizygous deletions GF GM M A B S Grandfather Grandmother Mother Aunt Brother Sister AB 0.01 BB 0.05 AB AB AB 0.06 BB AA 0.27 NC AA AA AB 0.12 BB AB 0.15 NC AA AA AB BB AB NC AA AA AA 0.30 AA BB 0.20 NC NC BB BB 0.22 BB AB 0.03 NC BB BB AB 0.09 AA AB NC BB BB AB AA BB 0.01 NC BB 0.04 BB BB 0.14 NC AB 0.17 NC AA AA AA AA AB 0.01 NC AA AA AA 0.16 AA AB NC BB BB AB 0.13 AA AB 0.01 NC BB BB AB AA BB 0.16 NC BB BB BB 0.10 BB AB 0.06 NC AA AA AB 0.07 BB AA 0.17 NC AA AA AB BB BB 0.02 NC BB BB AB 0.13 AA AA 0.21 NC AA AA AB 0.20 BB BB 0.05 BB 0.11 BB 0.15 BB 0.29 AB 0.13 AB -0.01

33 References Carvalho B, Bengtsson H, Speed TP, Irizarry RA (2007) Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics, 8(2): Scharpf RB, Ting JC, Pevsner J, Ruczinski I (2007). SNPchip: R classes and methods for SNP array data. Bioinformatics, 23(5): Scharpf RB, Parmigiani G, Pevsner J, Ruczinski I (2008). Hidden Markov models for the assessment of chromosomal alterations using high-throughput SNP arrays. Annals of Applied Statistics, 2(2): Ting J, Ye Y, Thomas G, Ruczinski I, Pevsner J (2006). Analysis and visualization of chromosomal abnormalities in SNP data with SNPscan. BMC Bioinformatics, 7(1):25. Ting J, Roberson E, Miller N, Lysholm A, Stephan D, Capone G, Ruczinski I, Thomas G, Pevsner J (2007). Visualization of uniparental inheritance, Mendelian inconsistencies, deletions and parent of origin effects in single nucleotide polymorphism trio data with SNPtrio. Human Mutation, 28(12):

34 iruczins/

An Integrated Approach for the Assessment of Chromosomal Abnormalities

An Integrated Approach for the Assessment of Chromosomal Abnormalities An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 26, 2007 Karyotypes Karyotypes General Cytogenetics

More information

An Integrated Approach for the Assessment of Chromosomal Abnormalities

An Integrated Approach for the Assessment of Chromosomal Abnormalities An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007 Karyotypes Mitosis and Meiosis Meiosis Meiosis

More information

SNP Association Studies with Case-Parent Trios

SNP Association Studies with Case-Parent Trios SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health July, 00 Acknowledgments Collaborators: Qing Li, Rob Scharpf, Holger Schwender,

More information

SNP Association Studies with Case-Parent Trios

SNP Association Studies with Case-Parent Trios SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature

More information

Some New Methods for Family-Based Association Studies

Some New Methods for Family-Based Association Studies Some New Methods for Family-Based Association Studies Ingo Ruczinski Department of Biostatistics Johns Hopkins Bloomberg School of Public Health April 8, 20 http: //biostat.jhsph.edu/ iruczins/ Topics

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism

More information

Genotype Imputation. Class Discussion for January 19, 2016

Genotype Imputation. Class Discussion for January 19, 2016 Genotype Imputation Class Discussion for January 19, 2016 Intuition Patterns of genetic variation in one individual guide our interpretation of the genomes of other individuals Imputation uses previously

More information

Learning ancestral genetic processes using nonparametric Bayesian models

Learning ancestral genetic processes using nonparametric Bayesian models Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

RECONSTRUCTING DNA COPY NUMBER BY JOINT SEGMENTATION OF MULTIPLE SEQUENCES. Zhongyang Zhang Kenneth Lange Chiara Sabatti

RECONSTRUCTING DNA COPY NUMBER BY JOINT SEGMENTATION OF MULTIPLE SEQUENCES. Zhongyang Zhang Kenneth Lange Chiara Sabatti RECONSTRUCTING DNA COPY NUMBER BY JOINT SEGMENTATION OF MULTIPLE SEQUENCES By Zhongyang Zhang Kenneth Lange Chiara Sabatti Technical Report 261 March 2012 Division of Biostatistics STANFORD UNIVERSITY

More information

SNP-SNP Interactions in Case-Parent Trios

SNP-SNP Interactions in Case-Parent Trios Detection of SNP-SNP Interactions in Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 2, 2009 Karyotypes http://ghr.nlm.nih.gov/ Single Nucleotide Polymphisms

More information

Curriculum Links. AQA GCE Biology. AS level

Curriculum Links. AQA GCE Biology. AS level Curriculum Links AQA GCE Biology Unit 2 BIOL2 The variety of living organisms 3.2.1 Living organisms vary and this variation is influenced by genetic and environmental factors Causes of variation 3.2.2

More information

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular

More information

Expected complete data log-likelihood and EM

Expected complete data log-likelihood and EM Expected complete data log-likelihood and EM In our EM algorithm, the expected complete data log-likelihood Q is a function of a set of model parameters τ, ie M Qτ = log fb m, r m, g m z m, l m, τ p mz

More information

Big Idea 3B Basic Review. 1. Which disease is the result of uncontrolled cell division? a. Sickle-cell anemia b. Alzheimer s c. Chicken Pox d.

Big Idea 3B Basic Review. 1. Which disease is the result of uncontrolled cell division? a. Sickle-cell anemia b. Alzheimer s c. Chicken Pox d. Big Idea 3B Basic Review 1. Which disease is the result of uncontrolled cell division? a. Sickle-cell anemia b. Alzheimer s c. Chicken Pox d. Cancer 2. Cancer cells do not exhibit, which can lead to the

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS. !! www.clutchprep.com CONCEPT: HISTORY OF GENETICS The earliest use of genetics was through of plants and animals (8000-1000 B.C.) Selective breeding (artificial selection) is the process of breeding organisms

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Hongzhe Li hongzhe@upenn.edu, http://statgene.med.upenn.edu University of Pennsylvania Perelman School of

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

F1 Parent Cell R R. Name Period. Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes

F1 Parent Cell R R. Name Period. Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes Name Period Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes 1. What is the chromosome theory of inheritance? 2. Explain the law of segregation. Use two different

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs

More information

Supplementary Materials for Integrated study of copy number states and genotype calls using high density SNP arrays

Supplementary Materials for Integrated study of copy number states and genotype calls using high density SNP arrays Supplementary Materials for Integrated study of copy number states and genotype calls using high density SNP arrays A HapMap samples Originally, Illumina performed 73 CEU samples, 77 YRI samples, and 75

More information

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the

More information

BIOINFORMATICS. Discrepancies in dbsnp confirmation rates and allele frequency distributions from varying genotyping error rates and patterns

BIOINFORMATICS. Discrepancies in dbsnp confirmation rates and allele frequency distributions from varying genotyping error rates and patterns BIOINFORMATICS Vol. 2 no. 7 24, pages 122 132 DOI: 1.193/bioinformatics/bth34 Discrepancies in dbsnp confirmation rates and allele frequency distributions from varying genotyping error rates and patterns

More information

Bioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.

Bioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics. Bioinformatics Jason H. Moore, Ph.D. Frank Lane Research Scholar in Computational Genetics Associate Professor of Genetics Adjunct Associate Professor of Biological Sciences Adjunct Associate Professor

More information

ESTIMATION OF PARENT SPECIFIC DNA COPY NUMBER IN TUMORS USING HIGH-DENSITY GENOTYPING ARRAYS. Hao Chen Haipeng Xing Nancy Zhang

ESTIMATION OF PARENT SPECIFIC DNA COPY NUMBER IN TUMORS USING HIGH-DENSITY GENOTYPING ARRAYS. Hao Chen Haipeng Xing Nancy Zhang ESTIMATION OF PARENT SPECIFIC DNA COPY NUMBER IN TUMORS USING HIGH-DENSITY GENOTYPING ARRAYS By Hao Chen Haipeng Xing Nancy Zhang Technical Report 251 March 2010 Division of Biostatistics STANFORD UNIVERSITY

More information

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS BIRS 2016 1 NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS Malka Gorfine Tel Aviv University, Israel Joint work with Danielle Braun and Giovanni

More information

Parts 2. Modeling chromosome segregation

Parts 2. Modeling chromosome segregation Genome 371, Autumn 2018 Quiz Section 2 Meiosis Goals: To increase your familiarity with the molecular control of meiosis, outcomes of meiosis, and the important role of crossing over in generating genetic

More information

Supplementary Information for Discovery and characterization of indel and point mutations

Supplementary Information for Discovery and characterization of indel and point mutations Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Unit 3 - Molecular Biology & Genetics - Review Packet

Unit 3 - Molecular Biology & Genetics - Review Packet Name Date Hour Unit 3 - Molecular Biology & Genetics - Review Packet True / False Questions - Indicate True or False for the following statements. 1. Eye color, hair color and the shape of your ears can

More information

NETWORK BIOLOGY AND COMPLEX DISEASES. Ahto Salumets

NETWORK BIOLOGY AND COMPLEX DISEASES. Ahto Salumets NETWORK BIOLOGY AND COMPLEX DISEASES Ahto Salumets CENTRAL DOGMA OF BIOLOGY https://en.wikipedia.org/wiki/central_dogma_of_molecular_biology http://www.qaraqalpaq.com/genetics.html CHROMOSOMES SINGLE-NUCLEOTIDE

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white Outline - segregation of alleles in single trait crosses - independent assortment of alleles - using probability to predict outcomes - statistical analysis of hypotheses - conditional probability in multi-generation

More information

Alleles Notes. 3. In the above table, circle each symbol that represents part of a DNA molecule. Underline each word that is the name of a protein.

Alleles Notes. 3. In the above table, circle each symbol that represents part of a DNA molecule. Underline each word that is the name of a protein. Alleles Notes Different versions of the same gene are called alleles. Different alleles give the instructions for making different versions of a protein. This table shows examples for two human genes.

More information

Stochastic processes and

Stochastic processes and Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University

More information

Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem

Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem Lan Liu 1, Xi Chen 3, Jing Xiao 3, and Tao Jiang 1,2 1 Department of Computer Science and Engineering, University

More information

SAT in Bioinformatics: Making the Case with Haplotype Inference

SAT in Bioinformatics: Making the Case with Haplotype Inference SAT in Bioinformatics: Making the Case with Haplotype Inference Inês Lynce 1 and João Marques-Silva 2 1 IST/INESC-ID, Technical University of Lisbon, Portugal ines@sat.inesc-id.pt 2 School of Electronics

More information

Meiosis and Fertilization Understanding How Genes Are Inherited 1

Meiosis and Fertilization Understanding How Genes Are Inherited 1 Meiosis and Fertilization Understanding How Genes Are Inherited 1 Introduction In this activity, you will learn how you inherited two copies of each gene, one from your mother and one from your father.

More information

4/19/10 More complications to Mendel

4/19/10 More complications to Mendel 4/19/10 More complications to Mendel Complications to the relationship between genotype to phenotype Commentary written in response to the release of the first draft of the human genome sequence From Science

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

The Lander-Green Algorithm. Biostatistics 666 Lecture 22 The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals

More information

Lesson 4: Understanding Genetics

Lesson 4: Understanding Genetics Lesson 4: Understanding Genetics 1 Terms Alleles Chromosome Co dominance Crossover Deoxyribonucleic acid DNA Dominant Genetic code Genome Genotype Heredity Heritability Heritability estimate Heterozygous

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes.

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes. February 8, 2005 Bio 107/207 Winter 2005 Lecture 11 Mutation and transposable elements - the term mutation has an interesting history. - as far back as the 17th century, it was used to describe any drastic

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 11, Issue 2 2012 Article 6 COMPUTATIONAL STATISTICAL METHODS FOR GENOMICS AND SYSTEMS BIOLOGY A Family-Based Probabilistic Method for Capturing

More information

Mississippi Academic Assessment Program (MAAP) Biology SAMPLE ITEMS

Mississippi Academic Assessment Program (MAAP) Biology SAMPLE ITEMS Mississippi Academic Assessment Program (MAAP) Biology SAMPLE ITEMS 1 1. The diagram below shows a piece of DNA and the proteins made from this DNA. Which characteristic of the DNA helps it function as

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

EM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works

EM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works EM algorithm The example in the book for doing the EM algorithm is rather difficult, and was not available in software at the time that the authors wrote the book, but they implemented a SAS macro to implement

More information

Lecture 12 April 25, 2018

Lecture 12 April 25, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 12 April 25, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes, Chenyang Zhong 1 Outline Agenda: The Knockoffs Framework 1. The Knockoffs Framework

More information

Statistical Analysis of Haplotypes, Untyped SNPs, and CNVs in Genome-Wide Association Studies

Statistical Analysis of Haplotypes, Untyped SNPs, and CNVs in Genome-Wide Association Studies Statistical Analysis of Haplotypes, Untyped SNPs, and CNVs in Genome-Wide Association Studies by Yijuan Hu A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in

More information

On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits

On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits Wang BMC Genetics 011, 1:8 http://www.biomedcentral.com/171-156/1/8 METHODOLOGY ARTICLE Open Access On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative

More information

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization

More information

Cloud-scale RNA-sequencing differential expression analysis with Myrna

Cloud-scale RNA-sequencing differential expression analysis with Myrna Cloud-scale RNA-sequencing differential expression analysis with Myrna Jeff Leek Johns Hopkins Bloomberg School of Public Health e: jleek@jhsph.edu t: http://www.twitter.com/leekgroup myrna: http://bowtie-bio.sourceforge.net/myrna/

More information

BIOLOGY LTF DIAGNOSTIC TEST MEIOSIS & MENDELIAN GENETICS

BIOLOGY LTF DIAGNOSTIC TEST MEIOSIS & MENDELIAN GENETICS 016064 BIOLOGY LTF DIAGNOSTIC TEST MEIOSIS & MENDELIAN GENETICS TEST CODE: 016064 Directions: Each of the questions or incomplete statements below is followed by five suggested answers or completions.

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

7.014 Problem Set 6 Solutions

7.014 Problem Set 6 Solutions 7.014 Problem Set 6 Solutions Question 1 a) Define the following terms: Dominant In genetics, the ability of one allelic form of a gene to determine the phenotype of a heterozygous individual, in which

More information

Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model

Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model arxiv:1203.4394v2 [stat.ap] 22 Jan 2013 Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model The Minh Luong, Yves Rozenholc, and Gregory Nuel MAP5,

More information

Introduction to population genetics & evolution

Introduction to population genetics & evolution Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Rui Wang, Yong Li, XiaoFeng Wang, Haixu Tang and Xiaoyong Zhou Indiana University at Bloomington

More information

Classification of SNP genotypes by a Gaussian mixture model in competitive enzymatic assays

Classification of SNP genotypes by a Gaussian mixture model in competitive enzymatic assays Mathematical Statistics Stockholm University Classification of SNP genotypes by a Gaussian mixture model in competitive enzymatic assays Hedvig Norlén, Erik Pettersson, Afshin Ahmadian, Joakim Lundeberg

More information

Biol. 303 EXAM I 9/22/08 Name

Biol. 303 EXAM I 9/22/08 Name Biol. 303 EXAM I 9/22/08 Name -------------------------------------------------------------------------------------------------------------- This exam consists of 40 multiple choice questions worth 2.5

More information

Genetics Notes. Chromosomes and DNA 11/15/2012. Structures that contain DNA, look like worms, can be seen during mitosis = chromosomes.

Genetics Notes. Chromosomes and DNA 11/15/2012. Structures that contain DNA, look like worms, can be seen during mitosis = chromosomes. chromosomes Genetics Notes Chromosomes and Structures that contain, look like worms, can be seen during mitosis = chromosomes. Chromosomes: made of coiled around protiens. Accurate copying of chromosomes

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Solutions to Problem Set 4

Solutions to Problem Set 4 Question 1 Solutions to 7.014 Problem Set 4 Because you have not read much scientific literature, you decide to study the genetics of garden peas. You have two pure breeding pea strains. One that is tall

More information

BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section. A/a ; B/B ; d/d X A/a ; b/b ; D/d

BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section. A/a ; B/B ; d/d X A/a ; b/b ; D/d BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section 1. In the following cross, all genes are on separate chromosomes. A is dominant to a, B is dominant to b and D is dominant

More information

Unit 3 Test 2 Study Guide

Unit 3 Test 2 Study Guide Unit 3 Test 2 Study Guide How many chromosomes are in the human body cells? 46 How many chromosomes are in the sex cells? 23 What are sex cells also known as? gametes What is fertilization? Union of the

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. please read pages 38-47; 49-55;57-63. Slide 1 of Chapter 2 1 Extension sot Mendelian Behavior of Genes Single gene inheritance

More information

Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes

Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes Christopher Holmes (joint work with Chris Yau) Department of Statistics, & Wellcome Trust Centre for Human Genetics, University

More information

Learning gene regulatory networks Statistical methods for haplotype inference Part I

Learning gene regulatory networks Statistical methods for haplotype inference Part I Learning gene regulatory networks Statistical methods for haplotype inference Part I Input: Measurement of mrn levels of all genes from microarray or rna sequencing Samples (e.g. 200 patients with lung

More information

Machine Learning for Data Science (CS4786) Lecture 19

Machine Learning for Data Science (CS4786) Lecture 19 Machine Learning for Data Science (CS4786) Lecture 19 Hidden Markov Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Quiz Quiz Two variables can be marginally independent but not

More information

Nature Genetics: doi:0.1038/ng.2768

Nature Genetics: doi:0.1038/ng.2768 Supplementary Figure 1: Graphic representation of the duplicated region at Xq28 in each one of the 31 samples as revealed by acgh. Duplications are represented in red and triplications in blue. Top: Genomic

More information

X Chromosome Association Testing in Genome-Wide Association Studies

X Chromosome Association Testing in Genome-Wide Association Studies X Chromosome Association Testing in Genome-Wide Association Studies Honours Thesis November 6, 009 Peter Hickey Department of Mathematics and Statistics, The University of Melbourne Bioinformatics Division,

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

Discovery of Genomic Structural Variations with Next-Generation Sequencing Data

Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Advanced Topics in Computational Genomics Slides from Marcel H. Schulz, Tobias Rausch (EMBL), and Kai Ye (Leiden University)

More information

Computational Methods for Learning Population History from Large Scale Genetic Variation Datasets

Computational Methods for Learning Population History from Large Scale Genetic Variation Datasets Computational Methods for Learning Population History from Large Scale Genetic Variation Datasets Ming-Chi Tsai CMU-CB-13-102 July 2, 2013 School of Computer Science Carnegie Mellon University Pittsburgh,

More information

BIOLOGY 321. Answers to text questions th edition: Chapter 2

BIOLOGY 321. Answers to text questions th edition: Chapter 2 BIOLOGY 321 SPRING 2013 10 TH EDITION OF GRIFFITHS ANSWERS TO ASSIGNMENT SET #1 I have made every effort to prevent errors from creeping into these answer sheets. But, if you spot a mistake, please send

More information

The history of Life on Earth reflects an unbroken chain of genetic continuity and transmission of genetic information:

The history of Life on Earth reflects an unbroken chain of genetic continuity and transmission of genetic information: 9/26/05 Biology 321 Answers to optional challenge probability questions posed in 9/23/05lecture notes are included at the end of these lecture notes The history of Life on Earth reflects an unbroken chain

More information

7.014 Problem Set 6. Question 1. MIT Department of Biology Introductory Biology, Spring 2004

7.014 Problem Set 6. Question 1. MIT Department of Biology Introductory Biology, Spring 2004 MIT Department of Biology 7.014 Introductory Biology, Spring 2004 Name: 7.014 Problem Set 6 Please print out this problem set and record your answers on the printed copy. Problem sets will not be accepted

More information

Parts 2. Modeling chromosome segregation

Parts 2. Modeling chromosome segregation Genome 371, Autumn 2017 Quiz Section 2 Meiosis Goals: To increase your familiarity with the molecular control of meiosis, outcomes of meiosis, and the important role of crossing over in generating genetic

More information

Supporting Information

Supporting Information Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data

A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data Iliadis et al. EURASIP Journal on Bioinformatics and Systems Biology 2014, 2014:7 RESEARCH Open Access A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data Alexandros Iliadis,

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

A segmentation-clustering problem for the analysis of array CGH data

A segmentation-clustering problem for the analysis of array CGH data A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E. Lebarbier, J-J. Daudin UMR INA P-G / ENGREF / INRA MIA 518 APPLIED STOCHASTIC MODELS AND DATA ANALYSIS Brest

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

BENCHMARK 1 STUDY GUIDE SPRING 2017

BENCHMARK 1 STUDY GUIDE SPRING 2017 BENCHMARK 1 STUDY GUIDE SPRING 2017 Name: There will be semester one content on this benchmark as well. Study your final exam review guide from last semester. New Semester Material: (Chapter 10 Cell Growth

More information

Heredity Composite. Multiple Choice Identify the choice that best completes the statement or answers the question.

Heredity Composite. Multiple Choice Identify the choice that best completes the statement or answers the question. Heredity Composite Multiple Choice Identify the choice that best completes the statement or answers the question. 1. When a plant breeder crossed two red roses, 78% of the offspring had red flowers and

More information

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next genetics the study of heredity heredity sequence of DNA that codes for a protein and thus determines a trait genome a specific characteristic that varies from one individual to another gene trait the passing

More information

Web-based Supplementary Materials for BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data

Web-based Supplementary Materials for BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data Web-based Supplementary Materials for BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data Yuan Ji 1,, Yanxun Xu 2, Qiong Zhang 3, Kam-Wah Tsui 3, Yuan Yuan 4, Clift Norris 1, Shoudan

More information

Heterozygous BMN lines

Heterozygous BMN lines Optical density at 80 hours 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 a YPD b YPD + 1µM nystatin c YPD + 2µM nystatin d YPD + 4µM nystatin 1 3 5 6 9 13 16 20 21 22 23 25 28 29 30

More information