An Integrated Approach for the Assessment of Chromosomal Abnormalities

Similar documents
An Integrated Approach for the Assessment of Chromosomal Abnormalities

Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays

SNP Association Studies with Case-Parent Trios

Curriculum Links. AQA GCE Biology. AS level

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

Genotype Imputation. Biostatistics 666

Learning ancestral genetic processes using nonparametric Bayesian models

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Genotype Imputation. Class Discussion for January 19, 2016

2. Map genetic distance between markers

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

Dropping Your Genes. A Simulation of Meiosis and Fertilization and An Introduction to Probability

Solutions to Problem Set 4

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

1. The diagram below shows two processes (A and B) involved in sexual reproduction in plants and animals.

Expected complete data log-likelihood and EM

Big Idea 3B Basic Review. 1. Which disease is the result of uncontrolled cell division? a. Sickle-cell anemia b. Alzheimer s c. Chicken Pox d.

Heredity Variation Genetics Meiosis

Ch 11.4, 11.5, and 14.1 Review. Game

BIOLOGY STANDARDS BASED RUBRIC

Cancer: DNA Synthesis, Mitosis, and Meiosis

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Calculation of IBD probabilities

BENCHMARK 1 STUDY GUIDE SPRING 2017

Heredity Variation Genetics Meiosis

Formative/Summative Assessments (Tests, Quizzes, reflective writing, Journals, Presentations)

Some New Methods for Family-Based Association Studies

GSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H*" ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION

A segmentation-clustering problem for the analysis of array CGH data

The genomes of recombinant inbred lines

Variation of Traits. genetic variation: the measure of the differences among individuals within a population

RECONSTRUCTING DNA COPY NUMBER BY JOINT SEGMENTATION OF MULTIPLE SEQUENCES. Zhongyang Zhang Kenneth Lange Chiara Sabatti

Big Idea 3: Living systems store, retrieve, transmit, and respond to information essential to life processes.

Alleles Notes. 3. In the above table, circle each symbol that represents part of a DNA molecule. Underline each word that is the name of a protein.

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

O 3 O 4 O 5. q 3. q 4. Transition

CELL CYCLE UNIT GUIDE- Due January 19, 2016

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study

1. Draw, label and describe the structure of DNA and RNA including bonding mechanisms.

BIOLOGY 321. Answers to text questions th edition: Chapter 2

Meiosis and Fertilization Understanding How Genes Are Inherited 1

Objective 3.01 (DNA, RNA and Protein Synthesis)

3/25/2013 LIVING ORGANISMS MEIOSIS CHROMOSOME NUMBER. 2 types of cells: Autosomal (body) cells have 2 copies of every gene

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

7.014 Problem Set 6. Question 1. MIT Department of Biology Introductory Biology, Spring 2004

Bioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.

F1 Parent Cell R R. Name Period. Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes

QQ 10/5/18 Copy the following into notebook:

This is DUE: Come prepared to share your findings with your group.

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.

Gene mapping, linkage analysis and computational challenges. Konstantin Strauch

Cell Growth and Genetics

Sexual Reproduction and Genetics

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS

The history of Life on Earth reflects an unbroken chain of genetic continuity and transmission of genetic information:

Parts 2. Modeling chromosome segregation

Lecture 9: Readings: Chapter 20, pp ;

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes.

Biol. 303 EXAM I 9/22/08 Name

EM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Calculation of IBD probabilities

SNP Association Studies with Case-Parent Trios

Chromosomes and Inheritance

9 Genetic diversity and adaptation Support. AQA Biology. Genetic diversity and adaptation. Specification reference. Learning objectives.

Constructing a Pedigree

What happens to the replicated chromosomes? depends on the goal of the division

Quiz Section 4 Molecular analysis of inheritance: An amphibian puzzle

Meiosis and Fertilization Understanding How Genes Are Inherited 1

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

11/18/2016. Meiosis. Dr. Bertolotti. How is meiosis different from mitosis?

Introduction to population genetics & evolution

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next

Ch 11.Introduction to Genetics.Biology.Landis

Mitosis. making identical copies of diploid cells

Bioinformatics 2 - Lecture 4

Use of hidden Markov models for QTL mapping

Sexual reproduction & Meiosis

CHROMOSOMAL BASIS OF INHERITANCE

What is mitosis? -Process in which a cell divides, creating TWO complete Sets of the original cell with the same EXACT genetic Material (DNA)

Name: Date: Chromosome Webquest (35 points)

Updated: 10/11/2018 Page 1 of 5

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Biology 211 (1) Exam 4! Chapter 12!

Lecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Mendelian Genetics And Meiosis Study Guide

GS Analysis of Microarray Data

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Meiosis and Fertilization Understanding How Genes Are Inherited 1

GS Analysis of Microarray Data

New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)

Part 2- Biology Paper 2 Inheritance and Variation Knowledge Questions

Supporting Information

Name: Period: EOC Review Part F Outline

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

SNP-SNP Interactions in Case-Parent Trios

BIOLOGY - CLUTCH CH.13 - MEIOSIS.

Transcription:

An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007

Karyotypes

Mitosis and Meiosis

Meiosis

Meiosis Errors

Trisomy

Trisomy

Disomies

DNA changes

DNA changes

The data

Deletion

FISH

Amplification

Uniparental Isodisomy

Cancer samples

Mosaicism

SNPchip S4 classes and methods

Estimation 1 By SNP: Estimate genotype and copy number for each SNP. 2 Within a sample: Borrow strength between SNPs to infer regions of LOH and copy number changes. 3 Between samples: Comparison between normal and disease populations to find chromosomal alterations associated with disease.

More information The confidence in genotype calls can differ substantially between SNPs! 4 2 sense 0 2 4 4 2 0 2 4 4 2 0 2 4 antisense

The structure of the data we observe At each SNP, we observe a noisy measure of the true copy number and genotype (and possibly also measures of confidence in those estimates).

Another HMM...? Novel (and we believe, important) HMM features: 1 Model the observation sequence of genotype calls and copy number jointly (Vanilla) 2 Integrate confidence estimates of the genotype calls and copy number estimates (ICE)

The Vanilla HMM components Observations ĈN and ĜT Hidden states Initial state probability distribution Transition probabilities Emission probabilities

Hidden states

Transition probabilities Following suggestions in the literature, we model the transition probabilities as a function of the distance d between SNPs. Specifically, let θ(d) 1 e 2d denote the probability that SNP i is not informative (I c ) for SNP at i + 1. For example: τ =P { (d) = P { = P { P i+1 i, d } i+1, I i, d } + P i+1, I c i, d i+1 I, i, d } P { I i, d } + i+1 I c, i, d P I c i, d = P { } θ(d).

Emission probabilities We assume conditional independence between copy number estimates and the genotype calls. For example: f( c CN, c GT ) = f( c CN ) f( c GT ) n o n o = f ccn f cgt = β n ccn o β n cgt o.

Integrating confidence estimates for genotype calls Let S cgt be the confidence score for the genotype estimate. We can estimate from Hapmap the following densities: n o n o n f SĤOM ĤOM, HOM, f SĤOM ĤOM, HET, f S HET d HET, d o n HOM, f S HET d HET, d o HET. Note: n o f SĤOM ĤOM, n f S HET d HET, d o n o f SĤOM ĤOM, HOM n f S HET d HET, d o HOM.

Emission Probabilities - Loss Recall that f( c CN, c GT ) = f( c CN ) f( c GT ) n o n o = f ccn f cgt = β n ccn o β n cgt o. If the state for a particular SNP is Loss, we have β n cgt, S cgt o = f n o n cgt f S cgt GT, c o.

Emission Probabilities - Retention For retention, the true genotype can be HET or HOM: β n cgt, Sd GT o n o n = f cgt f S GT d GT, c o n o n = f cgt f S GT d, HOM GT, c o n + f S d GT, HET GT, c o n o n = f cgt f S GT d HOM, GT, c o n f HOM GT, c o n + f S GT d HET, GT, c o n f HET GT, c o n o n = f cgt f S GT d HOM, GT c o n f HOM GT, c o n + f S GT d HET, GT c o n f HET GT, c o

Vanilla ICE comparison 5.0 A D B C E 2.0 1.0 0.5 Amp LOH vanilla Norm Del Amp LOH ICE Norm Del 0 50 100 150 200 250 Bioconductor package: ICE Amp LOH A D vanilla B vanilla E Norm Del Amp LOH A vanilla D vanilla ICE B ICE E Norm Del ICE 48 50 52 54 ICE 69 71 73 172 174 176 236 239 242

A HapMap sample 8 6 4 LOH 2 normal AA/BB AB Mb 50 100 150 200 2

Many HapMap samples

SNP Trio

SNP Trio

SNP Trio

SNP Trio

Probability Given the number of SNPs for a particular event region type (M), among all uploaded informative autosomal SNPs (X) of that trio, what is the probability to have the observed number (N) or more informative non-bpi SNPs of that type clustered together solely by chance? In other words, if the M SNPs of a particular event region type were randomly dispersed among the X informative autosomal SNPs, how probable is it to observe an event of the same or larger magnitude (defined by the number of consecutive SNPs of that event region)?

Probability Let p be the probability of a positive outcome in a Bernoulli trial (e.g. a 1 for 0/1 outcomes). Assume that we have Z trials. Let LS be the length of the largest block of consecutive ones in those Z trials. Let P Z (LS < N) be the probability that LS is smaller than some number N.

Probability If Z < N, then P Z (LS < N) = 1. If Z = N, then P Z (LS < N) = 1 p N. If Z > N, then P Z (LS < N) = N 1 k=0 p k (1 p)p Z k 1 (LS < N)

Probability Tabulate the probabilities P 1 (LS < N),..., P N (LS < N). Calculate P N+1 (LS < N), P N+2 (LS < N),..., P X (LS < N), iteratively. Using p = M/X, the probability of having N or more informative SNPs of a particular non-bpi type clustered together solely by chance is equal to 1 P X (LS < N).

HMM for SNP Trio chromosome 10 chromosome 22 BPI BPI UPI F UPI F UPI M UPI M MI D MI D MI S non BPI MI S non BPI BPI BPI 0 20 40 60 80 100 120 140 position (Mb) 15 20 25 30 35 40 45 50 position (Mb)

Acknowledgments Rob Scharpf Giovanni Parmigiani Rafael Irizarry & Gang Benilton Carvalho, Wenyi Wang Jonathan Pevsner & Lab Nate Miller, Eli Roberson, Jason Ting

References Carvalho B, Bengtsson H, Speed TP, Irizarry RA (2007) Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics, 8(2):485-99. Scharpf RB, Ting JC, Pevsner J, Ruczinski I (2007). SNPchip: R classes and methods for SNP array data. Bioinformatics, 23(5): 627-8. Scharpf RB, Parmigiani G, Ruczinski I (2007). A hidden markov model for joint estimation of genotype and copy number in high-throughput SNP chips. JHU Biostatistics Working papers, #136. Ting JC, Ye Y, Thomas GH, Ruczinski I, Pevsner J (2006). Analysis and visualization of chromosomal abnormalities in SNP data with SNPscan. BMC Bioinformatics, 7(1):25. Ting JC, Roberson ED, Miller N, et al, Ruczinski I, Thomas GH, Pevsner J (2007). Visualization of uniparental inheritance, Mendelian inconsistencies, deletions and parent of origin effects in single nucleotide polymorphism trio data with SNPtrio. Human Mutation, (in press). Wang W, Caravalho B, Miller N, Pevsner J, Chakravarti A, Irizarry RA (2006) Estimating genome-wide copy number using allele specific mixture models. JHU Biostatistics Working papers, #122.