A paradigm shift in DNA interpretation John Buckleton

Similar documents
Stochastic Models for Low Level DNA Mixtures

Characterization of Error Tradeoffs in Human Identity Comparisons: Determining a Complexity Threshold for DNA Mixture Interpretation

Adventures in Forensic DNA: Cold Hits, Familial Searches, and Mixtures

DNA Mixture Interpretation Workshop John Buckleton

Reporting LRs - principles

Interpreting DNA Mixtures in Structured Populations

AALBORG UNIVERSITY. Investigation of a Gamma model for mixture STR samples

A gamma model for DNA mixture analyses

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 3: Examining Relationships

Social Work & Child Protection Survey CATI Fieldwork : March 20th-22nd 2009

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Examples: P: it is not the case that P. P Q: P or Q P Q: P implies Q (if P then Q) Typical formula:

( ) is called the dependent variable because its

Lecture 10: F -Tests, ANOVA and R 2

The statistical evaluation of DNA crime stains in R

DNA commission of the International Society of Forensic Genetics: Recommendations on the interpretation of mixtures

CS395T Computational Statistics with Application to Bioinformatics

review session gov 2000 gov 2000 () review session 1 / 38

Machine Learning, Fall 2009: Midterm

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

STRmix - MCMC Study Page 1

Hypothesis tests

HYPOTHESIS TESTING. Hypothesis Testing

Genotype likelihood ratio distributions and random match probability: Generalization, calculation and application

The Cycloid. and the Kinematic Circumference. by Miles Mathis

Logic. Propositional Logic: Syntax. Wffs

Chapter 15 Sampling Distribution Models

GeneMapper ID-X Software Version 1.1 (Mixture Analysis Tool)

Why Care About Counterfactual Support? The Cognitive Uses of Causal Order Lecture 2

Figure 5.1: Force is the only action that has the ability to change motion. Without force, the motion of an object cannot be started or changed.

A particularly nasty aspect of this is that it is often difficult or impossible to tell if a model fails to satisfy these steps.

Probability, Statistics, and Bayes Theorem Session 3

Business Statistics. Lecture 9: Simple Regression

Column: The Physics of Digital Information 1

LECTURE 15: SIMPLE LINEAR REGRESSION I

Proving Things. 1. Suppose that all ravens are black. Which of the following statements are then true?

In any hypothesis testing problem, there are two contradictory hypotheses under consideration.

Propositional Logic: Syntax

Logic. Propositional Logic: Syntax

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Business Statistics. Lecture 10: Course Review

Probability and Samples. Sampling. Point Estimates

Measurement Error PHYS Introduction

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

appstats27.notebook April 06, 2017

1 Introduction. Probabilistic evaluation of low-quality DNA profiles. K. Ryan, D. Gareth Williams a, David J. Balding b,c

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Chapter 27 Summary Inferences for Regression

Final Exam - Solutions

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Section 5.4: Hypothesis testing for μ

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Econ Spring Review Set 1 - Answers ELEMENTS OF LOGIC. NECESSARY AND SUFFICIENT. SET THE-

Individual Round Arithmetic

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

Measurement Uncertainty

When working with probabilities we often perform more than one event in a sequence - this is called a compound probability.

Vocabulary: Samples and Populations

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

Chapter 10 Regression Analysis

Simple Interactions CS 105 Lecture 2 Jan 26. Matthew Stone

Sampling and Sample Size. Shawn Cole Harvard Business School

ECON Semester 1 PASS Mock Mid-Semester Exam ANSWERS

CH 39 ADDING AND SUBTRACTING SIGNED NUMBERS

Combinatorics. But there are some standard techniques. That s what we ll be studying.

20 Hypothesis Testing, Part I

Linear Equations. Find the domain and the range of the following set. {(4,5), (7,8), (-1,3), (3,3), (2,-3)}

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

Introducing Proof 1. hsn.uk.net. Contents

Binary addition (1-bit) P Q Y = P + Q Comments Carry = Carry = Carry = Carry = 1 P Q

CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (

Visualizing Population Genetics

Good Hours. We all have our good hours. Whether it be in the early morning on a hot summer

Forensic Genetics. Summer Institute in Statistical Genetics July 26-28, 2017 University of Washington. John Buckleton:

A FLOW DIAGRAM FOR CALCULATING LIMITS OF FUNCTIONS (OF SEVERAL VARIABLES).

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Supplement for MAA 3200, Prof S Hudson, Fall 2018 Constructing Number Systems

THE SAMPLING DISTRIBUTION OF THE MEAN

Measurement Error PHYS Introduction

Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing

GMAT-Arithmetic-3. Descriptive Statistics and Set theory

Answers for Calculus Review (Extrema and Concavity)

Don t be Fancy. Impute Your Dependent Variables!

On the errors introduced by the naive Bayes independence assumption

Thank you for taking part in this survey! You will be asked about whether or not you follow certain forecasting practices.

EM Waves in Media. What happens when an EM wave travels through our model material?

Chapter 6 Overview: Applications of Derivatives

Predicting AGI: What can we say when we know so little?

Against the F-score. Adam Yedidia. December 8, This essay explains why the F-score is a poor metric for the success of a statistical prediction.

Keeping well and healthy when it is really cold

arxiv: v3 [stat.ap] 4 Nov 2014

Uncertainty and Bias UIUC, 403 Advanced Physics Laboratory, Fall 2014

PMR Learning as Inference

Evolutionary Rates in mtdna Sequences: Forensic Applications and Implications

Kinematics II Mathematical Analysis of Motion

MA 1128: Lecture 08 03/02/2018. Linear Equations from Graphs And Linear Inequalities

Transcription:

A paradigm shift in DNA interpretation John Buckleton Specialist Science Solutions Manaaki Tangata Taiao Hoki protecting people and their environment through science

I sincerely acknowledge conversations with Jo-Anne Bright XX Duncan Taylor XY Steven Myers XY Michael Coble XY Ian Evett XY

Variability in interpretation DNA science has been criticised for producing different interpretations of the same profile Part of the diversity is subjectivity but Part is systemic Different laboratories use different methods for interpretation Yet we nearly all use strongly similar typing technology Why?

We need to avoid falling in love with our own method The method I invented is the exactly correct mix of complexity and information usage.

4500 4000 3500 3000 2500 2000 1500 1000 500 0

I claim a special right to critique CPI Cumulative probability of inclusion RMP Random match probability Todd Bille, Jo-Anne Bright LR binary selection of genotypes Peter Gill, Jonathan Whittaker, Tim Clayton Drop model Continuous models Peter Gill, Jonathan Whittaker, David Balding Duncan Taylor, Jo- Anne Bright

CPI, RMP, Drop model, Continuous model Drop model Continuous model Continuous model 7

8

CPI Cumulative probability of inclusion If we were starting new how would we choose RMP Random match probability LR binary selection of genotypes Simple answer: the one that gets it right? Drop model Continuous models

Getting it right Use ground truth known samples Eg mix person A and person B What is the right answer if we test the hypotheses H 1 : A + B H 2 : A + unknown Should be between 1 and 1/Pr(B) But that is a pretty wide range If the PCR is unusual <1 is even the right answer

Getting it right? How old are they now? How old were their parents when they died? The person dies at 76. Was I right? What is the probability that this person will live to 75+? 22% Do they have any health risks? What does their doctor say? The more relevant information Used properly The better

We cannot decide from this one event Was 22% right? Was it wrong? How can an answer be neither right nor wrong?

Is the answer right? It is the best answer that I can produce? John Buckleton ESR

But is it right? I cannot tell if it is right or wrong but it makes the best use of the available information John Buckleton ESR

Q. Can you answer the question? A. Yes. Can we make that the last time you yell at me.

Q. Well if you'd answered the question then I wouldn't need to repeat it. ESR A. 2013 OK

We cannot decide from this one event But we might be able to score methods from a lot of events with known outcomes, Known ground truth. There are scoring methods

CPI Cumulative probability of inclusion If we were starting new how would we choose RMP Random match probability LR binary selection of genotypes Which one makes best use of the available information? Drop model Continuous models

CPI CPI (cumulative probability of inclusion) The probability that a person would be included (not excluded) usually on straight allele presence 19

CPI f f f f 2 f f 2 f f 2 f f 2 2 2 2 10 11 12 13 10 11 10 12 10 13 2 f f 2 f f 2 f f 11 12 11 13 12 13 2 Person could be 10,10 10,11 10,12 10,13 11,11 11,12 11,13 12,12 12,13 13,13 f10 f11 f12 f13 use the reasonable inference of Does not assume a number of contributors, this is seen as a good thing. Is it good for Mr 10,10? 20

If we assume two people then one of them could be 10,10 10,11 10,12 10,13 11,11 11,12 11,13 12,12 12,13 13,13 RMP f f f f 2 f f 2 f f 2 f f 2 2 2 2 10 11 12 13 10 11 10 12 10 13 2 f f 2 f f 2 f f 11 12 11 13 12 13 Does use the reasonable inference of a number of contributors. 21

If we assume two people they must be Does use the reasonable inference of a number of contributors.. LR : We now need two hypotheses Some people think this is bad H 1 : POI = 10,11 + U H 2 : 2U 10,11 and 12,13 or 10,12 and 11,13 or 10,13 and 12,13 or 11,12 and 10,13 or 11,13 and 10,12 or 12,13 and 10,11 LR LR Pr( E H ) 1 Pr( E H ) 2 2 f12 f13 1 24 f f f f 12 f f 10 11 12 13 10 11 22

If we assume two people they must be LR 2 f12 f13 1 24 f f f f 12 f f 10 11 12 13 10 11 10,11 and 12,13 or 10,12 and 11,13 or 10,13 and 12,13 or 11,12 and 10,13 or 11,13 and 10,12 or 12,13 and 10,11 23

2 CPI f f f f 10 11 12 13 RMP 2 f f 2 f f 2 f f 10 11 10 12 10 13 2 f f 2 f f 2 f f 11 12 11 13 12 13 LR 1 12 f f 10 11 Add information V = 12,13 high vaginal swab, no consensual partners H 1 : POI = 10,11 + V H 2 : U + V 2 CPI f f f f RMP 2 f f 10 11 12 13 10 11 LR 1 2 f f 10 11 24

Principle Adding relevant information improves the power of our statistics On average Higher LR when H 1 true, lower when H 2 true Benefits the innocent, bad for the guilty 25

Let s ask the automobile association? Is the mountain pass open?

We could ring the gas station on the other side and see if people are coming over Nah I don t like information it might bias me?

? Nah might bias. Best if we just drive blind?

2 CPI f f f f RMP 2f f 2f f 10 11 12 13 10 11 12 13 What about Mr 11,12? LR 2 f12 f13 1 2 f f 2 f f 2 f f 10 11 12 13 10 11 29

LR binary selection of genotypes RMP Random match probability Where does this one go? Drop model CPI Cumulative probability of inclusion Information

No No No No Yes Yes Yes Yes No No The drop model Take the profile Throw away much of the information Then start the interpretation 5 6 7 8 9 10 11 12 13 14 15 31

LR binary selection of genotypes So why did we even develop it? This graphic is only true for a good profile with no drop out possible RMP Random match probability Where does this one go? Drop model CPI Cumulative probability of inclusion Information

Non-concordance All non-concordances are problematic but some more so than others. POI = 13,15 500 400 Exclusion Strong evidence 300 200 100 0 10 11 12 13 14 15 16 17 18 19 20

10 2p rule Drop model 1 Ignoring the locus 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 D Is the 2p rule always conservative? Forensic Science International, Volume 159, Issues 2 3, 2 June 2006, Pages 206-209 John Buckleton, Christopher Triggs DNA commission of the International Society of Forensic Genetics: Recommendations on the evaluation of STR typing results that may include drop-out and/or drop-in using probabilistic methods Forensic Science International: Genetics, Volume 6, Issue 6, December 2012, Pages 679-688 P. Gill, L. Gusmão, H. Haned, W.R. Mayr, N. Morling, W. Parson, L. Prieto, M. Prinz, H. Schneider, P.M. Schneider, B.S. Weir

So why did we even develop it? LR binary selection of genotypes RMP Random match probability CPI Cumulative probability of inclusion Information Drop model

Drop model We can probably extend the drop model a lot further by incorporating aspects of height information Information

Log(Hb) Identifiler 28 cycles 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8 0 1000 2000 3000 4000 5000 6000 7000 8000 APH

Log(Hb) NGM SElect 29 cycles 1 0.8 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 0 1000 2000 3000 4000 5000 6000 7000 8000 APH

Log(Hb) SGMPlus 34 cycles 2 1.5 1 0.5 0-0.5-1 -1.5-2 0 1000 2000 3000 4000 5000 6000 7000 8000 APH

4500 4000 3500 3000 2500 PHr works well Drop model wastes info Experiments with a composite approach might catch a lot of the information content Luigi Armogida USACIL 2000 1500 1000 PHr unreliable Drop allows interpretation 500 0

Degradation slopes differ hence drop-out probabilities are profile and locus specific

Locus specific amplification example Locus effects are not steady over time, they may be batch or even profile specific Modelling one drop-out probability per profile misses these effects Modelling a degradation slope gets some but not all

3000 2500 D8S1179 V = 13,17 POI=14,15 f f f 2 CPI 12 f13 f14 f 15 16 17 2000 RMP 2 f f 14 15 1500 1000 LR B 1 2 f f 14 15 500 0 10 11 12 13 14 15 16 17 18 19 20 GENOTYPE PROBABILITY DISTRIBUTION D8S1179 [14,15] [13,17] 1.000 LR = 181.8 1 LR C 2 f f 14 15 43

D7S820 V = 9,9 POI=11,11 7000 6000 5000 f 2 CPI f f 8 9 11 0.09 2 RMP f11 f11 f Q 2 0.19 4000 3000 2000 1000 0 252RFU LR 7 8 9 10 11 12 13 14 15 B LR C 2 11 11 8.20 1 f 2 f f Q 5.26 GENOTYPE PROBABILITY DISTRIBUTION D7S820 [8,11] [9,9] 0.218 [9,11] [9,9] 0.191 [11,11] [9,9] 0.427 [11,Q] [9,9] 0.165 0.4271 0.218 2 f f 0.191 2 f f 0.427 f 0.165 2 f f 2 8 11 9 11 11 11 Q 44

42 Have we gone too far? Will anyone follow?

3000 2500 2000 1500 1000 500 0 10 11 12 13 14 15 16 17 18 19 20 G 1 G 2 w i 13,14 16,17 0.58 13,16 14,17 0.12 13,17 14,16 0.11 14,16 13,17 0.11 14,17 13,16 0.09 16,17 13,14 0.00

A paradigm shift in DNA interpretation CPI Cumulative probability of inclusion Information LR binary selection of genotypes RMP Random match probability Drop model Continuous

End