A paradigm shift in DNA interpretation John Buckleton

A paradigm shift in DNA interpretation John Buckleton Specialist Science Solutions Manaaki Tangata Taiao Hoki protecting people and their environment through science

I sincerely acknowledge conversations with Jo-Anne Bright XX Duncan Taylor XY Steven Myers XY Michael Coble XY Ian Evett XY

Variability in interpretation DNA science has been criticised for producing different interpretations of the same profile Part of the diversity is subjectivity but Part is systemic Different laboratories use different methods for interpretation Yet we nearly all use strongly similar typing technology Why?

We need to avoid falling in love with our own method The method I invented is the exactly correct mix of complexity and information usage.

4500 4000 3500 3000 2500 2000 1500 1000 500 0

I claim a special right to critique CPI Cumulative probability of inclusion RMP Random match probability Todd Bille, Jo-Anne Bright LR binary selection of genotypes Peter Gill, Jonathan Whittaker, Tim Clayton Drop model Continuous models Peter Gill, Jonathan Whittaker, David Balding Duncan Taylor, Jo- Anne Bright

CPI, RMP, Drop model, Continuous model Drop model Continuous model Continuous model 7

CPI Cumulative probability of inclusion If we were starting new how would we choose RMP Random match probability LR binary selection of genotypes Simple answer: the one that gets it right? Drop model Continuous models

Getting it right Use ground truth known samples Eg mix person A and person B What is the right answer if we test the hypotheses H 1 : A + B H 2 : A + unknown Should be between 1 and 1/Pr(B) But that is a pretty wide range If the PCR is unusual <1 is even the right answer

Getting it right? How old are they now? How old were their parents when they died? The person dies at 76. Was I right? What is the probability that this person will live to 75+? 22% Do they have any health risks? What does their doctor say? The more relevant information Used properly The better

We cannot decide from this one event Was 22% right? Was it wrong? How can an answer be neither right nor wrong?

Is the answer right? It is the best answer that I can produce? John Buckleton ESR

But is it right? I cannot tell if it is right or wrong but it makes the best use of the available information John Buckleton ESR

Q. Can you answer the question? A. Yes. Can we make that the last time you yell at me.

Q. Well if you'd answered the question then I wouldn't need to repeat it. ESR A. 2013 OK

We cannot decide from this one event But we might be able to score methods from a lot of events with known outcomes, Known ground truth. There are scoring methods

CPI Cumulative probability of inclusion If we were starting new how would we choose RMP Random match probability LR binary selection of genotypes Which one makes best use of the available information? Drop model Continuous models

CPI CPI (cumulative probability of inclusion) The probability that a person would be included (not excluded) usually on straight allele presence 19

CPI f f f f 2 f f 2 f f 2 f f 2 2 2 2 10 11 12 13 10 11 10 12 10 13 2 f f 2 f f 2 f f 11 12 11 13 12 13 2 Person could be 10,10 10,11 10,12 10,13 11,11 11,12 11,13 12,12 12,13 13,13 f10 f11 f12 f13 use the reasonable inference of Does not assume a number of contributors, this is seen as a good thing. Is it good for Mr 10,10? 20

If we assume two people then one of them could be 10,10 10,11 10,12 10,13 11,11 11,12 11,13 12,12 12,13 13,13 RMP f f f f 2 f f 2 f f 2 f f 2 2 2 2 10 11 12 13 10 11 10 12 10 13 2 f f 2 f f 2 f f 11 12 11 13 12 13 Does use the reasonable inference of a number of contributors. 21

If we assume two people they must be Does use the reasonable inference of a number of contributors.. LR : We now need two hypotheses Some people think this is bad H 1 : POI = 10,11 + U H 2 : 2U 10,11 and 12,13 or 10,12 and 11,13 or 10,13 and 12,13 or 11,12 and 10,13 or 11,13 and 10,12 or 12,13 and 10,11 LR LR Pr( E H ) 1 Pr( E H ) 2 2 f12 f13 1 24 f f f f 12 f f 10 11 12 13 10 11 22

If we assume two people they must be LR 2 f12 f13 1 24 f f f f 12 f f 10 11 12 13 10 11 10,11 and 12,13 or 10,12 and 11,13 or 10,13 and 12,13 or 11,12 and 10,13 or 11,13 and 10,12 or 12,13 and 10,11 23

2 CPI f f f f 10 11 12 13 RMP 2 f f 2 f f 2 f f 10 11 10 12 10 13 2 f f 2 f f 2 f f 11 12 11 13 12 13 LR 1 12 f f 10 11 Add information V = 12,13 high vaginal swab, no consensual partners H 1 : POI = 10,11 + V H 2 : U + V 2 CPI f f f f RMP 2 f f 10 11 12 13 10 11 LR 1 2 f f 10 11 24

Principle Adding relevant information improves the power of our statistics On average Higher LR when H 1 true, lower when H 2 true Benefits the innocent, bad for the guilty 25

Let s ask the automobile association? Is the mountain pass open?

We could ring the gas station on the other side and see if people are coming over Nah I don t like information it might bias me?

? Nah might bias. Best if we just drive blind?

2 CPI f f f f RMP 2f f 2f f 10 11 12 13 10 11 12 13 What about Mr 11,12? LR 2 f12 f13 1 2 f f 2 f f 2 f f 10 11 12 13 10 11 29

LR binary selection of genotypes RMP Random match probability Where does this one go? Drop model CPI Cumulative probability of inclusion Information

No No No No Yes Yes Yes Yes No No The drop model Take the profile Throw away much of the information Then start the interpretation 5 6 7 8 9 10 11 12 13 14 15 31

LR binary selection of genotypes So why did we even develop it? This graphic is only true for a good profile with no drop out possible RMP Random match probability Where does this one go? Drop model CPI Cumulative probability of inclusion Information

Non-concordance All non-concordances are problematic but some more so than others. POI = 13,15 500 400 Exclusion Strong evidence 300 200 100 0 10 11 12 13 14 15 16 17 18 19 20

10 2p rule Drop model 1 Ignoring the locus 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 D Is the 2p rule always conservative? Forensic Science International, Volume 159, Issues 2 3, 2 June 2006, Pages 206-209 John Buckleton, Christopher Triggs DNA commission of the International Society of Forensic Genetics: Recommendations on the evaluation of STR typing results that may include drop-out and/or drop-in using probabilistic methods Forensic Science International: Genetics, Volume 6, Issue 6, December 2012, Pages 679-688 P. Gill, L. Gusmão, H. Haned, W.R. Mayr, N. Morling, W. Parson, L. Prieto, M. Prinz, H. Schneider, P.M. Schneider, B.S. Weir

So why did we even develop it? LR binary selection of genotypes RMP Random match probability CPI Cumulative probability of inclusion Information Drop model

Drop model We can probably extend the drop model a lot further by incorporating aspects of height information Information

Log(Hb) Identifiler 28 cycles 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8 0 1000 2000 3000 4000 5000 6000 7000 8000 APH

Log(Hb) NGM SElect 29 cycles 1 0.8 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 0 1000 2000 3000 4000 5000 6000 7000 8000 APH

Log(Hb) SGMPlus 34 cycles 2 1.5 1 0.5 0-0.5-1 -1.5-2 0 1000 2000 3000 4000 5000 6000 7000 8000 APH

4500 4000 3500 3000 2500 PHr works well Drop model wastes info Experiments with a composite approach might catch a lot of the information content Luigi Armogida USACIL 2000 1500 1000 PHr unreliable Drop allows interpretation 500 0

Degradation slopes differ hence drop-out probabilities are profile and locus specific

Locus specific amplification example Locus effects are not steady over time, they may be batch or even profile specific Modelling one drop-out probability per profile misses these effects Modelling a degradation slope gets some but not all

3000 2500 D8S1179 V = 13,17 POI=14,15 f f f 2 CPI 12 f13 f14 f 15 16 17 2000 RMP 2 f f 14 15 1500 1000 LR B 1 2 f f 14 15 500 0 10 11 12 13 14 15 16 17 18 19 20 GENOTYPE PROBABILITY DISTRIBUTION D8S1179 [14,15] [13,17] 1.000 LR = 181.8 1 LR C 2 f f 14 15 43

D7S820 V = 9,9 POI=11,11 7000 6000 5000 f 2 CPI f f 8 9 11 0.09 2 RMP f11 f11 f Q 2 0.19 4000 3000 2000 1000 0 252RFU LR 7 8 9 10 11 12 13 14 15 B LR C 2 11 11 8.20 1 f 2 f f Q 5.26 GENOTYPE PROBABILITY DISTRIBUTION D7S820 [8,11] [9,9] 0.218 [9,11] [9,9] 0.191 [11,11] [9,9] 0.427 [11,Q] [9,9] 0.165 0.4271 0.218 2 f f 0.191 2 f f 0.427 f 0.165 2 f f 2 8 11 9 11 11 11 Q 44

42 Have we gone too far? Will anyone follow?

3000 2500 2000 1500 1000 500 0 10 11 12 13 14 15 16 17 18 19 20 G 1 G 2 w i 13,14 16,17 0.58 13,16 14,17 0.12 13,17 14,16 0.11 14,16 13,17 0.11 14,17 13,16 0.09 16,17 13,14 0.00

A paradigm shift in DNA interpretation CPI Cumulative probability of inclusion Information LR binary selection of genotypes RMP Random match probability Drop model Continuous

End