The statistical evaluation of DNA crime stains in R

Size: px

Start display at page:

Download "The statistical evaluation of DNA crime stains in R"

Wesley Wiggins
6 years ago
Views:

1 The statistical evaluation of DNA crime stains in R Miriam Marušiaková Department of Statistics, Charles University, Prague Institute of Computer Science, CBI, Academy of Sciences of CR UseR! 2008, Dortmund Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

2 Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

3 Introduction Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

4 Introduction Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

5 Introduction Single crime scene stain Blood stain at the crime scene Believed it was left by offender Suspect arrested for reasons unconnected with his DNA profile Crime sample, suspect sample Hypothesis H p (prosecution): The suspect left the crime stain. H d (defense): Some other person left the crime stain. Notation DNA typing results E = {G C, G S } non-dna evidence I Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

6 Introduction Evidence interpretation Prior odds Posterior odds Bayes theorem Pr(H p I) Pr(H d I) Pr(H p E, I) Pr(H d E, I) Pr(H p E, I) Pr(H d E, I) = Pr(E H p, I) Pr(H p I) Pr(E H d, I) Pr(H }{{} d I) LR Balding and Donelly (1995), Robertson and Vignaux (1995) Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

7 Introduction Evidence interpretation LR = Pr(E H p, I) Pr(E H d, I) = Pr(G S, G C H p, I) Pr(G S, G C H d, I) = Pr(G C G S, H p, I) Pr(G C G S, H d, I) Pr(G S H p, I) Pr(G S H d, I) = Pr(G C G S, H p, I) Pr(G C G S, H d, I) 1 = 1 Pr(G C G S, H d, I) = 1 Pr(G C H d, I) if independence assumed Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

8 Introduction Errors and fallacies Statement The probability of observing this type if the blood came from someone other than suspect is 1 in 100. Pr(G C H d, I) = 1/100 Common error The probability that the blood came from someone else is 1 in 100. Pr(H d G C, I) = 1/100 There is a 99% probability that it came from the suspect. Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

9 Introduction Errors and fallacies (cont. d) Statement The evidence is 100 times more probable if the suspect left the crime stain than if some unknown person left it. 1 Pr(G C H d, I) = 100 Common error It is 100 times more probable that the suspect left the crime stain than some unknown person. Pr(H p G C, I) Pr(H d G C, I) = 100 Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

10 Single Crime Scene Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

11 Single Crime Scene Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

12 Single Crime Scene Product rule Model of an ideal population Infinite size Random mating Model reliable for most real-world problems Hardy-Weinberg equilibrium Likelihood ratio LR = Pr(G = A i A i ) = p 2 i Pr(G = A i A j ) = 2p i p j, i j 1 Pr(G C G S, H d, I) = 1 Pr(G C H d, I) = 1 p 2 i ( 1 2p i p j ) Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

13 Single Crime Scene Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

14 Single Crime Scene F - measure of uncertainty about allele proportions in the population of suspects Genetic interpretation of F (Wright, 1951) How to estimate F (Weir and Cockerham, 1984) Recommendations (National Research Council, 1996) F = 0.01 large subpopulations (USA) F = 0.03 small isolated subpopulations Match probabilities (Balding and Nichols, 1994) P(G C = A i A i G S = A i A i ) = [2F + (1 F)p i] [3F + (1 F)p i ] (1 + F)(1 + 2F) P(G C = A i A j G S = A i A j ) = 2 [F + (1 F)p i] [F + (1 F)p j ] (1 + F)(1 + 2F) Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

15 Single Crime Scene Effects of F corrections Likelihood ratio - some numerical values Heterozygotes A i A j, p i = p j = p F = 0 F = F = 0.01 F = 0.03 p = p = p = Homozygotes A i A i, p i = p F = 0 F = F = 0.01 F = 0.03 p = p = p = Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

16 Single Crime Scene Relatedness Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

17 Single Crime Scene Relatedness Inbreeding Individuals with common ancestors - related Their children - inbred Alleles are ibd (identical by descent) - copies of the same allele Example Alleles h 1, h 2 transmitted from parent H to X, Y who transmit a, b to I h 1 X a H h 2 Y b I Pr(h 1 is ibd to h 2 ) = Pr(h 1 h 2 ) = 0.5 Pr(a b) = Pr(a h 1, b h 2 h 1 h 2 )Pr(h 1 h 2 ) = = Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

18 Single Crime Scene Relatedness Match probabilities for close relatives Balding and Nichols (1994) { ( k0 p 4 i + k 1 p 3 i + k 2 p 2 ) i /p 2 i Pr(G C G S, H d, I) = ) (4k 0 p 2 i p2 j + k 1 p i p j (p i + p j ) + 2k 2 p i p j /2p i p j Kinship coefficients Relationship k 0 k 1 k 2 Parent - child Siblings 1/4 1/2 1/4 Grandparent - grandchild 1/2 1/2 0 Uncle - nephew 1/2 1/2 0 Cousins 3/4 1/4 0 Unrelated k i - probability that two persons will share i alleles ibd i = 0, 1, 2 Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

19 Single Crime Scene R package forensic Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

20 Single Crime Scene R package forensic R function Pmatch Usage Pmatch(prob, k = c(1, 0, 0), theta = 0) Example > p <- c(0.057, 0.160, 0.182, 0, 0.024, 0.122) > Pmatch(p) $prob locus 1 locus 2 locus 3 allele allele $match locus 1 locus 2 locus 3 [1,] $total_match [1] e-06 Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

21 Mixed stain Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

22 Mixed stain Mixtures Prosecution and defense hypothesis H p : Contributors were the victim and the suspect. H d : Contributors were the victim and an unknown person. Likelihood ratio for the mixture LR = Pr(E C, G V, G S H p, I) Pr(E C, G V, G S H d, I) = Pr(E C G V, G S, H p, I) Pr(E C G V, G S, H d, I) Pr(G V, G S H p, I) Pr(G V, G S H d, I) = Pr(E C G V, G S, H p, I) Pr(E C G V, G S, H d, I) = 1 Pr(E C G V, G S, H d, I) 1 = (if independence assumed) Pr(E C G V, H d, I) Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

23 Mixed stain Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

24 Mixed stain Match probability (independence assumption) Weir et al (1997) P x (U, C) = (T 0 ) 2x i U (T 1i ) 2x + T 0 = l C p l T 1i = l C\{i} p l, i U T 2ij = l C\{i,j} p l, i, j U, i < j i,j U:i<j (T 2ij ) 2x... U - set of alleles from the crime sample C not carried by known contributors x - number of unknown contributors R Pevid.ind(alleles, prob, x, u = NULL) LR.ind(alleles, prob, x1, x2, u1 = NULL, u2 = NULL) Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

25 Mixed stain Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

26 Mixed stain Match probability for structured population Assumption All the involved people come from the same subpopulation with parameter F Fung and Hu (2000), Zoubková and Zvárová (2004) MP = r r r 1 r 1 =0 r 2 =0 r r 1 r c 2 r c 1 =0 ti +u i +v i 1 i=1 j=t i +v i (2n U )! c [(1 F)p i + jf] c i=1 u i! 2n T +2n U +2n V 1 j=2n T +2n V [(1 F) + jf] R Pevid.gen(alleles, prob, x, T = NULL, V = NULL, theta = 0) T, V - genotypes of known contributors, known non-contributors Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

27 Mixed stain People versus Simpson Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

28 Mixed stain People versus Simpson People versus Simpson (Los Angeles County Case) DNA evidence Material at the crime scene - alleles a, b, c (locus D2S44) Suspect s genotype ab Victim s genotype ac Hypotheses (Weir et al, 1997, Fung and Hu, 2000) H p : Contributors were the victim, suspect and m unknowns. H d : Contributors were n unknowns. R a = c( a, b, c ) p = c(0.0316, , ) suspect <- a/b, victim <- a/c Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

29 Mixed stain People versus Simpson Likelihood ratios for the Simpson case Defense n = 2 n = 3 Prosecution F = F = m = m = Prosecution proposition (F = 0) Pevid.ind(alleles = a, prob = p, x = m) Pevid.gen(alleles = a, prob = p, x = m, T = c(victim, suspect)) Defense proposition (F = 0) Pevid.ind(alleles = a, prob = p, x = n, u = c( a, b, c )) Pevid.gen(alleles = a, prob = p, x = n) Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

30 Mixed stain More general situation Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

31 Mixed stain More general situation More general situation Contributors from different ethnic groups Fung and Hu (2001) Pevid.ind, LR.ind (independence within and between ethnic groups) Presence of related people Hu and Fung (2003) Example Suspect not typed, his relative is tested Two related people among unknown contributors Pevid.rel Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

32 Conclusion Outline Introduction Single Crime Scene Relatedness R package forensic Mixed stain People versus Simpson More general situation Conclusion Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

33 Conclusion Conclusion Single crime stain Suspect and offender are unrelated are members of the same subpopulation are close relatives Pmatch Mixed crime stain Contributors unrelated members of the same subpopulation may be related from different ethnic groups Pevid.ind, LR.ind Pevid.gen Pevid.rel Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

34 Conclusion Thank you for your attention! Acknowledgement The work was supported by GAČR 201/05/H007 and the project 1M06014 of the Ministry of Education, Youth and Sports of the Czech Republic. Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

35 Important publications BALDING AND NICHOLS(1994) DNA profile match probability calculation Forensic Science International 64, p WEIR ET AL(1997) Interpreting DNA mixtures Journal of Forensic Sciences 42, p FUNG AND HU(2000) Interpreting forensic DNA mixtures: allowing for uncertainty in population substructure and dependence Journal of Royal Statistical Society A 163, p ZOUBKOVÁ, SUPERVISOR ZVÁROVÁ (2004) Master thesis (in Czech) Charles University, Prague Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

36 Software R DEVELOPMENT CORE TEAM (2006) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria ISBN , URL MARUŠIAKOVÁ, M (2007) forensic: Statistical methods in forensic genetics R package version 0.2 Miriam Marušiaková (Prague) The statistical evaluation of DNA crime stains UseR! 2008, August / 36

THE RARITY OF DNA PROFILES 1. BY BRUCE S. WEIR University of Washington

THE RARITY OF DNA PROFILES 1. BY BRUCE S. WEIR University of Washington The Annals of Applied Statistics 2007, Vol. 1, No. 2, 358 370 DOI: 10.1214/07-AOAS128 Institute of Mathematical Statistics, 2007 THE RARITY OF DNA PROFILES 1 BY BRUCE S. WEIR University of Washington It