Y-STR: Haplotype Frequency Estimation and Evidence Calculation

Size: px
Start display at page:

Download "Y-STR: Haplotype Frequency Estimation and Evidence Calculation"

Transcription

1 Calculating evidence Further work Questions and Evidence Mikkel, MSc Student Supervised by associate professor Poul Svante Eriksen Department of Mathematical Sciences Aalborg University, Denmark June 16 th 2010, Presentation of Master of Science Thesis 1/75

2 Outline Outline Biological framework Motivation Aims 1. Short biological recap 2. Motivation and aims of using Y-STR 3. estimation 3.1 Methods and comments to the 3.2 Model control 4. Evidence calculation 5. Further work Calculating evidence Further work Questions 2/75

3 Biological framework Outline Biological framework Motivation Aims Calculating evidence Further work Questions Example of (autosomal) STR DNA-type: ({14, 13}, {22}, {15, 16},..., {22}) Example of Y-STR DNA-type: (17, 14, 22,..., 15) LHS image is from and RHS image is from 3/75

4 Motivation Outline Biological framework Motivation Aims Calculating evidence In some situations Y-STR is more sensible to use than A-STR (autosomal STR), e.g. to avoid noise in the trace from a rape victim Y-STR and A-STR differs in several areas, e.g. the number of alleles at each locus and statistical dependence between loci The statistical developed to handle A-STR cannot be applied on Y-STR directly, so reformulation is required (e.g. for calculating evidence) or new must be developed (to estimate Y-STR haplotype ) Further work Questions 4/75

5 Aims Outline Biological framework Motivation Aims Be able to calculate statistical evidence in trials Estimate for Y-STR haplotypes (also unobserved ones) is required to do this First, for estimating for Y-STR haplotypes will be discussed and afterwards calculation of evidence will be introduced. Calculating evidence Further work Questions 5/75

6 New Calculating 6/75

7 New Neither principal component analysis nor factor analysis yields good results, so that path has not been followed any further. Calculating 7/75

8 New Simple count estimates Not precise enough One published method (used at introduced in A new method for the evaluation of matches in non-recombining genomes: application to Y-chromosomal short tandem repeat (STR) haplotypes in European males. from 2000 by L. Roewer et al. Several problems exist; some will be presented in this presentation (some also presented in a talk at 7th International Y Chromosome User Workshop in Berlin, Germany, from April 22 to April 24, 2010) Calculating 8/75

9 New New Graphical would be an obvious choice Structure based learning (e.g. PC algorithm) or score based learning (e.g. AIC and BIC) Standard tests for conditional independence (e.g. G 2 = 2nCE(A, B {S i } i I ), where n is the sample size and CE is the cross entropy, which is χ 2 ν distributed when A and B are independent given {S i } i I ) do not exploit the ordering in the data nor does it incorporate prior knowledge such as the single step mutation model Better independence tests are required (e.g. classification trees, ordered logistic regression, and support vector machines) and model-based clustering Calculating 9/75

10 New Calculating 10/75

11 The idea: Bayesian approach New Notation: N: number of observations M: number of haplotypes (i.e. unique observations) N i : the number of times the i th haplotype has been observed f i = N i 1 N M : the frequency for the i th haplotype Model using Bayesian inference: 1. A priori: assume f i is Beta distributed with parameters non-stochastic parameters u i and v i 2. Likelihood: Given f i, then N i is Binomial distributed 3. Posterior: Given N i, then f i is (still) Beta distributed (Beta distribution is a conjugate prior for the Binomial distribution) Model expressed in densities using generic notation: p (f i N i ) p (N i f i ) p (f i ) Calculating 11/75

12 u i and v i New 1. Calculate W i = 1 N j N N i i j d ij for i = 1, 2,..., M, where d ij denotes the Manhattan distance/l 1 norm 2. Order the W i s by size and divide into 15 (?) groups and calculate the mean and variance of the f i s in each group 3. Fit regression µ(w ) = β 1 + exp (β 2 W + β 3 ) and σ 2 (W ) = β 4 + exp (β 5 W + β 6 ) based on the 15 estimates 4. Calculate µ i = µ(w i ) and σ 2 i = σ 2 (W i ) and use these to calculate the prior parameters u i and v i 5. Apply the Bayesian approach to obtain the posterier distribution, e.g. to estimate f i using the posterior mean Only dane could fit the full and the fit is not too comforting the others resulted in µ i < 0 or σi 2 < 0 for some i s Calculating 12/75

13 Plot of the regression New µ dane divided into 15 groups W µ(w) = exp( * W ) Calculating 13/75

14 Plot of the regression dane divided into 15 groups New σ W σ 2 (W) = 0 + exp( * W ) Calculating 14/75

15 Modified regression Set β 1 = β 4 = 0 in the regression yielding µ(w ) = exp (β 2 W + β 3 ) New og σ 2 (W ) = exp (β 5 W + β 6 ) Now berlin makes the best fits, which seems quite reasonable for µ(w ), but more doubtful for σ 2 (W ) dane fits almost as before Calculating 15/75

16 Plot of the regression New µ berlin divided into 16 groups W µ(w) = 0 + exp( * W ) Calculating 16/75

17 Plot of the regression New σ 2 0e+00 1e 04 2e 04 3e 04 4e 04 5e 04 berlin divided into 16 groups W σ 2 (W) = 0 + exp( * W ) Calculating 17/75

18 Plot of the regression New µ somali divided into 17 groups W µ(w) = 0 + exp( * W 9.286) Calculating 18/75

19 Plot of the regression New σ somali divided into 17 groups W σ 2 (W) = 0 + exp(10.29 * W 9.328) Calculating 19/75

20 Changes made on New At the 7 th International Y Chromosome User Workshop in Berlin, 2010, Sascha Willuweit (one of the persons behind mentioned a couple of changes between their implementation at and the original article: Using the reduced regression, i.e. without intercepts β 1 and β 4 The number of groups are determined by fitting several regressions and choosing the best one (details for selecting the minimum number of groups was not mentioned) Calculating 20/75

21 Comments and proposals for changes New Not a statistical model, more an ad-hoc method µ i = µ(w i ) = exp(aw i + b) is not bounded above such that µ(w ) 1 for W b a : for berlin a = and b = so µ i 1 for W i (0 W i 1 and 0 < µ i < 1 by definition) Fitting u i and v i : only W i to fit two exponential regression The model is not consistent: generalisation to a Dirichlet prior and a multinomial likelihood might solve this Calculating 21/75

22 New Calculating 22/75 Fitting two parameters (u i and v i ) using only one (W i ) fitted u vs. fitted v for berlin based on 16 groups Using regression without beta1 and beta4 u v

23 Fitting two parameters (u i and v i ) using only one (W i ) New v fitted u vs. fitted v for dane based on 15 groups u Using regression with beta1 and beta4 Calculating 23/75

24 Fitting two parameters (u i and v i ) using only one (W i ) fitted u vs. fitted v for dane based on 15 groups New v u Using regression without beta1 and beta4 Calculating 24/75

25 Fitting two parameters (u i and v i ) using only one (W i ) New v fitted u vs. fitted v for somali based on 17 groups u Using regression without beta1 and beta4 Calculating 25/75

26 Idea: use second moment W i = 1 N j N N i i j d ij = 1 N r j N N i i j k=1 d uses first ijk moment of the allele differences on each locus of r loci Maybe also use the second moment to introduce 1 N j Z i = N N i ) r i j k=1 (d ijk d 2 ij r New Fit µ i s and σi 2 s by multiple regression using a grid of W i and Z i values 15 groups only correspond to a 4 4-grid, which is way too coarse requiring 15 groups of W i s and Z i s, the grid would have size = 225: requires a lot of observations! Too few observations in berlin, dane, and somali, but it would be interesting to see how it would perform compared to just using the W i s Calculating 26/75

27 Generalised frequency New Assume a priori that f Dirichlet (α) where f = (f 1, f 2,..., f K ) is the vector of for all possible haplotypes Use the likelihood N f Multinomial (N +, f ) The posterior becomes f N Dirichlet (α 1 + N 1,..., α K + N K ) = Dirichlet (α 1 + N 1,..., α n + N n, α n+1,..., α K ) where f 1, f 2,..., f n are the for the observed haplotypes and f n+1, f n+2,..., f K are for the unobserved haplotypes Calculating 27/75

28 Marginal distribution New Let α + = K i=1 α i The marginal( posterior distribution for the i th haplotype is f i N i Beta α i + N i, ) K j=1 (α j + N j ) (α i + N i ) = Beta (α i + N i, α + α i + N + N i ) E [f i N i ] = α i +N i K j=1(α j +N j) (α i +N i )+(α i +N i ) = α i +N i α ++N + K i=1 E [f i N i ] = (α + + N + ) 1 K i=1 (α i + N i ) = 1 Incorporate prior knowledge can be done by specifying the prior parameter α i for all possible haplotypes, but with this approach α + might be problematic to calculate for large K Calculating 28/75

29 Approximating α + Histogram of Wi for berlin Gamma: s = , r = Beta: s1 = , s2 = New Density W mean(w) = Calculating 29/75

30 Approximating α + Histogram of Wi for dane Gamma: s = , r = Beta: s1 = , s2 = New Density W mean(w) = Calculating 30/75

31 Approximating α + New Density Histogram of Wi for somali W mean(w) = Gamma: s = 3.112, r = Beta: s1 = 2.44, s2 = Calculating 31/75

32 Approximating α + New As earlier stated, 0 W i 1 so the Beta distribution is the right choice theoretically Assume that α i = h(w i ) for some function h such that α + = K i=1 h(w i) Denote by f β the density of a fitted Beta-distribution, then α + K 1 0 f β (W ) h(w )dw Calculating 32/75

33 Approximating α + New To get equality between the first prior parameter in and the generalised, let α i = u i = µ2 i (1 µ i ) σi 2 Because µ i = µ(w i ) = exp(aw i + b) can result in µ i 1 then 1 µ i 0 such that α i 0 which is now allowed For berlin, µ i 1 for W i so that all contributions to the integral in the α + approximation is negative for W i berlin : α + = and the uncovered probability mass is estimated to dane : b/a = and α + = somali : b/a = and α + = α i = u i and the exponential regression is unusable, but the distribution of the W i s might be helpful for other choices Calculating 33/75

34 Problem New Maybe getting too much attention because it is the only published method for estimating haplotype At the 7 th International Y Chromosome User Workshop in Berlin, 2010, Michael Krawczak (one of the authors of the original articles) gave a talk where the associated slides included the statement [frequency has] never [been] thoroughly studied and validated Calculating 34/75

35 New Calculating 35/75

36 The idea: approximate when identified a common (partial) ancestor New The basic idea is to find I = {i 1, i 2,..., i q } {1, 2,..., r} such that ( ) P L j = a j L i = a i P L j = a j L i = a i j I is a good approximation. j I i I i I Calculating 36/75

37 Example New Assume that we have loci L 1, L 2, L 3, L 4 and I = {1, 2} Then P (L 1, L 2, L 3, L 4 ) = P (L 1 ) P (L 2 L 1 ) P (L 3, L 4 L 1, L 2 ) (1) 4 P (L 1 ) P (L 2 L 1 ) P (L j L 1, L 2 ) (2) j=3 Calculating 37/75

38 How to chose I New I is called an ancestral set, because it can be interpreted as a set of alleles that is common with one s ancestors I can be found using a greedy approach adding the j to I that maximises P (L j = a j i I L i = a i ) Stop adding elements to I, e.g. when only a percentage of the observations is left to use for calculating the marginal probabilities conditional on I Calculating 38/75

39 Drawbacks New The approach is simple, but like it is not a statistical model Calculating 39/75

40 New Calculating 40/75

41 The idea: alternately perceive different loci as a response variable New Let L 1, L 2,..., L r be the r different loci available in the haplotype. Then fit L i1 L k (3) L i2 k {i 1 } k {i 1,i 2 } L k (4). (5) L ir 2 L k (6) L ir 1 k {i 1,i 2,...,i r 2 } Calculating 41/75 k {i 1,i 2,...,i r 1 } and use the empirical distribution for L ir. L k = L ir (7)

42 Properties New A class of statistical The classifications can be done with some of the several available classifications such as classification trees, ordered logistic regression, or support vector machines Selection of i j should be done using standard model selection criteria depending on the classification model used Does not incorporate prior knowledge Calculating 42/75

43 and model based clustering New and model based clustering Calculating 43/75

44 The idea: create a density around each haplotype New Put a scaled density/mass (called a kernel) around each haplotype with mass equal to its relative frequency N i N + In this way unobserved haplotypes get probability mass from the (near) neighbours Calculating 44/75

45 Choice of kernel New A straightforward approach is the Gaussian kernel K (z x i, λ) = ( 2πλ 2 ) r 2 det (Σ) 1 2 exp ( 1 2λ 2 (x i z)σ 1 (x i z) ) where λ is called a parameter that has to be chosen A frequency estimate for any given haplotype z is g(z) = 1 N + n i=1 N ik (z x i, λ) To incorporate prior knowledge, K (z x i, N i, λ) = ( 2π λ2 2 ) r N i could be used det (Σ) 1 2 exp ( 1 2 λ2 N i (x i z)σ 1 (x i z) ) Calculating 45/75

46 Problem New The model can be inaccurate if the kernel has small variance, because then the actual mass when evaluated over the discrete grid can differ greatly from the relative Discrete kernels could be tried instead, e.g. the multinomial Calculating 46/75

47 Model based clustering New a frequency for a haplotype using kernel require evaluating as many densities as the number of haplotypes in the database Model based clustering can be used to perform clustering first to minimise the required number of density evaluations Same problem as with kernel if the variances are too small Calculating 47/75

48 Unobserved probability mass Marginal deviances Calculating evidence Further work Questions 48/75

49 Unobserved probability mass Marginal deviances Calculating evidence Model verification is as always crucial One important feature of a model is to be able to efficiently obtain further samples of haplotypes according to their probability under a model Further work Questions 49/75

50 Different ways of comparing Unobserved probability mass Marginal deviances Calculating evidence Estimated unobserved probability mass Marginal deviances (for a model s single and pairwise compared to observed) Several more should be definitely considered Further work Questions 50/75

51 Unobserved probability mass: point estimate Unobserved probability mass Marginal deviances Calculating evidence K 0 : the number of singletons N: the number of observations In 1968, Robbins showed that V = K 0 N + 1 is an unbiased estimate of the unobserved probability mass. Further work Questions 51/75

52 Unobserved probability mass: limiting consistent variance estimate Unobserved probability mass Marginal deviances Calculating evidence K 1 : the number of doubletons In 1986, Bickel and Yahav showed that under some regularity conditions, ˆσ 2 = K 0 N 2 (K 0 2K 1 ) 2 N 3 is limiting consistent estimate of the variance of the unobserved probability mass Both V and the variance estimate can be verified by simulation Further work Questions 52/75

53 Unobserved probability mass: simulation study Unobserved probability mass Marginal deviances Calculating evidence Further work Questions Coverage Confidence interval coverage for true population size Q = S S S S S S S S S S T T T T T T T T T T Symmetric standard confidence interval logit transformed confidence interval C C C C C C C C C C S S S S S S S S S S T T T T T T T T T T N Based on simulations and probabilities. Line at /75

54 Unobserved probability mass Unobserved probability mass Marginal deviances Calculating evidence The estimate seems like a good and simply way of perform model verification, but it cannot stand alone as we shall soon see It can also be used to fit model parameters, e.g. the parameter in the kernel model Further work Questions 54/75

55 Unobserved probability mass: approximate confidence intervals Unobserved probability mass Marginal deviances Calculating evidence berlin dane somali V ˆσ V 95% conf. int. [0.321; 0.408] [0.510; 0.694] [0.212; 0.342] S: β 1 /β 4 0 NA NA S: β 1 /β 4 = rpart svm polr NA Ancestor: 10% Ancestor: 15% Ancestor: 20% Further work Questions 55/75

56 Marginal deviances Unobserved probability mass Marginal deviances Calculating evidence Depending on the model, exact marginals can be difficult to obtain If haplotypes can be sampled according to their probability under a model, then marginals can be approximated by simulating a huge number of haplotypes under that model Using only the observed marginals should correspond to this, but at least for small databases this is not the case according to simulations studies performed with the classification Further work Questions 56/75

57 Deviance for pairwise marginals Unobserved probability mass Marginal deviances Calculating evidence Let {u} ij be the two-way table with the observation counts, { p} ij the table of predicted probabilities under a model M 0, {ˆp} ij the relative probabilities such that ˆp ij = u ij u ++ For the pairwise marginal tables, the deviance is ( d = 2 log L({ p} ij) L({ˆp} ij) ) where L ({p} ij ) = i,j pu ij ij proportional to the likelihood (the constant cancelled out in the fraction) Then d = 2 ( ) i,j u pij ij log ˆp ij χ 2 ν u ++! i,j u ij is is Further work Questions 57/75

58 Unobserved probability mass Marginal deviances Calculating evidence A deviance is calculated for each pair of loci To compare these deviances can summed and used for relative comparisons (the sum is not χ 2 distributed) The deviance is calculated similar for single marginals Further work Questions 58/75

59 Deviance sums for single marginals Unobserved probability mass Marginal deviances Calculating evidence berlin dane somali rpart svm polr NA Sum of deviances for observed single marginals vs. simulated single marginals for the classification method specified in each row. Further work Questions 59/75

60 Deviance sums for pairwise marginals Unobserved probability mass Marginal deviances Calculating evidence berlin dane somali rpart Inf svm Inf polr Inf NA Sum of deviances for observed pairwise marginals vs. simulated pairwise marginals for the classification method specified in each row. Further work Questions 60/75

61 Calculating evidence Calculating evidence Calculating evidence Two contributors n contributors Further work Questions 61/75

62 Evidence: motivation of usage The purpose is to get an unbiased opinion in a trial Often formulated as the hypothesis H p : The suspect left the crime stain together with n additional contributors. H d : Some other person left the crime stain together with n additional contributors. Calculating evidence Two contributors n contributors Further work Questions Then the likelihood ratio given by LR = P(E Hp) P(E H d ) is calculated In a courtroom it can then be stated that the evidence E is LR times more likely to have arisen under H p than under H d (formulation is from Interpreting DNA Mixtures by Weir et al., 1997) 62/75

63 Computational challenge The actual evaluation of LR can be a computation heavy task: the number of combinations giving rise to the same trace grows with the number of contributors Calculating evidence Two contributors n contributors Further work Questions 63/75

64 Two contributors Calculating evidence Two contributors n contributors Further work Questions Assume H p : The suspect left the crime stain together with one additional contributors. H d : Some other person left the crime stain together with one additional contributors. 64/75

65 Two contributors: notation Calculating evidence Two contributors n contributors Further work Let a = (a 1, a 2,..., a n ) and b = (b 1, b 2,..., b n ) and define T = a b = ({a 1, b 1 }, {a 2, b 2 },..., {a 1, b n }) (8) T a = ({b 1 }, {b 2 },..., {b n }) (9) where the sets are multisets. Questions 65/75

66 Two contributors Calculating evidence Two contributors n contributors Further work Questions Let T be the trace, h s the suspect s haplotype, and h 1 the additional contributor s haplotype. Then T = h s h 1 and h 1 = T h s. Say that (h 1, h 2 ) is consistent with the trace T if h 1 h 2 = T, which is denoted (h 1, h 2 ) T. This makes LR = P (E H p) P (E H d ) P (h s, T h s ) = (h 1,h 2 ) T P (h s, h 1, h 2 ) P (h s ) P (T h s ) = P (h s ) (h 1,h 2 ) T P (h 1) P (h 2 ) P (T h s ) = (h 1,h 2 ) T P (h 1) P (h 2 ) (10) (11) (12) (13) by assuming that haplotypes are independent. 66/75

67 Two contributors: number of terms in the denominator Calculating evidence Two contributors n contributors Further work Let T = (T 1, T 2,..., T r ), T i is a set of alleles such that T i {1, 2} Let H T = T 1 T 2 T r In the non-trivial case a j {1, 2,..., r} exists such that T j = {a 1, a 2 } with a 1 a 2 Let T j = {a 1 } (such that one of the alleles is removed) and H T = T 1 T j 1 T j T j+1 T r Questions 67/75

68 Two contributors: number of terms in the denominator Calculating evidence Two contributors n contributors Further work Questions Now the denominator of LR can be written as 2 h 1 H T P (h 1 ) P (T h 1 ) If k denotes the number of loci in the trace with only one allele, and we assume that we have the non-trivial case with 0 k < r, we have that H T = r i=1 T i = 2 r k such that H T = H T 2 = 2 r k 1 2 r 1 This means that for r loci, a maximum of 2 2 r 1 = 2 r haplotype have to be calculated, e.g = 1024 If a trace has two contributors with no known suspects, the two most likely contributors can be chosen to be arg max h1 H T P (h 1) P (T h 1 ) 68/75

69 n contributors: formulation of the LR for one locus Calculating evidence Two contributors n contributors Further work Questions LR defined generally in Forensic interpretation of Y-chromosomal DNA mixtures by Wolf et al., 2005 At a given locus, let E t, E s, and E k be the set of alleles from the trace, the suspect, and the known contributors, respectively Assume n unknown contributors and let A n denote the set of alleles carried by the these n unknown contributors Let P n (V ; W ) = P (W A n V ) Then LR = P n (E t ; E t \ (E s E k )) P n+1 (E t ; E t \ E k ) = P (E t \ (E s E k ) A n E t ) P (E t \ E k A n+1 E t ) (14) (15) 69/75

70 n contributors: formulation of the LR for m loci Calculating evidence Two contributors n contributors Further work For m loci we have ( m P n {E t,i ; E t,i \ ( ) ) E s,i E k,i } LR = i=1 ( m ) P n+1 {E t,i ; E t,i \ E k,i } i=1 Questions 70/75

71 n contributors: example from thesis, page 18 Calculating evidence Two contributors n contributors Further work Questions Let E t,1 = {1, 2}, E t,2 = {2}, E s,1 = {1}, E s,2 = {2}, E k,1 = E k,2 =. Then ( 2 P 1 i=1 {E t,i ; E t,i \ ( ) ) E s,i E k,i } LR = ( 2 ) P 2 i=1 {E t,i ; E t,i \ E k,i } ( 2 P i=1 {E t,i \ ( ) ) E s,i E k,i A i 1 E t,i = ( 2 ) P i=1 {E t,i \ E k,i A i 2 E t,i } = P ( {E t,1 \ E s,1 A 1 1 E t,1} {E t,2 \ E s,2 A 2 1 E ) t,2 P ( {E t,1 \ E k,1 A 1 2 E t,1} {E t,2 \ E k,2 A 2 2 E t,2} ) = P ( {{2} 1 A 1 1 {1, 2}1 } {A 2 1 {2}2 } ) P ( {{1, 2} 1 A 1 2 {1, 2}1 } {{2} 2 A 2 2 {2}2 } ) = P ( {2} 1 {2} 2) P ({1, 2} 1 {2, 2} 2 ) = P (h 1 = (2, 2)) P (h 1 = (1, 2), h 2 = (2, 2)) + P (h 1 = (2, 2), h 2 = (1, 2)) 71/75

72 n contributors: challenges Calculating evidence Two contributors n contributors Further work It is complicated One key ingredient in calculating the LR is to be able to estimate for unobserved haplotypes The next step is to be able to calculate the LR efficiently even for a large number of contributors Questions 72/75

73 Further work Further work Calculating evidence Further work Questions 73/75

74 Further work Calculating evidence Further work Questions Y-STR haplotype Better incorporation of prior knowledge in a statistical model, e.g. graphical with other test statistics (ordinal data and incorporating prior knowledge such as the single step mutation model) More and better ways to verify Larger datasets ( has gathered a lot of data, both publicly available in journals and directly from laboratories, but none is available for others, yet) Y-STR Mixtures Efficient calculation of LR Use quantitative information (the amount of DNA material which can be seen in the EPG) instead of only the qualitative Model the signal in the EPG (electropherogram) 74/75

75 Questions? Questions? Calculating evidence Further work Questions 75/75

The problem of Y-STR rare haplotype

The problem of Y-STR rare haplotype The problem of Y-STR rare haplotype February 17, 2014 () Short title February 17, 2014 1 / 14 The likelihood ratio When evidence is found on a crime scene, the expert is asked to provide a Likelihood ratio

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Stochastic Models for Low Level DNA Mixtures

Stochastic Models for Low Level DNA Mixtures Original Article en5 Stochastic Models for Low Level DNA Mixtures Dalibor Slovák 1,, Jana Zvárová 1, 1 EuroMISE Centre, Institute of Computer Science AS CR, Prague, Czech Republic Institute of Hygiene

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

A gamma model for DNA mixture analyses

A gamma model for DNA mixture analyses Bayesian Analysis (2007) 2, Number 2, pp. 333 348 A gamma model for DNA mixture analyses R. G. Cowell, S. L. Lauritzen and J. Mortera Abstract. We present a new methodology for analysing forensic identification

More information

Cluster analysis of Y-chromosomal STR population data using discrete Laplace distributions

Cluster analysis of Y-chromosomal STR population data using discrete Laplace distributions 25th ISFG Congress, Melbourne, 2013 1,, Poul Svante Eriksen 1 and Niels Morling 2 1 Department of Mathematical,, 2 Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical,

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Adventures in Forensic DNA: Cold Hits, Familial Searches, and Mixtures

Adventures in Forensic DNA: Cold Hits, Familial Searches, and Mixtures Adventures in Forensic DNA: Cold Hits, Familial Searches, and Mixtures Departments of Mathematics and Statistics Northwestern University JSM 2012, July 30, 2012 The approach of this talk Some simple conceptual

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

Learning from Data: Regression

Learning from Data: Regression November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Clustering bi-partite networks using collapsed latent block models

Clustering bi-partite networks using collapsed latent block models Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Learning ancestral genetic processes using nonparametric Bayesian models

Learning ancestral genetic processes using nonparametric Bayesian models Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Generative Models for Discrete Data

Generative Models for Discrete Data Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

More information

Multiple Sequence Alignment using Profile HMM

Multiple Sequence Alignment using Profile HMM Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

A Bayesian model for event-based trust

A Bayesian model for event-based trust A Bayesian model for event-based trust Elements of a foundation for computational trust Vladimiro Sassone ECS, University of Southampton joint work K. Krukow and M. Nielsen Oxford, 9 March 2007 V. Sassone

More information

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval Inf1-DA 2010 20 III: 28 / 89 Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval Statistical Analysis of Data: III.2 Data scales and summary statistics III.3 Hypothesis

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames Statistical Methods in HYDROLOGY CHARLES T. HAAN The Iowa State University Press / Ames Univariate BASIC Table of Contents PREFACE xiii ACKNOWLEDGEMENTS xv 1 INTRODUCTION 1 2 PROBABILITY AND PROBABILITY

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

More Spectral Clustering and an Introduction to Conjugacy

More Spectral Clustering and an Introduction to Conjugacy CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Advanced topics from statistics

Advanced topics from statistics Advanced topics from statistics Anders Ringgaard Kristensen Advanced Herd Management Slide 1 Outline Covariance and correlation Random vectors and multivariate distributions The multinomial distribution

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Dreem Challenge report (team Bussanati)

Dreem Challenge report (team Bussanati) Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Statistics 135 Fall 2007 Midterm Exam

Statistics 135 Fall 2007 Midterm Exam Name: Student ID Number: Statistics 135 Fall 007 Midterm Exam Ignore the finite population correction in all relevant problems. The exam is closed book, but some possibly useful facts about probability

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Interpreting DNA Mixtures in Structured Populations

Interpreting DNA Mixtures in Structured Populations James M. Curran, 1 Ph.D.; Christopher M. Triggs, 2 Ph.D.; John Buckleton, 3 Ph.D.; and B. S. Weir, 1 Ph.D. Interpreting DNA Mixtures in Structured Populations REFERENCE: Curran JM, Triggs CM, Buckleton

More information

Lecture 1 Bayesian inference

Lecture 1 Bayesian inference Lecture 1 Bayesian inference olivier.francois@imag.fr April 2011 Outline of Lecture 1 Principles of Bayesian inference Classical inference problems (frequency, mean, variance) Basic simulation algorithms

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

The statistical evaluation of DNA crime stains in R

The statistical evaluation of DNA crime stains in R The statistical evaluation of DNA crime stains in R Miriam Marušiaková Department of Statistics, Charles University, Prague Institute of Computer Science, CBI, Academy of Sciences of CR UseR! 2008, Dortmund

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru

More information