A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage

Size: px
Start display at page:

Download "A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage"

Transcription

1 A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage Mauricio Sadinle Duke University Supported by NSF grants SES to Carnegie Mellon University and SES to Duke University 1 / 49

2 Record Linkage 2 / 49

3 Examples of (Non-Trivial) Record Linkage Statistical agencies (US Census Bureau, Statistics Canada,...): Creation of unified administrative databases Merging of census and post-enumeration surveys for census adjustments Human rights: Creation of unified lists of casualties Combination of data sources to estimate true mortality levels Epidemiological studies, educational studies,... 3 / 49

4 Examples of (Non-Trivial) Record Linkage Statistical agencies (US Census Bureau, Statistics Canada,...): Creation of unified administrative databases Merging of census and post-enumeration surveys for census adjustments Human rights: Creation of unified lists of casualties Combination of data sources to estimate true mortality levels Epidemiological studies, educational studies,... 3 / 49

5 Examples of (Non-Trivial) Record Linkage Statistical agencies (US Census Bureau, Statistics Canada,...): Creation of unified administrative databases Merging of census and post-enumeration surveys for census adjustments Human rights: Creation of unified lists of casualties Combination of data sources to estimate true mortality levels Epidemiological studies, educational studies,... 3 / 49

6 Record Linkage vs Duplicate Detection Record linkage: The task of finding records that refer to the same entity across different data sources Duplicate detection: The task of identifying groups of records referring to the same entities within one datafile 4 / 49

7 The Intuition Very similar records are likely to be coreferent! Record Gender First Name Last Name... Age i Male Benedict Cumberbatch j Male Benedict Cucumberbatch Very dissimilar records should be non-coreferent! Record Gender First Name Last Name... Age i Male Benedict Cumberbatch j Male Martin Freeman / 49

8 Comparison Data Construct comparison vectors: γ ij = (..., γ f ij,... ) Examples of similarity measures: Field Numeric (e.g. age) Strings (e.g. name) Nominal (e.g. race) Similarity Measure Absolute difference Levenshtein, Jaro Winkler Binary comparison 6 / 49

9 Common Solutions: Training Classifiers Take sample of record pairs and find if they refer to the same entity or not (e.g. clerical review) Train your favorite classifier using the comparison vectors γ ij (logistic regression, SVM, random forest,...) Predict coreference status of the remaining record pairs 7 / 49

10 Common Solutions: Mixture Model Comparison vector γ ij is a realization of a random vector Γ ij Mixture model implementation Γ ij ij = 1 iid G M Γ ij ij = 0 iid G U ij iid Bernoulli(p) used by Winkler (1980), Jaro (1989), Larsen and Rubin (2001) and many others Often called Fellegi-Sunter approach, after these authors seminal paper (JASA, 1969) Caveat: mixture components may not be associated with matches and non-matches! 8 / 49

11 Common Solutions: Mixture Model Comparison vector γ ij is a realization of a random vector Γ ij Mixture model implementation Γ ij ij = 1 iid G M Γ ij ij = 0 iid G U ij iid Bernoulli(p) used by Winkler (1980), Jaro (1989), Larsen and Rubin (2001) and many others Often called Fellegi-Sunter approach, after these authors seminal paper (JASA, 1969) Caveat: mixture components may not be associated with matches and non-matches! 8 / 49

12 Common Solutions: Mixture Model Comparison vector γ ij is a realization of a random vector Γ ij Mixture model implementation Γ ij ij = 1 iid G M Γ ij ij = 0 iid G U ij iid Bernoulli(p) used by Winkler (1980), Jaro (1989), Larsen and Rubin (2001) and many others Often called Fellegi-Sunter approach, after these authors seminal paper (JASA, 1969) Caveat: mixture components may not be associated with matches and non-matches! 8 / 49

13 Drawbacks of the Traditional Methodologies They output independent decisions on the coreference status of pairs of records Independent pairwise decisions are often non-transitive a and b declared coreferent b and c declared coreferent a and c declared non-coreferent b a c No proper account of uncertainty in linkage decisions 9 / 49

14 Record Linkage à la Fellegi-Sunter Fellegi-Sunter decision rule: w ij = log P( γ ij ij = 1 ) P ( γ ij ij = 0 ) ˆ ij = 1, if w ij > t M ; (link) 0, if w ij < t U ; (non-link) R, otherwise. ( reject ) Caveat: the comparison vector γ ij alone determines the linkage decision 10 / 49

15 Traditional Approaches: What s Wrong? ij s are not independent! 11 / 49

16 Overview of New Methodology General Bayesian approach to joint duplicate detection and record linkage (DDRL) Guarantees transitive coreference decisions Bayesian approach allows us to Incorporate prior information on the quality of the datafiles Provide a proper account of uncertainty in DDRL decisions 12 / 49

17 Contents Bayesian Partitioning Approach to DDRL Bipartite Record Linkage Population Size Estimation with Linked Files Conclusions 13 / 49

18 Contents Bayesian Partitioning Approach to DDRL Bipartite Record Linkage Population Size Estimation with Linked Files Conclusions 14 / 49

19 Linking Multiple Files that Contain Duplicates File 1 Sex, Sex,...,Age Male,...,24 1 Male,...,25 2 Male,...,24 3 Male,...,50 4 Male,...,52 5 Female,...,20 6 Male Male Male Male Male Fem Male File 2 Male,...,24 Male,...,10 Female,...,77 Male,..., File 3 Female,...,78 Male,...,25 Female,..., / 49

20 Our Parameter of Interest Goal is to partition the datafile into groups of coreferent records This partition is our parameter of interest, denoted coreference partition A coreference partition can be represented by a matrix { 1, if records i and j are coreferent; ij = 0, otherwise. Or by labelings of the records (Z 1,..., Z r ) such that Z i = Z j iff records i and j are coreferent 16 / 49

21 Inference on the Partition of the Concatenated File Comparison data: Γ ij = (..., Γ f ij,... ) We propose the model Γ ij ij = 1, D i = k, D j = l Γ ij ij = 0, D i = k, D j = l iid G kl 1, iid G kl 0, Distribution on valid partitions D i : datafile where record i belongs. Linkage decisions are based on p( Γ) 17 / 49

22 Inference on the Partition of the Concatenated File Comparison data: Γ ij = (..., Γ f ij,... ) We propose the model Γ ij ij = 1, D i = k, D j = l Γ ij ij = 0, D i = k, D j = l iid G kl 1, iid G kl 0, Distribution on valid partitions D i : datafile where record i belongs. Linkage decisions are based on p( Γ) 17 / 49

23 Particular Cases and Further Connections Approach can be constrained to handle important particular cases Duplicate detection within one file (AOAS 2014) Traditional record linkage scenario: two files, no dups (JASA 2016+) Account for linkage uncertainty in some models for population size estimation 18 / 49

24 Contents Bayesian Partitioning Approach to DDRL Bipartite Record Linkage Population Size Estimation with Linked Files Conclusions 19 / 49

25 Bipartite Record Linkage no duplicates within file 20 / 49

26 Traditional Methodologies Chains of links can happen! X 1 X / 49

27 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 3 = 8 22 / 49

28 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 3 = 8 22 / 49

29 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 3 = 8 22 / 49

30 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 3 = 8 22 / 49

31 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 3 = 8 22 / 49

32 Enforcing One-to-One Matching in Fellegi-Sunter Approach Jaro (1989) proposes to plug the weights w ij into a linear sum assignment problem max w ij ij i subject to ij {0, 1}, ij 1, j X 2, i j ij 1, i X 1. j 23 / 49

33 Model for Bipartite Matchings Γ ij Z j = i iid G M Γ ij Z j i iid G U Z Prior on Bipartite Matchings 24 / 49

34 Beta Prior on Bipartite Matchings Larsen (2005, 2010): Files overlap size n 12 is Beta-Binomial(n 2, α, β) All bipartite matchings with the same n 12 are equally likely 25 / 49

35 Models G M and G U for Comparison Vectors Assume comparisons are independent for both coreferent, and non-coreferent records Γ f ij Z j = i Multinomial(1, m f ) Γ f ij Z j i Multinomial(1, u f ) Flat priors for m f, u f, for all f 26 / 49

36 Estimation of the Bipartite Matching Z MCMC 27 / 49

37 Point Estimation From the posterior p(z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with { L(Z j, Ẑj) = 0, if Z j = Ẑj (gets it right) 28 / 49

38 Point Estimation From the posterior p(z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with L(Z j, Ẑj) = { 0, if Z j = Ẑj; λ R, if Ẑj = R (does not output a decision, reject ) 29 / 49

39 Point Estimation From the posterior p(z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with 0, if Z j = Ẑj; L(Z j, Ẑj) = λ R, if Ẑj = R; λ FNM, if Z j n 1, Ẑj = j (false non-match) 30 / 49

40 Point Estimation From the posterior p(z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with L(Z j, Ẑj) = 0, if Z j = Ẑj; λ R, if Ẑj = R; λ FNM, if Z j n 1, Ẑj = j; λ FM1, if Z j = j, Ẑj n 1 (false match type 1) 31 / 49

41 Point Estimation From the posterior p(z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with L(Z j, Ẑj) = 0, if Z j = Ẑj; λ R, if Ẑj = R; λ FNM, if Z j n 1, Ẑj = n 1 + j; λ FM1, if Z j = n 1 + j, Ẑj n 1 ; λ FM2, if Z j, Ẑj n 1, Z j Ẑj (false match type 2) For generic applications λ FNM = λ FM1 = 1 and λ FM2 = 2 works well 32 / 49

42 Point Estimation From the posterior p(z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with L(Z j, Ẑj) = 0, if Z j = Ẑj; λ R, if Ẑj = R; λ FNM, if Z j n 1, Ẑj = n 1 + j; λ FM1, if Z j = n 1 + j, Ẑj n 1 ; λ FM2, if Z j, Ẑj n 1, Z j Ẑj (false match type 2) For generic applications λ FNM = λ FM1 = 1 and λ FM2 = 2 works well 32 / 49

43 Point Estimation THEOREM. If λ FM2 λ FM1 2λ R > 0 and λ FNM 2λ R in the additive loss function, the Bayes estimate of the bipartite matching can be obtained from Ẑ = (Ẑ1,..., Ẑn 2 ), where Ẑ j = i, if P(Z j = i γ obs ) > 1 λ R λ FM1 + λ FM2 λ FM1 λ FM1 P(Z j / {i, n 1 + j} γ obs ); n 1 + j, if P(Z j = n 1 + j γ obs ) > 1 λ R λ FNM ; R, otherwise. 33 / 49

44 Simulation Setup Different scenarios of files overlap and measurement error 100 pairs of synthetic datafiles for each scenario Each datafile has 500 records Four fields: given and family names, age and occupation 34 / 49

45 Results with Full Assignments Overlap 100% Overlap 50% Overlap 10% Fellegi Sunter Precision / Recall Precision / Recall Precision / Recall Number of Erroneous Fields Number of Erroneous Fields Number of Erroneous Fields Beta Record Linkage Precision / Recall Precision / Recall Precision / Recall Number of Erroneous Fields Number of Erroneous Fields Number of Erroneous Fields : precision, : recall black lines show medians, and gray lines show first and 99th percentiles. 35 / 49

46 Results with Partial Assignments Fellegi Sunter Mixture Model Beta Record Linkage Full Estimates Partial Estimates Full Estimates Partial Estimates Positive Predictive Value / Negative PV PPV / NPV / Rejection Rate PPV / NPV PPV / NPV / RR Number of Erroneous Fields Number of Erroneous Fields Number of Erroneous Fields Number of Erroneous Fields Datafiles with 10% overlap : precision or positive predictive value (PPV) : negative predictive value (NPV) - - : rejection rate (RR) 36 / 49

47 Contents Bayesian Partitioning Approach to DDRL Bipartite Record Linkage Population Size Estimation with Linked Files Conclusions 37 / 49

48 A Case Study from El Salvador El Salvador underwent a civil war from 1980 until 1991 Three datafiles on casualties obtained from the Human Rights Data Analysis Group - HRDAG ER-TL El Rescate - Tutela Legal (4335 records) CDHES Salvadoran Human Rights Commission (1038 records) Collected throughout the period of the civil war Reports were investigated before accepted in the files UNTC United Nations Truth Commission for El Salvador (5161 records) Reports collected in 1992 from relatives and friends of the victims Common to find different reports on the same event with slightly different information DDRL for these datafiles would allow us to estimate Number of different reported killings in these datafiles Number of civilian killings occurred during the war 38 / 49

49 A Case Study from El Salvador El Salvador underwent a civil war from 1980 until 1991 Three datafiles on casualties obtained from the Human Rights Data Analysis Group - HRDAG ER-TL El Rescate - Tutela Legal (4335 records) CDHES Salvadoran Human Rights Commission (1038 records) Collected throughout the period of the civil war Reports were investigated before accepted in the files UNTC United Nations Truth Commission for El Salvador (5161 records) Reports collected in 1992 from relatives and friends of the victims Common to find different reports on the same event with slightly different information DDRL for these datafiles would allow us to estimate Number of different reported killings in these datafiles Number of civilian killings occurred during the war 38 / 49

50 A Case Study from El Salvador El Salvador underwent a civil war from 1980 until 1991 Three datafiles on casualties obtained from the Human Rights Data Analysis Group - HRDAG ER-TL El Rescate - Tutela Legal (4335 records) CDHES Salvadoran Human Rights Commission (1038 records) Collected throughout the period of the civil war Reports were investigated before accepted in the files UNTC United Nations Truth Commission for El Salvador (5161 records) Reports collected in 1992 from relatives and friends of the victims Common to find different reports on the same event with slightly different information DDRL for these datafiles would allow us to estimate Number of different reported killings in these datafiles Number of civilian killings occurred during the war 38 / 49

51 A Case Study from El Salvador El Salvador underwent a civil war from 1980 until 1991 Three datafiles on casualties obtained from the Human Rights Data Analysis Group - HRDAG ER-TL El Rescate - Tutela Legal (4335 records) CDHES Salvadoran Human Rights Commission (1038 records) Collected throughout the period of the civil war Reports were investigated before accepted in the files UNTC United Nations Truth Commission for El Salvador (5161 records) Reports collected in 1992 from relatives and friends of the victims Common to find different reports on the same event with slightly different information DDRL for these datafiles would allow us to estimate Number of different reported killings in these datafiles Number of civilian killings occurred during the war 38 / 49

52 Levels of Disagreement for the Salvadoran Datafiles Levels of Disagreement Field Similarity Measure Year Absolute Difference Month Absolute Difference Day Absolute Difference Municipality Binary Comparison Agree Disagree Given Name Modified Levenshtein 0 (0, 0.25] (0.25, 0.5] (0.5, 1] Family Name Modified Levenshtein 0 (0, 0.25] (0.25, 0.5] (0.5, 1] 39 / 49

53 Posterior Distribution of Number of Reported Killings Frequency n ~ = Number of Reported Killings Figure: In red the number corresponding to the estimated posterior mode of the coreference partition. 40 / 49

54 Population Size Estimation Target: size N of a population Several models have the capture histories of individuals in multiple samples as sufficient statistics n = Third Sample Yes No Second Sample Second Sample First Sample Yes No Yes No Yes n 111 n 101 n 110 n 100 No n 011 n 001 n 010 From these models we can obtain p A (N n) We focus on the methodology of Madigan and York (1997): Bayesian model averaging over the class of decomposable graphical models for n 41 / 49

55 Population Size Estimation Target: size N of a population Several models have the capture histories of individuals in multiple samples as sufficient statistics n = Third Sample Yes No Second Sample Second Sample First Sample Yes No Yes No Yes n 111 n 101 n 110 n 100 No n 011 n 001 n 010 From these models we can obtain p A (N n) We focus on the methodology of Madigan and York (1997): Bayesian model averaging over the class of decomposable graphical models for n 41 / 49

56 Population Size Estimation with Linked Files It is easy to see that n is a deterministic function of On the other hand the uncertainty on is captured by the posterior p L ( X) Is p C (N X) = p A (N n( ))p L ( X) a valid posterior? Yes! p C (N X) linkage model likelihood p(n, ) }{{} p A (N n( ))p( ) 42 / 49

57 Population Size Estimation with Linked Files It is easy to see that n is a deterministic function of On the other hand the uncertainty on is captured by the posterior p L ( X) Is p C (N X) = p A (N n( ))p L ( X) a valid posterior? Yes! p C (N X) linkage model likelihood p(n, ) }{{} p A (N n( ))p( ) 42 / 49

58 Population Size Estimation with Linked Files It is easy to see that n is a deterministic function of On the other hand the uncertainty on is captured by the posterior p L ( X) Is p C (N X) = p A (N n( ))p L ( X) a valid posterior? Yes! p C (N X) linkage model likelihood p(n, ) }{{} p A (N n( ))p( ) 42 / 49

59 Back to the Salvadoran Datafiles UNTC Present Absent CDHES CDHES ER-TL Present Absent Present Absent n 111 n 101 n 110 n 100 Present n 011 n 001 n 010 Absent / 49

60 Given Posterior Mode of Coreference Partition Given Posterior Random Coreference Partition Given Posterior Random Coreference Partition Probability Probability Probability Number of Killings Number of Killings Number of Killings Given Posterior Random Coreference Partition Given Posterior Random Coreference Partition Probability Probability Model Average [ER TL][CDHES, UNTC] [ER TL, CDHES][ER TL, UNTC] [ER TL, CDHES][CDHES, UNTC] [ER TL, UNTC][CDHES, UNTC] Number of Killings Number of Killings 44 / 49

61 Posterior of Number of Killings Accounting for Coreference Partition Uncertainty Probability Number of Killings 45 / 49

62 Total Variance Decomposition Var(N X) = Var X [E(N )] ( DDRL ) + E X {Var m [E(N, m)]} ( pop. size model ) + E X {E m [Var(N, m)]} ( intrinsic ) In our application the decomposition is: DDRL Pop. Size Model Intrinsic 2.6% 65.6% 31.8% 46 / 49

63 Total Variance Decomposition Var(N X) = Var X [E(N )] ( DDRL ) + E X {Var m [E(N, m)]} ( pop. size model ) + E X {E m [Var(N, m)]} ( intrinsic ) In our application the decomposition is: DDRL Pop. Size Model Intrinsic 2.6% 65.6% 31.8% 46 / 49

64 Contents Bayesian Partitioning Approach to DDRL Bipartite Record Linkage Population Size Estimation with Linked Files Conclusions 47 / 49

65 Conclusions Unsupervised approach to duplicate detection and record linkage problems Guarantees transitive coreference decisions Provides a natural account for uncertainty of the coreference decisions in the form of a posterior distribution New methodology improves on existing methods and performs much better 48 / 49

66 Questions? 49 / 49

Bayesian Estimation of Bipartite Matchings for Record Linkage

Bayesian Estimation of Bipartite Matchings for Record Linkage Bayesian Estimation of Bipartite Matchings for Record Linkage Mauricio Sadinle msadinle@stat.duke.edu Duke University Supported by NSF grants SES-11-30706 to Carnegie Mellon University and SES-11-31897

More information

Bayesian Estimation of Bipartite Matchings for Record Linkage

Bayesian Estimation of Bipartite Matchings for Record Linkage Bayesian Estimation of Bipartite Matchings for Record Linkage arxiv:1601.06630v1 [stat.me] 25 Jan 2016 Mauricio Sadinle Department of Statistical Science, Duke University, and National Institute of Statistical

More information

New Prediction Methods for Tree Ensembles with Applications in Record Linkage

New Prediction Methods for Tree Ensembles with Applications in Record Linkage New Prediction Methods for Tree Ensembles with Applications in Record Linkage Samuel L. Ventura Rebecca Nugent Department of Statistics Carnegie Mellon University June 11, 2015 45th Symposium on the Interface

More information

A Generalized Fellegi Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems

A Generalized Fellegi Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems arxiv:1205.3217v2 [stat.ap] 6 Feb 2013 A Generalized Fellegi Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems Mauricio Sadinle and Stephen E. Fienberg Carnegie Mellon

More information

Large-scale Data Linkage from Multiple Sources: Methodology and Research Challenges

Large-scale Data Linkage from Multiple Sources: Methodology and Research Challenges Large-scale Data Linkage from Multiple Sources: Methodology and Research Challenges John M. Abowd Associate Director for Research and Methodology and Chief Scientist, U.S. Census Bureau Based on NBER Summer

More information

Machine Learning and Record Linkage

Machine Learning and Record Linkage Machine Learning and Record Linkage william.e.winkler@census.gov 26 August, 2011 1. Background on Problem 2. Methods of Record Linkage 3. Optimal Parameter Estimation 4. String Comparators for Typographical

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

arxiv: v2 [stat.me] 27 Apr 2015

arxiv: v2 [stat.me] 27 Apr 2015 Vol. 00 (0000) 1 DOI: 0000 Entity Resolution with Empirically Motivated Priors Rebecca C. Steorts 1 arxiv:1409.0643v2 [stat.me] 27 Apr 2015 1 Visiting Assistant Professor, Department of Statistics, Carnegie

More information

arxiv: v1 [stat.co] 2 Mar 2014

arxiv: v1 [stat.co] 2 Mar 2014 SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication Rebecca C. Steorts Rob Hall Stephen E. Fienberg { beka, fienberg } @cmu.edu, Department of Statistics, Carnegie Mellon University,

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Statistics 352: Spatial statistics. Jonathan Taylor. Department of Statistics. Models for discrete data. Stanford University.

Statistics 352: Spatial statistics. Jonathan Taylor. Department of Statistics. Models for discrete data. Stanford University. 352: 352: Models for discrete data April 28, 2009 1 / 33 Models for discrete data 352: Outline Dependent discrete data. Image data (binary). Ising models. Simulation: Gibbs sampling. Denoising. 2 / 33

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

Day 5: Generative models, structured classification

Day 5: Generative models, structured classification Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Probabilistic Record Linkage and Deduplication after Indexing, Blocking, and Filtering

Probabilistic Record Linkage and Deduplication after Indexing, Blocking, and Filtering Probabilistic Record Linkage and Deduplication after Indexing, Blocking, and Filtering Jared S. Murray arxiv:1603.07816v1 [stat.me] 25 Mar 2016 March 12, 2016 Abstract Probabilistic record linkage, the

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Randomized Decision Trees

Randomized Decision Trees Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Estimating the Size of Hidden Populations using Respondent-Driven Sampling Data

Estimating the Size of Hidden Populations using Respondent-Driven Sampling Data Estimating the Size of Hidden Populations using Respondent-Driven Sampling Data Mark S. Handcock Krista J. Gile Department of Statistics Department of Mathematics University of California University of

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

How Many Digits In A Handshake? National Death Index Matching With Less Than Nine Digits of the Social Security Number

How Many Digits In A Handshake? National Death Index Matching With Less Than Nine Digits of the Social Security Number How Many Digits In A Handshake? National Death Index Matching With Less Than Nine Digits of the Social Security Number Bryan Sayer, Social & Scientific Systems, Inc. 8757 Georgia Ave, STE 1200, Silver

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling

Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling Santa Fe Institute Working Paper 7-4-5 arxiv.org/math.st/7375 Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling Christopher C. Strelioff,,2, James

More information

REGRESSION ANALYSIS WITH LINKED DATA: PROBLEMS AND POSSIBLE SOLUTIONS.

REGRESSION ANALYSIS WITH LINKED DATA: PROBLEMS AND POSSIBLE SOLUTIONS. STATISTICA, anno LXXV, n. 1, 2015 REGRESSION ANALYSIS WITH LINKED DATA: PROBLEMS AND POSSIBLE SOLUTIONS. Andrea Tancredi 1 Dipartimento di metodi e Modelli per l Economia, il territorio e la Finanza, Sapienza

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Bayesian Estimation Under Informative Sampling with Unattenuated Dependence

Bayesian Estimation Under Informative Sampling with Unattenuated Dependence Bayesian Estimation Under Informative Sampling with Unattenuated Dependence Matt Williams 1 Terrance Savitsky 2 1 Substance Abuse and Mental Health Services Administration Matthew.Williams@samhsa.hhs.gov

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Application of Jaro-Winkler String Comparator in Enhancing Veterans Administrative Records

Application of Jaro-Winkler String Comparator in Enhancing Veterans Administrative Records Application of Jaro-Winkler String Comparator in Enhancing Veterans Administrative Records Hyo Park, Eddie Thomas, Pheak Lim The Office of Data Governance and Analytics Department of Veterans Affairs FCSM,

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University Kalman Filter 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Examples up to now have been discrete (binary) random variables Kalman filtering can be seen as a special case of a temporal

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

What does Bayes theorem give us? Lets revisit the ball in the box example.

What does Bayes theorem give us? Lets revisit the ball in the box example. ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered

More information

MTTTS16 Learning from Multiple Sources

MTTTS16 Learning from Multiple Sources MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:

More information

Bayesian inference for multivariate extreme value distributions

Bayesian inference for multivariate extreme value distributions Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of

More information

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract Abstract The analysis of extremely large, complex datasets is becoming an increasingly important task in the analysis of scientific data. This trend is especially prevalent in astronomy, as large-scale

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Neutral Bayesian reference models for incidence rates of (rare) clinical events

Neutral Bayesian reference models for incidence rates of (rare) clinical events Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making:

More information

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated? Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making:

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Bernoulli and Poisson models

Bernoulli and Poisson models Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

ASA Section on Survey Research Methods

ASA Section on Survey Research Methods REGRESSION-BASED STATISTICAL MATCHING: RECENT DEVELOPMENTS Chris Moriarity, Fritz Scheuren Chris Moriarity, U.S. Government Accountability Office, 411 G Street NW, Washington, DC 20548 KEY WORDS: data

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Statistical Inference: Uses, Abuses, and Misconceptions

Statistical Inference: Uses, Abuses, and Misconceptions Statistical Inference: Uses, Abuses, and Misconceptions Michael W. Trosset Indiana Statistical Consulting Center Department of Statistics ISCC is part of IU s Department of Statistics, chaired by Stanley

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Managing Decomposed Belief Functions

Managing Decomposed Belief Functions Managing Decomposed Belief Functions Johan Schubert Department of Decision Support Systems, Division of Command and Control Systems, Swedish Defence Research Agency, SE-164 90 Stockholm, Sweden schubert@foi.se

More information

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February 1 PageRank You are given in the file adjency.mat a matrix G of size n n where n = 1000 such that { 1 if outbound link from i to j,

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information