Bayesian Estimation of Bipartite Matchings for Record Linkage
|
|
- Nigel Cannon
- 5 years ago
- Views:
Transcription
1 Bayesian Estimation of Bipartite Matchings for Record Linkage Mauricio Sadinle Duke University Supported by NSF grants SES to Carnegie Mellon University and SES to Duke University
2 Record Linkage no duplicates within file
3 Record Linkage no duplicates within file
4 Outline The Fellegi-Sunter Framework for Record Linkage Record Linkage Actual Target A Bayesian Approach to Bipartite Record Linkage Performance Comparison Take-Home Message
5 Record Linkage à la Fellegi-Sunter After Fellegi and Sunter (1969): Datafiles: X 1 and X 2 Given i X 1 and j X 2, define { 1, if records i and j are a match (refer to the same unit); ij = 0, otherwise. Goal: To estimate ij for each i X 1 and j X 2
6 Record Linkage à la Fellegi-Sunter After Fellegi and Sunter (1969): Datafiles: X 1 and X 2 Given i X 1 and j X 2, define { 1, if records i and j are a match (refer to the same unit); ij = 0, otherwise. Goal: To estimate ij for each i X 1 and j X 2
7 Record Linkage à la Fellegi-Sunter After Fellegi and Sunter (1969): Datafiles: X 1 and X 2 Given i X 1 and j X 2, define { 1, if records i and j are a match (refer to the same unit); ij = 0, otherwise. Goal: To estimate ij for each i X 1 and j X 2
8 Record Linkage à la Fellegi-Sunter The intuition: Very similar records are likely to be matches! Record Gender City... Age i Male Gatineau j Male Gatineau Very dissimilar records should be non-matches! Record Gender City... Age i Male Gatineau j Female Durham... 60
9 Record Linkage à la Fellegi-Sunter For each record pair compute a comparison vector γ ij = (..., γ f ij,... ) The probabilities P ( γ ij ij = 1 ) and P ( γ ij ij = 0 ) should (generally) be very different
10 Record Linkage à la Fellegi-Sunter For each record pair compute a comparison vector γ ij = (..., γ f ij,... ) The probabilities P ( γ ij ij = 1 ) and P ( γ ij ij = 0 ) should (generally) be very different
11 Record Linkage à la Fellegi-Sunter Fellegi-Sunter decision rule: w ij = log P( γ ij ij = 1 ) P ( γ ij ij = 0 ) ˆ ij = 1, if w ij > t M ; (link) 0, if w ij < t U ; (non-link) R, otherwise. ( reject ) Fellegi and Sunter (1969): this rule minimizes P( ˆ ij = R) subject to some nominal error levels Caveat: the comparison vector γ ij alone determines the linkage decision
12 Record Linkage à la Fellegi-Sunter Fellegi-Sunter decision rule: w ij = log P( γ ij ij = 1 ) P ( γ ij ij = 0 ) ˆ ij = 1, if w ij > t M ; (link) 0, if w ij < t U ; (non-link) R, otherwise. ( reject ) Fellegi and Sunter (1969): this rule minimizes P( ˆ ij = R) subject to some nominal error levels Caveat: the comparison vector γ ij alone determines the linkage decision
13 Record Linkage à la Fellegi-Sunter Fellegi-Sunter decision rule: w ij = log P( γ ij ij = 1 ) P ( γ ij ij = 0 ) ˆ ij = 1, if w ij > t M ; (link) 0, if w ij < t U ; (non-link) R, otherwise. ( reject ) Fellegi and Sunter (1969): this rule minimizes P( ˆ ij = R) subject to some nominal error levels Caveat: the comparison vector γ ij alone determines the linkage decision
14 Record Linkage à la Fellegi-Sunter Fellegi-Sunter decision rule: w ij = log P( γ ij ij = 1 ) P ( γ ij ij = 0 ) ˆ ij = 1, if w ij > t M ; (link) 0, if w ij < t U ; (non-link) R, otherwise. ( reject ) Fellegi and Sunter (1969): this rule minimizes P( ˆ ij = R) subject to some nominal error levels Caveat: the comparison vector γ ij alone determines the linkage decision
15 Record Linkage à la Fellegi-Sunter Comparison vector γ ij is a realization of a random vector Γ ij Mixture model implementation Γ ij ij = 1 iid G M Γ ij ij = 0 iid G U ij iid Bernoulli(p) used by Winkler (1980), Jaro (1989), Larsen and Rubin (2001) and many others Caveat: mixture components may not be associated with links and non-links!
16 Record Linkage à la Fellegi-Sunter Comparison vector γ ij is a realization of a random vector Γ ij Mixture model implementation Γ ij ij = 1 iid G M Γ ij ij = 0 iid G U ij iid Bernoulli(p) used by Winkler (1980), Jaro (1989), Larsen and Rubin (2001) and many others Caveat: mixture components may not be associated with links and non-links!
17 Record Linkage à la Fellegi-Sunter Comparison vector γ ij is a realization of a random vector Γ ij Mixture model implementation Γ ij ij = 1 iid G M Γ ij ij = 0 iid G U ij iid Bernoulli(p) used by Winkler (1980), Jaro (1989), Larsen and Rubin (2001) and many others Caveat: mixture components may not be associated with links and non-links!
18 Record Linkage à la Fellegi-Sunter Even if mixture model is successful, chains of links can happen using Fellegi-Sunter decision rule X 1 X
19 Record Linkage à la Fellegi-Sunter What went wrong? ij s are not independent!
20 Outline The Fellegi-Sunter Framework for Record Linkage Record Linkage Actual Target A Bayesian Approach to Bipartite Record Linkage Performance Comparison Take-Home Message
21 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 2 = 4, Z 3 = 8, Z 4 = 2 5
22 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 2 = 4, Z 3 = 8, Z 4 = 2 5
23 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 2 = 4, Z 3 = 8, Z 4 = 2 5
24 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 2 = 4, Z 3 = 8, Z 4 = 2 5
25 Terminology: Bipartite Matching 1 2 X 1 X 2 Representation of bipartite matchings 1 2 Z = (..., Z j,... ) for the records in the file X 2 { i, if i, j are coref.; Z j = n 1 + j, if no coref. in X 1. Z j Z j Example: Z 1 = 1, Z 2 = 4, Z 3 = 8, Z 4 = 2 5
26 Enforcing One-to-One Matching in Fellegi-Sunter Approach Jaro (1989) proposes to plug the weights w ij into a linear sum assignment problem max w ij ij i subject to ij {0, 1}, ij 1, j X 2, i j ij 1, i X 1. j
27 Outline The Fellegi-Sunter Framework for Record Linkage Record Linkage Actual Target A Bayesian Approach to Bipartite Record Linkage Performance Comparison Take-Home Message
28 Just Modify the Mixture Model! Z Prior on Bipartite Matchings Γ ij Z j = i iid G M Γ ij Z j i iid G U
29 Beta Prior on Bipartite Matchings Larsen (2005, 2010): Files overlap size n 12 is Beta-Binomial(n 2, α, β) All bipartite matchings with the same n 12 are equally likely
30 Models G M and G U for Comparison Vectors Assume comparisons are independent for both coreferent, and non-coreferent records Γ f ij Z j = i Multinomial(1, m f ) Γ f ij Z j i Multinomial(1, u f ) Flat priors for m f, u f, for all f
31 Estimation of the Bipartite Matching Z MCMC
32 Point Estimation From the posterior P(Z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with { L(Z j, Ẑj) = 0, if Z j = Ẑj (gets it right)
33 Point Estimation From the posterior P(Z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with L(Z j, Ẑj) = { 0, if Z j = Ẑj; λ R, if Ẑj = R (does not output a decision, reject )
34 Point Estimation From the posterior P(Z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with 0, if Z j = Ẑj; L(Z j, Ẑj) = λ R, if Ẑj = R; λ FNM, if Z j n 1, Ẑj = j (false non-match)
35 Point Estimation From the posterior P(Z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with L(Z j, Ẑj) = 0, if Z j = Ẑj; λ R, if Ẑj = R; λ FNM, if Z j n 1, Ẑj = j; λ FM1, if Z j = j, Ẑj n 1 (false match type 1)
36 Point Estimation From the posterior P(Z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with L(Z j, Ẑj) = 0, if Z j = Ẑj; λ R, if Ẑj = R; λ FNM, if Z j n 1, Ẑj = n 1 + j; λ FM1, if Z j = n 1 + j, Ẑj n 1 ; λ FM2, if Z j, Ẑj n 1, Z j Ẑj (false match type 2) For generic applications λ FNM = λ FM1 = 1 and λ FM2 = 2 works well
37 Point Estimation From the posterior P(Z γ) we obtain Bayes estimates under the loss function L(Z, Ẑ) = j X 2 L(Z j, Ẑj), with L(Z j, Ẑj) = 0, if Z j = Ẑj; λ R, if Ẑj = R; λ FNM, if Z j n 1, Ẑj = n 1 + j; λ FM1, if Z j = n 1 + j, Ẑj n 1 ; λ FM2, if Z j, Ẑj n 1, Z j Ẑj (false match type 2) For generic applications λ FNM = λ FM1 = 1 and λ FM2 = 2 works well
38 Point Estimation THEOREM. If λ FM2 λ FM1 2λ R > 0 and λ FNM 2λ R in the additive loss function, the Bayes estimate of the bipartite matching can be obtained from Ẑ = (Ẑ1,..., Ẑn 2 ), where i, if P(Z j = i γ obs ) > 1 λ R + λ FM2 λ FM1 P(Z λ FM1 λ FM1 j / {i, n 1 + j} γ obs ); Ẑ j = n 1 + j, if P(Z j = n 1 + j γ obs ) > 1 λ R ; λ FNM R, otherwise.
39 Outline The Fellegi-Sunter Framework for Record Linkage Record Linkage Actual Target A Bayesian Approach to Bipartite Record Linkage Performance Comparison Take-Home Message
40 Simulation Setup Different scenarios of files overlap and measurement error 100 pairs of synthetic datafiles for each scenario Each datafile has 500 records Four fields: given and family names, age and occupation
41 Results with Full Assignments Overlap 100% Overlap 50% Overlap 10% Fellegi Sunter Precision / Recall Precision / Recall Precision / Recall Number of Erroneous Fields Number of Erroneous Fields Number of Erroneous Fields Beta Record Linkage Precision / Recall Precision / Recall Precision / Recall Number of Erroneous Fields Number of Erroneous Fields Number of Erroneous Fields Solid lines refer to precision, dashed lines to recall, black lines show medians, and gray lines show first and 99th percentiles.
42 Results with Partial Assignments Fellegi Sunter Mixture Model Beta Record Linkage Full Estimates Partial Estimates Full Estimates Partial Estimates Positive Predictive Value / Negative PV PPV / NPV / Rejection Rate PPV / NPV PPV / NPV / RR Number of Erroneous Fields Number of Erroneous Fields Number of Erroneous Fields Number of Erroneous Fields Datafiles with 10% overlap. Solid lines refer to precision or positive predictive value (PPV), dashed lines to negative predictive value (NPV), dot-dashed lines to rejection rate (RR), black lines show medians, and gray lines show first and 99th percentiles.
43 Outline The Fellegi-Sunter Framework for Record Linkage Record Linkage Actual Target A Bayesian Approach to Bipartite Record Linkage Performance Comparison Take-Home Message
44 Take-Home Message The optimality of the Fellegi-Sunter decision rule depends on conditions that are not met in practice Mixture model implementation of Fellegi-Sunter has a number of weaknesses New methodology improves on existing methods and performs much better
45 Questions? For further details see: Sadinle (2016+). Bayesian Estimation of Bipartite Matchings for Record Linkage. Journal of the American Statistical Association (in press).
46 Levels of Disagreement Compute a measure of similarity and divide its range into levels of disagreement Level of Disagreement Field Similarity Measure Age Absolute Diff Race Binary Comp. Agree Disagree.. Example: Record Age... i j γ Age ij = 3 γ ij = (..., γ f ij,... )
47 Levels of Disagreement Compute a measure of similarity and divide its range into levels of disagreement Level of Disagreement Field Similarity Measure Age Absolute Diff Race Binary Comp. Agree Disagree.. Example: Record Age... i j γ Age ij = 3 γ ij = (..., γ f ij,... )
48 Levels of Disagreement Compute a measure of similarity and divide its range into levels of disagreement Level of Disagreement Field Similarity Measure Age Absolute Diff Race Binary Comp. Agree Disagree.. Example: Record Age... i j γ Age ij = 3 γ ij = (..., γ f ij,... )
49 Simulation Setup Table: Types of errors per field in the simulation study. Type of Error Fields Missing Edits OCR Keyboard Phonetic Given and Family Names Age and Occupation Table: Construction of disagreement levels in the simulation study. Levels of Disagreement Fields Similarity Given and Family Names Levenshtein 0 (0,.25] (.25,.5] (.5, 1] Age and Occupation Binary Agree Disagree
50 Simulation Setup Table: Types of errors per field in the simulation study. Type of Error Fields Missing Edits OCR Keyboard Phonetic Given and Family Names Age and Occupation Table: Construction of disagreement levels in the simulation study. Levels of Disagreement Fields Similarity Given and Family Names Levenshtein 0 (0,.25] (.25,.5] (.5, 1] Age and Occupation Binary Agree Disagree
51 Training Classifiers for Record Linkage Take sample of record pairs and find if they refer to the same entity or not (e.g. clerical review) Train your favorite classification model (logistic regression, SVM, random forest,...) Predict coreference status of the remaining record pairs Training assumes pairs are i.i.d. Method takes independent decisions: chains of links are possible
52 Training Classifiers for Record Linkage Take sample of record pairs and find if they refer to the same entity or not (e.g. clerical review) Train your favorite classification model (logistic regression, SVM, random forest,...) Predict coreference status of the remaining record pairs Training assumes pairs are i.i.d. Method takes independent decisions: chains of links are possible
A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage
A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage Mauricio Sadinle msadinle@stat.duke.edu Duke University Supported by NSF grants SES-11-30706 to Carnegie Mellon University and
More informationBayesian Estimation of Bipartite Matchings for Record Linkage
Bayesian Estimation of Bipartite Matchings for Record Linkage arxiv:1601.06630v1 [stat.me] 25 Jan 2016 Mauricio Sadinle Department of Statistical Science, Duke University, and National Institute of Statistical
More informationNew Prediction Methods for Tree Ensembles with Applications in Record Linkage
New Prediction Methods for Tree Ensembles with Applications in Record Linkage Samuel L. Ventura Rebecca Nugent Department of Statistics Carnegie Mellon University June 11, 2015 45th Symposium on the Interface
More informationA Generalized Fellegi Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems
arxiv:1205.3217v2 [stat.ap] 6 Feb 2013 A Generalized Fellegi Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems Mauricio Sadinle and Stephen E. Fienberg Carnegie Mellon
More informationLarge-scale Data Linkage from Multiple Sources: Methodology and Research Challenges
Large-scale Data Linkage from Multiple Sources: Methodology and Research Challenges John M. Abowd Associate Director for Research and Methodology and Chief Scientist, U.S. Census Bureau Based on NBER Summer
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationarxiv: v2 [stat.me] 27 Apr 2015
Vol. 00 (0000) 1 DOI: 0000 Entity Resolution with Empirically Motivated Priors Rebecca C. Steorts 1 arxiv:1409.0643v2 [stat.me] 27 Apr 2015 1 Visiting Assistant Professor, Department of Statistics, Carnegie
More informationMachine Learning and Record Linkage
Machine Learning and Record Linkage william.e.winkler@census.gov 26 August, 2011 1. Background on Problem 2. Methods of Record Linkage 3. Optimal Parameter Estimation 4. String Comparators for Typographical
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationStatistics 352: Spatial statistics. Jonathan Taylor. Department of Statistics. Models for discrete data. Stanford University.
352: 352: Models for discrete data April 28, 2009 1 / 33 Models for discrete data 352: Outline Dependent discrete data. Image data (binary). Ising models. Simulation: Gibbs sampling. Denoising. 2 / 33
More informationNeutral Bayesian reference models for incidence rates of (rare) clinical events
Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationarxiv: v1 [stat.co] 2 Mar 2014
SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication Rebecca C. Steorts Rob Hall Stephen E. Fienberg { beka, fienberg } @cmu.edu, Department of Statistics, Carnegie Mellon University,
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationAdaptive Multi-Modal Sensing of General Concealed Targets
Adaptive Multi-Modal Sensing of General Concealed argets Lawrence Carin Balaji Krishnapuram, David Williams, Xuejun Liao and Ya Xue Department of Electrical & Computer Engineering Duke University Durham,
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationApplication of Jaro-Winkler String Comparator in Enhancing Veterans Administrative Records
Application of Jaro-Winkler String Comparator in Enhancing Veterans Administrative Records Hyo Park, Eddie Thomas, Pheak Lim The Office of Data Governance and Analytics Department of Veterans Affairs FCSM,
More informationItemwise Conditionally Independent Nonresponse Modeling for Incomplete Multivariate Data 1
Itemwise Conditionally Independent Nonresponse Modeling for Incomplete Multivariate Data 1 Mauricio Sadinle Duke University and NISS Supported by NSF grant SES-11-31897 1 Joint work with Jerry Reiter What
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationREGRESSION ANALYSIS WITH LINKED DATA: PROBLEMS AND POSSIBLE SOLUTIONS.
STATISTICA, anno LXXV, n. 1, 2015 REGRESSION ANALYSIS WITH LINKED DATA: PROBLEMS AND POSSIBLE SOLUTIONS. Andrea Tancredi 1 Dipartimento di metodi e Modelli per l Economia, il territorio e la Finanza, Sapienza
More informationIntroduction to AI Learning Bayesian networks. Vibhav Gogate
Introduction to AI Learning Bayesian networks Vibhav Gogate Inductive Learning in a nutshell Given: Data Examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationBAYESIAN LOGISTIC REGRESSION WITH JARO-WINKLER STRING COMPARATOR SCORES PROVIDES SIZABLE IMPROVEMENT IN PROBABILISTIC RECORD MATCHING
BAYESIAN LOGISTIC REGRESSION WITH JARO-WINKLER STRING COMPARATOR SCORES PROVIDES SIZABLE IMPROVEMENT IN PROBABILISTIC RECORD MATCHING A Dissertation by DOMINIC ANDREW JANN Submitted to the Office of Graduate
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationProbabilistic Record Linkage and Deduplication after Indexing, Blocking, and Filtering
Probabilistic Record Linkage and Deduplication after Indexing, Blocking, and Filtering Jared S. Murray arxiv:1603.07816v1 [stat.me] 25 Mar 2016 March 12, 2016 Abstract Probabilistic record linkage, the
More informationWhat does Bayes theorem give us? Lets revisit the ball in the box example.
ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 26, 2015 Today: Bayes Classifiers Conditional Independence Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationBayesian Approaches to File Linking with Faulty Data
Bayesian Approaches to File Linking with Faulty Data by Nicole Dalzell Department of Statistical Science Duke University Date: Approved: Jerome P. Reiter, Supervisor Gale Boyd Katherine Heller Sayan Mukherjee
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationBayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)
Bayesian Classifiers, Conditional Independence and Naïve Bayes Required reading: Naïve Bayes and Logistic Regression (available on class website) Machine Learning 10-701 Tom M. Mitchell Machine Learning
More informationDeciding, Estimating, Computing, Checking
Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making:
More informationDeciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?
Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making:
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationBayesian Inference Technique for Data mining for Yield Enhancement in Semiconductor Manufacturing Data
Bayesian Inference Technique for Data mining for Yield Enhancement in Semiconductor Manufacturing Data Presenter: M. Khakifirooz Co-authors: C-F Chien, Y-J Chen National Tsing Hua University ISMI 2015,
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic
More informationHow Many Digits In A Handshake? National Death Index Matching With Less Than Nine Digits of the Social Security Number
How Many Digits In A Handshake? National Death Index Matching With Less Than Nine Digits of the Social Security Number Bryan Sayer, Social & Scientific Systems, Inc. 8757 Georgia Ave, STE 1200, Silver
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationEstimating Parameters
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 13, 2012 Today: Bayes Classifiers Naïve Bayes Gaussian Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationEM Algorithm & High Dimensional Data
EM Algorithm & High Dimensional Data Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Gaussian EM Algorithm For the Gaussian mixture model, we have Expectation Step (E-Step): Maximization Step (M-Step): 2 EM
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationBayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London
Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationBayesian Nonparametric Rasch Modeling: Methods and Software
Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar
More informationEstimating the Size of Hidden Populations using Respondent-Driven Sampling Data
Estimating the Size of Hidden Populations using Respondent-Driven Sampling Data Mark S. Handcock Krista J. Gile Department of Statistics Department of Mathematics University of California University of
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationDay 6: Classification and Machine Learning
Day 6: Classification and Machine Learning Kenneth Benoit Essex Summer School 2014 July 30, 2013 Today s Road Map The Naive Bayes Classifier The k-nearest Neighbour Classifier Support Vector Machines (SVMs)
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationStatistical Practice
Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationTopic 2: Logistic Regression
CS 4850/6850: Introduction to Machine Learning Fall 208 Topic 2: Logistic Regression Instructor: Daniel L. Pimentel-Alarcón c Copyright 208 2. Introduction Arguably the simplest task that we can teach
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationAlgorithm Independent Topics Lecture 6
Algorithm Independent Topics Lecture 6 Jason Corso SUNY at Buffalo Feb. 23 2009 J. Corso (SUNY at Buffalo) Algorithm Independent Topics Lecture 6 Feb. 23 2009 1 / 45 Introduction Now that we ve built an
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationApplications of Information Theory in Plant Disease Management 1
Applications of Information Theory in Plant Disease Management Gareth Hughes School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JG, UK We seek the truth, but, in this life, we usually
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More informationSTA102 Class Notes Chapter Logistic Regression
STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationLecture 11. Linear Soft Margin Support Vector Machines
CS142: Machine Learning Spring 2017 Lecture 11 Instructor: Pedro Felzenszwalb Scribes: Dan Xiang, Tyler Dae Devlin Linear Soft Margin Support Vector Machines We continue our discussion of linear soft margin
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationA union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling
A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics
More informationGenerative Model (Naïve Bayes, LDA)
Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning
More information