Bayesian Concept Learning

Similar documents
Computational Cognitive Science

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

CS540 Machine learning L9 Bayesian statistics

Bayesian concept learning

Statistical learning. Chapter 20, Sections 1 3 1

Lecture : Probabilistic Machine Learning

CS340: Bayesian concept learning. Kevin Murphy Based on Josh Tenenbaum s PhD thesis (MIT BCS 1999)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Statistical learning. Chapter 20, Sections 1 4 1

Notes on Machine Learning for and

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Statistical learning. Chapter 20, Sections 1 3 1

Bayesian RL Seminar. Chris Mansley September 9, 2008

Introduction to Machine Learning

Mathematical Formulation of Our Example

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Naïve Bayes classification

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Bayesian Learning. Machine Learning Fall 2018

PMR Learning as Inference

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Probabilistic Graphical Models

Computational Cognitive Science

CMU-Q Lecture 24:

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Basic Probabilistic Reasoning SEG

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Lecture 7: Con3nuous Latent Variable Models

Massachusetts Institute of Technology

Bayesian probability theory and generative models

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Algorithmisches Lernen/Machine Learning

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

GAUSSIAN PROCESS REGRESSION

Bayes Theorem (10B) Young Won Lim 6/3/17

Bayes Theorem (4A) Young Won Lim 3/5/18

Machine Learning for Data Science (CS4786) Lecture 24

Generalisation and inductive reasoning. Computational Cognitive Science 2014 Dan Navarro

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

CSC321 Lecture 18: Learning Probabilistic Models

Bayesian Networks. Marcello Cirillo PhD Student 11 th of May, 2007

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Learning Extension

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Scribe to lecture Tuesday March

Machine Learning

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Machine Learning

Probability and Statistics

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Multivariate Bayesian Linear Regression MLAI Lecture 11

Learning with Probabilities

Bayesian Machine Learning

Density Estimation. Seungjin Choi

1 A Lower Bound on Sample Complexity

STA 414/2104, Spring 2014, Practice Problem Set #1

CHAPTER 10 Comparing Two Populations or Groups

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine Learning

Mixture of Gaussians Models

STA 4273H: Statistical Machine Learning

Frequentist Statistics and Hypothesis Testing Spring

Computational Cognitive Science

Introduction. Chapter 1

Probabilistic Graphical Models

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Some Concepts of Probability (Review) Volker Tresp Summer 2018

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Introduction to Bayesian Statistics

CS 6140: Machine Learning Spring 2016

Chapter 16. Structured Probabilistic Models for Deep Learning

The problem of base rates

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Forward algorithm vs. particle filtering

Introduction to Machine Learning

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

STA 4273H: Sta-s-cal Machine Learning

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Bayesian Analysis for Natural Language Processing Lecture 2

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Introduction to Bayesian Learning

Nuisance parameters and their treatment

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

An Introduction to No Free Lunch Theorems

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Bayesian Regression Linear and Logistic Regression

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Joint, Conditional, & Marginal Probabilities

Transcription:

Learning from positive and negative examples Bayesian Concept Learning Chen Yu Indiana University With both positive and negative examples, it is easy to define a boundary to separate these two. Just with positive examples, it is hard to define an upper bound. Learn from just positive examples Most pattern recognition methods are based on both positive and negative examples. But human learning is based on possible examples. For instance, positive instances of dog (but no negative example, this is not a dog). The challenge is how to generalize from limited data with only a few positive examples? The number example A few numbers between 1 and 100 as training data. Give a new number, decide whether it belongs to the same group or not. 1

Examples Given 16, what other numbers do you think are positive? 17? 6? 8? 32? 99? 17 is similar because it is close to 16. 6 is similar because it has a digit in common. 32 is similar because it is also even an a power of 2. 99 does not seem similar. Example Now suppose that we have 8, 2, and 64. You may guess that the hidden concept is powers of two. Empirical predictive distributions of humans Another Example 2

Hypothesis space The classic approach to rule-based induction is to suppose we have a hypothesis space of concepts, H, such as: odd number, even number, all numbers between 1 and 100, powers of two, etc. The space is initially big and we gradually reduce the space with more examples. However, if we see {16, 8, 2, 64} we can have three equally consistent hypotheses: - powers of two except for 32, - powers of two - all even numbers Generalization In the Bayesian approach, we maintain a probability distribution over hypotheses, P(h X), we call this posterior belief state. Assuming we have such a distribution, we can predict the future even when we are uncertain about the exact concept. Specifically, we can compute the posterior predictive distribution by marginalizing out the variable h: p ( X ) = h) h H This called model averaging. Inference In a noise-free example, y! " " #" $ % &' ( ) *+,! -. $ % & & ) / * +/,! - 0 132!! /, )54 p ( X ) = h H y 2! 76-! 8 8 $! 5" 9 * $! )!,* : * # * +; $ 5" Bayesian inference We can compute the posterior distribution as follows: X h) h) = X ) ) We therefore need to specify the prior h) and the likelihood function X h). 3

Likelihood When we see {16, 8, 2, 64}, we have a hypothesis h = even numbers. Then we expected to not see any numbers that weren t powers of two if the data was randomly sampled? Random Sampling The probability of independently sampling n items from h is given by: 1 n 1 n p ( X h) = [ ] = [ ] size( h) h Let X = 16, then X h=powers of two) = 1/6, since there are only 6 powers of two less than 100. A more general concept X h=even numbers) = 1/50. P(X h=odd numbers) = 0 After 4 examples, the likelihood of powers of two is 1/(6x6x6x6) while the likelihood of even numbers is 1/(50x50x50x50). However, the unnatural hypothesis powers of two except 32 has higher likelihood as well. Priors It is the combination of the likelihood and the prior that determines the posterior. The second hypothesis is much less likely than h. For example, given 1200, 1500, 900 and 1400, If you are told the number is based on an arithmetic rule, you may think 400 is likely but 1183 is unlikely. If you are told that the numbers are examples of healthy cholestrol levels, you would probably think 400 is unlikely and 1183 is likely. The prior is the mechanism by which background knowledge can be brought to bear. 4

Posterior The posterior is simply the likelihood times the prior, and then normalized. = X h) h) X ) ) More accurate Model The hypothesis set contains 100 mathematical concepts and 5050 magnitude hypotheses based on experimental data of how people measure similarity between numbers. 5

Bayesian Method 1. A constrained hypothesis space. Without this, it is impossible to generalize from a finite data set, because any hypothesis consistent with the evidence is possible. 2. An informative prior. The alternative is to have a uniform prior. 3. A likelihood function. E.g. P(X h) = 1 or 0. 4. Hypothesis averaging, integrating h when making predictions p ( X ) = h) h H Other Two Approaches ML: Maximum likelihood 1+ 3 MAP: maximum a posterior 1+2+3 instead of averaging all the hypotheses, select the most probable one. hˆ = arg max X ) = h h h) hˆ) 6