Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Similar documents
MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

CHAPTER 3: BAYESIAN DECISION THEORY

Lecture 12: Classification

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Maximum Likelihood Estimation (MLE)

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Artificial Intelligence Bayesian Networks

15-381: Artificial Intelligence. Regression and cross validation

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Classification Bayesian Classifiers

Pattern Classification

Ensemble Methods: Boosting

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Generative classification models

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Classification as a Regression Problem

Engineering Risk Benefit Analysis

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

Probability and Random Variable Primer

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Evaluation for sets of classes

Support Vector Machines

Speech and Language Processing

Support Vector Machines

Probability Theory (revisited)

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Natural Language Processing and Information Retrieval

Feature Selection: Part 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Statistical pattern recognition

Lecture 3: Probability Distributions

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Statistical Foundations of Pattern Recognition

Homework Assignment 3 Due in class, Thursday October 15

Which Separator? Spring 1

Composite Hypotheses testing

Vapnik-Chervonenkis theory

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

The big picture. Outline

Error Probability for M Signals

Lecture 14 (03/27/18). Channels. Decoding. Preview of the Capacity Theorem.

VQ widely used in coding speech, image, and video

Primer on High-Order Moment Estimators

Estimation: Part 2. Chapter GREG estimation

Bayesian Decision Theory

Course 395: Machine Learning - Lectures

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

Machine learning: Density estimation

Why BP Works STAT 232B

Multilayer Perceptron (MLP)

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Pattern Classification

Classification learning II

The exam is closed book, closed notes except your one-page cheat sheet.

Kernel Methods and SVMs Extension

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Gaussian process classification: a message-passing viewpoint

Support Vector Machines

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta

Probabilistic Graphical Models

Finding Dense Subgraphs in G(n, 1/2)

10-701/ Machine Learning, Fall 2005 Homework 3

Generalized Linear Methods

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Learning Theory: Lecture Notes

Chapter 2 Transformations and Expectations. , and define f

Hidden Markov Models

The Expectation-Maximization Algorithm

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

A be a probability space. A random vector

Expectation Maximization Mixture Models HMMs

Maximum likelihood. Fredrik Ronquist. September 28, 2005

Bayesian Decision Theory

Bayesian predictive Configural Frequency Analysis

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Channel Encoder. Channel. Figure 7.1: Communication system

Fuzzy Systems (2/2) Francesco Masulli

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

CIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example.

Hidden Markov Models

International Journal of Mathematical Archive-3(3), 2012, Page: Available online through ISSN

Transcription:

MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/ www.cld.astate.edu/ Copyrght Vasant Honavar, 006.

Notaton Let Y, Z denote Y Z sets of { Y, Y, Y3} ; Y Y, Y { Z, Z} ; Z Z, Z Y U Z Y, Y, Y3, Z, Z Y U Z Y Z Z random varables, Y Note the overloadng of and an unfortunate consequence of the set notaton 3 Y, Z Copyrght Vasant Honavar, 006.

Let Y, Z denote Y Smlarly, Eample: Copyrght Vasant Honavar, 006. z sets of Margnalzaton Y, z where the summaton s over all assgnment of values to random varables n Z Y Y z z Y { Y, Y, Y }, Z { Z, Z } Suppose all random varables are bnary. The jont dstrbuton over the varablesn Z U Y Margnalzaton over Y results n a the varablesn Z yeldng a z 3 random varables table of jont dstrbuton over entres. has 5 entres 3

Independence and Condtonal Independence If Let E, be a probablty space. Let A, A E. We say that the events A and A are A A A A A ndependent f and 0, A A A A are ndependen t If A A 0, A and A A A A are ndependent A 0 or A 0 or both, A A 0 If Copyrght Vasant Honavar, 006. 4

Independence and Condtonal Independence If for every subset k elements of A B { B... Bk } { A,..., A } k B... B C B C k j mutually ndependent gven C. j n obtaned by selectng k n f we have, we say that A,..., A n are Copyrght Vasant Honavar, 006. 5

Condtonal Independence s condtonally ndependent of Y gven Z f the probablty dstrbuton governng s ndependent of the value of Y gven the value of Z: Y, Z Z that s, f, y j, zk Y y j, Z zk Z zk Copyrght Vasant Honavar, 006. 6

Condtonal Independence Thunder s ndependent of an gven Lghtnng an, Lghtnng Thunder Lghtenng Thunder an 0, Lghtenng Thunder an, Lghtnng 0 Thunder Lghtenng 0 Thunder an 0, Lghtenng 0 Thunder 0 an, Lghtnng Thunder 0 Lghtenng Thunder 0 an 0, Lghtenng Thunder 0 an, Lghtnng 0 Thunder 0 Lghtenng 0 Thunder 0 an 0, Lghtenng 0 Thunder Copyrght Vasant Honavar, 006. 7

Let Z Independence and Condtonal Independence random varables on a Z,..., Z,... Z and are mutually ndependent gven W f n Z... Zn W Z W Z Z U W Z W f Z n n W Note that these represent sets of possble value assgnments be parwse dsjont sets of gven event space. and Z are ndependent. equatons, for all to random varables Copyrght Vasant Honavar, 006. 8

Independence ropertes of andom Varables Let W,, Y, Z be parwse dsjont sets of random varables on a gven event space. Let I a. I b. I c. I d. I, Y, Z That s, denote that and Z are ndependent U Z Y Y Z Y, or Y U Z Y, Z, Y I Y, Z,, Z, Y U W I, Z, Y, Z, Y U W I, Z U W, Y, Z, Y I, Z U Y, W I, Z, Y U W roof : Follows from defnton of ndependence. gven Y..Then : Copyrght Vasant Honavar, 006. 9

Epectaton and Varance Let : E be a random varable on a fnte probablty space E, and B E. The condtonal epectaton or epected value of gven E B The varance of Var e E B E e B e gven B s gven by B E B B e e The uncondtonal epectaton and varance correspond to the case B E n whch case we smply drop " B". e B B s Copyrght Vasant Honavar, 006. 0

Condtonal epectaton of random varables Epectaton of a random varable condtoned on a random varabley Note that ths denotes a set The defntons can be etended to the case where and Y replaced by sets of s E Y e E of equatons for possble values of Y. random varables. e Y e are Eample 0Y00.6 0Y0.3 Y00.4 Y0.7 E E Y Y 0 0.7 0 Y 0 0 + Y 0 0.4 Copyrght Vasant Honavar, 006.

ropertes of Epectaton and Varance Let If For any c s and If,... j has mean μ and varanceσ, and n s, be random varables and a, b, c... c j Var then a Var c B roof of these results s left as an eercse. B E B E B + b has mean aμ + b and varance a σ. E c B ce are ndependent gven B, then B c n Var be real numbers. B Copyrght Vasant Honavar, 006.

Learnng as Bayesan Inference robablty s the logc of Scence Jaynes Bayesan subjectve probablty provdes a bass for updatng belefs based on evdence By updatng belefs about hypotheses based on data, we can learn about the world. Bayesan framework provdes a sound probablstc bass for understandng many learnng algorthms and desgnng new algorthms Bayesan framework provdes several practcal reasonng and learnng algorthms Copyrght Vasant Honavar, 006. 3

Classfcaton usng Bayesan Decson Theory Consder the problem of classfyng an nstance nto one of two mutually eclusve classes or probablty of class probablty of class What s the probablty of error? gven the evdence gven the evdence error f f we choose we choose Copyrght Vasant Honavar, 006. 4

5 Copyrght Vasant Honavar, 006. Iowa State Unversty Mnmum Error Classfcaton [ ] ; We have :, mn whch yelds f Choose f Choose classfcaton error To mnmze error > >

Classfcaton usng Bayesan decson theory Choose f Choose f > >.e..e. Copyrght Vasant Honavar, 006. 6

Optmalty of Bayesan Decson ule We can show that the Bayesan classfer s optmal n that t s guaranteed to mnmze the probablty of msclassfcaton roof gven n class Copyrght Vasant Honavar, 006. 7

Optmalty of Bayes Decson ule Error Copyrght Vasant Honavar, 006. 8

9 Copyrght Vasant Honavar, 006. Iowa State Unversty Optmalty of the Bayes Decson ule + + + +, Applyng Bayes ule :,, e e d p d p p p p d p d p

Copyrght Vasant Honavar, 006. Optmalty of the Bayes Decson ule p p d + p p d e Because e e p d + p d s such that such that p mnmzed > and covers the by choosng > entre nput space, d 0

Optmalty of Bayes Decson ule The proof generalzes to multvarate nput spaces Smlar result can be proved n the case of dscrete as opposed to contnuous nput spaces replace ntegral over the nput space by sum Copyrght Vasant Honavar, 006.

Bayes Decson ule yelds Mnmum Error Classfcaton To mnmze classfcaton error Choose f Choose f whch yelds error mn > > [, ] Copyrght Vasant Honavar, 006.

Bayes Decson ule Behavor of Bayes decson rule as a functon of pror probablty of classes Copyrght Vasant Honavar, 006. 3

Bayes Optmal Classfer Classfcaton rule that guarantees mnmum error : If If classfcaton depends entrely on and Choose f Choose f, classfcaton depends entrely on Bayes classfcaton rule combnes the effect of the two terms optmally - so as to yeld mnmum error classfcaton. Generalzaton to multple classes > > and c arg ma j j Copyrght Vasant Honavar, 006. 4

5 Copyrght Vasant Honavar, 006. Iowa State Unversty Mnmum sk Classfcaton con otherwse Flp a f Choose f Choose rule that guarantees mnmum rsk : Classfcaton to class epected loss ncurred n assgnng when the correct classfcaton s to class rsk or cost assocated wth assgnng an nstance Let λ j j λ λ λ λ < < + +

λ j Mnmum sk Classfcaton rsk or cost assocated to class when the correct classfcaton s j Ordnarly λ λ and λ λ are postve cost of beng correct s less than the cost of error λ λ So we choose f > λ λ Otherwse choose Mnmum error classfcaton rule s a specal case : λ 0 f j and λ f j j wth assgnng an nstance Ths classfcaton rule can be shown to be optmal n that t s guaranteed to mnmze the rsk of msclassfcaton j Copyrght Vasant Honavar, 006. 6

Summary of Bayesan recpe for classfcaton λ j rsk or cost assocated wth assgnng an nstance Choose Choose Choose to class f f f when the correct classfcaton s λ λ λ λ λ λ λ λ Mnmum error classfcaton rule s a specal case : j > < > Otherwse choose Copyrght Vasant Honavar, 006. 7

Summary of Bayesan recpe for classfcaton The Bayesan recpe s smple, optmal, and n prncple, straghtforward to apply To use ths recpe n practce, we need to know and Because these probabltes are unknown, we need to estmate them from data or learn them! s typcally hgh-dmensonal v Need to estmate from lmted data Copyrght Vasant Honavar, 006. 8