Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Similar documents
Bayes (Naïve or not) Classifiers: Generative Approach

Bayesian belief networks

Bayesian belief networks

CHAPTER VI Statistical Analysis of Experimental Data

Generative classification models

Chapter 14 Logistic Regression Models

Chapter 13 Student Lecture Notes 13-1

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Introduction to local (nonparametric) density estimation. methods

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Point Estimation: definition of estimators

Functions of Random Variables

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

ESS Line Fitting

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

ENGI 3423 Simple Linear Regression Page 12-01

Linear Regression with One Regressor

Dimensionality reduction Feature selection

Lecture Notes Types of economic variables

Kernel-based Methods and Support Vector Machines

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

6.867 Machine Learning

ρ < 1 be five real numbers. The

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Simple Linear Regression

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Parameter Estimation

LINEAR REGRESSION ANALYSIS

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Chapter 4 Multiple Random Variables

Naïve Bayes MIT Course Notes Cynthia Rudin

3. Basic Concepts: Consequences and Properties

BAYESIAN NETWORK AND ITS APPLICATION IN MAIZE DISEASES DIAGNOSIS

Lecture 8: Linear Regression

Machine Learning. knowledge acquisition skill refinement. Relation between machine learning and data mining. P. Berka, /18

Dimensionality Reduction and Learning

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Course Content. What is Classification? Chapter 4 Objectives

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Module 7: Probability and Statistics

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Maximum Likelihood Estimation

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations

Logistic regression (continued)

Evaluation of uncertainty in measurements

Chapter 8. Inferences about More Than Two Population Central Values

Supervised learning: Linear regression Logistic regression

An Introduction to. Support Vector Machine

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter 5 Properties of a Random Sample

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Applications of Multiple Biological Signals

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Part I: Background on the Binomial Distribution

Special Instructions / Useful Data

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

ENGI 4421 Propagation of Error Page 8-01

Probability and. Lecture 13: and Correlation

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Lecture 3. Sampling, sampling distributions, and parameter estimation

Continuous Distributions

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Statistics: Unlocking the Power of Data Lock 5

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Econometric Methods. Review of Estimation

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

ECON 5360 Class Notes GMM

residual. (Note that usually in descriptions of regression analysis, upper-case

Qualifying Exam Statistical Theory Problem Solutions August 2005

TESTS BASED ON MAXIMUM LIKELIHOOD

CODING & MODULATION Prof. Ing. Anton Čižmár, PhD.

Objectives of Multiple Regression

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Fitting models to data.

Simple Linear Regression

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

STK4011 and STK9011 Autumn 2016

X ε ) = 0, or equivalently, lim

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Multivariate Probability Estimation for Categorical Variables from Marginal Distribution Constraints

Chapter Statistics Background of Regression Analysis

A New Family of Transformations for Lifetime Data

Rademacher Complexity. Examples

Summary of the lecture in Biostatistics

A Combination of Adaptive and Line Intercept Sampling Applicable in Agricultural and Environmental Studies

Point Estimation: definition of estimators

Transcription:

Baa Classfcato CS6L Data Mg: Classfcato() Referece: J. Ha ad M. Kamber, Data Mg: Cocepts ad Techques robablstc learg: Calculate explct probabltes for hypothess, amog the most practcal approaches to certa types of learg problems Icremetal: Each trag example ca cremetally crease/decrease the probablty that a hypothess s correct. ror kowledge ca be combed wth observed data. robablstc predcto: redct multple hypotheses, weghted by ther probabltes Stadard: Eve whe Baa methods are computatoally tractable, they ca provde a stadard of optmal decso makg agast whch other methods ca be measured Baa Theorem: Bascs Let X be a data sample whose class label s ukow Let H be a hypothess that X belogs to class C For classfcato problems, determe (H/X): the probablty that the hypothess holds gve the observed data sample X (H): pror probablty of hypothess H (.e. the tal probablty before we observe ay data, reflects the backgroud kowledge) (X): probablty that sample data s observed (X H) : probablty of observg the sample X, gve that the hypothess holds Baa Theorem Gve trag data X, posteror probablty of a hypothess H, (H X) follows the Ba theorem Iformally, ths ca be wrtte as posteror lkelhood x pror / evdece MA (maxmum posteror) hypothess ractcal dffculty: requre tal kowledge of may probabltes, sgfcat computatoal cost Naïve Ba Classfer A smplfed assumpto: attrbutes are codtoally depedet: ( X C) ( xk C) k The product of occurrece of say elemets x ad x, gve the curret class s C, s the product of the probabltes of each elemet take separately, gve the same class ([y,y ],C) (y,c) * (y,c) No depedece relato betwee attrbutes Greatly reduces the computato cost, oly cout the class dstrbuto. Oce the probablty (X C ) s kow, assg X to the class wth maxmum (X C )*(C ) Class: C:buys_computer C:buys_computer o Data sample X (age<0, Icomemedum, Studet Credt_ratg Far) Trag dataset age come studet credt_ratg buys_computer <0 hgh o far o <0 hgh o excellet o 0 40 hgh o far >40 medum o far >40 low far >40 low excellet o 40 low excellet <0 medum o far o <0 low far >40 medum far <0 medum excellet 40 medum o excellet 40 hgh far >40 medum o excellet o

Naïve Baa Classfer: Example Compute (X/C) for each class (age <0 buys_computer ) /. (age <0 buys_computer o ) / 0.6 (come medum buys_computer ) 4/ 0.444 (come medum buys_computer o ) / 0.4 (studet buys_computer ) 6/ 0.667 (studet buys_computer o ) /0. (credt_ratg far buys_computer )6/.667 (credt_ratg far buys_computer o )/0.4 X(age<0,come medum, studet,credt_ratgfar) (X C) : (X buys_computer ) 0. x 0.444 x 0.667 x 0.0.667 0.044 (X buys_computer o ) 0.6 x 0.4 x 0. x 0.4 0.0 (X C)*(C ) : (X buys_computer ) * (buys_computer )0.08 (X buys_computer ) * (buys_computer )0.007 X belogs to class buys_computer Naïve Ba: Cotuous Value () Temp(ºF) Humdty Wdy Class suy Do't suy Do't 8 86 ray 6 ray 68 ray Do t suy 7 Do t suy 6 ray suy 7 8 ray 7 Do t Naïve Ba: Cotuous Value () Naïve Ba: Cotuous Value () temperature humdty wdy play temperature humdty wdy play o o o o o suy 4 0 8 86 6 6 ye s o o o o o ray 68 7 suy / / µ 7 74.6 µ 7. 86. 6/ / /4 /4 6 7 4/ 0/ σ 6. 7. σ 0..7 / / 7 ray / / 8 µ: mea ad σ stadard devato Naïve Ba: Cotuous Value (4) Naïve Ba: Cotuous Value () suy Temp 66 Humdty Wdy Class? f ( Temp 66 Class ) e π (6.) ( (7)) x (6.) 0.040 (Class suy Temp 66 Humdty Wdy ) f ( Humdty Class ) 0.0 (Class o suy Temp 66 Humdty Wdy ) Gaussa (Normal) Desty Fucto f ( x µ ) σ ( ) x e πσ 0.040 0.0 4 ( suy Temp 66 Humdty Wdy ) 0.00006 ( suy Temp cool Humdty hgh Wdy )

Naïve Ba: Cotuous Value (6) 0.0 0.0 4 o ( suy Temp 66 Humdty Wdy ) 0.0006 ( suy Temp cool Humdty hgh Wdy ) o 0.% 7.% Naïve Baa Classfer: Evaluato Advatages : Easy to mplemet Good results obtaed most of the cases Dsadvatages Assumpto: class codtoal depedece, therefore loss of accuracy ractcally, depedeces exst amog varables. E.g., hosptals: patets: rofle: age, famly hstory etc Symptoms: fever, cough etc., Dsease: lug cacer, dabetes etc Depedeces amog these caot be modeled by Naïve Baa Classfer How to deal wth these depedeces? Baa Belef Networks Baa Networks Baa Belef Network: A Example Baa belef etwork allows a subset of the varables codtoally depedet A graphcal model of causal relatoshps Represets depedecy amog the varables Gves a specfcato of jot probablty dstrbuto X Z Nodes: radom varables Lks: depedecy X, are the parets of Z, ad s the paret of No depedecy betwee Z ad Has o loops or cycles Famly Hstory LugCacer ostvexray Smoker Emphysema Dyspea Baa Belef Networks LC ~LC (FH, S) (FH, ~S) (~FH, S) (~FH, ~S) 0.8 0. 0. 0. 0.7 0. 0. 0. The codtoal probablty table for the varable LugCacer: Shows the codtoal probablty for each possble combato of ts parets ( z,..., z) ( z arets ( Z )) Learg Baa Networks Several cases Gve both the etwork structure ad all varables observable: lear oly the CTs Network structure kow, some hdde varables: method of gradet descet, aalogous to eural etwork learg Network structure ukow, all varables observable: search through the model space to recostruct graph topology Ukow structure, all hdde varables: o good algorthms kow for ths purpose redcto redcto s smlar to classfcato Frst, costruct a model Secod, use model to predct ukow value Major method for predcto s regresso Lear ad multple regresso No-lear regresso redcto s dfferet from classfcato Classfcato refers to predct categorcal class label redcto models cotuous-valued fuctos

redctve Modelg Databases redctve modelg: redct data values or costruct geeralzed lear models based o the database data. Oe ca oly predct value rages or category dstrbutos Method outle: Mmal geeralzato Attrbute relevace aalyss Geeralzed lear model costructo redcto Determe the major factors whch fluece the predcto Data relevace aalyss: ucertaty measuremet, etropy aalyss, expert judgmet, etc. Mult-level predcto: drll-dow ad roll-up aalyss Regress Aalyss ad Log-Lear Models redcto Lear regresso: α + β X Two parameters, α ad β specfy the le ad are to be estmated by usg the data at had. usg the least squares crtero to the kow values of,,, X, X,. Multple regresso: b0 + b X + b X. May olear fuctos ca be trasformed to the above. Log-lear models: The mult-way table of jot probabltes s approxmated by a product of lower-order tables. robablty: p(a, b, c, d) αab βacχad δbcd Regresso Lear regresso Method of Least Squares rcple: Mmze [ ˆ (X ) - ] Assume ˆ (X ) Mmze [ -(a+bx )] a + bx (lear relatoshp) Mmzg w.r.t. a a + b Mmzg w.r.t. b X X a X + b X () () Lear regresso Solvg () ad () we get: b X X X ( X ) a ( b X ) X (years) 8 6 6 Lear Regresso: Example (Salary) 0 7 7 6 4 0 8 X X. y. 4 X X ( X X )( ) b X ( X ) ( X X ) a b X (.)(0.4)+(8.)(7.4) +(6.)(8.4) b ------------------------------------------------------------------------- (.) + (8.) + + (6.) a.4 (.)(.).6

Lear regresso Determe lear regresso le, wth learly depedet o X