Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Similar documents
INTRODUCTION TO MACHINE LEARNING 3RD EDITION

CHAPTER 3: BAYESIAN DECISION THEORY

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Evaluation for sets of classes

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

/ n ) are compared. The logic is: if the two

Homework Assignment 3 Due in class, Thursday October 15

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Statistics for Economics & Business

Machine learning: Density estimation

Lecture 4 Hypothesis Testing

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Ensemble Methods: Boosting

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Statistics II Final Exam 26/6/18

Lecture 6: Introduction to Linear Regression

Modeling and Simulation NETW 707

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

CS286r Assign One. Answer Key

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Chapter 14 Simple Linear Regression

Classification Bayesian Classifiers

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Stat 543 Exam 2 Spring 2016

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

STATISTICS QUESTIONS. Step by Step Solutions.

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

3.1 ML and Empirical Distribution

Statistical pattern recognition

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Boostrapaggregating (Bagging)

Stat 543 Exam 2 Spring 2016

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Economics 130. Lecture 4 Simple Linear Regression Continued

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

MDL-Based Unsupervised Attribute Ranking

Comparison of Regression Lines

Basic Business Statistics, 10/e

Basically, if you have a dummy dependent variable you will be estimating a probability.

28. SIMPLE LINEAR REGRESSION III

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

x = , so that calculated

Midterm Examination. Regression and Forecasting Models

10-701/ Machine Learning, Fall 2005 Homework 3

Maximum Likelihood Estimation (MLE)

Chapter 3 Describing Data Using Numerical Measures

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Probability Theory. The nth coefficient of the Taylor series of f(k), expanded around k = 0, gives the nth moment of x as ( ik) n n!

11 Bayesian Methods. p(w D) = p(d w)p(w) (1)

18. SIMPLE LINEAR REGRESSION III

Lecture Notes on Linear Regression

Limited Dependent Variables

Linear Approximation with Regularization and Moving Least Squares

Correlation and Regression

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Chapter 5 Multilevel Models

Lecture 12: Classification

Learning with Maximum Likelihood

Mixture o f of Gaussian Gaussian clustering Nov

CS-433: Simulation and Modeling Modeling and Probability Review

Statistics for Business and Economics

Kernel Methods and SVMs Extension

Classification as a Regression Problem

The written Master s Examination

Hidden Markov Models

Lecture 6 More on Complete Randomized Block Design (RBD)

Bayesian Planning of Hit-Miss Inspection Tests

Statistics Chapter 4

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

EGR 544 Communication Theory

The big picture. Outline

STAT 3008 Applied Regression Analysis

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Expectation Maximization Mixture Models HMMs

} Often, when learning, we deal with uncertainty:

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

First Year Examination Department of Statistics, University of Florida

e i is a random error

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Copyright (C) 2008 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of the Creative

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

Transcription:

Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu

Bayesan Learnng ombnes pror knowledge wth evdence to make predctons Optmal albet mpractcal classfer Naïve Bayes classfer practcal Assumes ndependence among features Assocaton rule mnng

Bayes Rule P p P p s the class, 1 K s the feature vector of an nstance P = probablty that nstance belongs to class posteror p = probablty that an nstance drawn from class would be lkelhood P = probablty of class pror p = probablty of nstance evdence Thomas Bayes posteror lkelhood pror evdence

Intuton behnd dfferent Probabltes Pror probablty: Knowledge we have as to the value of before lookng at observables Lkelhood probablty: ondtonal probablty that an event belongng to has the assocated observaton value Data tells us regardng the class Evdence: Margnal probablty that an observaton s seen

lassfy nstance as class such that Snce only nterested n mamum, can gnore denomnator p If pror probablty dstrbuton of classes s unform, then can gnore P Bayes lassfer arg ma 1 k K k P arg ma 1 k k K k P p arg ma 1 k K k p posteror p P p P evdence pror lkelhood posteror

Eample p P P p lkelhood pror posteror evdence Suppose you want to propose a grl, and you know the probablty of her sayng yes, gven the grl s above 24 years of age. Now let probablty of the grl beng above 24 years of age be P. So s the event of the grl beng older than 24 years old. What s ths probablty P? A Lkelhood B Posteror Evdence D Pror Snce ths nformaton s known to you beforehand, ths s pror.

Eample P p P p Let Y be the event of a grl sayng yes to you. So PY s the probablty of a grl no age constrant here, sayng yes to you. What s ths probablty? A Lkelhood B Posteror Evdence Ths s called evdence, smply because you get to see the results of ths, or you wtness ths event happenng. Hence, ths s evdence to you.

Eample P p P p PY/ s the probablty of a grl sayng yes to you, gven she s older than 24 years. Or how lkely s t for a grl older than 24, to say yes to you. What form of probablty s ths? A Lkelhood B Posteror Evdence It s lkelhood

Eample P p P p P/Y s the probablty of observng a grl beng greater than 24 years old, gven she has already sad yes to you. What form of probablty s ths? A Lkelhood B Posteror Evdence Snce you can not know ths nformaton, wthout proposng to the grl frst, ths s called posteror Ths reflect the dfferent concepts of Bayes theorem: posteror = lkelhood * pror / evdence

Bayes lassfer Practcal ssue p s a jont probablty dstrbuton Need to know the probablty of every possble nstance gven every possble class Even for D boolean features and K classes, that s K*2 D probabltes Soluton Assume features are ndependent of each other p,,..., p 1 2 D j j 1 D

Gven tranng set Estmate probabltes from lassfy new nstance as class such that Naïve Bayes lassfer * arg ma 1 1 k j D j k K k p P }, { r r P }, { }, { j j r r v and r r v p

Naïve Bayes lassfer Another practcal ssue What f j s a contnuous feature? Soluton #1 Assume some parameterzed dstrbuton for j E.g., normal Learn parameters of dstrbuton from data E.g., mean and varance of j values Soluton #2 Dscretze feature E.g., prce R to prce {low, medum, hgh}

Yet another practcal ssue What f no eamples n class have j = v? Soluton Naïve Bayes lassfer 0 * 1 j D j p P 0 j v p }, { 1 }, { j j j doman r r v and r r v p

Naïve Bayes lassfer Independence assumpton rarely true E.g., s prce ndependent from engne power? Naïve Bayes classfer stll does surprsngly well Smple, effectve baselne for other learners

Accuracy Sdebar: The Learnng urve 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Dvde data nto tranng and testng Learn hypothess on ncreasng percentages of tranng data ompute accuracy of each hypothess on testng data Plot accuracy versus percentage of tranng data used Interested n convergence rate and plateau 0 0 10 20 30 40 50 60 70 80 90 100 Percent Tranng

Accuracy Learnng urve 1 NaveBayes vs. OneR on Labor Data 2/3-1/3 splt 0.9 0.8 0.7 0.6 0.5 0.4 0.3 NaveBayes OneR 0.2 0.1 0 10 20 30 40 50 60 70 80 90 100 Percent Tranng

Assocaton Rules Assocaton rule: Y People who buy/clck/vst/enjoy are also lkely to buy/clck/vst/enjoy Y A rule mples assocaton, not necessarly causaton

Assocaton measures Support Y: onfdence Y: Lft Y: customers and customers who bought # #, Y Y P Y P Y P Y P who bought customers and who bought customers # #,, P Y P Y P Y P Y P

Eample Gven a set of transactons, fnd rules that wll predct the occurrence of an tem based on the occurrences of other tems n the transacton Market Based Transactons TID Items 1 Bread, Mlk 2 Bread, Daper, Beer, Eggs 3 Mlk, Daper, Beer, oke 4 Bread, Mlk, Daper, Beer 5 Bread, Mlk, Daper, oke {Daper} {Beer}, {Mlk, Bread} {Eggs, oke}, {Beer, Bread} {Mlk},

Eample Assocaton Rule An mplcaton epresson of the form Y, where and Y are temsets Eample: {Mlk, Daper} {Beer} Rule Evaluaton Metrcs Support s Fracton of transactons that contan both and Y onfdence c Measures how often tems n Y appear n transactons that contan

Sgnfcance of Assocaton measures onfdence ondtonal probablty Value should be close to 1 Strength of the rule, rule holds enough confdence Support Statstcal sgnfcance of the rule Strong confdence value but # of such customers s small, rule s worthless Mnmum support and confdence are set by the user/entty Rules wth hgher support and confdence are searched for n the database

Assocaton Rules In general, and Y can be a sets of tems Basket analyss E.g., customers buyng hot dogs and buns are more lkely to buy mustard and catsup Assocaton rule mnng Gven database of customer purchases Fnd all assocaton rules wth hgh support and confdence Apror algorthm [Agrawal et al., 1996]

Apror Algorthm Agrawal et al., 1996 For,Y,Z, a 3-tem set, to be frequent have enough support,,y,,z, and Y,Z should be frequent If,Y s not frequent, none of ts supersets can be frequent Once we fnd the frequent k-tem sets, we convert them to rules:, Y Z,... and Y, Z,...

Apror Algorthm Fnd all temsets wth enough support If temset of sze k does not have enough support, then no superset of ths temset wll have enough support For each temset, fnd all assocaton rules Y wth enough confdence Rules of the form {A,B} {,D} can only be confdent f both {A,B,} {D} and {A,B,D} {} are confdent WEKA: Assocate

Summary: Bayesan Learnng Optmal learnng framework Incorporates background knowledge Practcal algorthms Naïve Bayes Assocaton rule mnng

One- Mnute Learnng What s Bayes Theorem? Provdes a way to calculate the probablty of a hypothess based on ts pror probabltes, the probabltes of observng varous data gven the hypothess, and the observed data tself What s dfference between posteror and pror probablty?.e., p or phd Posteror probablty reflects our confdence that hypothess holds after we have seen the tranng data It reflects the nfluence of the tranng data, n contrast to the pror probablty whch s ndependent of data

Practce Problem Does patent have cancer or not? A patent takes a lab test and results come back postve. The test returns a correct postve result n only 98% of the cases n whch the dsease s actually present, and a correct negatve results n only 97% of the cases n whch the dsease s actually not present. Furthermore, 0.008 of the entre populaton have ths cancer. Pcancer = P cancer = P+cancer = P --cancer = P+ cancer = P-- cancer = Hnts: Fnd the MAP hypothess. P+cancer. Pcancer = P+ cancer. P cancer =

Practce Problem Does patent have cancer or not? A patent takes a lab test and results come back postve. The test returns a correct postve result n only 98% of the cases n whch the dsease s actually present, and a correct negatve results n only 97% of the cases n whch the dsease s actually not present. Furthermore, 0.008 of the entre populaton have ths cancer. Pcancer = 0.008 P cancer = 0.992 P+cancer = 0.98 P --cancer =0.02 P+ cancer = 0.03 P-- cancer = 0.97 Hnts: Fnd the MAP hypothess. P+cancer. Pcancer =.98.008 =.0078 P+ cancer. P cancer =.03.992 =.0298. Thus, h MAP = cancer