Credal Classification

Size: px
Start display at page:

Download "Credal Classification"

Transcription

1 Credal Classification A. Antonucci, G. Corani, D. Maua Istituto Dalle Molle di Studi sull Intelligenza Artificiale Lugano (Switzerland) IJCAI-13

2 About the speaker PhD in Information Engineering (Politecnico di Milano, 2005). Since 2006: researcher at IDSIA ( Switzerland. Member of the IDSIA Imprecise Probability Group ( Probabilistic graphical models, data mining, imprecise probability.

3 Agenda 1 Estimating a multinomial 2 Naive Bayes and Naive Credal Classifier 3 Comparing credal and traditonal classifiers 4 Credal Model Averaging

4 Estimation for a categorical variable Class C, with sample space Ω C = {c 1,..., c m }. P (c j ) = θ j, θ = {θ 1,..., θ m }. n i.i.d. observations; n = {n 1,..., n m }. Multinomial likelihood: L(θ n) Max. likelihood estimator: ˆθ j = nj n. m j=1 θ nj j

5 Bayesian estimation: Dirichlet prior The prior expresses the beliefs about θ, before analyzing the data: π(θ) k j=1 θ stj 1 j. s > 0 is the equivalent sample size, which can be regarded as a number of hidden instances; t j is the proportion of hidden instances in category j.

6 Posterior distribution Obtained by multiplying likelihood and prior: π(θ n) j θ (nj+stj 1) j Dir posteriors are obtained from Dir priors (conjugacy). Taking expectations: P (c j n, t) = n j + st j n + s

7 Example n=10, n 1 =4, n 2 =6, s=1. The posterior estimate is sensitive on the choice of the prior: Prior 1 Prior 2 t 1 = 0.5 t 1 = 0.8 t 2 = 0.5 t 2 = 0.2 ˆθ 1 = = 0.41 ˆθ1 = = 0.44 Uniform prior is the most common choice.

8 Uniform prior looks non-informative but its estimates depend on the sample space. Sample space θ 1, θ 2 θ 1, θ 2, θ 3 Data n 1 = 4 n 1 = 4 n 2 = 6 n 2 = 6 - n 3 = 0 Estimate of θ 1 ˆθ1 = 4 + 1/ =.41 ˆθ1 = 4 + 1/ =.39 Non-informative priors are a lost cause L. Wasserman, normaldeviate.wordpress.com

9 Modelling prior-ignorance: the Imprecise Dirichlet Model (IDM) (Walley, 1996) The IDM is a convex set of Dirichlet prior. The set of admissible values for t j is: 0 < t j < 1 j j t j = 1 This is a model of prior ignorance : a priori nothing is known.

10 Learning from data with the IDM An upper and a lower posterior expectation are derived for each chance θ j. E(θ j n) = E(θ j n) = n j + st j inf 0<t j<1 n + s n j + st j sup 0<t j<1 n + s = n j n + s = n j + s n + s Upper and lower expectations do not depend on the sample space.

11 Example Binary variable, with n=10, n 1 =4, n 2 =6, s=1. The estimates of θ 1 are: Bayes Bayes IDM (t 1 = 0.5, t 2 = 0.5) (t 1 = 0.8, t 2 = 0.2) ˆθ 1 = ˆθ 1 = [ , ] =.409 =.436 = [.363,.454] The interval estimate of the IDM comprises the estimates obtained letting vary each t i within (0, 1).

12 On the IDM The estimates are imprecise, being characterized by an upper and a lower probability. They do not depend on the sample space. The gap between upper and lower probability narrows as more data become available. References: Walley, 1996, J. of the Royal Statistical Society. Series B. Bernard, 2009, Int. J. Approximate Reasoning.

13 Agenda 1 Estimating a multinomial 2 Naive Bayes and Naive Credal Classifier 3 Comparing credal and traditonal classifiers 4 Credal Model Averaging

14 Estimating a multinomial Naive Bayes and Naive Credal Classifier Comparing credal and traditonal classifiers Credal Model Averaging Classification Determine the type of Iris (class) on the basis of length and width (features) of sepal and petal. (a) setosa (b) virginica (c) versicolor Classification: to predict the class C of a given object, on the basis of features A = {A1,..., Ak }. Prediction: the most probable class a posteriori.

15 Naive Bayes (NBC) C A 1 A2 A 3 Naively assumes the features to be independent given the class. NBC is highly biased, but achieves good accuracy, especially on small data sets, thanks to low variance (Friedman,1997).

16 Naive Bayes (NBC) C A 1 A 2 A 3 Learns from data the joint probability of class and features, decomposed as the marginal probability of the classes and the conditional probability of each feature given the class.

17 Joint prior θ c,a : the unknown joint probability of class and features, which we want to estimate. Under naive assumption and Dirichlet prior, the joint prior is: P (θ c,a ) k θc st(c) θ st(a,c) (a c). c Ω C i=1 a Ω Ai where t(a, c) is the proportion of hidden instances with C = c and A i = a. Let vector t collect all the parameters t(c) and t(a, c). Thus, a joint prior is specified by t.

18 Likelihood and posterior The likelihood is like the prior, with coefficients st( ) replaced by the n( ). L(θ n) c C [ θ n(c) c k i=1 θ n(a,c) (a c) a A i The joint posterior P (θ c,a n, t) is like the likelihood, with coefficients n( ) replaced by st( ) + n( ). Once P (θ c,a n, t) is available, the classifier is trained. ].

19 Issuing a classification The value of the features is specified as a = (a i,..., a k ). where P (c a) = P (c) P (c) = k P (a i c) i=1 n(c) + st(c) n + s P (a i c) = n(a i, c) + st(a i, c) n c + st c. Prior-dependence : the most probable class varies with t.

20 Credal classifiers Induced using a set of priors (credal set). They separate safe instances from prior-dependent ones. On prior-dependent instances: they return a set of classes ( indeterminate classifications ), remaining robust though less informative.

21 IDM over the naive topology We consider a set of joint priors, factorized as: Class : 0 < t(c) < 1 c Ω c c t(c) = 1 A priori, 0 < P (c j ) < 1 j. Features given the class: t(a, c) = t(c) a, a c 0 < t(a, c) < t(c) a, c A priori, 0 < P (a c) < 1, c.

22 Naive Credal Classifier (NCC) Uses the IDM to specify a credal set of joint distributions and updates it into a posterior credal set. The posterior probability of class c ranges within an interval. Given feature observation a, class c each posterior of the credal set: credal-dominates c if for P (c a) > P (c a)

23 NCC and prior-dependent instances Credal-dominance is checked by solving an optimization problem. NCC eventually returns the non-dominated classes: a singleton on the safe instances a set on the prior-dependent ones. The next applications shows the gain in reliability due to returning indeterminate classifications on the prior-dependent instances.

24 Next From Naive Bayes to Naive Credal Classifier: Texture recognition

25 Texture recognition The OUTEX data sets (Ojala, 2002): 4500 images, 24 classes (textiles, carpets, woods..).

26 Features: Local Binary Patterns (Ojala, 2002) The gray level of each pixel is compared with that of its neighbors, resulting in a binary judgment (more intense/ less intense). Such judgments are collected in a string for each pixel.

27 Local Binary Patterns (2) Each string is then assigned to a single category. The categories group similar strings: e.g., is in the same category of for rotational invariance. There are 18 categories. For each image there are 18 features: the % of pixels assigned to each category.

28 Results (Corani et al., BMVC 2010) Accuracy of NBC: 92% (SVMs: 92.5%). NBC is highly accurate on the safe instances, but almost random on the prior-dependent ones. Safe Prior-dependent Amount% 95% 5% NBC: accuracy 94% 56% NCC: accuracy 94% 85% NCC: non-dom. classes 1 2.4

29 Different training set sizes (II) As n grows there is a decrease of both: the % of indet. classification; the number of classes returned when indeterminate. 100% % of indeterminate classifications 25 Classes returned when indeterminate 80% 20 60% 15 40% 10 20% 5 0% Training set size Training set size

30 Naive Bayes + rejection option vs Naive Credal Rejection option: reject an instance (no classification) if the probability of the most probable class is below a threshold p. Texture recognition: half of the prior-dependent instances classified by naive Bayes with probability > 90%. Rejection rule is not designed to detect prior-dependent instances!

31 Tree-augmented credal classifier (TAN) C C A 1 A 2 A 3 A 1 A 2 A 3 NBC TAN TAN improves over naive Bayes by weakening the independence assumption.

32 Tree-augmented credal classifier (TAN) C C A 1 A 2 A 3 A 1 A 2 A 3 NBC TAN Credal TAN detects instances unreliably classified by Bayesian TAN because of prior-dependence. References: Zaffalon, Reliable Computing, 2003 Corani and De Campos, Int. J. Appr. Reasoning, 2010.

33 Next From Naive Bayes to Naive Credal Classifier: Conservative treatment of missing data

34 Ignorance from missing data Usually, classifiers ignore missing data, assuming them to be MAR (missing at random). MAR: the probability of an observation to be missing does not depend on its value or on the value of other missing data. The MAR assumption cannot be tested on the incomplete data.

35 A non-mar example: a political poll. The right-wing supporters (R) sometimes refuse to answer; left-wing (L) supporters always answer. Vote Answer L L L L L L R R R - R - By ignoring missing data, P (R) = 1/4: underestimated!

36 Conservative treatment of missing data. Consider each possible completion of the data. Answer D1 D2 D3 D4 L L L L L L L L L L L L L L L R R R R R - L L R R - L R L R P (R) 1/6 1/3 1/3 1/2 P (R) [1/6, 1/2]; this interval includes the real value. Application to BNs: Ramoni and Sebastiani (2000).

37 Conservative Inference Rule (Zaffalon, 2005) MAR missing data are ignored. Non-MAR missing data are filled in all possible ways, both in the training and in the test data. The replacements exponentially grow with the missing data; yet polynomial time algorithms are available for naive Credal. Reference: Corani and Zaffalon, 2008, JMLR

38 The conservative treatment of missing data increase indeterminacy. Multiple classes are returned if the most probable class depends: on the prior specification or on the completion of the non-mar missing data. Declare each feature as MAR or non-mar depending on domain knowledge, for a good trade-off between robustness and informativeness.

39 Agenda 1 Estimating a multinomial 2 Naive Bayes and Naive Credal Classifier 3 Comparing credal and traditonal classifiers 4 Credal Model Averaging

40 Discounted-accuracy d-acc = 1 N N i=1 (accurate) i Z i accurate i : whether the set of classes returned on instance i contains the actual class; Z i : the number of classes returned on the i-th instance. For a traditional classification, d-acc equals the 0-1 accuracy.

41 Some properties d-acc = 1 N N i=1 (accurate) i Z i Among two accurate classifications, d-acc is higher for the more informative classifier. Among an accurate and an inaccurate classification, d-acc is higher for the accurate classifier, even if it has returned multiple classes (linear decrease). Discounted-accuracy satisfies some fundamental properties (Zaffalon et al., IJAR 2012)

42 Doctor random and doctor vacuous. Possible diseases:{a,b}. Doctor random: uniformly random diagnosis. Doctor vacuous: return both categories. Disease Random Vacuous Class. d-acc Class. d-acc A A 1 {A,B} 0.5 A B 0 {A,B} 0.5 B A 0 {A,B} 0.5 B B 1 {A,B} 0.5 d-acc

43 Doctor random vs doctor vacuous: the manager viewpoint. Disease Random Vacuous Class. d-acc Class. d-acc A A 1 {A,B} 0.5 A B 0 {A,B} 0.5 B A 0 {A,B} 0.5 B B 1 {A,B} 0.5 d-acc Assumption: the hospital profits a quantity of money proportional to the discounted-accuracy. After n visits, the profits are: doctor vacuous: n/2, with no variance. doctor random: expected n/2, with variance n/4.

44 Introducing utility Expected utility increases with the expected value of the rewards but decreases with their variance. Any risk-adverse manager prefers doctor vacuous over doctor random. And you prefer a vacuous over a random diagnosis! Idea: quantify such preference through a utility function (Zaffalon et al., IJAR 2012).

45 How to design the utility function Actual Predicted Utility A A 1 A B 0 Utility corresponds with accuracy for traditional classification consisting of a single class.

46 How to design the utility function Actual Predicted Utility A B,C 0 A A,B A wrong indeterminate classification has utility 0. The utility of an accurate but indeterminate classification consisting of two classes has to be larger than otherwise doctor random and doctor vacuous yield the same utility.

47 How to design the utility function Let this utility range between 0.65 (function u 65 ) or 0.8 (function u 80 ), then perform sensitivity analysis. Utility of credal classifiers and accuracy of determinate classifiers can be now compared. Such utility functions are numerically close to F1 and F2 metric (Del Coz et al., JLMR, 2009).

48 Comparing naive Bayes and naive Credal Wins on 55 UCI data sets: Utility naive Bayes naive Credal u u How to compare credal and traditional classifier was an open problem at AAAI 10!

49 Agenda 1 Estimating a multinomial 2 Naive Bayes and Naive Credal Classifier 3 Comparing credal and traditonal classifiers 4 Credal Model Averaging

50 Model uncertainty: an example with logistic regression. Given k features, we can design 2 k logistic regressors, each with a different feature set. Model uncertainty: different logistic regressors (with different feature sets) can fit similarly well the data. Considering only a single model leads to overconfident conclusions. Bayesian Model Averaging (BMA) combines the inferences of multiple models, using as weights the posterior probability of the models.

51 Bayesian Model Averaging (BMA) The posterior of class c under BMA is: P (c D) = m i M P (c m i, D)P (m i D) m i M P (c m i, D)P (D m i )P (m i ) D: available training data. m i is the i-th model within the model space M. P (m i ) is the prior over the models. Often BMA analysis is repeated under different prior over the models.

52 Independent Bernoulli (IB) prior over the models Prior probability of model m i, containing k i features: P (m i ) = θ ki (1 θ) k ki Assumptions: every covariate has the same prior probability of inclusion θ, which is the only parameter; the probability of inclusion of each covariate is independent from the others. Uniform prior over the models: θ = 0.5.

53 Credal model averaging (CMA) Represents the prior over the models by a credal set. The prior probability of inclusion of each covariate varies in [θ, θ]. Ignorance a priori can be represented by setting θ = ɛ θ = 1 ɛ CMA automates sensitivity analysis on the value of θ.

54 Upper and lower probability of a class Lower posterior probability of class c 1 : min θ [θ,θ] m i M P (c 1 D, x) = min θ [θ,θ] m i M P (c 1 D, x, m i )P (m i D) = P (D m i )θ ki (1 θ) k ki P (c 1 D, x, m i ) m P (D m j M j)θ kj (1 θ) k kj The upper is found by maximization of the above expression. Such optimizations are analytically solved.

55 Credal model averaging (CMA) The prior over the models is modelled by a credal set. The single models (logistic regressors with different feature sets) are learned in a traditional Bayesian way. CMA is thus a credal ensemble of Bayesian regressors.

56 Indeterminate classifications Safe instances: the most probable class remains the same, regardless the prior over the models. CMA and BMA return the same class if θ [θ, θ]. Prior-dependent instances: the most probable class varies with θ. CMA suspend the judgment, returning a set of classes.

57 Predicting presence or absence of marmot burrows Study area: 9500 cells (100m 2 each) within the Stelvio National Park (Italy). Presence of burrows in about 4.5% of the cells. Features available for each cell : altitude, slope, aspect (the direction in which the slope faces), soil temperature, soil cover, etc.

58 Experiments: comparing BMA and CMA BMA prior over the models: uniform (θ=0.5) hierarchical (θ as a random variable, with uniform prior). Training sets of size n {30, 60, 90..., 300}. 30 experiments for each sample size. For CMA we set θ = 0.95 and θ = 0.05 (ignorance a priori).

59 BMA accuracy Regardless the adopted prior over the models: BMA is highly accurate on the safe instances but... almost a random guesser on the prior-dependent ones! BMA: accuracy safe prior dependent BMA: accuracy safe prior dependent training set dimension training set dimension Figure: Left: uniform prior. Right: hierarchical prior.

60 Credal model averaging (CMA) Allows to easily compute sensitivity analysis regarding the coefficients of the covariates (upper / lower value) by solving optimization problem. The solution is analitically done. CMA automates sensitivity analysis.

61 Other ensembles using a credal set for the prior probability of the models. Credal ensemble of naive Bayes (Corani and Zaffalon, ECML-PKDD 2008.) Credal ensemble with compression coefficients (Corani and Antonucci, Comp. Statistics & Data Analysis, in press)

62 Other imprecise probability classifiers. Some useful pointers: Credal KNN: Destercker, Soft Computing 2012 Credal regression trees: Abellan and Moral, Int. J. Appr. Reasoning, Abellan et al., Comp. Statistics & Data Analysis, 2013 Classification of temporal sequences: Antonucci et al., Proc. ISIPTA 13 (Symp. on Imprecise Probability)

63 Conclusions Credal classifiers suspend the judgment on prior-dependent instances. Prior-dependent instances are not effectively identified by the rejection option. Credal classifiers robustly deal with the specification of the prior and with non-mar missing data. Typically, the accuracy of traditional classifiers drops on the instances indeterminately classified by credal classifiers.

64 Open problems Multi-label classification. Dealing by credal set with the prior on the model parameters and on the prior over the models.

Bayesian Networks with Imprecise Probabilities: Classification

Bayesian Networks with Imprecise Probabilities: Classification Bayesian Networks with Imprecise Probabilities: Classification Alessandro Antonucci and Giorgio Corani {alessandro,giorgio}@idsia.ch IDSIA - Switzerland AAAI-10 A companion chapter A companion chapter

More information

It provides a complete overview of credal networks and credal Cassio P. de Campos, Alessandro Antonucci, Giorgio Corani classifiers.

It provides a complete overview of credal networks and credal Cassio P. de Campos, Alessandro Antonucci, Giorgio Corani classifiers. Bayesian estimation vs Imprecise Dirichlet Model From Naive Bayes to Naive Credal Classifier Further classifiers Conclusions Bayesian estimation vs Imprecise Dirichlet Model From Naive Bayes to Naive Credal

More information

Credal Ensembles of Classifiers

Credal Ensembles of Classifiers Credal Ensembles of Classifiers G. Corani a,, A. Antonucci a a IDSIA Istituto Dalle Molle di Studi sull Intelligenza Artificiale CH-6928 Manno (Lugano), Switzerland Abstract It is studied how to aggregate

More information

Likelihood-Based Naive Credal Classifier

Likelihood-Based Naive Credal Classifier 7th International Symposium on Imprecise Probability: Theories and Applications, Innsbruck, Austria, 011 Likelihood-Based Naive Credal Classifier Alessandro Antonucci IDSIA, Lugano alessandro@idsia.ch

More information

Continuous updating rules for imprecise probabilities

Continuous updating rules for imprecise probabilities Continuous updating rules for imprecise probabilities Marco Cattaneo Department of Statistics, LMU Munich WPMSIIP 2013, Lugano, Switzerland 6 September 2013 example X {1, 2, 3} Marco Cattaneo @ LMU Munich

More information

Bayesian Networks with Imprecise Probabilities: Theory and Application to Classification

Bayesian Networks with Imprecise Probabilities: Theory and Application to Classification Bayesian Networks with Imprecise Probabilities: Theory and Application to Classification G. Corani 1, A. Antonucci 1, and M. Zaffalon 1 IDSIA, Manno, Switzerland {giorgio,alessandro,zaffalon}@idsia.ch

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Alessandro Antonucci Marco Zaffalon IDSIA, Istituto Dalle Molle di Studi sull

More information

Generalized Loopy 2U: A New Algorithm for Approximate Inference in Credal Networks

Generalized Loopy 2U: A New Algorithm for Approximate Inference in Credal Networks Generalized Loopy 2U: A New Algorithm for Approximate Inference in Credal Networks Alessandro Antonucci, Marco Zaffalon, Yi Sun, Cassio P. de Campos Istituto Dalle Molle di Studi sull Intelligenza Artificiale

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Robust Bayesian Model Averaging for the analysis of presence-absence data.

Robust Bayesian Model Averaging for the analysis of presence-absence data. Noname manuscript No. (will be inserted by the editor) Robust Bayesian Model Averaging for the analysis of presence-absence data. G. Corani A. Mignatti the date of receipt and acceptance should be inserted

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Fast Non-Parametric Bayesian Inference on Infinite Trees

Fast Non-Parametric Bayesian Inference on Infinite Trees Marcus Hutter - 1 - Fast Bayesian Inference on Trees Fast Non-Parametric Bayesian Inference on Infinite Trees Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2,

More information

Learning Extended Tree Augmented Naive Structures

Learning Extended Tree Augmented Naive Structures Learning Extended Tree Augmented Naive Structures Cassio P. de Campos a,, Giorgio Corani b, Mauro Scanagatta b, Marco Cuccu c, Marco Zaffalon b a Queen s University Belfast, UK b Istituto Dalle Molle di

More information

On the Complexity of Strong and Epistemic Credal Networks

On the Complexity of Strong and Epistemic Credal Networks On the Complexity of Strong and Epistemic Credal Networks Denis D. Mauá Cassio P. de Campos Alessio Benavoli Alessandro Antonucci Istituto Dalle Molle di Studi sull Intelligenza Artificiale (IDSIA), Manno-Lugano,

More information

Hierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables

Hierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables Hierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables Laura Azzimonti IDSIA - SUPSI/USI Manno, Switzerland laura@idsia.ch Giorgio Corani IDSIA - SUPSI/USI Manno,

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Approximate Credal Network Updating by Linear Programming with Applications to Decision Making

Approximate Credal Network Updating by Linear Programming with Applications to Decision Making Approximate Credal Network Updating by Linear Programming with Applications to Decision Making Alessandro Antonucci a,, Cassio P. de Campos b, David Huber a, Marco Zaffalon a a Istituto Dalle Molle di

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Learning Extended Tree Augmented Naive Structures

Learning Extended Tree Augmented Naive Structures Learning Extended Tree Augmented Naive Structures de Campos, C. P., Corani, G., Scanagatta, M., Cuccu, M., & Zaffalon, M. (2016). Learning Extended Tree Augmented Naive Structures. International Journal

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Imprecise Hierarchical Dirichlet model with applications

Imprecise Hierarchical Dirichlet model with applications Imprecise Hierarchical Dirichlet model with applications Alessio Benavoli Dalle Molle Institute for Artificial Intelligence (IDSIA), Manno, Switzerland, alessio@idsia.ch Abstract Many estimation problems

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Reliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks

Reliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks Reliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks Sabina Marchetti Sapienza University of Rome Rome (Italy) sabina.marchetti@uniroma1.it Alessandro Antonucci IDSIA Lugano (Switzerland)

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Part 3 Robust Bayesian statistics & applications in reliability networks

Part 3 Robust Bayesian statistics & applications in reliability networks Tuesday 9:00-12:30 Part 3 Robust Bayesian statistics & applications in reliability networks by Gero Walter 69 Robust Bayesian statistics & applications in reliability networks Outline Robust Bayesian Analysis

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

An Ensemble of Bayesian Networks for Multilabel Classification

An Ensemble of Bayesian Networks for Multilabel Classification Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence An Ensemble of Bayesian Networks for Multilabel Classification Antonucci Alessandro, Giorgio Corani, Denis Mauá,

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Statistical comparison of classifiers through Bayesian hierarchical modelling

Statistical comparison of classifiers through Bayesian hierarchical modelling Noname manuscript No. (will be inserted by the editor) Statistical comparison of classifiers through Bayesian hierarchical modelling Giorgio Corani Alessio Benavoli Janez Demšar Francesca Mangili Marco

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

Conservative Inference Rule for Uncertain Reasoning under Incompleteness

Conservative Inference Rule for Uncertain Reasoning under Incompleteness Journal of Artificial Intelligence Research 34 (2009) 757 821 Submitted 11/08; published 04/09 Conservative Inference Rule for Uncertain Reasoning under Incompleteness Marco Zaffalon Galleria 2 IDSIA CH-6928

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Chapter 6 Classification and Prediction (2)

Chapter 6 Classification and Prediction (2) Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

The Minimum Message Length Principle for Inductive Inference

The Minimum Message Length Principle for Inductive Inference The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance Probal Chaudhuri and Subhajit Dutta Indian Statistical Institute, Kolkata. Workshop on Classification and Regression

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information