Credal Classification
|
|
- Oliver Jones
- 5 years ago
- Views:
Transcription
1 Credal Classification A. Antonucci, G. Corani, D. Maua Istituto Dalle Molle di Studi sull Intelligenza Artificiale Lugano (Switzerland) IJCAI-13
2 About the speaker PhD in Information Engineering (Politecnico di Milano, 2005). Since 2006: researcher at IDSIA ( Switzerland. Member of the IDSIA Imprecise Probability Group ( Probabilistic graphical models, data mining, imprecise probability.
3 Agenda 1 Estimating a multinomial 2 Naive Bayes and Naive Credal Classifier 3 Comparing credal and traditonal classifiers 4 Credal Model Averaging
4 Estimation for a categorical variable Class C, with sample space Ω C = {c 1,..., c m }. P (c j ) = θ j, θ = {θ 1,..., θ m }. n i.i.d. observations; n = {n 1,..., n m }. Multinomial likelihood: L(θ n) Max. likelihood estimator: ˆθ j = nj n. m j=1 θ nj j
5 Bayesian estimation: Dirichlet prior The prior expresses the beliefs about θ, before analyzing the data: π(θ) k j=1 θ stj 1 j. s > 0 is the equivalent sample size, which can be regarded as a number of hidden instances; t j is the proportion of hidden instances in category j.
6 Posterior distribution Obtained by multiplying likelihood and prior: π(θ n) j θ (nj+stj 1) j Dir posteriors are obtained from Dir priors (conjugacy). Taking expectations: P (c j n, t) = n j + st j n + s
7 Example n=10, n 1 =4, n 2 =6, s=1. The posterior estimate is sensitive on the choice of the prior: Prior 1 Prior 2 t 1 = 0.5 t 1 = 0.8 t 2 = 0.5 t 2 = 0.2 ˆθ 1 = = 0.41 ˆθ1 = = 0.44 Uniform prior is the most common choice.
8 Uniform prior looks non-informative but its estimates depend on the sample space. Sample space θ 1, θ 2 θ 1, θ 2, θ 3 Data n 1 = 4 n 1 = 4 n 2 = 6 n 2 = 6 - n 3 = 0 Estimate of θ 1 ˆθ1 = 4 + 1/ =.41 ˆθ1 = 4 + 1/ =.39 Non-informative priors are a lost cause L. Wasserman, normaldeviate.wordpress.com
9 Modelling prior-ignorance: the Imprecise Dirichlet Model (IDM) (Walley, 1996) The IDM is a convex set of Dirichlet prior. The set of admissible values for t j is: 0 < t j < 1 j j t j = 1 This is a model of prior ignorance : a priori nothing is known.
10 Learning from data with the IDM An upper and a lower posterior expectation are derived for each chance θ j. E(θ j n) = E(θ j n) = n j + st j inf 0<t j<1 n + s n j + st j sup 0<t j<1 n + s = n j n + s = n j + s n + s Upper and lower expectations do not depend on the sample space.
11 Example Binary variable, with n=10, n 1 =4, n 2 =6, s=1. The estimates of θ 1 are: Bayes Bayes IDM (t 1 = 0.5, t 2 = 0.5) (t 1 = 0.8, t 2 = 0.2) ˆθ 1 = ˆθ 1 = [ , ] =.409 =.436 = [.363,.454] The interval estimate of the IDM comprises the estimates obtained letting vary each t i within (0, 1).
12 On the IDM The estimates are imprecise, being characterized by an upper and a lower probability. They do not depend on the sample space. The gap between upper and lower probability narrows as more data become available. References: Walley, 1996, J. of the Royal Statistical Society. Series B. Bernard, 2009, Int. J. Approximate Reasoning.
13 Agenda 1 Estimating a multinomial 2 Naive Bayes and Naive Credal Classifier 3 Comparing credal and traditonal classifiers 4 Credal Model Averaging
14 Estimating a multinomial Naive Bayes and Naive Credal Classifier Comparing credal and traditonal classifiers Credal Model Averaging Classification Determine the type of Iris (class) on the basis of length and width (features) of sepal and petal. (a) setosa (b) virginica (c) versicolor Classification: to predict the class C of a given object, on the basis of features A = {A1,..., Ak }. Prediction: the most probable class a posteriori.
15 Naive Bayes (NBC) C A 1 A2 A 3 Naively assumes the features to be independent given the class. NBC is highly biased, but achieves good accuracy, especially on small data sets, thanks to low variance (Friedman,1997).
16 Naive Bayes (NBC) C A 1 A 2 A 3 Learns from data the joint probability of class and features, decomposed as the marginal probability of the classes and the conditional probability of each feature given the class.
17 Joint prior θ c,a : the unknown joint probability of class and features, which we want to estimate. Under naive assumption and Dirichlet prior, the joint prior is: P (θ c,a ) k θc st(c) θ st(a,c) (a c). c Ω C i=1 a Ω Ai where t(a, c) is the proportion of hidden instances with C = c and A i = a. Let vector t collect all the parameters t(c) and t(a, c). Thus, a joint prior is specified by t.
18 Likelihood and posterior The likelihood is like the prior, with coefficients st( ) replaced by the n( ). L(θ n) c C [ θ n(c) c k i=1 θ n(a,c) (a c) a A i The joint posterior P (θ c,a n, t) is like the likelihood, with coefficients n( ) replaced by st( ) + n( ). Once P (θ c,a n, t) is available, the classifier is trained. ].
19 Issuing a classification The value of the features is specified as a = (a i,..., a k ). where P (c a) = P (c) P (c) = k P (a i c) i=1 n(c) + st(c) n + s P (a i c) = n(a i, c) + st(a i, c) n c + st c. Prior-dependence : the most probable class varies with t.
20 Credal classifiers Induced using a set of priors (credal set). They separate safe instances from prior-dependent ones. On prior-dependent instances: they return a set of classes ( indeterminate classifications ), remaining robust though less informative.
21 IDM over the naive topology We consider a set of joint priors, factorized as: Class : 0 < t(c) < 1 c Ω c c t(c) = 1 A priori, 0 < P (c j ) < 1 j. Features given the class: t(a, c) = t(c) a, a c 0 < t(a, c) < t(c) a, c A priori, 0 < P (a c) < 1, c.
22 Naive Credal Classifier (NCC) Uses the IDM to specify a credal set of joint distributions and updates it into a posterior credal set. The posterior probability of class c ranges within an interval. Given feature observation a, class c each posterior of the credal set: credal-dominates c if for P (c a) > P (c a)
23 NCC and prior-dependent instances Credal-dominance is checked by solving an optimization problem. NCC eventually returns the non-dominated classes: a singleton on the safe instances a set on the prior-dependent ones. The next applications shows the gain in reliability due to returning indeterminate classifications on the prior-dependent instances.
24 Next From Naive Bayes to Naive Credal Classifier: Texture recognition
25 Texture recognition The OUTEX data sets (Ojala, 2002): 4500 images, 24 classes (textiles, carpets, woods..).
26 Features: Local Binary Patterns (Ojala, 2002) The gray level of each pixel is compared with that of its neighbors, resulting in a binary judgment (more intense/ less intense). Such judgments are collected in a string for each pixel.
27 Local Binary Patterns (2) Each string is then assigned to a single category. The categories group similar strings: e.g., is in the same category of for rotational invariance. There are 18 categories. For each image there are 18 features: the % of pixels assigned to each category.
28 Results (Corani et al., BMVC 2010) Accuracy of NBC: 92% (SVMs: 92.5%). NBC is highly accurate on the safe instances, but almost random on the prior-dependent ones. Safe Prior-dependent Amount% 95% 5% NBC: accuracy 94% 56% NCC: accuracy 94% 85% NCC: non-dom. classes 1 2.4
29 Different training set sizes (II) As n grows there is a decrease of both: the % of indet. classification; the number of classes returned when indeterminate. 100% % of indeterminate classifications 25 Classes returned when indeterminate 80% 20 60% 15 40% 10 20% 5 0% Training set size Training set size
30 Naive Bayes + rejection option vs Naive Credal Rejection option: reject an instance (no classification) if the probability of the most probable class is below a threshold p. Texture recognition: half of the prior-dependent instances classified by naive Bayes with probability > 90%. Rejection rule is not designed to detect prior-dependent instances!
31 Tree-augmented credal classifier (TAN) C C A 1 A 2 A 3 A 1 A 2 A 3 NBC TAN TAN improves over naive Bayes by weakening the independence assumption.
32 Tree-augmented credal classifier (TAN) C C A 1 A 2 A 3 A 1 A 2 A 3 NBC TAN Credal TAN detects instances unreliably classified by Bayesian TAN because of prior-dependence. References: Zaffalon, Reliable Computing, 2003 Corani and De Campos, Int. J. Appr. Reasoning, 2010.
33 Next From Naive Bayes to Naive Credal Classifier: Conservative treatment of missing data
34 Ignorance from missing data Usually, classifiers ignore missing data, assuming them to be MAR (missing at random). MAR: the probability of an observation to be missing does not depend on its value or on the value of other missing data. The MAR assumption cannot be tested on the incomplete data.
35 A non-mar example: a political poll. The right-wing supporters (R) sometimes refuse to answer; left-wing (L) supporters always answer. Vote Answer L L L L L L R R R - R - By ignoring missing data, P (R) = 1/4: underestimated!
36 Conservative treatment of missing data. Consider each possible completion of the data. Answer D1 D2 D3 D4 L L L L L L L L L L L L L L L R R R R R - L L R R - L R L R P (R) 1/6 1/3 1/3 1/2 P (R) [1/6, 1/2]; this interval includes the real value. Application to BNs: Ramoni and Sebastiani (2000).
37 Conservative Inference Rule (Zaffalon, 2005) MAR missing data are ignored. Non-MAR missing data are filled in all possible ways, both in the training and in the test data. The replacements exponentially grow with the missing data; yet polynomial time algorithms are available for naive Credal. Reference: Corani and Zaffalon, 2008, JMLR
38 The conservative treatment of missing data increase indeterminacy. Multiple classes are returned if the most probable class depends: on the prior specification or on the completion of the non-mar missing data. Declare each feature as MAR or non-mar depending on domain knowledge, for a good trade-off between robustness and informativeness.
39 Agenda 1 Estimating a multinomial 2 Naive Bayes and Naive Credal Classifier 3 Comparing credal and traditonal classifiers 4 Credal Model Averaging
40 Discounted-accuracy d-acc = 1 N N i=1 (accurate) i Z i accurate i : whether the set of classes returned on instance i contains the actual class; Z i : the number of classes returned on the i-th instance. For a traditional classification, d-acc equals the 0-1 accuracy.
41 Some properties d-acc = 1 N N i=1 (accurate) i Z i Among two accurate classifications, d-acc is higher for the more informative classifier. Among an accurate and an inaccurate classification, d-acc is higher for the accurate classifier, even if it has returned multiple classes (linear decrease). Discounted-accuracy satisfies some fundamental properties (Zaffalon et al., IJAR 2012)
42 Doctor random and doctor vacuous. Possible diseases:{a,b}. Doctor random: uniformly random diagnosis. Doctor vacuous: return both categories. Disease Random Vacuous Class. d-acc Class. d-acc A A 1 {A,B} 0.5 A B 0 {A,B} 0.5 B A 0 {A,B} 0.5 B B 1 {A,B} 0.5 d-acc
43 Doctor random vs doctor vacuous: the manager viewpoint. Disease Random Vacuous Class. d-acc Class. d-acc A A 1 {A,B} 0.5 A B 0 {A,B} 0.5 B A 0 {A,B} 0.5 B B 1 {A,B} 0.5 d-acc Assumption: the hospital profits a quantity of money proportional to the discounted-accuracy. After n visits, the profits are: doctor vacuous: n/2, with no variance. doctor random: expected n/2, with variance n/4.
44 Introducing utility Expected utility increases with the expected value of the rewards but decreases with their variance. Any risk-adverse manager prefers doctor vacuous over doctor random. And you prefer a vacuous over a random diagnosis! Idea: quantify such preference through a utility function (Zaffalon et al., IJAR 2012).
45 How to design the utility function Actual Predicted Utility A A 1 A B 0 Utility corresponds with accuracy for traditional classification consisting of a single class.
46 How to design the utility function Actual Predicted Utility A B,C 0 A A,B A wrong indeterminate classification has utility 0. The utility of an accurate but indeterminate classification consisting of two classes has to be larger than otherwise doctor random and doctor vacuous yield the same utility.
47 How to design the utility function Let this utility range between 0.65 (function u 65 ) or 0.8 (function u 80 ), then perform sensitivity analysis. Utility of credal classifiers and accuracy of determinate classifiers can be now compared. Such utility functions are numerically close to F1 and F2 metric (Del Coz et al., JLMR, 2009).
48 Comparing naive Bayes and naive Credal Wins on 55 UCI data sets: Utility naive Bayes naive Credal u u How to compare credal and traditional classifier was an open problem at AAAI 10!
49 Agenda 1 Estimating a multinomial 2 Naive Bayes and Naive Credal Classifier 3 Comparing credal and traditonal classifiers 4 Credal Model Averaging
50 Model uncertainty: an example with logistic regression. Given k features, we can design 2 k logistic regressors, each with a different feature set. Model uncertainty: different logistic regressors (with different feature sets) can fit similarly well the data. Considering only a single model leads to overconfident conclusions. Bayesian Model Averaging (BMA) combines the inferences of multiple models, using as weights the posterior probability of the models.
51 Bayesian Model Averaging (BMA) The posterior of class c under BMA is: P (c D) = m i M P (c m i, D)P (m i D) m i M P (c m i, D)P (D m i )P (m i ) D: available training data. m i is the i-th model within the model space M. P (m i ) is the prior over the models. Often BMA analysis is repeated under different prior over the models.
52 Independent Bernoulli (IB) prior over the models Prior probability of model m i, containing k i features: P (m i ) = θ ki (1 θ) k ki Assumptions: every covariate has the same prior probability of inclusion θ, which is the only parameter; the probability of inclusion of each covariate is independent from the others. Uniform prior over the models: θ = 0.5.
53 Credal model averaging (CMA) Represents the prior over the models by a credal set. The prior probability of inclusion of each covariate varies in [θ, θ]. Ignorance a priori can be represented by setting θ = ɛ θ = 1 ɛ CMA automates sensitivity analysis on the value of θ.
54 Upper and lower probability of a class Lower posterior probability of class c 1 : min θ [θ,θ] m i M P (c 1 D, x) = min θ [θ,θ] m i M P (c 1 D, x, m i )P (m i D) = P (D m i )θ ki (1 θ) k ki P (c 1 D, x, m i ) m P (D m j M j)θ kj (1 θ) k kj The upper is found by maximization of the above expression. Such optimizations are analytically solved.
55 Credal model averaging (CMA) The prior over the models is modelled by a credal set. The single models (logistic regressors with different feature sets) are learned in a traditional Bayesian way. CMA is thus a credal ensemble of Bayesian regressors.
56 Indeterminate classifications Safe instances: the most probable class remains the same, regardless the prior over the models. CMA and BMA return the same class if θ [θ, θ]. Prior-dependent instances: the most probable class varies with θ. CMA suspend the judgment, returning a set of classes.
57 Predicting presence or absence of marmot burrows Study area: 9500 cells (100m 2 each) within the Stelvio National Park (Italy). Presence of burrows in about 4.5% of the cells. Features available for each cell : altitude, slope, aspect (the direction in which the slope faces), soil temperature, soil cover, etc.
58 Experiments: comparing BMA and CMA BMA prior over the models: uniform (θ=0.5) hierarchical (θ as a random variable, with uniform prior). Training sets of size n {30, 60, 90..., 300}. 30 experiments for each sample size. For CMA we set θ = 0.95 and θ = 0.05 (ignorance a priori).
59 BMA accuracy Regardless the adopted prior over the models: BMA is highly accurate on the safe instances but... almost a random guesser on the prior-dependent ones! BMA: accuracy safe prior dependent BMA: accuracy safe prior dependent training set dimension training set dimension Figure: Left: uniform prior. Right: hierarchical prior.
60 Credal model averaging (CMA) Allows to easily compute sensitivity analysis regarding the coefficients of the covariates (upper / lower value) by solving optimization problem. The solution is analitically done. CMA automates sensitivity analysis.
61 Other ensembles using a credal set for the prior probability of the models. Credal ensemble of naive Bayes (Corani and Zaffalon, ECML-PKDD 2008.) Credal ensemble with compression coefficients (Corani and Antonucci, Comp. Statistics & Data Analysis, in press)
62 Other imprecise probability classifiers. Some useful pointers: Credal KNN: Destercker, Soft Computing 2012 Credal regression trees: Abellan and Moral, Int. J. Appr. Reasoning, Abellan et al., Comp. Statistics & Data Analysis, 2013 Classification of temporal sequences: Antonucci et al., Proc. ISIPTA 13 (Symp. on Imprecise Probability)
63 Conclusions Credal classifiers suspend the judgment on prior-dependent instances. Prior-dependent instances are not effectively identified by the rejection option. Credal classifiers robustly deal with the specification of the prior and with non-mar missing data. Typically, the accuracy of traditional classifiers drops on the instances indeterminately classified by credal classifiers.
64 Open problems Multi-label classification. Dealing by credal set with the prior on the model parameters and on the prior over the models.
Bayesian Networks with Imprecise Probabilities: Classification
Bayesian Networks with Imprecise Probabilities: Classification Alessandro Antonucci and Giorgio Corani {alessandro,giorgio}@idsia.ch IDSIA - Switzerland AAAI-10 A companion chapter A companion chapter
More informationIt provides a complete overview of credal networks and credal Cassio P. de Campos, Alessandro Antonucci, Giorgio Corani classifiers.
Bayesian estimation vs Imprecise Dirichlet Model From Naive Bayes to Naive Credal Classifier Further classifiers Conclusions Bayesian estimation vs Imprecise Dirichlet Model From Naive Bayes to Naive Credal
More informationCredal Ensembles of Classifiers
Credal Ensembles of Classifiers G. Corani a,, A. Antonucci a a IDSIA Istituto Dalle Molle di Studi sull Intelligenza Artificiale CH-6928 Manno (Lugano), Switzerland Abstract It is studied how to aggregate
More informationLikelihood-Based Naive Credal Classifier
7th International Symposium on Imprecise Probability: Theories and Applications, Innsbruck, Austria, 011 Likelihood-Based Naive Credal Classifier Alessandro Antonucci IDSIA, Lugano alessandro@idsia.ch
More informationContinuous updating rules for imprecise probabilities
Continuous updating rules for imprecise probabilities Marco Cattaneo Department of Statistics, LMU Munich WPMSIIP 2013, Lugano, Switzerland 6 September 2013 example X {1, 2, 3} Marco Cattaneo @ LMU Munich
More informationBayesian Networks with Imprecise Probabilities: Theory and Application to Classification
Bayesian Networks with Imprecise Probabilities: Theory and Application to Classification G. Corani 1, A. Antonucci 1, and M. Zaffalon 1 IDSIA, Manno, Switzerland {giorgio,alessandro,zaffalon}@idsia.ch
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationDecision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks
Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks Alessandro Antonucci Marco Zaffalon IDSIA, Istituto Dalle Molle di Studi sull
More informationGeneralized Loopy 2U: A New Algorithm for Approximate Inference in Credal Networks
Generalized Loopy 2U: A New Algorithm for Approximate Inference in Credal Networks Alessandro Antonucci, Marco Zaffalon, Yi Sun, Cassio P. de Campos Istituto Dalle Molle di Studi sull Intelligenza Artificiale
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationRobust Bayesian Model Averaging for the analysis of presence-absence data.
Noname manuscript No. (will be inserted by the editor) Robust Bayesian Model Averaging for the analysis of presence-absence data. G. Corani A. Mignatti the date of receipt and acceptance should be inserted
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationFast Non-Parametric Bayesian Inference on Infinite Trees
Marcus Hutter - 1 - Fast Bayesian Inference on Trees Fast Non-Parametric Bayesian Inference on Infinite Trees Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2,
More informationLearning Extended Tree Augmented Naive Structures
Learning Extended Tree Augmented Naive Structures Cassio P. de Campos a,, Giorgio Corani b, Mauro Scanagatta b, Marco Cuccu c, Marco Zaffalon b a Queen s University Belfast, UK b Istituto Dalle Molle di
More informationOn the Complexity of Strong and Epistemic Credal Networks
On the Complexity of Strong and Epistemic Credal Networks Denis D. Mauá Cassio P. de Campos Alessio Benavoli Alessandro Antonucci Istituto Dalle Molle di Studi sull Intelligenza Artificiale (IDSIA), Manno-Lugano,
More informationHierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables
Hierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables Laura Azzimonti IDSIA - SUPSI/USI Manno, Switzerland laura@idsia.ch Giorgio Corani IDSIA - SUPSI/USI Manno,
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationModeling High-Dimensional Discrete Data with Multi-Layer Neural Networks
Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationApproximate Credal Network Updating by Linear Programming with Applications to Decision Making
Approximate Credal Network Updating by Linear Programming with Applications to Decision Making Alessandro Antonucci a,, Cassio P. de Campos b, David Huber a, Marco Zaffalon a a Istituto Dalle Molle di
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationLearning Extended Tree Augmented Naive Structures
Learning Extended Tree Augmented Naive Structures de Campos, C. P., Corani, G., Scanagatta, M., Cuccu, M., & Zaffalon, M. (2016). Learning Extended Tree Augmented Naive Structures. International Journal
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationImprecise Hierarchical Dirichlet model with applications
Imprecise Hierarchical Dirichlet model with applications Alessio Benavoli Dalle Molle Institute for Artificial Intelligence (IDSIA), Manno, Switzerland, alessio@idsia.ch Abstract Many estimation problems
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationReliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks
Reliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks Sabina Marchetti Sapienza University of Rome Rome (Italy) sabina.marchetti@uniroma1.it Alessandro Antonucci IDSIA Lugano (Switzerland)
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationPart 3 Robust Bayesian statistics & applications in reliability networks
Tuesday 9:00-12:30 Part 3 Robust Bayesian statistics & applications in reliability networks by Gero Walter 69 Robust Bayesian statistics & applications in reliability networks Outline Robust Bayesian Analysis
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationAn Ensemble of Bayesian Networks for Multilabel Classification
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence An Ensemble of Bayesian Networks for Multilabel Classification Antonucci Alessandro, Giorgio Corani, Denis Mauá,
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationStatistical comparison of classifiers through Bayesian hierarchical modelling
Noname manuscript No. (will be inserted by the editor) Statistical comparison of classifiers through Bayesian hierarchical modelling Giorgio Corani Alessio Benavoli Janez Demšar Francesca Mangili Marco
More informationMachine Learning Lecture 2
Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability
More informationConservative Inference Rule for Uncertain Reasoning under Incompleteness
Journal of Artificial Intelligence Research 34 (2009) 757 821 Submitted 11/08; published 04/09 Conservative Inference Rule for Uncertain Reasoning under Incompleteness Marco Zaffalon Galleria 2 IDSIA CH-6928
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationChapter 6 Classification and Prediction (2)
Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationThe Minimum Message Length Principle for Inductive Inference
The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationSemiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta
Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance Probal Chaudhuri and Subhajit Dutta Indian Statistical Institute, Kolkata. Workshop on Classification and Regression
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More information