Adaptive Multi-Modal Sensing of General Concealed Targets

Size: px
Start display at page:

Download "Adaptive Multi-Modal Sensing of General Concealed Targets"

Transcription

1 Adaptive Multi-Modal Sensing of General Concealed argets Lawrence Carin Balaji Krishnapuram, David Williams, Xuejun Liao and Ya Xue Department of Electrical & Computer Engineering Duke University Durham, NC

2 Outline Review of semi-supervised statistical classifiers and graph-based prior Extension of semi-supervised classifiers to a multi-sensor setting: Bayesian co-training Active multi-sensor sensing - Selection of those members of the unlabeled set for which acquisition of labels would be most informative - Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work

3 Nomenclature Labeled data: Set of N L feature vectors x n for which the associated label is known, denoted l n {,} for the binary case, thereby yielding the set D = {x l }., n n,n L L n = Unlabeled data: he set of N U feature vectors for which the associated labels are unknown, yielding the set D {x. his the data to be classified. = } = U n n N +,N + N L L U Supervised algorithm: Classifier that is designed using D L and tested on D U Semi-supervised algorithm: Classifier that is designed using D L and D U. Used to estimate labels of D U.

4 Motivation: FOPEN : target : clutter We typically have far more unlabeled data than labeled examples (N U >>N L ) Seek labels for most-informative feature vectors in D U ypically classification performed on isolated unlabeled examples one at a time We wish to classify members of D U using all information from D U and D L simultaneously

5 Motivation: SAR-Based Mine Detection remendous amount of unlabeled data, very limited labeled data Classification far easier when placed in context of entire image, vis-à-vis image chips

6 Motivation: UXO Detection remendous amount of unlabeled data, very limited labeled data Classification far easier when placed in context of entire image, vis-à-vis image chips

7 Logistic Link Function Design and a Data-Dependent Prior Assume labels {, }, and define the kernel-based function l n N = b y( x w) w K( x, b ) + w = Φ ( x) w n= he probability that x is associated with l n = is expressed as p( l n For the N L labeled examples, we wish to maximize the log-likelihood n = x, w) = σ[ y( x w)] = exp[ Φ ( x) w]/[ + exp( Φ N = L l( w) log P( D, w n= n l n L ) subject to a prior p( wd U ) L D on the weights, with the prior dependent on all data (labeled and unlabeled), via a graph-based construct ( x) w)] he classifier weights w MAP are therefore set at w MAP arg max = l w [ ( w D ) + log p( w D D )] L L U

8 Graph-Based Design of Data-Dependent Prior Use a kernel k(x i,x j ) to define the similarity of x i and x j Note: his kernel is used to define the graph, and need not be the same as that employed in the classifier W ij =k(x i,x j ) is large when the two vectors are similar, e.g., 2 the radial basis function W k( x, x ) = exp[ x x / σ ] ij = i j i j Let the vectors x i constitute the nodes of the graph and let f ( x ) = Φ ( x ) w be a function on the graph i i We seek to minimize the energy function En( f) = 2 i, j W [ f (x ij i ) f( x j)] 2 Large W ij f(x i ) f(x i ) Defining f = { f( x ), f( x2),..., f( xn, it is easy to show that En(f)=f L + N U )} f where is the combinatorial Laplacian =D-W, where the matrix W is defined by W ij and D is a diagonal matrix, the ith element of which is expressed as d i W = j ij

9 Graph-Based Design of Data-Dependent Prior - 2 Using En(f)=f f, finding f that minimizes En(f) corresponds to a MAP estimate of f from the Gaussian random field density function p( f) = Z β exp[ βen( f)] = Z β exp[ βf f] his gives us a prior on f, which is defined through f ( x ) = Φ ( x ) w, and therefore f=aw with i i A = x N + )] [ Φ( x ), Φ( x2),..., Φ( U NL We therefore have a prior on our model weights w also represented by a Gaussian random field p( w x, x 2,..., x ~ ) = N(, A NU + β NL ) with ~ A = A A We now have a prior on the weights w applied to the labeled data D L, accounting for the inter-relationships between all data D L and D U he MAP solution for w, w MAP arg max = l w [ ( w D ) + log p( w D D )] L L U, solved for via an EM algorithm

10 Graph-Based Design of Data-Dependent Prior: Intuition he Gaussian field prior on f essentially prefers functions which vary smoothly across the graph as opposed to functions which vary rapidly In our case we prefer to have the posterior probability of belonging to class + vary smoothly across the neighboring vertices of the graph

11 Decision surface based on labeled data (supervised) Decision surface based on labeled & unlabeled data (semi-supervised)

12 UXO Sensing JPG V

13 Outline Review of semi-supervised statistical classifiers and graph-based prior Extension of semi-supervised classifiers to a multi-sensor setting: Bayesian co-training Active multi-sensor sensing - Selection of those members of the unlabeled set for which acquisition of labels would be most informative - Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work

14 Extension of Graph to Multiple Sensors Graph for feature vectors from Sensor One Items for which features available from both sensors Graph for feature vectors from Sensor wo Assume M sensors are available, and S n represents the subset of sensors deployed for item n For item n we have features ( m) { xn, m S n } Build a graph-based prior for feature vectors from each of the individual sensor types How do we connect features from multiple sensors when available for a given item n? o simplify the subsequent discussion, we assume that we have only two sensors

15 Bayesian Co-raining Graph for feature vectors from Sensor One Items for which features available from both sensors Graph for feature vectors from Sensor wo o connect multiple feature vectors (graph nodes) for a given item, we impose a statistical prior favoring that the multiple feature vectors yield decision statistics that agree () () Let f ( xn ) = Φ ( xn ) w represent the decision function for the nth item, with data (2) (2) from Sensor One, and f ( x ) = Φ ( x w is defined similarly for Sensor wo 2 n 2 n ) 2 Let D B represent those elements for which data is available from both sensors, we seek parameters w and w 2 that satisfy min () (2) 2 min () (2) 2 { σ[ f( xn )] σ[ f2( xn )]} [ f( xn ) f2( xn )] w, w w, w 2 n D B = w 2 n D B min, w 2 w Cw

16 Multi-Sensor Graph-Based Prior min he condition w Cw may be expressed in terms of a Gaussian random field w w, 2 prior, the likelihood of which we wish to maximize he cumulative graph-based multi-sensor prior on the model weights is ~ ~ log p( w, w2 λb, λ, λ2) = log p( w λb, λ, λ2) = λbw Cw + λw Aw + λ2w2 A2w2 + K Hyper-parameters that control relative importance of terms Co-training prior based on multiple views of same item Smoothness prior within Sensor One weights Smoothness prior within Sensor wo weights A Gamma hyper-prior is used for ( λ, λ, λ ) p B 2

17 otal Likelihood to be Maximized p( w D L, D U ) N L n= l n -l n = { σ[ Φ ( xn) w]} { σ[ Φ ( xn) w]} + p( w, w 2 λ B, λ, λ 2 )p( λ B, λ, λ 2 3 ) dλ Driven by labeled data from Sensor One, Sensor wo or both Graph-based prior based on labeled and unlabeled data from Sensors One and wo We solve for the weights in a maximum-likelihood sense, via an efficient EM algorithm with λ, λ λ serving as the hidden variables B, 2 Once the weights w are so determined, the probability that example x is associated with label l n is expressed as p( l n x, w l n -l n ML ) = σ[ Φ ( x) wml]} { σ[ Φ ( x) wml]}

18 Features of Bayesian Semi-Supervised Co-raining Almost all previous fusion algorithms have assumed that all sensors are applied on each item of interest Using Bayesian co-training, a subset of sensors may be deployed on any given item Placed within a semi-supervised setting, whereby context and changing statistics are accounted for by utilizing the unlabeled data Sensor Sensor 2 Labeled Data Unlabeled data

19 Semi-Supervised Multi-Sensor Processing Example Results: WAAMD Hyperspectral & SAR data Sensor Sensor 2 Labeled Data Unlabeled data Hyperspectral X-band SAR NVESD collected data from Yuma Proving Ground, several different environments Simple feature extraction performed on hyper-spectral & SAR data Labeled examples selected randomly, classification performance presented for remaining unlabeled examples

20 N L =386 N U =469

21 N L =66 N U =477

22 Discussion We have demonstrated integration of AHI (hyperspectral) and Veridian (SAR) data, and improved performance with the semi-supervised classifier, when N U >>N L Question: In this example, is the SAR busting the hyperspectral performance, vice-versa, or both? We have two couple graphs, one each for the SAR and hyperspectral, with these linked via the co-training prior We can use the sub-classifier associated with each of these individual graphs to examine performance with or without the other sensor (e.g., SAR alone vis-à-vis the SAR classifier performance when also using information from the hyper-spectral sensor)

23 Hyper-Spectral alone vis-à-vis Hyper-Spectral Using SAR Information AHI #B62R rFlw & Veridian #B3R48rFvv.9.8 Probability of Detection est on unlabeled AHI data.2/.2/.8 of /2/Overlap labeled (L/U=57/35) 99testing data including 73mines.2 Supervised (AHI only). Semi-supervised (AHI only) Semi-supervised (AHI and SAR) Probability of False Alarm

24 AHI #B62R rFlw & Veridian #B3R48rFvv Probability of Detection SAR alone vis-à-vis SAR Using Hyper-Spectral Information est on unlabeled SAR data.2/.2/.8 of /2/Overlap labeled (L/U=57/35) 28 testing data including mines Supervised (SAR only). Semi-supervised (SAR only) Semi-supervised (AHI and SAR) Probability of False Alarm

25 Outline Review of semi-supervised statistical classifiers and graph-based prior Extension of semi-supervised classifiers to a multi-sensor setting: Bayesian co-training Active multi-sensor sensing - Selection of those members of the unlabeled set for which acquisition of labels would be most informative - Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work

26 Active Learning: Adaptive Multi-Modality Sensing Sensor Sensor 2 Labeled Data Unlabeled data Q: Which of the unlabeled data (from Sensor, Sensor 2, or both) would be most informative if the associated label could be determined (via personnel or auxiliary sensor) Q2: For those examples for which only one sensor was deployed, which would be most informative if the other sensor was deployed to fill in missing data A: heory of optimal experiments

27 ype Active Learning: Labeling Unlabeled Data he graph-based prior does not change with the addition of new labeled examples We assume that the hyper-parameters λ B, λ, and λ 2 do not change with the addition of one new labeled example he statistics of the model weights are approximated (Laplace approximation) as p( w D L, D U ) ~ N( w wˆ, H ) where the precision matrix (Hessian) is expressed H = 2 [ log p( w DL, DU )] o within an additive constant the entropy of a Gaussian process is 2 log H

28 ype Active Learning: Labeling Unlabeled Data Expected decrease in entropy on w when the label is acquired for x * I( w; l* ) = H ( w) E{ H ( w l* )} = (/ 2) log[ + p* ( p* ) x* H x* ] p ( l* = x*, wˆ ) Error bars in our model with regard to sample x * : Logistic regression Where do we acquire labels? - hose x * for which the classifier is least certain, i.e., ( l = x, ˆ ) ~.5 p * * w - hose x * for which the logistic-regression model has largest error bars

29 ype 2 Active Learning: Deploying Sensors to Fill-In Missing Data o simplify the discussion, assume we have two sensors, S and S 2 Let the feature vector measured by S for the ith item (target/non-target), with defined similarly Using a Laplace approximation, we have with () x i (2) x j ], ˆ [ ), ( p U L U L U L w D D w N σ σ + σ σ + + λ + λ λ = = = (2) (2) (2) 2 2 (2) 2 () () () () B B 2 2 U L ) ( ) ( ) ( ) ( i i i L i i i i i L i i x x x w x w x x x w x w Graphs from individual sensors Co-training graph Labeled data from sensors S & S 2

30 ype 2 Active Learning L () () L2 () () x (2) (2) L U = λ +λ2 2 +λb B + σ( ) σ( ) i xi w xi w xi + σ( w2xi ) σ( w2xi ) i= i= x (2) i x (2) i We only deploy sensors to add data to the unlabeled data, and therefore only λ + λ2 2 + λb B changes with the addition of the new data (i.e., we improve the quality of the graph-based prior) L U We desire the expected change in the determinant of, but to make computationally tractable we actually compute E{ } L U Use Gaussian-mixture models, based on all data, to estimate needed density functions () (2) (2) () p( x x ) and p( x x )

31 Active Selection of Labeled Examples from Unlabeled Data

32 Deployment of Sensor A to Fill In Missing Data from Sensor B Consider WAAMD Data, 2 Potential Fill Ins Sensor A Sensor B Regions Where Data Potentially Filled In

33 AHI #B62R rFlw & Veridian #B3R48rFvv Pd data missing each sensor labeled data Active querying 65 Random querying Pfa

34 AHI #B62R rFlw & Veridian #B3R48rFvv Pd data missing each sensor labeled data Active querying 97 Random querying Pfa

35 AHI #B62R rFlw & Veridian #B3R48rFvv Pd Active querying 29 Random querying data missing each sensor labeled data Pfa

36 Outline Review of semi-supervised statistical classifiers and graph-based prior Extension of semi-supervised classifiers to a multi-sensor setting: Bayesian co-training Active multi-sensor sensing - Selection of those members of the unlabeled set for which acquisition of labels would be most informative - Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work

37 What is Concept Drift? Sensor Sensor 2 Labeled Data Unlabeled data A fundamental assumption in statistical learning algorithms is that all data are characterized by the same underlying statistics. For M sensors and label l n : p( x (), x (2),...,x ( M ) l n In sensing problems, background and/or sensor conditions may change, and therefore there may manifest changes in the underlying statistics, for the labeled and unlabeled data: Concept Drift ) Can we design algorithms that adapt as the underlying concepts change, such that we can still utilize all available data?

38 Concept Drift Assume we have unlabeled data from environment of interest E D ( m) U = { xn, m Sn} n=, NU for which we seek to estimate the unknown associated labels l n In addition, assume we have labeled data from a related but different environment Ê Dˆ ˆ ( m) L = xn, l n), m Sn} n= NU +, NU + NL {(ˆ ( m) where (ˆ x ˆ n, l ) are data and associated label from the previous environment (sensor m) n Environment Ê Environment E Feature 2 Feature 2 Feature Feature

39 Concept Drift Feature 2 Environment Ê Feature 2 Environment E Feature Feature Problem: We have labeled landmine/fopen/underground-structure data from one environment, which we d like to apply to a new but related environment Define the probabilities p( l ˆ ˆ n l n, D U, DL ) for which we impose a Dirichlet conjugate prior, which allows us to incorporate prior knowledge for p( l lˆ) In subsequent discussion we consider a single sensor (M=) and binary labels (l=,) to simplify the discussion

40 Concept Drift 2 Feature 2 Environment Ê Feature 2 Environment E Feature Feature p( l ˆ n l = n =, D, ˆ U D L ) p( w D U, Dˆ L ) N L = {ˆ l n= n log[ σ( w xˆ n ) µ n + σ( w xˆ n )( µ n )] + ( lˆ n ) log[ σ( w xˆ n ) ν n + σ( w xˆ n )( ν n )]} + λw ~ Aw + K p( l ˆ n l = n =, D, ˆ U D L ) Graph-based smoothness prior on unlabeled data

41 Concept Drift 3 Feature 2 Environment Ê Feature 2 Environment E Feature Feature Weights determined as before using an EM algorithm, with the parameters µ n and ν n playing the role of hidden variables (Dirichlet prior employed) Can again do active learning: - Of the unlabeled examples, which would be most informative if they label could be acquired Solved as before within a Laplace approximation (Hessian, etc.)

42 7 Illustrative oy Example logit-select a ctive: iteration Original labeled data Concept Drift Only the Initial Data are Labeled -.5

43 Illustrative oy Example - logit-select a ctive: iteration First Active Labeling & Classifier Refinement

44 Illustrative oy Example - 2 logit-select a ctive: iteration Second Active Labeling & Classifier Refinement

45 Illustrative oy Example - 3 logit-select a ctive: iteration Fourth Active Labeling & Classifier Refinement

46 Example on Real Data: UXO Detection remendous amount of unlabeled data, very limited labeled data Classification far easier when placed in context of entire image, vis-à-vis image chips

47 Results on JPGV-V and Badlands Data EMI and magnetometer sensors Labeled data: JPGV-V (6 UXO + 88 clutter) Unlabeled data: Badlands (57 UXO clutter) Kernel: Direct kernel (weights applied to feature components) UXO Features = [log(m p ), log(m z ), depth]. Each feature is normalized to zero mean and unitary variance) Five items actively selected for labeling from Badlands data

48 .9 Account for drift in statistics ROC: test data includes actively labeled data.8 probability of UXO detection rain classifier on labeled JPG-V and five actively selected items From Badlands rain classifier on labeled JPG-V data only logit-push-active (C=e-5), 5 primary data logit-active, 5 primary data logit number of excavations

49 Future Work he active learning for labeling and acquisition of new data ( fill in ) is thus far myopic. We believe this can be extended to non-myopic active learning. We now have multiple actions one may take: (i) perform labeling of unlabeled data or (ii) deploy a given sensor to fill in missing data on a given target. hese will now be integrated into a general sensor and HUMIN management structure, accounting for deployment costs We need to extend the multi-sensor fill-in active learning to the concept-drift framework Have performed ML (EM algorithm) estimation of model parameters. Now employing ensemble/variational techniques to extract full posterior on model parameters (initial work discussed during Workshop slides available) Deploy algorithms on actual hardware (robots), in collaboration with Quantum Magnetics, Inc.

Multi-Modal Inverse Scattering for Detection and Classification Of General Concealed Targets:

Multi-Modal Inverse Scattering for Detection and Classification Of General Concealed Targets: Multi-Modal Inverse Scattering for Detection and Classification Of General Concealed Targets: Landmines, Targets Under Trees, Underground Facilities Research Team: Lawrence Carin, Leslie Collins, Qing

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Migratory Logistic Regression for Learning Concept Drift Between Two Data Sets with Application to UXO Sensing

Migratory Logistic Regression for Learning Concept Drift Between Two Data Sets with Application to UXO Sensing Migratory ogistic Regression for earning Concept Drift Between Two Data Sets with Application to UXO Sensing Xuejun iao and awrence Carin Department of Electrical and Computer Engineering Duke University

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Learning From Crowds. Presented by: Bei Peng 03/24/15

Learning From Crowds. Presented by: Bei Peng 03/24/15 Learning From Crowds Presented by: Bei Peng 03/24/15 1 Supervised Learning Given labeled training data, learn to generalize well on unseen data Binary classification ( ) Multi-class classification ( y

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

ENVIRONMENTAL remediation of sites containing unexploded

ENVIRONMENTAL remediation of sites containing unexploded IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 4, NO. 4, OCTOBER 2007 629 A Bivariate Gaussian Model for Unexploded Ordnance Classification with EMI Data David Williams, Member, IEEE, Yijun Yu, Levi

More information

Radial Basis Function Network for Multi-task Learning

Radial Basis Function Network for Multi-task Learning Radial Basis Function Networ for Multi-tas Learning Xuejun Liao Department of ECE Due University Durham, NC 7708-091, USA xjliao@ee.due.edu Lawrence Carin Department of ECE Due University Durham, NC 7708-091,

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Detection theory

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Target Tracking and Classification using Collaborative Sensor Networks

Target Tracking and Classification using Collaborative Sensor Networks Target Tracking and Classification using Collaborative Sensor Networks Xiaodong Wang Department of Electrical Engineering Columbia University p.1/3 Talk Outline Background on distributed wireless sensor

More information

(2) I. INTRODUCTION (3)

(2) I. INTRODUCTION (3) 2686 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 5, MAY 2010 Active Learning and Basis Selection for Kernel-Based Linear Models: A Bayesian Perspective John Paisley, Student Member, IEEE, Xuejun

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Statistical Learning. Philipp Koehn. 10 November 2015

Statistical Learning. Philipp Koehn. 10 November 2015 Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:

More information