Scalable robust hypothesis tests using graphical models
|
|
- Annabelle Jenkins
- 5 years ago
- Views:
Transcription
1 Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010
2 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either of two hypotheses H 0 : x g(x H 0 ) H 1 : x g(x H 1 ) Given: Training sets T 0 and T 1, K samples each Goal: Classify new sample as coming from H 0 or H 1 Assumptions: Conditional densities g(x H 0 ) and g(x H 1 ) known exactly Samples in T 0 and T 1 generated i.i.d. from g(x H 0 ) and g(x H 1 ) respectively Likelihood ratio test (LRT) L(x) := g(x H H 1 1) τ (1) g(x H 0 ) H 0 10/22/2010 ipal Group Meeting 2
3 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either of two hypotheses H 0 : x g(x H 0 ) H 1 : x g(x H 1 ) Given: Training sets T 0 and T 1, K samples each Goal: Classify new sample as coming from H 0 or H 1 Assumptions: Conditional densities g(x H 0 ) and g(x H 1 ) known exactly Samples in T 0 and T 1 generated i.i.d. from g(x H 0 ) and g(x H 1 ) respectively Likelihood ratio test (LRT) L(x) := g(x H H 1 1) τ (1) g(x H 0 ) H 0 10/22/2010 ipal Group Meeting 2
4 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either of two hypotheses H 0 : x g(x H 0 ) H 1 : x g(x H 1 ) Given: Training sets T 0 and T 1, K samples each Goal: Classify new sample as coming from H 0 or H 1 Assumptions: Conditional densities g(x H 0 ) and g(x H 1 ) known exactly Samples in T 0 and T 1 generated i.i.d. from g(x H 0 ) and g(x H 1 ) respectively Likelihood ratio test (LRT) L(x) := g(x H H 1 1) τ (1) g(x H 0 ) H 0 10/22/2010 ipal Group Meeting 2
5 Need for robustness Assumption of knowledge of true densities unrealistic: Limited training Training data acquired in the presence of noise Dynamically evolving conditional densities Secondary physical effects on signal not modeled Robust hypothesis test 1 (RHT): Uncertainty in knowledge of true densities modeled as class of distributions in the proximity of some nominal density Minimum level of performance guaranteed for all models in the vicinity of nominal density 1 Huber, /22/2010 ipal Group Meeting 3
6 Need for robustness Assumption of knowledge of true densities unrealistic: Limited training Training data acquired in the presence of noise Dynamically evolving conditional densities Secondary physical effects on signal not modeled Robust hypothesis test 1 (RHT): Uncertainty in knowledge of true densities modeled as class of distributions in the proximity of some nominal density Minimum level of performance guaranteed for all models in the vicinity of nominal density 1 Huber, /22/2010 ipal Group Meeting 3
7 Measures of model proximity Contamination model: F c k = {f(x) : f(x) = (1 ɛ k)f k (x)+ɛ k h(x)}, k = 0,1, where f k (x) are the nominal densities, 0 ɛ 0,ɛ 1 1, and h(x) is an unknown probability density. Total variation: F TV k = {f(x) : d TV (f k,f) = f k (x) f(x) dx < ɛ}, k = 0,1. Kullback-Leibler divergence: F KL k = {f(x) : D(f k f) = f k (x)ln ( ) fk (x) dx < ɛ}, k = 0,1. f(x) 10/22/2010 ipal Group Meeting 4
8 Problem set-up D: convex set of pointwise randomized decision functions δ( ). For observation x, we select H 1 with probability δ(x) and H 0 with probability 1 δ(x). False alarm: P F (δ,f 0 ) = Miss: P M (δ,f 1 ) = δ(x)f 0 (x)dx (2) (1 δ(x))f 1 (x)dx. (3) For equally likely hypotheses, the probability of error is given by P E (δ,f 0,f 1 ) = 1 2 [P F(δ,f 0 )+P M (δ,f 1 )]. (4) 10/22/2010 ipal Group Meeting 5
9 Minimax RHT where (δ R,f L 0 (x),fl 1 (x)) = argmin δ D max f 0,f 1 F cp E(δ,f 0,f 1 ), (5) δ R is the robust test (f L 0,f L 1 ) are least favorable densities in F c = F c 0 F c 1. 10/22/2010 ipal Group Meeting 6
10 Solution to minimax RHT f L 0 (x) = { (1 ɛ0 )f 0 (x) 1 c (1 ɛ 0 )f 1 (x) { f1 L (x) = (1 ɛ1 )f 1 (x) c (1 ɛ 1 )f 0 (x) f 1 L 1 (x) f0 δ R (x) = L(x) 1 0 f 1(x) f < 0(x) c f 1(x) f (6) 0(x) c f 1(x) f > 0(x) c f 1(x) f (7) 0(x) c f1 L (x), (8) < 1 f0 L (x) where c and c are defined such that f0 L and f1 L are valid probability distributions, leading to: ( ) f1 (x) P 0 f 0 (x) < c + 1 ( ) f1 (x) c P 1 f 0 (x) c = 1 (9) 1 ɛ 0 ( ) ( ) f1 (x) P 1 f 0 (x) > c +c f1 (x) P 0 f 0 (x) c = 1. (10) 1 ɛ 1 P k is the probability measure w.r.t f k (x). 10/22/2010 ipal Group Meeting 7
11 Underlying intuition Choice of c and c : Consider L(x) = g(x H 1) n g(x H 0 ) = g(x i H 1 ) g(x i H 0 ). If any factor in the product approaches 0 or, L(x) is affected. Introduce robustness by clipping the likelihood ratios to the range c,c. Least favorable densities: Choose f L 0 (x) as close as possible to f 1 (x), and f L 1 (x) as close as possible to f 0 (x). i=1 10/22/2010 ipal Group Meeting 8
12 Scalability challenge RHT reduces to finding c and c such that: ( ) f1 (x) P 0 f 0 (x) < c + 1 ( ) f1 (x) c P 1 f 0 (x) c ( ) ( ) f1 (x) P 1 f 0 (x) > c +c f1 (x) P 0 f 0 (x) c = 1 1 ɛ 0 = 1 1 ɛ 1. Highly nonlinear equations; require Monte Carlo methods (sample generation). Scales very poorly with dimension - computationally intractable. 10/22/2010 ipal Group Meeting 9
13 Probabilistic graphical models Graph G = (V,E) is defined by a set of nodes V = {1,...,n}, and a set of edges E V V which connect pairs of nodes. Graphical model: Random vector defined on a graph such that each node represents one (or more) random variables, and edges reveal conditional dependencies. Underlying graph structure leads to factorization of joint probability distribution. Leverage efficient graph-based algorithms for statistical inference and learning. Trade-off between graph complexity and approximation accuracy. 10/22/2010 ipal Group Meeting 10
14 Probabilistic graphical models Graph G = (V,E) is defined by a set of nodes V = {1,...,n}, and a set of edges E V V which connect pairs of nodes. Graphical model: Random vector defined on a graph such that each node represents one (or more) random variables, and edges reveal conditional dependencies. Underlying graph structure leads to factorization of joint probability distribution. Leverage efficient graph-based algorithms for statistical inference and learning. Trade-off between graph complexity and approximation accuracy. 10/22/2010 ipal Group Meeting 10
15 Some graph structures Tree: x1 x2 x3 x4 x5 x6 x7 f(x) = f(x 1 )f(x 2 x 1 )f(x 3 x 1 )f(x 4 x 2 )f(x 5 x 2 )f(x 6 x 3 )f(x 7 x 3 ). Forest: Undirected acyclic graph with exactly (n 1) edges. Chow-Liu (1965): optimal tree approximation reduces to a maximum weight spanning tree (MWST) problem. Graph with k < (n 1) edges. Junction-tree: Tree-structured graph with edges between clusters of nodes. Clusters connected by an edge have at least one common node. 10/22/2010 ipal Group Meeting 11
16 Block-tree graphs Disjoint clusters of nodes, with only one path connecting any two clusters. 1,2,3 4,5,6 7,8,9 10,11,12 13,14,15 16,17,18 Figure: Example of a block-tree graph Benefits: Favorable complexity-performance trade-off Low cost of sample generation Efficient greedy algorithms to compute block-trees. 10/22/2010 ipal Group Meeting 12
17 Realizing RHT on block-tree graphs Suppose f(x) is Gaussian with mean zero. State-space model on the block-tree graph 2 is given as: where u Ci is white noise. Computing c and c : x Ci = A i x CΥ(i) +u Ci, (11) A i = E(x Ci x T C Υ(i) )[E(x CΥ(i) x T C Υ(i) )] 1 (12) E(u Ci u T C i ) = E(x Ci x T C i ) A i E(x CΥ(i) x T C i ), (13) 1 For each f k (x), compute block-tree graphs G k using a specified value of m (number of nodes in a cluster). Using recursive sampling, generate sample sets S k, k = 0,1. 2 Using S 0 and S 1, compute c and c by Monte Carlo methods. 2 Vats and Moura, /22/2010 ipal Group Meeting 13
18 Complexity benefits Assuming Gaussianity, generating a sample from f(x) is O(n 3 ) - inversion of an n n matrix. For L generated samples, total complexity is O(Ln 3 ). Using block-tree graph with cluster size m, computing block-tree graph has complexity O(logn)+O(mn 2 ) O(mn 2 ), while generating samples has complexity O(rm 3 ) = O(m 2 n). For L generated samples, total complexity is O(L (mn 2 +m 2 n)) O(L mn 2 ). Reduction in complexity for sparse graphical models, since m n and L L. 10/22/2010 ipal Group Meeting 14
19 Results Probability of misclassification RHT Classical hypothesis test ε (= ε 0 = ε 1 ) Figure: Error probability as a function of ɛ for classical hypothesis testing and RHT. 10/22/2010 ipal Group Meeting 15
20 Results Probability of misclassification RHT Graph based RHT (m = 20) Graph based RHT (m = 10) Number of training samples Figure: Error probability as a function of training size, for RHT and graph -based RHT (dense inverse covariance matrix). 10/22/2010 ipal Group Meeting 16
21 Results RHT Graph based RHT Probability of misclassification Number of training samples Figure: Error probability as a function of training size, for RHT and graph -based RHT (sparse inverse covariance matrix). 10/22/2010 ipal Group Meeting 17
22 Results R1 RHT Graph based RHT Probability of misclassification R2 R Number of training samples Figure: Automatic target recognition: Misclassification probability as a function of number training samples for graph-based RHT and RHT. Classification is performed on real-world SAR images. 10/22/2010 ipal Group Meeting 18
23 Summary Real-world classification problems: high-dimensional data, limited training, noisy acquisition need for robust hypothesis tests. Minimax test minimizes worst-case performance of making a decision via pursuit of least favorable densities. RHT is computationally intractable for high-dimensional data. Approximate densities by block-tree graphs and instantiate RHT - significant computational benefits with tolerable loss in classification performance. 10/22/2010 ipal Group Meeting 19
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationUncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department
Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationLatent Tree Approximation in Linear Model
Latent Tree Approximation in Linear Model Navid Tafaghodi Khajavi Dept. of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 Email: navidt@hawaii.edu ariv:1710.01838v1 [cs.it] 5 Oct 2017
More informationOnline Forest Density Estimation
Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density
More informationA Deep Interpretation of Classifier Chains
A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén http://users.ics.aalto.fi/{jesse,jhollmen}/ Aalto University School of Science, Department of Information and Computer Science and
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationLecture 22: Error exponents in hypothesis testing, GLRT
10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationoutline Nonlinear transformation Error measures Noisy targets Preambles to the theory
Error and Noise outline Nonlinear transformation Error measures Noisy targets Preambles to the theory Linear is limited Data Hypothesis Linear in what? Linear regression implements Linear classification
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationInformation Theory & Decision Trees
Information Theory & Decision Trees Jihoon ang Sogang University Email: yangjh@sogang.ac.kr Decision tree classifiers Decision tree representation for modeling dependencies among input variables using
More informationVariational sampling approaches to word confusability
Variational sampling approaches to word confusability John R. Hershey, Peder A. Olsen and Ramesh A. Gopinath {jrhershe,pederao,rameshg}@us.ibm.com IBM, T. J. Watson Research Center Information Theory and
More informationDoug Cochran. 3 October 2011
) ) School Electrical, Computer, & Energy Arizona State University (Joint work with Stephen Howard and Bill Moran) 3 October 2011 Tenets ) The purpose sensor networks is to sense; i.e., to enable detection,
More informationAdapted Feature Extraction and Its Applications
saito@math.ucdavis.edu 1 Adapted Feature Extraction and Its Applications Naoki Saito Department of Mathematics University of California Davis, CA 95616 email: saito@math.ucdavis.edu URL: http://www.math.ucdavis.edu/
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationEfficient Likelihood-Free Inference
Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationLearning SVM Classifiers with Indefinite Kernels
Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationEstimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS
MURI Meeting June 20, 2001 Estimation and Optimization: Gaps and Bridges Laurent El Ghaoui EECS UC Berkeley 1 goals currently, estimation (of model parameters) and optimization (of decision variables)
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationVers un apprentissage subquadratique pour les mélanges d arbres
Vers un apprentissage subquadratique pour les mélanges d arbres F. Schnitzler 1 P. Leray 2 L. Wehenkel 1 fschnitzler@ulg.ac.be 1 Université deliège 2 Université de Nantes 10 mai 2010 F. Schnitzler (ULG)
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationOnline Learning of Probabilistic Graphical Models
1/34 Online Learning of Probabilistic Graphical Models Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr CRIL-U Nankin 2016 Probabilistic Graphical Models 2/34 Outline 1 Probabilistic
More informationAutomatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries
Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic
More informationLecture 3 Classification, Logistic Regression
Lecture 3 Classification, Logistic Regression Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se F. Lindsten Summary
More informationOnline Estimation of Discrete Densities using Classifier Chains
Online Estimation of Discrete Densities using Classifier Chains Michael Geilke 1 and Eibe Frank 2 and Stefan Kramer 1 1 Johannes Gutenberg-Universtität Mainz, Germany {geilke,kramer}@informatik.uni-mainz.de
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationMarkov Random Fields
Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationSymbolic Variable Elimination in Discrete and Continuous Graphical Models. Scott Sanner Ehsan Abbasnejad
Symbolic Variable Elimination in Discrete and Continuous Graphical Models Scott Sanner Ehsan Abbasnejad Inference for Dynamic Tracking No one previously did this inference exactly in closed-form! Exact
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationBayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses
Bayesian Learning Two Roles for Bayesian Methods Probabilistic approach to inference. Quantities of interest are governed by prob. dist. and optimal decisions can be made by reasoning about these prob.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationEstimation of Rényi Information Divergence via Pruned Minimal Spanning Trees 1
Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees Alfred Hero Dept. of EECS, The university of Michigan, Ann Arbor, MI 489-, USA Email: hero@eecs.umich.edu Olivier J.J. Michel
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationRobust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4
More informationIntegrating Correlated Bayesian Networks Using Maximum Entropy
Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationExample: multivariate Gaussian Distribution
School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationStatistical Learning. Philipp Koehn. 10 November 2015
Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More information