Machine Learning Overview

Similar documents
Machine Learning using Bayesian Approaches

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Bayesian Models in Machine Learning

Pattern Recognition and Machine Learning

Recent Advances in Bayesian Inference Techniques

Curve Fitting Re-visited, Bishop1.2.5

Machine Learning Techniques for Computer Vision

Gentle Introduction to Infinite Gaussian Mixture Modeling

CSCI-567: Machine Learning (Spring 2019)

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Brief Introduction of Machine Learning Techniques for Content Analysis

Bayesian Networks Inference with Probabilistic Graphical Models

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Machine Learning for Signal Processing Bayes Classification and Regression

STA 414/2104: Machine Learning

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

p L yi z n m x N n xi

Chapter 6 Classification and Prediction (2)

Linear Classification

STA 4273H: Statistical Machine Learning

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

STA 4273H: Statistical Machine Learning

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Hidden Markov Models

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Bayesian Methods for Machine Learning

STA 4273H: Statistical Machine Learning

Introduction. Chapter 1

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

CPSC 540: Machine Learning

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Nonparametric Bayesian Methods (Gaussian Processes)

Undirected Graphical Models

Naïve Bayes classification

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

INTRODUCTION TO PATTERN RECOGNITION

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

The Bayes classifier

Bayesian Nonparametrics for Speech and Signal Processing

Machine Learning Srihari. Probability Theory. Sargur N. Srihari

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA. IBM Research Labs, Bangalore January 2014

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

STA 4273H: Statistical Machine Learning

Machine Learning for Signal Processing Bayes Classification

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

CS Machine Learning Qualifying Exam

Lecture 13 : Variational Inference: Mean Field Approximation

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Unsupervised Learning

Probabilistic Time Series Classification

STA 4273H: Sta-s-cal Machine Learning

Support Vector Machine. Industrial AI Lab.

Conditional Independence

Hidden Markov Models

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

ECE285/SIO209, Machine learning for physical applications, Spring 2017

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

Stat 5101 Lecture Notes

13: Variational inference II

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Computational Genomics

Learning Bayesian network : Given structure and completely observed data

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian Learning (II)

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Probability and Information Theory. Sargur N. Srihari

Intelligent Systems (AI-2)

PATTERN RECOGNITION AND MACHINE LEARNING

PMR Learning as Inference

Introduction to Machine Learning Midterm, Tues April 8

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Gaussian discriminant analysis Naive Bayes

Introduction to Machine Learning

an introduction to bayesian inference

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Probabilistic Graphical Models for Image Analysis - Lecture 1

Introduction to Machine Learning

Random Field Models for Applications in Computer Vision

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Introduction to Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Karl-Rudolf Koch Introduction to Bayesian Statistics Second Edition

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Intelligent Systems (AI-2)

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Neural Network Training

Probabilistic Graphical Models

CSCE 478/878 Lecture 6: Bayesian Learning

Transcription:

Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1

Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression 2. Classification 3. Clustering 4. Modeling Uncertainty/Inference 3. New Developments 1. Fully Bayesian Approach 4. Summary 2

What is Machine Learning? Programming computers to: Perform tasks that humans perform well but difficult to specify algorithmically Principled way of building high performance information processing systems search engines, information retrieval adaptive user interfaces, personalized assistants (information systems) scientific application (computational science) engineering 3

Example Problem: Handwritten Digit Recognition Wide variability of same numeral Handcrafted rules will result in large no of rules and exceptions Better to have a machine that learns from a large training set 4

ML History ML has origins in Computer Science PR has origins in Engineering They are different facets of the same field Methods around for over 50 years Revival of Bayesian methods due to availability of computational methods 5

Some Successful Applications of Machine Learning Learning to recognize spoken words Speaker-specific strategies for recognizing primitive sounds (phonemes) and words from speech signal Neural networks and methods for learning HMMs for customizing to individual speakers, vocabularies and microphone characteristics Table 1.1 6

Some Successful Applications of Machine Learning Learning to drive an autonomous vehicle Train computer-controlled vehicles to steer correctly Drive at 70 mph for 90 miles on public highways Associate steering commands with image sequences 7

Handwriting Recognition Task T recognizing and classifying handwritten words within images Performance measure P percent of words correctly classified Training experience E a database of handwritten words with given classifications 8

The ML Approach Data Collection Samples Model Selection Probability distribution to model process Parameter Estimation Values/distributions Generalization (Training) Search Find optimal solution to problem Decision (Inference OR Testing) 9

Types of Problems machine learning is used for 1. Classification 2. Regression 3. Clustering (Data Mining) 4. Modeling/Inference 10

Example Classification Problem Off-shore oil transfer pipelines Non-invasive measurement of proportion of oil,water and gas Called Three-phase Oil/Water/Gas Flow

Dual-energy gamma densitometry Beam of gamma rays passed through pipe Attenuation in intensity provides information on density of material Single beam insufficient Two degrees of freedom: fraction of oil, fraction of water One beam of Gamma rays of two energies (frequencies or wavelengths) Detector

Complication due to Flow Velocity 1. Low Velocity: Stratified configuration Oil floats on top of water, gas above oil 2. Medium Velocity: Annular configuration Concentric cylinders of Water, oil, gas 3. High-Turbulence: Homogeneous Intimately mixed Single beam is insufficient Horizontal beam thru stratified indicates only oil 13

Multiple dual energy gamma densitometers Six Beams 12 measurements attenuation 14

Prediction Problems 1. Predict Volume Fractions of oil/ water/gas 2. Predict geometric configuration of three phases Twelve Features Fractions of oil and water along the paths Learn to classify from data 15

Feature Space Three classes (Stratified,Annular,Homogeneos) Two variables shown 100 points Which class should x belong to? 16

Cell-based Classification Naïve approach of cell based voting will fail exponential growth of cells with dimensionality 12 dimensions discretized into 6 gives 3 million cells Hardly any points in each cell 17

Popular Statistical Models Generative Naïve Bayes Mixtures of multinomials Mixtures of Gaussians Hidden Markov Models (HMM) Bayesian networks Markov random fields Discriminative Logistic regression SVMs Traditional neural networks Nearest neighbor Conditional Random Fields (CRF) 18

Regression Problems Forward problem data set Corresponding inverse problem by reversing x and t Red curve is result of fitting a two-layer neural network by minimizing sum-of-squared error 19 Very poor fit to data: GMMs used here

Forward and Inverse Problems Kinematics of a robot arm Forward problem: Find end effector position given joint angles Has a unique solution Inverse kinematics: two solutions: Elbow-up and elbow-down Forward problems correspond to causality in a physical system have a unique solution e.g., symptoms caused by disease If forward problem is a many-to-one mapping, inverse has multiple 20 solutions

Clustering Old Faithful (Hydrothermal Geyser in Yellowstone) 272 observations Duration (mins, horiz axis) vs Time to next eruption (vertical axis) Simple Gaussian unable to capture structure Linear superposition of two Gaussians is better Gaussian has limitations in modeling real data sets Gaussian Mixture Models give very complex densities p( x) = π N( x µ, Σ k = 1 π k are mixing coefficients that sum to one One dimension Three Gaussians in blue Sum in red K k k k ) 21

Estimation for Gaussian Mixtures Log likelihood function is N K ln p( X π, µ, Σ) = ln π k N( x n µ k, Σk ) = = n 1 k 1 No closed form solution Use either iterative numerical optimization techniques or Expectation Maximization 22

Bayesian Representation of Uncertainty Not just frequency of random, repeatable event It is a quantification of uncertainty Example: Whether Arctic ice cap will disappear by end of century We have some idea of how quickly polar ice is melting Revise it on the basis of fresh evidence (satellite observations) Assessment will affect actions we take (to reduce greenhouse gases) Handled by general Bayesian interpretation Use of probability to represent uncertainty is not an ad-hoc choice If numerical values represent degrees of belief, then simple axioms for manipulating degrees of belief leads to sum and product rules of probability 23

Modeling Uncertainty A Causal Bayesian Network Example of Inference: Cancer is independent of Age and Gender given exposure to Toxics and Smoking 24

The Fully Bayesian Approach Bayes Rule Bayesian Probabilities Concept of Conjugacy Monte Carlo Sampling 25

Rules of Probability Given random variables X and Y Sum Rule gives Marginal Probability Product Rule: joint probability in terms of conditional and marginal Combining we get Bayes Rule where Viewed as Posterior α likelihood x prior 26

Probability Distributions: Relationships Discrete- Binary Binomial N samples of Bernoulli N=1 Bernoulli Single binary variable Conjugate Prior Beta Continuous variable between {0,1] Discrete- Multi-valued Large N Multinomial One of K values = K-dimensional binary vector Conjugate Prior K=2 Dirichlet K random variables between [0.1] Continuous Gaussian Student s-t Generalization of Gaussian robust to Outliers Infinite mixture of Gaussians Gamma ConjugatePrior of univariate Gaussian precision Gaussian-Gamma Conjugate prior of univariate Gaussian Unknown mean and precision Wishart Conjugate Prior of multivariate Gaussian precision matrix Gaussian-Wishart Conjugate prior of multi-variate Gaussian Unknown mean and precision matrix Exponential Special case of Gamma Angular Von Mises Uniform 27

Fully Bayesian Approach Feasible with increased computational power Intractable posterior distribution handled using either variational Bayes or stochastic sampling e.g., Markov Chain Monte Carlo, Gibbs 28

Summary ML is a systematic approach instead of ad-hockery in designing information processing systems Useful for solving problems of classification, regression, clustering and modeling The fully Bayesian approach together with Monte Carlo sampling is leading to high performing systems Great practical value in many applications Computer science search engines text processing document recognition Engineering 29