Adapted Feature Extraction and Its Applications
|
|
- Erik Ward
- 6 years ago
- Views:
Transcription
1 1 Adapted Feature Extraction and Its Applications Naoki Saito Department of Mathematics University of California Davis, CA saito@math.ucdavis.edu URL: saito/
2 2 Acknowledgment Ronald R. Coifman (Yale/FMAH) Fred Warner (Yale/FMAH) Frank Geshwind (Yale/FMAH)
3 3 Outline Problem Formulation A Dictionary/Library of Orthonormal Bases Local Discriminant Basis (LDB) Improved LDB with Empirical Probability Density Estimation Example 1: Synthetic Waveform Classification Example 2: Geophysical Acoustic Waveform Classification Conclusion Future Directions
4 4 A Strategy for Pattern Recognition Normalization of input patterns scaling, translation, rotation Feature extraction and selection Classification LDA, CART, k-nn, Neural Networks,... X F Y x f f(x) g 1. y. C
5 5 Problem Formulation Learning (or Training): Given a set of data, T = {(x i, y i )} N i=1 X Y, where X R n and Y = {1,..., K} (class labels), find a map d : X Y such that R(d) = 1 N N I(d(x i ) y i ) small. i=1 Prediction: Apply the map d obtained during the training to a new dataset. Then interpret the result and evaluate d.
6 6 Problem Formulation... Let (X, Y ) (X, Y) be a random sample from P (X A, Y = y) = P (X A Y = y) P (Y = y), where P (Y = y) = π y = N y /N in practice. Let us assume the density p(x y) exists. The Bayes classifier (or rule) d B is: d B (x) = ki(x A k ), where A k = {x X : p(x k)π k = max y Y p(x y)π y }. In other words, assign x to class k if x A k. Note y Y A y = X.
7 7 Problem Formulation... Difficulties of the Bayes classifier: Don t know p(x k) or Too difficult to estimate p(x k) computationally because of the high dimensionality of the problem (curse of dimensionality). = Need dimensionality reduction without loosing important information for classification.
8 8 Problem Formulation... Therefore, we restrict our attention to the map d : X Y of the following form: d = g f = g Θ m Ψ T, f : X F R m is called a feature extractor consisting of: Ψ: an n-dimensional orthogonal matrix selected from a dictionary or library of orthonormal bases. Θ m : a selection rule: it selects the most important m ( n) coordinates from n-dimensional coordinates. g : F Y is a conventional classifier, e.g., LDA, CART, ANN etc.
9 9 A Library of Orthonormal Bases consists of dictionaries of orthonormal bases: each dictionary is a binary tree whose nodes are subspaces of Ω 0,0 = R n with different time-frequency localization characteristics. Ω 3,7 Ω 2,3 Ω 3,6 Ω 1,1 Ω 3,5 Ω 2,2 Ω 3,4 Ω 0,0 Ω 3,3 Ω 2,1 Ω 3,2 Ω 1,0 Ω 3,1 Ω 2,0 Ω 3,0
10 10 A Library of Orthonormal Bases... Examples of dictionaries include wavelet packet bases, local trigonometric bases, and local Fourier bases. It costs O(n[log n] p ) to generate a dictionary for a signal of length n (p = 1 for wavelet packets, p = 2 for LTB/LFB). Each dictionary may contain up to n(1 + log 2 n) basis vectors and more than 2 n possible orthonormal bases. How to select the best possible basis for the problem at hand is a key issue.
11 11 Example of Local Basis Functions Standard Basis Haar Basis Walsh Basis C12 Wavelet Packet Basis Local Sine Basis Discrete Sine Basis
12 12 Time-Frequency Characteristics Time Frequency Standard Basis Time Frequency Haar Basis Time Frequency Walsh Basis Time Frequency C12 Wavelet Packet Basis Time Frequency Local Sine Basis Time Frequency Discrete Sine Basis
13 13 Local Discriminant Basis Let D = {w i } N w i=1 be a time-frequency dictionary D = {B j } N B j=1 as a list of all possible orthonormal bases where B j = (w j1,..., w jn ) Let M + (B j ) be a measure of efficacy of B j for discriminantion Then the local discriminant basis (LDB) can be written as Ψ = arg max B j D M+ (B j ). Proposition: There exists a fast algorithm (divide-and-conquer, O(n)) to find Ψ from each dictionary D if M + is additive. M + (0) = 0, M + ({w j1,..., w jn }) = n M + (w ji ). i=1
14 14 Discriminant Measures For w i D, consider the projection Z i = wi X of an input random signal X X Let δ i be the efficacy for discrimination (or discriminant power) of w i Some possibilities of measuring the efficacy of the basis B j : M + (B j ) = n i=1 δ j i M + (B j ) = k i=1 δ (j i ), where {δ (ji )} is the decreasing rearrangement of {δ ji } and k < n. M + (B j ) = n i=1 ε j i δ ji, where ε ji = 1 if E[Z 2 i ] > θ, = 0 otherwise.
15 15 Discriminant Measures... What are the basic quantities to use for 1D efficacy? Differences in normalized energies of Z i among classes: V (y) i = E[Z 2 i Y = y] n i=1 E[Z2 i Y = y] ˆV (y) i = Ny k=1 w i x (y) k 2 Ny k=1 x(y) k 2 Differences in probability density functions (pdfs) of Z i : q (y) i (z) = p(x y) dx ˆq (y) i (z) via e.g., ASH w i x=z
16 16 Discriminant Measures... Some possibilites of discrepancy measures between two pdfs p(x), q(x): Relative entropy [Kullback-Leibler divergence]: Hellinger distance: D KL (p, q) = p(x) log 2 p(x) q(x) dx D H (p, q) = ( p(x) q(x) ) 2 dx L 2 distance: D 2 (p, q) = (p(x) q(x)) 2 dx
17 17 Discriminant Measures... Using normalized energy: δ i = ˆV (1) (1) ( ˆV i i log 2, ˆV (2) i Using empirical pdfs: ˆV (1) i ) 2 ( ) 2 ˆV (2) i, or ˆV (1) i ˆV (2) i δ i = D KL (ˆq (1) i, ˆq (2) i ), D H (ˆq (1) i, ˆq (2) i ), ord 2 (ˆq (1) i, ˆq (2) i )
18 18 Discriminant Measures... Important to consider two-dimensional projection x x 1 Applicable to the complex coefficients of local Fourier bases
19 19 Example 1: Signal Shape Classification Objective: Classify synthetic signals of length 128 to three possible classes, c(i) = (6 + η) χ [a,b] (i) + ɛ(i) for cylinder, b(i) = (6 + η) χ [a,b] (i) (i a)/(b a) + ɛ(i) for bell, f(i) = (6 + η) χ [a,b] (i) (b i)/(b a) + ɛ(i) for funnel. where a = unif[16, 32], b a = unif[32, 96], η N(0, 1), ɛ(i) i.i.d. N(0, 1).
20 20 Example Waveforms "Cylinder" signals "Bell" signals "Funnel" signals
21 21 Example 1: Signal Shape Classification... Top 10 LDFs Top 10 LDBs Selected Nodes by LDB
22 22 Example 1: Signal Shape Classification... Table of Misclassifications (average over 10 simulations) Method (Coordinates) Training Data Test Data LDA on STD 0.83 % % CT on STD 2.83 % % LDA on Top 10 LDB 7.00 % 8.37 % CT on Top 10 LDB 2.67 % 5.54 % 100 training signals and 1000 test signals were generated for each class per simulation 12-tap Coiflet filter was used for LDB
23 Example 1: Signal Shape Classification... Full Classification Tree on Top 10 LDB Coordinates cylinder 200/300 ldc.1< ldc.1> bell 1/99 ldc.5< ldc.5> bell 1/5 bell 0/94 21/120 ldc.7< /103 ldc.1< ldc.1> funnel 3/7 funnel 0/18 funnel funnel cylinder 101/201 ldc.5< ldc.5> funnel 5/96 ldc.5< ldc.5> funnel 2/91 ldc.9< ldc.9> funnel funnel 2/5 funnel 2/23 0/68 ldc.9< ldc.9> cylinder 2/5 ldc.7> /17 ldc.3< ldc.3> cylinder 0/12 cylinder 0/81 cylinder funnel 1/5 23
24 24 Example 2: Classification of Geophysical Acoustic Waveforms 402 acoustic waveforms (256 time samples) were recorded at a gas producing well. Region consists of sand or shale layers. Sand layers contain either water or gas Time (sec)
25 25 Sonic Waveform Measurement Borehole Layer #1 R Layer #2 T
26 26 Objectives Can we classify acoustic waveforms in terms of mineral contents of layers? Can we automate the classification process? If so, what features (or wave components) are important? Pressure (Pa) P S Stoneley Time (sec)
27 27 Classification Procedure Adopt 10-fold cross validation; split waveforms randomly into training and test datasets, and repeat the experiments. Decompose training waveforms into the local sine dictionary. Choose the LDB using various discrepancy measures. Choose 25, 50, 75, 100 most discriminant coordinates. Construct classifiers (LDA and Classification Tree) using these. Decompose test waveforms into the LDB and classify them. Compute the average misclassification rates.
28 28 Misclassification Rates for Test Data Misclassification Rate (%) HJ Solid Lines: LDA Dotted Lines: CT I W S WS S S I W W HJ I HJ HI HJ J I S S S HS J I I HJ I WH J W W Number of LDB Features
29 29 The Best LDB Vectors Time (sec)
30 30 The Best LDB Pattern Time (sec)
31 31 The Best LDB Vectors... Top 10 most influential LDB vectors in LDA Time (sec)
32 32 Observations LDA on LDB gave better results than CT on LDB = features are oblique. Top 25 LDB features were clearly not sufficient. Supplying too many features degraded the performance. Most important LDB vectors are clustered around P wave components.
33 33 Improved LDB with Empirical PD Estimation This can be viewed as a specific yet fast version of the Projection Pursuit algorithm (Friedman-Tukey, Huber). Compared to top-down strategy of PP, LDB is not greedy, i.e., bottom-up approach. This approach also works for complex-valued expansion coefficients (e.g., local Fourier bases). How to estimate the empirical pdf? histograms, ASH (averaged shifted histograms), nearest neighbor, kernel-based methods...
34 34 Conclusion LDB functions can capture relevant local features in data Interpretation of the results becomes easier and more intuitive Computational cost is at most O(n[log n] 2 ) These methods enhance both traditional and modern statistical methods
35 35 Conclusion... Our algorithms are being applied and tested to: Geophysical signal/image classification at Schlumberger Noise reduction in hearing aids at Northwestern University Diagnostics of mammography at University of Chicago Hospital Radar target discrimination at Lockeed-Martin
36 36 Future Directions How about the good basis for clustering? Parameterization of low dimensional structures in high dimensional space Explore the relationship with the David-Jones-Semmes geometric analysis Develop two dimensional version for the complex coefficients provided by the local Fourier bases or brushlets
PATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationIMPROVED LOCAL DISCRIMINANT BASES USING EMPIRICAL PROBABILITY DENSITY ESTIMATION
IMPRVED LCAL DISCRIMINANT BASES USING EMPIRICAL PRBABILIT DENSIT ESTIMATIN Naoki Saito, Schlumberger-Doll Research, Ronald R. Coifman, ale University Naoki Saito, Schlumberger-Doll Research, ld Quarry
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationCurve learning. p.1/35
Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35 The problem
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationIs cross-validation valid for small-sample microarray classification?
Is cross-validation valid for small-sample microarray classification? Braga-Neto et al. Bioinformatics 2004 Topics in Bioinformatics Presented by Lei Xu October 26, 2004 1 Review 1) Statistical framework
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationGenerative MaxEnt Learning for Multiclass Classification
Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,
More informationThe Generalized Haar-Walsh Transform (GHWT) for Data Analysis on Graphs and Networks
The Generalized Haar-Walsh Transform (GHWT) for Data Analysis on Graphs and Networks Jeff Irion & Naoki Saito Department of Mathematics University of California, Davis SIAM Annual Meeting 2014 Chicago,
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationPattern Recognition 2
Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationApplied Machine Learning for Biomedical Engineering. Enrico Grisan
Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationJulesz s Quest for texture perception
A Mathematical Theory for Texture, Texton, Primal Sketch and Gestalt Fields Departments of Statistics and Computer Science University of California, Los Angeles Joint work with Y.Wu, C.Guo and D. Mumford,
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationStatistical Rock Physics
Statistical - Introduction Book review 3.1-3.3 Min Sun March. 13, 2009 Outline. What is Statistical. Why we need Statistical. How Statistical works Statistical Rock physics Information theory Statistics
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationSparse linear models
Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More informationFeature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with
Feature selection c Victor Kitov v.v.kitov@yandex.ru Summer school on Machine Learning in High Energy Physics in partnership with August 2015 1/38 Feature selection Feature selection is a process of selecting
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationMixture of Gaussians Models
Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationKernel Density Estimation
Kernel Density Estimation If Y {1,..., K} and g k denotes the density for the conditional distribution of X given Y = k the Bayes classifier is f (x) = argmax π k g k (x) k If ĝ k for k = 1,..., K are
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationVariational sampling approaches to word confusability
Variational sampling approaches to word confusability John R. Hershey, Peder A. Olsen and Ramesh A. Gopinath {jrhershe,pederao,rameshg}@us.ibm.com IBM, T. J. Watson Research Center Information Theory and
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationOn Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong
On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality Weiqiang Dong 1 The goal of the work presented here is to illustrate that classification error responds to error in the target probability estimates
More informationBayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI
Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationStatistical Modeling of Visual Patterns: Part I --- From Statistical Physics to Image Models
Statistical Modeling of Visual Patterns: Part I --- From Statistical Physics to Image Models Song-Chun Zhu Departments of Statistics and Computer Science University of California, Los Angeles Natural images
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationNotation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions
Notation S pattern space X feature vector X = [x 1,...,x l ] l = dim{x} number of features X feature space K number of classes ω i class indicator Ω = {ω 1,...,ω K } g(x) discriminant function H decision
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationCS Machine Learning Qualifying Exam
CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationMaximum Entropy Generative Models for Similarity-based Learning
Maximum Entropy Generative Models for Similarity-based Learning Maya R. Gupta Dept. of EE University of Washington Seattle, WA 98195 gupta@ee.washington.edu Luca Cazzanti Applied Physics Lab University
More informationA Modular NMF Matching Algorithm for Radiation Spectra
A Modular NMF Matching Algorithm for Radiation Spectra Melissa L. Koudelka Sensor Exploitation Applications Sandia National Laboratories mlkoude@sandia.gov Daniel J. Dorsey Systems Technologies Sandia
More informationLearning Linear Detectors
Learning Linear Detectors Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today Detection versus Classification Bayes Classifiers Linear Classifiers Examples of Detection 3 Learning: Detection
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationOnline Estimation of Discrete Densities using Classifier Chains
Online Estimation of Discrete Densities using Classifier Chains Michael Geilke 1 and Eibe Frank 2 and Stefan Kramer 1 1 Johannes Gutenberg-Universtität Mainz, Germany {geilke,kramer}@informatik.uni-mainz.de
More informationW compression of a variety of signals such as sound and
EEE TRANSACTONS ON NFORMATON THEORY. VOL. 38, NO. 2, MARCH 1992 713 Entropy-Based Algorithms for Best Basis Selection Ronald R. Coifman and Mladen Victor Wickerhauser Abstract-Adapted waveform analysis
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationStatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech
StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data Ilya Narsky, Caltech Motivation Introduction advanced classification tools in a convenient C++ package for HEP researchers
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationDoug Cochran. 3 October 2011
) ) School Electrical, Computer, & Energy Arizona State University (Joint work with Stephen Howard and Bill Moran) 3 October 2011 Tenets ) The purpose sensor networks is to sense; i.e., to enable detection,
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationDigital Image Processing
Digital Image Processing Wavelets and Multiresolution Processing (Wavelet Transforms) Christophoros Nikou cnikou@cs.uoi.gr University of Ioannina - Department of Computer Science 2 Contents Image pyramids
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationInvariant Scattering Convolution Networks
Invariant Scattering Convolution Networks Joan Bruna and Stephane Mallat Submitted to PAMI, Feb. 2012 Presented by Bo Chen Other important related papers: [1] S. Mallat, A Theory for Multiresolution Signal
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More information