PATTERN CLASSIFICATION


 Kathlyn Anis Sparks
 1 years ago
 Views:
Transcription
1 PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wileylnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto
2 CONTENTS PREFACE XVII INTRODUCTION 1.1 Machine Perception, An Example, Related Fields, Pattern Recognition Systems, Sensing, Segmentation and Grouping, Feature Extraction, Classification, Postprocessing, The Design Cycle, Data Collection, Feature Choice, Model Choice, Training, Evaluation, Computational Complexity, Learning and Adaptation, Supervised Learning, Unsupervised Learning, Reinforcement Learning, Conclusion, 17 Summary by Chapters, 17 Bibliographical and Historical Remarks, 18 Bibliography, 19 BAYESIAN DECISION THEORY Introduction, Bayesian Decision Theory Continuous Features, TwoCategory Classification, MinimumErrorRate Classification, 26 *2.3.1 Minimax Criterion, 27 VII
3 *2.3.2 NeymanPearson Criterion, Classifiers, Discriminant Functions, and Decision Surfaces, The Multicategory Case, The TwoCategory Case, The Normal Density, Univariate Density, Multivariate Density, Discriminant Functions for the Normal Density, Case 1: X, = a 2 I, Case 2: X; = X, Case 3: X, = arbitrary, 41 Example 1 Decision Regions for TwoDimensional Gaussian Data, 41 *2.7 Error Probabilities and Integrals, 45 *2.8 Error Bounds for Normal Densities, Chernoff Bound, Bhattacharyya Bound, 47 Example 2 Error Bounds for Gaussian Distributions, Signal Detection Theory and Operating Characteristics, Bayes Decision Theory Discrete Features, Independent Binary Features, 52 Example 3 Bayesian Decisions for ThreeDimensional Binary Data, 53 *2.10 Missing and Noisy Features, Missing Features, Noisy Features, 55 *2.11 Bayesian Belief Networks, 56 Example 4 Belief Network for Fish, 59 *2.12 Compound Bayesian Decision Theory and Context, 62 Summary, 63 Bibliographical and Historical Remarks, 64 Problems, 65 Computer exercises, 80 Bibliography, 82 MAXIMUMLIKELIHOOD AND BAYESIAN PARAMETER ESTIMATION Introduction, MaximumLikelihood Estimation, The General Principle, The Gaussian Case: Unknown /ut, The Gaussian Case: Unknown /u, and X, Bias, Bayesian Estimation, The ClassConditional Densities, The Parameter Distribution, Bayesian Parameter Estimation: Gaussian Case, The Univariate Case: p(p\t>), The Univariate Case: p(x V), The Multivariate Case, 95
4 CONTENTS ix 3.5 Bayesian Parameter Estimation: General Theory, 97 Example l Recursive Bayes Learning, When Do MaximumLikelihood and Bayes Methods Differ?, Noninformative Priors and Invariance, Gibbs Algorithm, 102 *3.6 Sufficient Statistics, Sufficient Statistics and the Exponential Family, Problems of Dimensionality, Accuracy, Dimension, and Training Sample Size, Computational Complexity, Overfitting, 113 *3.8 Component Analysis and Discriminants, Principal Component Analysis (PCA), Fisher Linear Discriminant, Multiple Discriminant Analysis, 121 *3.9 ExpectationMaximization (EM), 124 Example 2 ExpectationMaximization for a 2D Normal Model, Hidden Markov Models, FirstOrder Markov Models, FirstOrder Hidden Markov Models, Hidden Markov Model Computation, Evaluation, 131 Example 3 Hidden Markov Model, Decoding, 135 Example 4 HMM Decoding, Learning, 137 Summary, 139 Bibliographical and Historical Remarks, 139 Problems, 140 Computer exercises, 155 Bibliography, 159 NONPARAMEJRIC TECHNIQUES Introduction, Density Estimation, Parzen Windows, Convergence of the Mean, Convergence of the Variance, Illustrations, Classification Example, Probabilistic Neural Networks (PNNs), Choosing the Window Function, ^ NearestNeighbor Estimation, ^ NearestNeighbor and ParzenWindow Estimation, Estimation of A Posteriori Probabilities, The NearestNeighbor Rule, Convergence of the Nearest Neighbor, Error Rate for the NearestNeighbor Rule, Error Bounds, The ^NearestNeighbor Rule, 182 j
5 4.5.5 Computational Complexity of the кnearestneighbor Rule, Metrics and NearestNeighbor Classification, Properties of Metrics, Tangent Distance, 188 *4.7 Fuzzy Classification, 192 *4.8 Reduced Coulomb Energy Networks, Approximations by Series Expansions, 197 Summary, 199 Bibliographical and Historical Remarks, 200 Problems, 201 Computer exercises, 209 Bibliography, 213 LINEAR DISCRIMINANT FUNCTIONS Introduction, Linear Discriminant Functions and Decision Surfaces, The TwoCategory Case, The Multicategory Case, Generalized Linear Discriminant Functions, The TwoCategory Linearly Separable Case, Geometry and Terminology, Gradient Descent Procedures, Minimizing the Perceptron Criterion Function, The Perceptron Criterion Function, Convergence Proof for SingleSample Correction, Some Direct Generalizations, Relaxation Procedures, The Descent Algorithm, Convergence Proof, Nonseparable Behavior, Minimum SquaredError Procedures, Minimum SquaredError and the Pseudoinverse, 240 Example 1 Constructing a Linear Classifier by Matrix Pseudoinverse, Relation to Fisher's Linear Discriminant, Asymptotic Approximation to an Optimal Discriminant, The WidrowHoff or LMS Procedure, Stochastic Approximation Methods, The HoKashyap Procedures, The Descent Procedure, Convergence Proof, Nonseparable Behavior, Some Related Procedures, 253 *5.10 Linear Programming Algorithms, Linear Programming, The Linearly Separable Case, Minimizing the Perceptron Criterion Function, 258 *5.11 Support Vector Machines, SVM Training, 263 Example 2 SVM for the XOR Problem, 264
6 CONTENTS xi 5.12 Multicategory Generalizations, Kesler's Construction, Convergence of the FixedIncrement Rule, Generalizations for MSE Procedures, 268 Summary, 269 Bibliographical and Historical Remarks, 270 Problems, 271 Computer exercises, 278 Bibliography, 281 MULTILAYER NEURAL NETWORKS Introduction, Feedforward Operation and Classification, General Feedforward Operation, Expressive Power of Multilayer Networks, Backpropagation Algorithm, Network Learning, Training Protocols, Learning Curves, Error Surfaces, Some Small Networks, The ExclusiveOR (XOR), Larger Networks, How Important Are Multiple Minima?, Backpropagation as Feature Mapping, Representations at the Hidden Layer Weights, Backpropagation, Bayes Theory and Probability, Bayes Discriminants and Neural Networks, Outputs as Probabilities, 304 *6.7 Related Statistical Techniques, Practical Techniques for Improving Backpropagation, Activation Function, Parameters for the Sigmoid, Scaling Input, Target Values, Training with Noise, Manufacturing Data, Number of Hidden Units, Initializing Weights, Learning Rates, Momentum, Weight Decay, Hints, OnLine, Stochastic or Batch Training?, Stopped Training, Number of Hidden Layers, Criterion Function, 318 *6.9 SecondOrder Methods, Hessian Matrix, Newton's Method, 319
7 6.9.3 Quickprop, Conjugate Gradient Descent, 321 Example 1 Conjugate Gradient Descent, 322 *6.10 Additional Networks and Training Methods, Radial Basis Function Networks (RBFs), Special Bases, Matched Filters, Convolutional Networks, Recurrent Networks, CascadeCorrelation, Regularization, Complexity Adjustment and Praning, 330 Summary, 333 Bibliographical and Historical Remarks, 333 Problems, 335 Computer exercises, 343 Bibliography, 347 STOCHASTIC METHODS Introduction, Stochastic Search, Simulated Annealing, The Boltzmann Factor, Deterministic Simulated Annealing, Boltzmann Learning, Stochastic Boltzmann Learning of Visible States, Missing Features and Category Constraints, Deterministic Boltzmann Learning, Initialization and Setting Parameters, 367 *7.4 Boltzmann Networks and Graphical Models, Other Graphical Models, 372 *7.5 Evolutionary Methods, Genetic Algorithms, Further Heuristics, Why Do They Work?, 378 *7.6 Genetic Programming, 378 Summary, 381 Bibliographical and Historical Remarks, 381 Problems, 383 Computer exercises, 388 Bibliography, NONMETRIC METHODS Introduction, Decision Trees, CART, Number of Splits, Query Selection and Node Impurity, When to Stop Splitting, Pruning, 403
8 CONTENTS XIII Assignment of Leaf Node Labels, 404 Example 1 A Simple Tree, Computational Complexity, Feature Choice, Multivariate Decision Trees, Priors and Costs, Missing Attributes, 409 Example 2 Surrogate Splits and Missing Attributes, Other Tree Methods, ID3, C4.5, Which Tree Classifier Is Best?, 412 *8.5 Recognition with Strings, String Matching, Edit Distance, Computational Complexity, String Matching with Errors, String Matching with the "Don'tCare" Symbol, Grammatical Methods, Grammars, Types of String Grammars, 424 Example 3 A Grammar for Pronouncing Numbers, Recognition Using Grammars, Grammatical Inference, 429 Example 4 Grammatical Inference, 431 *8.8 RuleBased Methods, Learning Rules, 433 Summary, 434 Bibliographical and Historical Remarks, 435 Problems, 437 Computer exercises, 446 Bibliography, 450 ALGORITHMINDEPENDENT MACHINE LEARNING Introduction, Lack of Inherent Superiority of Any Classifier, No Free Lunch Theorem, 454 Example 1 No Free Lunch for Binary Data, 457 *9.2.2 Ugly Duckling Theorem, Minimum Description Length (MDL), Minimum Description Length Principle, Overfitting Avoidance and Occam's Razor, Bias and Variance, Bias and Variance for Regression, Bias and Variance for Classification, Resampling for Estimating Statistics, Jackknife, 472 Example 2 Jackknife Estimate of Bias and Variance of the Mode, Bootstrap, Resampling for Classifier Design, 475
9 9.5.1 Bagging, Boosting, Learning with Queries, Arcing, Learning with Queries, Bias and Variance, Estimating and Comparing Classifiers, Parametric Models, CrossValidation, Jackknife and Bootstrap Estimation of Classification Accuracy, MaximumLikelihood Model Comparison, Bayesian Model Comparison, The ProblemAverage Error Rate, Predicting Final Performance from Learning Curves, The Capacity of a Separating Plane, Combining Classifiers, Component Classifiers with Discriminant Functions, Component Classifiers without Discriminant Functions, 498 Summary, 499 Bibliographical and Historical Remarks, 500 Problems, 502 Computer exercises, 508 Bibliography, UNSUPERVISED LEARNING AND CLUSTERING Introduction, Mixture Densities and Identifiability, MaximumLikelihood Estimates, Application to Normal Mixtures, Case 1: Unknown Mean Vectors, Case 2: All Parameters Unknown, fcmeans Clustering, 526 * Fuzzy itmeans Clustering, Unsupervised Bayesian Learning, The Bayes Classifier, Learning the Parameter Vector, 531 Example 1 Unsupervised Learning of Gaussian Data, DecisionDirected Approximation, Data Description and Clustering, Similarity Measures, Criterion Functions for Clustering, The SumofSquaredError Criterion, Related Minimum Variance Criteria, Scatter Criteria, 544 Example 2 Clustering Criteria, 546 *10.8 Iterative Optimization, Hierarchical Clustering, Definitions, Agglomerative Hierarchical Clustering, StepwiseOptimal Hierarchical Clustering, Hierarchical Clustering and Induced Metrics, 556 * The Problem of Validity, 557
10 CONTENTS XV *10.11 Online clustering, Unknown Number of Clusters, Adaptive Resonance, Learning with a Critic, 565 *10.12 GraphTheoretic Methods, Component Analysis, Principal Component Analysis (PCA), Nonlinear Component Analysis (NLCA), 569 * Independent Component Analysis (ICA), LowDimensional Representations and Multidimensional Scaling (MDS), SelfOrganizing Feature Maps, Clustering and Dimensionality Reduction, 580 Summary, 581 Bibliographical and Historical Remarks, 582 Problems, 583 Computer exercises, 593 Bibliography, 598 MATHEMATICAL FOUNDATIONS 601 A.l Notation, 601 A.2 Linear Algebra, 604 A.2.1 Notation and Preliminaries, 604 A.2.2 Inner Product, 605 A.2.3 Outer Product, 606 A.2.4 Derivatives of Matrices, 606 A.2.5 Determinant and Trace, 608 A.2.6 Matrix Inversion, 609 A.2.7 Eigenvectors and Eigenvalues, 609 A.3 Lagrange Optimization, 610 A.4 Probability Theory, 611 A.4.1 Discrete Random Variables, 611 A.4.2 Expected Values, 611 A.4.3 Pairs of Discrete Random Variables, 612 A.4.4 Statistical Independence, 613 A.4.5 Expected Values of Functions of Two Variables, 613 A.4.6 Conditional Probability, 614 A.4.7 The Law of Total Probability and Bayes Rule, 615 A.4.8 Vector Random Variables, 616 A.4.9 Expectations, Mean Vectors and Covariance Matrices, 617 A.4.10 Continuous Random Variables, 618 A.4.11 Distributions of Sums of Independent Random Variables, 620 A.4.12 Normal Distributions, 621 A.5 Gaussian Derivatives and Integrals, 623 A.5.1 Multivariate Normal Densities, 624 A.5.2 Bivariate Normal Densities, 626 A.6 Hypothesis Testing, 628 A.6.1 ChiSquared Test, 629 A.7 Information Theory, 630 A.7.1 Entropy and Information, 630
11 xvi CONTENTS A.7.2 Relative Entropy, 632 A.7.3 Mutual Information, 632 A.8 Computational Complexity, 633 Bibliography, 635 INDEX 637
Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Nonparametric
More informationCondensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.
Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two onepage, twosided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationLessons in Estimation Theory for Signal Processing, Communications, and Control
Lessons in Estimation Theory for Signal Processing, Communications, and Control Jerry M. Mendel Department of Electrical Engineering University of Southern California Los Angeles, California PRENTICE HALL
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationHoldout and CrossValidation Methods Overfitting Avoidance
Holdout and CrossValidation Methods Overfitting Avoidance Decision Trees Reduce error pruning Costcomplexity pruning Neural Networks Early stopping Adjusting Regularizers via CrossValidation Nearest
More informationInformation Dynamics Foundations and Applications
Gustavo Deco Bernd Schürmann Information Dynamics Foundations and Applications With 89 Illustrations Springer PREFACE vii CHAPTER 1 Introduction 1 CHAPTER 2 Dynamical Systems: An Overview 7 2.1 Deterministic
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) YungKyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More information10701/ Machine Learning, Fall
070/578 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sumofsquares error is the most common training
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009  The exam is openbook, opennotes, no electronics other than calculators.  The maximum possible score on this exam is 100. You have 3
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 MultiLayer Perceptrons The BackPropagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwthaachen.de leibe@vision.rwthaachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationMachine Learning, Midterm Exam
10601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv BarJoseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationCENG 793. On Machine Learning and Optimization. Sinan Kalkan
CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches KNN Support Vector Machines Softmax classification / logistic regression Parzen
More informationEMalgorithm for Training of Statespace Models with Application to Time Series Prediction
EMalgorithm for Training of Statespace Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology  Neural Networks Research
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I SmallSample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: MultiLayer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationMachine Learning. Regression. Manfred Huber
Machine Learning Regression Manfred Huber 2015 1 Regression Regression refers to supervised learning problems where the target output is one or more continuous values Continuous output values imply that
More informationPART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics
Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability
More informationTime Series: Theory and Methods
Peter J. Brockwell Richard A. Davis Time Series: Theory and Methods Second Edition With 124 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition vn ix CHAPTER 1 Stationary
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationComplex Valued Nonlinear Adaptive Filters
Complex Valued Nonlinear Adaptive Filters Noncircularity, Widely Linear and Neural Models Danilo P. Mandic Imperial College London, UK Vanessa Su Lee Goh Shell EP, Europe WILEY A John Wiley and Sons, Ltd,
More informationIntroduction to Machine Learning Midterm Exam Solutions
10701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv BarJoseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationArtificial Neural Networks (ANN)
Artificial Neural Networks (ANN) Edmondo Trentin April 17, 2013 ANN: Definition The definition of ANN is given in 3.1 points. Indeed, an ANN is a machine that is completely specified once we define its:
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationMidterm, Fall 2003
578 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more timeconsuming and is worth only three points, so do not attempt 9 unless you are
More informationPart I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes
Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More information2 CONTENTS. Bibliography Index... 70
Contents 9 Algorithmindependent machine learning 3 9.1 Introduction................................. 3 9.2 Lack of inherent superiority of any classifier............... 4 9.2.1 No Free Lunch Theorem......................
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationDETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja
DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box
More informationC4.5  pruning decision trees
C4.5  pruning decision trees Quiz 1 Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can have? A: No. Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationIn the Name of God. Lectures 15&16: Radial Basis Function Networks
1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training
More informationStatistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames
Statistical Methods in HYDROLOGY CHARLES T. HAAN The Iowa State University Press / Ames Univariate BASIC Table of Contents PREFACE xiii ACKNOWLEDGEMENTS xv 1 INTRODUCTION 1 2 PROBABILITY AND PROBABILITY
More informationLMS Algorithm Summary
LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal
More informationMultivariate Analysis Techniques in HEP
Multivariate Analysis Techniques in HEP Jan Therhaag IKTP Institutsseminar, Dresden, January 31 st 2013 Multivariate analysis in a nutshell Neural networks: Defeating the black box Boosted Decision Trees:
More informationMachine learning for pervasive systems Classification in highdimensional spaces
Machine learning for pervasive systems Classification in highdimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationNEURAL NETWORKS and PATTERN RECOGNITION
Anna Bartkowiak NEURAL NETWORKS and PATTERN RECOGNITION Lecture notes Fall 2004 Institute of Computer Science, University of Wroc law CONTENTS 2 Contents 2 Bayesian Decision theory 5 2.1 Setting Decision
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees XinShun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: KMeans #3: SVM
More informationExample  basketball players and jockeys. We will keep practical applicability in mind:
Sonka: Pattern Recognition Class 1 INTRODUCTION Pattern Recognition (PR) Statistical PR Syntactic PR Fuzzy logic PR Neural PR Example  basketball players and jockeys We will keep practical applicability
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte  CMPT 726 Bishop PRML Ch. 4 Classification: Handwritten Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II  Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and NonParametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationData Mining Techniques
Data Mining Techniques CS 6220  Section 3  Fall 2016 Lecture 21: Review JanWillem van de Meent Schedule Topics for Exam PreMidterm Probability Information Theory Linear Regression Classification Clustering
More informationIncremental Stochastic Gradient Descent
Incremental Stochastic Gradient Descent Batch mode : gradient descent w=w  η E D [w] over the entire data D E D [w]=1/2σ d (t d o d ) 2 Incremental mode: gradient descent w=w  η E d [w] over individual
More informationA Course in Time Series Analysis
A Course in Time Series Analysis Edited by DANIEL PENA Universidad Carlos III de Madrid GEORGE C. TIAO University of Chicago RUEY S. TSAY University of Chicago A WileyInterscience Publication JOHN WILEY
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training
More informationMachine Learning 2nd Edition
ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://wwwcstauacil/~apartzin/machinelearning/ and wwwcsprincetonedu/courses/archive/fall01/cs302 /notes/11/emppt The MIT Press, 2010
More informationStochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS
Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationLinks between Perceptrons, MLPs and SVMs
Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:
More informationProbabilistic Methods in Bioinformatics. Pabitra Mitra
Probabilistic Methods in Bioinformatics Pabitra Mitra pabitra@cse.iitkgp.ernet.in Probability in Bioinformatics Classification Categorize a new object into a known class Supervised learning/predictive
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationArtificial neural networks
Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural
More informationPRINCIPLES OF STATISTICAL INFERENCE
Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a NeoFisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongamro, Namgu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationFeature selection and extraction Spectral domain quality estimation Alternatives
Feature selection and extraction Error estimation Maa57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 20052007 Carlos Guestrin 1 Generative
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationNew Introduction to Multiple Time Series Analysis
Helmut Lütkepohl New Introduction to Multiple Time Series Analysis With 49 Figures and 36 Tables Springer Contents 1 Introduction 1 1.1 Objectives of Analyzing Multiple Time Series 1 1.2 Some Basics 2
More informationCHAPTER17. Decision Tree Induction
CHAPTER17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationNumerical Analysis for Statisticians
Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method
More informationoutput dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t1 x t+1
To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 67 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationAdvanced statistical methods for data analysis Lecture 1
Advanced statistical methods for data analysis Lecture 1 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationNonparametric Classification of Facial Features
Nonparametric Classification of Facial Features Hyun Sung Chang Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Problem statement In this project, I attempted
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More information10810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semisupervised, or lightly supervised learning. In this kind of learning either no labels are
More informationW vs. QCD Jet Tagging at the Large Hadron Collider
W vs. QCD Jet Tagging at the Large Hadron Collider Bryan Anenberg: anenberg@stanford.edu; CS229 December 13, 2013 Problem Statement High energy collisions of protons at the Large Hadron Collider (LHC)
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationDecision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)
Decision Trees Lewis Fishgold (Material in these slides adapted from Ray Mooney's slides on Decision Trees) Classification using Decision Trees Nodes test features, there is one branch for each value of
More informationRobust Classification using Boltzmann machines by Vasileios Vasilakakis
Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,
More informationArtifical Neural Networks
Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai ShalevShwartz and Shai BenDavid, and others 1
More informationPart 8: Neural Networks
METU Informatics Institute Min720 Pattern Classification ith BioMedical Applications Part 8: Neural Netors  INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron:  A nerve cell as
More information