Introduction to Machine Learning

Similar documents
INTRODUCTION TO PATTERN

INTRODUCTION TO PATTERN RECOGNITION

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

PATTERN RECOGNITION AND MACHINE LEARNING

Issues and Techniques in Pattern Classification

CS 6375 Machine Learning

Unsupervised Learning

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Lecture 7: Con3nuous Latent Variable Models

How to do backpropagation in a brain

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Data Informatics. Seon Ho Kim, Ph.D.

Mathematical Formulation of Our Example

Machine Learning and Deep Learning! Vincent Lepetit!

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

CS-E3210 Machine Learning: Basic Principles

ECE 5984: Introduction to Machine Learning

Linear classifiers Lecture 3

What is semi-supervised learning?

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

CS798: Selected topics in Machine Learning

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers

The Perceptron. Volker Tresp Summer 2016

Introduction To Artificial Neural Networks

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

The Perceptron. Volker Tresp Summer 2014

Lecture 11 Linear regression

Foundations of Machine Learning

Machine Learning Techniques

Recap from previous lecture

2018 EE448, Big Data Mining, Lecture 4. (Part I) Weinan Zhang Shanghai Jiao Tong University

STA 414/2104: Lecture 8

Machine Learning Lecture 1

Introduction to Machine Learning

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Lecture 4: Feed Forward Neural Networks

Adversarial Examples

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning 2nd Edition

Multivariate Methods in Statistical Data Analysis

MIRA, SVM, k-nn. Lirong Xia

Machine Learning: A change of paradigm in Flow Control?

Classification with Perceptrons. Reading:

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 188: Artificial Intelligence Spring Announcements

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Dimensionality Reduction

Learning with multiple models. Boosting.

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Experiments on the Consciousness Prior

Neural networks and optimization

Introduction. Chapter 1

Fundamentals of Machine Learning

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Bayesian Machine Learning

Chemometrics: Classification of spectra

Spatial Transformer Networks

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

6.867 Machine Learning

Neural Networks biological neuron artificial neuron 1

Linear Models for Classification

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

Linear Models for Regression. Sargur Srihari

Non-parametric Methods

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

CS 231A Section 1: Linear Algebra & Probability Review

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Machine Learning (CS 567) Lecture 2

An Introduction to Statistical and Probabilistic Linear Models

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

The problem of searching for patterns in data is a fundamental one and has a long and successful history. For instance, the extensive astronomical

Machine Learning - MT & 14. PCA and MDS

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Course 395: Machine Learning

12. Machine learning I. Regression Overfitting Classification Example: Handwritten digit recognition Support vector machines (SVMs)

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Reward-modulated inference

Discovery Through Situational Awareness

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Machine Learning Lecture 7

Least Mean Squares Regression

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Finding Multiple Outliers from Multidimensional Data using Multiple Regression

CS534 Machine Learning - Spring Final Exam

Microarray Data Analysis: Discovery

4 Bias-Variance for Ridge Regression (24 points)

Machine Learning Basics

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04

Neural networks and optimization

CSC321 Lecture 20: Autoencoders

Transcription:

Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage August 2, 2017 (Dr. Mihail) Introduction to Machine Learning August 2, 2017 1 / 24

Machine Learning (ML) What is ML? Machine learning is the study of data-driven methods capable of mimicking, understanding and aiding human and biological information processing tasks. Machine learning can also be thought of as the science of getting machines to solve problems without being explicitly programmed how to. Related issues: how to find patterns in data how to compress data how to interpret data how to process data (Dr. Mihail) Introduction to Machine Learning August 2, 2017 2 / 24

Patterns Finding patterns is fundamental The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories. (Dr. Mihail) Introduction to Machine Learning August 2, 2017 3 / 24

Hard problems for ML Credit card fraud detection (Dr. Mihail) Introduction to Machine Learning August 2, 2017 4 / 24

Hard problems for ML Face detection and recognition (Dr. Mihail) Introduction to Machine Learning August 2, 2017 5 / 24

Hard problems for ML Self driving cars (Dr. Mihail) Introduction to Machine Learning August 2, 2017 6 / 24

Hard problems for ML Weather prediction (Dr. Mihail) Introduction to Machine Learning August 2, 2017 7 / 24

Hard problems for ML Recommender systems (Dr. Mihail) Introduction to Machine Learning August 2, 2017 8 / 24

Hard problems for ML Cyber security (Dr. Mihail) Introduction to Machine Learning August 2, 2017 9 / 24

Handwritten Digit Recognition Commercial systems currently in use by USPS How would you solve this problem? (Dr. Mihail) Introduction to Machine Learning August 2, 2017 10 / 24

Handwritten Digit Recognition Problem Each digit corresponds to a 28 x 28 pixel image and so can be represented by a vector x comprising 784 real numbers. The goal is to build a machine that will take such a vector x as input and that will produce the identity of the digit 0,..., 9 as the output. This is a nontrivial problem due to the wide variability of handwriting. It could be solved in several ways: Craft rules or heuristics, such as shape of strokes. What are some shortcomings of this method? (Dr. Mihail) Introduction to Machine Learning August 2, 2017 11 / 24

Handwritten Digit Recognition Problem Each digit corresponds to a 28 x 28 pixel image and so can be represented by a vector x comprising 784 real numbers. The goal is to build a machine that will take such a vector x as input and that will produce the identity of the digit 0,..., 9 as the output. This is a nontrivial problem due to the wide variability of handwriting. It could be solved in several ways: Craft rules or heuristics, such as shape of strokes. What are some shortcomings of this method? Uncontrolled growth of rules, invariably gives poor results. (Dr. Mihail) Introduction to Machine Learning August 2, 2017 11 / 24

Handwritten Digit Recognition Problem Each digit corresponds to a 28 x 28 pixel image and so can be represented by a vector x comprising 784 real numbers. The goal is to build a machine that will take such a vector x as input and that will produce the identity of the digit 0,..., 9 as the output. This is a nontrivial problem due to the wide variability of handwriting. It could be solved in several ways: Craft rules or heuristics, such as shape of strokes. What are some shortcomings of this method? Uncontrolled growth of rules, invariably gives poor results. Adopt a ML based approach, where: A large set of N digits {x 1, x 2,..., x n } called a training set is used to tune the parameters of an adaptive method The categories are known ahead of time, typically but not always, by hand-labeling them, expressed by a target vector t, which represents the identiy of the digit (Dr. Mihail) Introduction to Machine Learning August 2, 2017 11 / 24

ML Algorithm The result of running the machine learning algorithm can be expressed as a function y(x) which takes a new digit image x as input and that generates an output vector y, encoded in the same way as the target vectors (Dr. Mihail) Introduction to Machine Learning August 2, 2017 12 / 24

ML Algorithm The result of running the machine learning algorithm can be expressed as a function y(x) which takes a new digit image x as input and that generates an output vector y, encoded in the same way as the target vectors The precise form of the function y(x) is determined during the training phase, also known as the learning phase, on the basis of the training data (Dr. Mihail) Introduction to Machine Learning August 2, 2017 12 / 24

ML Algorithm The result of running the machine learning algorithm can be expressed as a function y(x) which takes a new digit image x as input and that generates an output vector y, encoded in the same way as the target vectors The precise form of the function y(x) is determined during the training phase, also known as the learning phase, on the basis of the training data Once the model is trained it can then determine the identity of new digit images, which are said to comprise a test set (Dr. Mihail) Introduction to Machine Learning August 2, 2017 12 / 24

ML Algorithm The result of running the machine learning algorithm can be expressed as a function y(x) which takes a new digit image x as input and that generates an output vector y, encoded in the same way as the target vectors The precise form of the function y(x) is determined during the training phase, also known as the learning phase, on the basis of the training data Once the model is trained it can then determine the identity of new digit images, which are said to comprise a test set The ability to categorize correctly new examples that differ from those used for training is known as generalization (Dr. Mihail) Introduction to Machine Learning August 2, 2017 12 / 24

ML Algorithm The result of running the machine learning algorithm can be expressed as a function y(x) which takes a new digit image x as input and that generates an output vector y, encoded in the same way as the target vectors The precise form of the function y(x) is determined during the training phase, also known as the learning phase, on the basis of the training data Once the model is trained it can then determine the identity of new digit images, which are said to comprise a test set The ability to categorize correctly new examples that differ from those used for training is known as generalization In practical applications, the variability of the input vectors will be such that the training data can comprise only a tiny fraction of all possible input vectors, and so generalization is a central goal in pattern recognition (Dr. Mihail) Introduction to Machine Learning August 2, 2017 12 / 24

Input data For most practical applications, the original input variables are typically preprocessed to transform them into some new space of variables where, it is hoped, the pattern recognition problem will be easier to solve (Dr. Mihail) Introduction to Machine Learning August 2, 2017 13 / 24

Input data For most practical applications, the original input variables are typically preprocessed to transform them into some new space of variables where, it is hoped, the pattern recognition problem will be easier to solve For instance, in the digit recognition problem, the images of the digits are typically translated and scaled so that each digit is contained within a box of a fixed size (Dr. Mihail) Introduction to Machine Learning August 2, 2017 13 / 24

Input data For most practical applications, the original input variables are typically preprocessed to transform them into some new space of variables where, it is hoped, the pattern recognition problem will be easier to solve For instance, in the digit recognition problem, the images of the digits are typically translated and scaled so that each digit is contained within a box of a fixed size This pre-processing stage is sometimes also called feature extraction (Dr. Mihail) Introduction to Machine Learning August 2, 2017 13 / 24

Types of ML Supervised Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems Regression: if the desired output consists of one or more continuous variables Classification: mapping input vector to one of a finite number of discrete categories Unsupervised In other pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization (Dr. Mihail) Introduction to Machine Learning August 2, 2017 14 / 24

Types of ML Reinforcement Learning finding suitable actions to take in a given situation in order to maximize a reward learning algorithm is not given examples of optimal outputs, in contrast to supervised learning, but must instead discover them by a process of trial and error (Dr. Mihail) Introduction to Machine Learning August 2, 2017 15 / 24

Example problem: polynomial curve fitting Regression Problem suppose we observe a real-valued input variable x and we wish to use this observation to predict the value of a real-valued target variable t data for this example is generated from the function sin(2πx) with random noise included in the target values generating data in this way, we are capturing a property of many real data sets, namely that they possess an underlying regularity, which we wish to learn, but that individual observations are corrupted by random noise noise might arise from intrinsically stochastic (i.e., random) processes such as radioactive decay but more typically is due to there being sources of variability that are themselves unobserved (Dr. Mihail) Introduction to Machine Learning August 2, 2017 16 / 24

Example problem: polynomial curve fitting Regression Problem We are given a training set comprising of N observations of x, written as x (x 1, x 2,..., x N ) T and observations of t, denoted t (t 1,..., t N ) T (Dr. Mihail) Introduction to Machine Learning August 2, 2017 17 / 24

Polynomial Curve Fitting Sample data (Dr. Mihail) Introduction to Machine Learning August 2, 2017 18 / 24

Polynomial Curve Fitting Goal Make prediction of the value ˆt, given an unobserved ˆx (Dr. Mihail) Introduction to Machine Learning August 2, 2017 19 / 24

Polynomial Curve Fitting Goal Make prediction of the value ˆt, given an unobserved ˆx This involves discovering the underlying generating function sin(2πx), with added noise (Dr. Mihail) Introduction to Machine Learning August 2, 2017 19 / 24

Polynomial Curve Fitting Goal Make prediction of the value ˆt, given an unobserved ˆx This involves discovering the underlying generating function sin(2πx), with added noise Attempt Assume underlying function is a polynomial y(x, w) = w 0 + w 1 x + w 2 x 2 +... + w M x M = M j=0 w jx j M is the order of the polynomial. The coefficients are w Functions like the polynomial are linear in the unknown parameters and all fall under linear models (Dr. Mihail) Introduction to Machine Learning August 2, 2017 19 / 24

Polynomial Curve Fitting How? The values of w will be determined by fitting the polynomial to the training data (Dr. Mihail) Introduction to Machine Learning August 2, 2017 20 / 24

Polynomial Curve Fitting How? The values of w will be determined by fitting the polynomial to the training data Done by minimizing an error function (Dr. Mihail) Introduction to Machine Learning August 2, 2017 20 / 24

Polynomial Curve Fitting How? The values of w will be determined by fitting the polynomial to the training data Done by minimizing an error function The error function measures the distance between the function y(x, w), for any given value of w and the training dataset points (Dr. Mihail) Introduction to Machine Learning August 2, 2017 20 / 24

Polynomial Curve Fitting How? The values of w will be determined by fitting the polynomial to the training data Done by minimizing an error function The error function measures the distance between the function y(x, w), for any given value of w and the training dataset points Many choices for error function. One popular choice: sum of squares of errors between predictions and each known target value Sum of squares E(w) = 1 2 N (y(x n, w) t n ) 2 n=1 (Dr. Mihail) Introduction to Machine Learning August 2, 2017 20 / 24

Polynomial Curve Fitting Errors (Dr. Mihail) Introduction to Machine Learning August 2, 2017 21 / 24

Minimizing the error Derivatives Choose w that makes the value of E as small as possible The resulting polynomial is y(x, w ) (Dr. Mihail) Introduction to Machine Learning August 2, 2017 22 / 24

(Dr. Mihail) Introduction to Machine Learning August 2, 2017 23 / 24

(Dr. Mihail) Introduction to Machine Learning August 2, 2017 24 / 24