Learning Air Traffic Controller Workload. from Past Sector Operations

Size: px
Start display at page:

Download "Learning Air Traffic Controller Workload. from Past Sector Operations"

Transcription

1 Learning Air Traffic Controller Workload from Past Sector Operations David Gianazza 1

2 Outline Introduction Background Model and methods Results Conclusion 2

3 ATC complexity Introduction Simple Complex 3

4 ATC complexity Introduction Simple Complex Workload & Airspace configuration 3

5 Questions related to ATC complexity and workload Can we predict the workload of ATC controllers, given a traffic situation as input? How to measure the ATC complexity of a traffic situation? How do we quantify workload? Can we build a predictive model of the workload, taking ATC complexity into account? What model should we choose? Many answers in the literature [Mogford et al., 1995, Hilburn, 2004, Delahaye and Puechmorel, 2000, Delahaye and Puechmorel, 2010, Flynn et al., 2005, Welch et al., 2007, Kopardekar and Magyarits, 2003, Chatterji and Sridhar, 1999], etc. 4

6 Focus of this publication Compare several machine learning methods Linear discriminant analysis (LDA), Quadratic discriminant analysis (QDA), Naive Bayes classifier (NBayes), Neural networks (NNet), Gradient boosted trees (GBM) On the workload prediction problem workload = f (complexity) complexity = g(traffic, sector, procedures) 5

7 Focus of this publication ATC complexity metrics taken from the literature Workload model is learned from past sector operations 6

8 Outline Introduction Background Model and methods Results Conclusion 7

9 ATC complexity Simple Complexe Literature reviews on ATC complexity factors R. Mogford, J. A. Guttman, S. L. Morrow, and P. Kopardekar, The complexity construct in air traffic control : A review and synthesis of the literature, tech. rep., FAA Technical Center : Atlantic City, B. Hilburn, Cognitive complexity in air traffic control, a litterature review, tech. rep., Eurocontrol experimental centre, E.g. : density, incoming flows, climbing/descending a/c, dispersion of speed direction/intensity, vertical velocity variance, number of potential conflicts, sector volume, number of crossing points, etc. 8

10 Research on ATC complexity Intrinsic complexity Delahaye, D. and Puechmorel, S. (2000). Air traffic complexity : towards intrinsic metrics. In Proceedings of the third USA/Europe Air Traffic Management R & D Seminar. Delahaye, D. and Puechmorel, S. (2010). Air traffic complexity based on dynamical systems. In Decision and Control (CDC), th IEEE Conference on, pages IEEE. Analogy with the entropy in Physics Air traffic = dynamic system (vector field + time) Complexity = system entropy Not related with workload 9

11 Cognitive approach of workload Flynn, G., Benkouar, A., and Christien, R. (2005). Adaptation of workload model by optimisation algorithms. Technical report, Eurocontrol. Welch, J. D., Andrews, J. W., Martin, B. D., and Sridhar, B. (2007). Macroscopic workload model for estimating en route sector capacity. In Proc. of 7th USA/Europe ATM Research and Development Seminar, Barcelona, Spain. Note : bibliographies are not exhaustive Mental/physical workload duration of ATC tasks G = τ j time required to execute task j λ j occurence ratio of task j J τ j λ j j=1 Complexity-dependent model parameters are weighed so that modeled workload threshold = declared sector capacity or modeled workload threshold = peak traffic count 10

12 Statistical models relating workload to complexity Collected data : Measures of complexity metrics extracted from radar data and sector geometry E.g. : density, incoming flows, climbing/descending a/c, dispersion of speed direction/intensity, vertical velocity variance, number of potential conflicts, sector volume, number of crossing points, etc. Workload measures : EEG, physiological parameters, pupil diameter, eye fixation points, duration and content of radio exchanges, subjectives ratings, etc. Assumption : workload = f (complexity) True relationship y = f (x) is unknown Approximation y = h(x) + ε h is tuned on a set of examples (x 1, y 1 ),..., (x N, y N ) 11

13 Statistical models relating workload to complexity Kopardekar, P. and Magyarits, S. (2003). Measurement and prediction of dynamic density. In Proceedings of the 5th USA/Europe Air Traffic Management R & D Seminar. Chatterji, G. B. and Sridhar, B. (1999). Neural network based air traffic controller workload prediction. In American Control Conference, Proceedings of the 1999, volume 4, pages IEEE. y = h(x) + ε y (response variable) : subjective workload ratings collected from ATC controllers x (explanatory variables) : measures of ATC complexity Different choices for model h : Linear model [Kopardekar and Magyarits, 2003] Neural network [Chatterji and Sridhar, 1999] 12

14 Outline Introduction Background Model and methods Results Conclusion 13

15 Our approach Basic assumption : Decisions to split or merge ATC sectors are caused by workload variations. Workload measure ATC sector status Low workload when sector s is collapsed with other sectors Normal workload when the sector s is operated as is High workload when s is split into several smaller sectors operated on different working positions 14

16 Supervised learning Our approach Unknown relationship y = f (x) between workload y and ATC complexity metrics x approximated by ŷ = h(x) using examples (x 1, y 1 ),..., (x N, y N ) Regression or classification problems Regression problem : y R or y R p as in [Kopardekar and Magyarits, 2003, Chatterji and Sridhar, 1999] Classification problem : y {C 1,... C P } is a category 15

17 Supervised learning Our approach Unknown relationship y = f (x) between workload y and ATC complexity metrics x approximated by ŷ = h(x) using examples (x 1, y 1 ),..., (x N, y N ) Regression or classification problems Regression problem : y R or y R p as in [Kopardekar and Magyarits, 2003, Chatterji and Sridhar, 1999] Classification problem : y {C 1,... C P } is a category Here, the output is a category : low, normal or high workload. 15

18 Output variable y Output can be modeled as posterior probabilities : ŷ 1 P(C 1 x) ŷ =. =. P(C P x) Non-overlapping classes : ŷ P ŷ p [0, 1], p {1,..., P} P P ŷ p = P (C p x) = 1 p=1 p=1 (1) 16

19 Output variable y Examples extracted from past sector operations : Collapsed sector y = (1, 0, 0) T Underload Opened sector y = (0, 1, 0) T Normal workload Split sector y = (0, 0, 1) T Overload Other Not used unknown workload Advantages, compared to other workload measures Historical data is easy to collect More than 510 ATC sectors for the 5 french ATCCs Not subjective Drawback Only three levels of workload 17

20 Output variable y Examples extracted from past sector operations : Collapsed sector y = (1, 0, 0) T Underload Opened sector y = (0, 1, 0) T Normal workload Split sector y = (0, 0, 1) T Overload Other Not used unknown workload Advantages, compared to other workload measures Historical data is easy to collect More than 510 ATC sectors for the 5 french ATCCs Not subjective Drawback Only three levels of workload 17

21 Input variables? What are the most relevant ATC complexity metrics for our problem? Previous research : D. Gianazza and K. Guittet. Evaluation of air traffic complexity metrics using neural networks and sector status. In Proceedings of the 2nd International Conference on Research in Air Transportation. ICRAT, D. Gianazza and K. Guittet. Selection and evaluation of air traffic complexity metrics. In Proceedings of the 25th Digital Avionics Systems Conference. DASC, D. Gianazza. Smoothed traffic complexity metrics for airspace configuration schedules. In Proceedings of the 3nd International Conference on Research in Air Transportation. ICRAT,

22 Input variables Among 27 ATC complexity metrics from the literature, we found 6 most relevant variables for our problem : vol, the airspace volume nb, the number of aircraft flow15, the incoming traffic flow within the next 15 minutes flow60, the incoming traffic flow within a 1 hour time horizon avg vs, the average absolute vertical speed of the aircraft within the sector, inter hori, the number of speed vector intersections with an angle greater than 20 degrees. Computed from radar tracks and sector geometries 19

23 Methods Objective : compare several classification methods on the workload prediction problem LDA : Linear discriminant analysis QDA : Quadratic discriminant analysis NBayes : Naive Bayes classifier NNet : Neural networks for classification GBM : Gradient-boosted classification trees 20

24 Dataset Historical data from 5 french ATC centers (2 weeks, Oct. 2016) : Radar tracks Past sector operations Data split into two datasets : Training set (Oct. 13 th to 19 th ) examples of workload and complexity measurements, from 511 different ATC sectors Test set (Oct. 20 th to 26 th ) examples, from 513 ATC sectors The three classes (low, normal, high) are equally represented 21

25 Hyperparameter tuning Some methods have hyperparameters that need to be selected E.g. : number of hidden units in a neural network 10-fold cross-validation, performed on the training set S Model selection: For each candidate model: For each fold : Adjust model using Assess error using Compute the cumulated error for the K folds Select model minimizing the error on the K validation subsets Prediction: Adjust the best model found Apply to fresh entries 22

26 Hyperparameter tuning 10-fold cross-validation to select hyperparameter values Method Hyperparameter grid LDA - QDA - NBayes - N NNet h = {15, 20, 25} (n,λ) λ = {10,1,1e-1,1e-2,1e-3,1e-5,0} m = {5000, 6000, 7000} GBM (m,j,ν) J = {2, 3, 4} ν = {1e-4, 5e-4, 1e-3,1e-2,1e-1} 23

27 Outline Introduction Background Model and methods Results Conclusion 24

28 Results Correct classification rates Method Overall Low Normal High LDA QDA NBayes NNet GBM

29 Why are correct classification rates not closer to 100%? nb flux15 flux avg.chgt_niv avg.inter_hori Sector N high low normal 26

30 Outline Introduction Background Model and methods Results Conclusion 27

31 Conclusion Nearly 82% of correct classification rates Neural nets (NNet) and gradient-boosted trees (GBM) performed best Poor performance of linear models 28

32 Have we answered our questions? Can we predict the workload of ATC controllers, given a traffic situation as input? How to measure the ATC complexity of a traffic situation? Combination of several factors. No single metric. How do we quantify workload? Categories (C 1, C 2, C 3 ) for low, normal and high workload. Examples are extracted from past sector operations, considering sector status (collapsed, opened, split). Can we build a predictive model of the workload, taking ATC complexity into account? Posterior probabilities P(C p x) in NNet, or classification trees in GBM. Models are learned from historical data What model should we choose? NNet and GBM perform better than the other tested models 29

33 Have we answered our questions? Can we predict the workload of ATC controllers, given a traffic situation as input? How to measure the ATC complexity of a traffic situation? Combination of several factors. No single metric. How do we quantify workload? Categories (C 1, C 2, C 3 ) for low, normal and high workload. Examples are extracted from past sector operations, considering sector status (collapsed, opened, split). Can we build a predictive model of the workload, taking ATC complexity into account? Posterior probabilities P(C p x) in NNet, or classification trees in GBM. Models are learned from historical data What model should we choose? NNet and GBM perform better than the other tested models 29

34 Have we answered our questions? Can we predict the workload of ATC controllers, given a traffic situation as input? How to measure the ATC complexity of a traffic situation? Combination of several factors. No single metric. How do we quantify workload? Categories (C 1, C 2, C 3 ) for low, normal and high workload. Examples are extracted from past sector operations, considering sector status (collapsed, opened, split). Can we build a predictive model of the workload, taking ATC complexity into account? Posterior probabilities P(C p x) in NNet, or classification trees in GBM. Models are learned from historical data What model should we choose? NNet and GBM perform better than the other tested models 29

35 Have we answered our questions? Can we predict the workload of ATC controllers, given a traffic situation as input? How to measure the ATC complexity of a traffic situation? Combination of several factors. No single metric. How do we quantify workload? Categories (C 1, C 2, C 3 ) for low, normal and high workload. Examples are extracted from past sector operations, considering sector status (collapsed, opened, split). Can we build a predictive model of the workload, taking ATC complexity into account? Posterior probabilities P(C p x) in NNet, or classification trees in GBM. Models are learned from historical data What model should we choose? NNet and GBM perform better than the other tested models 29

36 Have we answered our questions? Can we predict the workload of ATC controllers, given a traffic situation as input? How to measure the ATC complexity of a traffic situation? Combination of several factors. No single metric. How do we quantify workload? Categories (C 1, C 2, C 3 ) for low, normal and high workload. Examples are extracted from past sector operations, considering sector status (collapsed, opened, split). Can we build a predictive model of the workload, taking ATC complexity into account? Posterior probabilities P(C p x) in NNet, or classification trees in GBM. Models are learned from historical data What model should we choose? NNet and GBM perform better than the other tested models 29

37 Improved workload prediction Potential benefits Predict optimal sector configurations Anticipate overloads What-if evaluation of envisionned ATFCM measures 30

38 Further works Model performance on elementary sectors? Effects of seasonality? Use larger datasets (1 year of traffic and sector operations) 31

39 32

40 Linear and quadratic discriminant analysis Model : p(x, C k ) = p(x C k ) P(C k ) Assumptions P(C k ) = π k { p(x C k ) = (2π) P 2 Σ 1 2 exp 1 } 2 (x µ k) T Σ 1 k (x µ k) Parameters π k, µ k and Σ k approximated by maximum-likelihood estimates 33

41 LDA, QDA, and naive bayes (NBayes) ŷ 1 ŷ =. ŷ P = Bayes theorem to compute P(C j x) P(C j x) = p(x C j) P(C j ) p(x) Gaussian assumption for p(x C k ) P(C 1 x). P(C P x) = p(x C j) P(C j ) k p(x C k) P(C k ) LDA : Same covariance matrix Σ for all classes QDA : One covariance matrix Σ k for each class NBayes : Conditional independance of explanatory variables D p(x C j ) = d=1 p(x d C j ) when x = (x 1,..., x D ) T 34

42 Neural networks for classification Hidden layer Output layer Each hidden unit j computes z j = ϕ(a j ) where a j is a weighted sum of the inputs 1, x 0,..., x D ϕ(a j ) is the sigmoid function 1 1+e a j or the hyperbolic tangent function tanh 35

43 Neural networks for classification Hidden layer Output layer Output : ŷ 1 ŷ =. ŷ P = P(C 1 x). P(C P x) 35

44 Neural networks for classification Output interpreted as probabilities ŷ p [0, 1], p {1,..., P} P P ŷ p = P (C p x) = 1 p=1 p=1 Take ŷ = γ(a) where γ is the softmax function : exp{a 1 } a P 1 p=1 exp{ap} γ :. a P. exp{a P } P p=1 exp{ap} where each a p is a weighted sum of the outputs of the hidden layer 36

45 Training NNet on examples S T = {(x 1, y 1 ),..., (x N, y N )} Adjust weights so as to minimize error between the computed outputs ŷ 1,..., ŷ N and the target values y 1,..., y N Maximum-likelihood principle minimize cross-entropy : E(w) = N P n=1 p=1 y np ln ŷnp(w) y np Maximum posterior cross-entropy with weight decay E λ (w) = N P n=1 p=1 y np ln ŷnp(w) y np + λ D j=1 w 2 j 37

46 Classification and regression trees E l ement s of St at i st i cal L ear ning ( 2nd E d.) c H ast i e, T i bshirani & F r i edman 2009 C hap 9 X 1 t 1 X 2 t 2 X 1 t 3 X 2 t 4 R 1 R 2 R 3 R 4 R 5 X 2 X 1 T h T (x) = c j (x) 1 Rj (x) (2) j=1 Regression with quadratic loss : c j (x) = 1 N j x n R j y n Classification : c j (x) = C p where C p is the most occuring class in R j 38

47 Gradient-boosted trees Sum of tree models, iteratively adjusted. Model update : h m : x h m 1 (x) + ν R j T m γ mj 1 Rj (x) Parameters : J : tree size Usually, 4 J 8 M : number of boosting iterations Can be selected by an early stopping procedure ν : shrinkage parameter Usually, ν

48 40

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Analytical Workload Model for Estimating En Route Sector Capacity in Convective Weather*

Analytical Workload Model for Estimating En Route Sector Capacity in Convective Weather* Analytical Workload Model for Estimating En Route Sector Capacity in Convective Weather* John Cho, Jerry Welch, and Ngaire Underhill 16 June 2011 Paper 33-1 *This work was sponsored by the Federal Aviation

More information

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Machine Learning - MT Classification: Generative Models

Machine Learning - MT Classification: Generative Models Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Voting (Ensemble Methods)

Voting (Ensemble Methods) 1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Classification. Sandro Cumani. Politecnico di Torino

Classification. Sandro Cumani. Politecnico di Torino Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Inf2b Learning and Data

Inf2b Learning and Data Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Internal Covariate Shift Batch Normalization Implementation Experiments. Batch Normalization. Devin Willmott. University of Kentucky.

Internal Covariate Shift Batch Normalization Implementation Experiments. Batch Normalization. Devin Willmott. University of Kentucky. Batch Normalization Devin Willmott University of Kentucky October 23, 2017 Overview 1 Internal Covariate Shift 2 Batch Normalization 3 Implementation 4 Experiments Covariate Shift Suppose we have two distributions,

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information