Lecture Machine Learning (past summer semester) If at least one English-speaking student is present.

Size: px
Start display at page:

Download "Lecture Machine Learning (past summer semester) If at least one English-speaking student is present."

Transcription

1 Advanced Machine Learning Lecture 1 Organization Lecturer Prof. Bastian Leibe (leibe@vision.rwth-aachen.de) Introduction Bastian Leibe Visual Computing Institute RWTH Aachen University leibe@vision.rwth-aachen.de Teaching Assistants Umer Rafi (rafi@vision.rwth-aachen.de) Lucas Beyer (beyer@vision.rwth-aachen.de) Course webpage Slides will be made available on the webpage There is also an L2P electronic repository Please subscribe to the lecture on the Campus system! Important to get announcements and L2P access! 2 Language Relationship to Previous Courses Official course language will be English Lecture Machine Learning (past summer semester) If at least one English-speaking student is present. Introduction to ML If not you can choose. Classification Graphical models However Please tell me when I m talking too fast or when I should repeat something in German for better understanding! You may at any time ask questions in German! You may turn in your eercises in German. This course: Advanced Machine Learning Natural continuation of ML course Deeper look at the underlying concepts But: will try to make it accessible also to newcomers Quick poll: Who hasn t heard the ML lecture? You may take the oral eam in German. This year: Lots of new material Large lecture block on Deep Learning First time for us to teach this (so, bear with us...) 3 4 New Content This Year Organization Structure: 3V (lecture) + 1Ü (eercises) 6 EECS credits Part of the area Applied Computer Science Place & Time Lecture/Eercises: Mon 14:15 15:45 room UMIC 025 Lecture/Eercises: Thu 10:15 11:45 room UMIC 025 Deep Learning Eam Oral or written eam, depending on number of participants Towards the end of the semester, there will be a proposed date 5 6 1

2 Course Webpage Eercises and Supplementary Material Eercises Typically 1 eercise sheet every 2 weeks. Pen & paper and programming eercises Matlab for early topics Theano for Deep Learning topics Monday: Matlab tutorial Hands-on eperience with the algorithms from the lecture. Send your solutions the night before the eercise class. Supplementary material Research papers and book chapters Will be provided on the webpage Tetbooks How to Find Us Most lecture topics will be covered in Bishop s book. Some additional topics can be found in Rasmussen & Williams. Office: UMIC Research Centre Mies-van-der-Rohe-Strasse 15, room 124 Christopher M. Bishop Pattern Recognition and Machine Learning Springer, 2006 (available in the library s Handapparat ) Carl E. Rasmussen, Christopher K.I. Williams Gaussian Processes for Machine Learning MIT Press, 2006 (also available online: Office hours If you have questions to the lecture, come see us. My regular office hours will be announced. Research papers will be given out for some topics. Send us an before to confirm a time slot. Tutorials and deeper introductions. Application papers Questions are welcome! 9 10 Machine Learning Statistical Machine Learning Principles, methods, and algorithms for learning and prediction on the basis of past evidence Already everywhere Speech recognition (e.g. speed-dialing) Computer vision (e.g. face detection) Hand-written character recognition (e.g. letter delivery) Information retrieval (e.g. image & video indeing) Operation systems (e.g. caching) Fraud detection (e.g. credit cards) Tet filtering (e.g. spam filters) Game playing (e.g. strategy prediction) Automatic Speech Recognition Robotics (e.g. prediction of battery lifetime)

3 Computer Vision (Object Recognition, Segmentation, Scene Understanding) 13 Information Retrieval (Retrieval, Categorization, Clustering,...) 14 Financial Prediction (Time series analysis,...) 15 Medical Diagnosis (Inference from partial observations) 16 Image from Kevin Murphy Bioinformatics (Modelling gene microarray data,...) 17 Robotics (DARPA Grand Challenge,...) 18 Image from Kevin Murphy 3

4 Machine Learning: Core Questions Learning to perform a task from eperience Machine Learning: Core Questions y = f(; w) Task Can often be epressed through a mathematical function y = f(; w) w: characterizes the family of functions w: indees the space of hypotheses w: vector, connection matri, graph, : Input y: Output w: Parameters (this is what is learned ) Classification vs. Regression Regression: continuous y Classification: discrete y E.g. class membership, sometimes also posterior probability A Look Back: Lecture Machine Learning This Lecture: Advanced Machine Learning Fundamentals Bayes Decision Theory Probability Density Estimation Classification Approaches Linear Discriminant Functions Support Vector Machines Ensemble Methods & Boosting Randomized Trees, Forests & Ferns Generative Models Bayesian Networks Markov Random Fields Etending lecture Machine Learning from last semester Regression Approaches Linear Regression Regularization (Ridge, Lasso) Gaussian Processes Learning with Latent Variables EM and Generalizations Approimate Inference Deep Learning Neural Networks CNNs, RNNs, RBMs, etc. 21 Let s Get Started Topics of This Lecture Some of you already have basic ML background Who hasn t? We ll start with a gentle introduction I ll try to make the lecture also accessible to newcomers We ll review the main concepts before applying them I ll point out chapters to review from ML lecture whenever knowledge from there is needed/helpful But please tell me when I m moving too fast (or too slow) Regression: Motivation Polynomial fitting General Least-Squares Regression Overfitting problem Regularization Ridge Regression Recap: Important Concepts from ML Lecture Probability Theory Bayes Decision Theory Maimum Likelihood Estimation Bayesian Estimation A Probabilistic View on Regression Least-Squares Estimation as Maimum Likelihood

5 Regression Eample: Polynomial Curve Fitting Learning to predict a continuous function value Given: training set X = { 1,, N } with target values T = {t 1,, t N }. Learn a continuous function y() to predict the function value for a new input. Steps towards a solution Choose a form of the function y(,w) with parameters w. Toy dataset Generated by function Small level of random noise with Gaussian distribution added (blue dots) Define an error function E(w) to optimize. Optimize E(w) for w to find a good solution. (This may involve math). Derive the properties of this solution and think about its limitations. 25 Goal: fit a polynomial function to this data Note: Nonlinear function of, but linear function of the w j. 26 Error Function Minimizing the Error How to determine the values of the coefficients w? We need to define an error function to be minimized. This function specifies how a deviation from the target value should be weighted. How do we minimize the error? Popular choice: sum-of-squares error Definition Solution (Always!) Compute the derivative and set it to zero. We ll discuss the motivation for this particular function later Since the error is a quadratic function of w, its derivative will be linear in w. Minimization has a unique solution Least-Squares Regression Least-Squares Regression We have given Training data points: Associated function values: Start with linear regressor: Try to enforce X = f 1 2 R d ; : : : ; n g T = ft 1 2 R; : : : ; t n g One linear equation for each training data point / label pair. Setup Step 1: Define Step 2: Rewrite Step 3: Matri-vector notation µ µ i w ~ i = ; ~w = 1 w 0 This is the same basic setup used for least-squares classification! Only the values are now continuous. Step 4: Find least-squares solution with Solution:

6 Regression with Polynomials Varying the Order of the Polynomial. How can we fit arbitrary polynomials using least-squares regression? We introduce a feature transformation (as before in ML). y() = w T Á() MX = w i Á i () i=0 E.g.: Á() = (1; ; 2 ; 3 ) T Fitting a cubic polynomial. assume Á 0 () = 1 basis functions Massive overfitting! 31 Which one should we pick? 32 Analysis of the Results Overfitting Results for different values of M Problem Training data contains some noise Best representation of the original function sin(2¼) with M = 3. Higher-order polynomial fitted perfectly to the noise. We say it was overfitting to the training data. Perfect fit to the training data with M = 9, but poor representation of the original function. Goal is a good prediction of future data Our target function should fit well to the training data, but also generalize. Measure generalization performance on independent test set. Why is that??? After all, M = 9 contains M = 3 as a special case! Measuring Generalization Analyzing Overfitting Eample: Polynomial of degree 9 Overfitting! E.g., Root Mean Square Error (RMS): Motivation Division by N lets us compare different data set sizes. Square root ensures E RMS is measured on the same scale (and in the same units) as the target variable t. 35 Relatively little data Overfitting typical Enough data Good estimate Overfitting becomes less of a problem with more data. 36 Slide adapted from Bernt Schiele 6

7 What Is Happening Here? Regularization The coefficients get very large: Fitting the data from before with various polynomials. Coefficients: What can we do then? How can we apply the approach to data sets of limited size? We still want to use relatively comple and fleible models. Workaround: Regularization Penalize large coefficient values Here we ve simply added a quadratic regularizer, which is simple to optimize The resulting form of the problem is called Ridge Regression. 37 (Note: w 0 is often omitted from the regularizer.) 38 Results with Regularization (M=9) RMS Error for Regularized Case 39 Effect of regularization The trade-off parameter now controls the effective model compleity and thus the degree of overfitting. 40 Summary Topics of This Lecture We ve seen several important concepts Linear regression Overfitting Role of the amount of data Role of model compleity Regularization How can we approach this more systematically? Would like to work with comple models. How can we prevent overfitting systematically? How can we avoid the need for validation on separate test data? What does it mean to do linear regression? What does it mean to do regularization? Regression: Motivation Polynomial fitting General Least-Squares Regression Overfitting problem Regularization Ridge Regression Recap: Important Concepts from ML Lecture Probability Theory Bayes Decision Theory Maimum Likelihood Estimation Bayesian Estimation A Probabilistic View on Regression Least-Squares Estimation as Maimum Likelihood

8 Recap: The Rules of Probability Recap: Bayes Decision Theory Basic rules p a p b Likelihood Sum Rule Product Rule p a p( a) p b p( b) Likelihood Prior From those, we can derive Decision boundary Bayes Theorem where 43 p a p b Likelihood Prior Posterior = NormalizationFactor 44 Recap: Gaussian (or Normal) Distribution Side Note One-dimensional case Mean ¹ Variance ¾ 2 N(j¹; ¾ 2 ) = p 1 ¾ ( ¹)2 ep ½ 2¼¾ 2¾ 2 Notation In many situations, it will be necessary to work with the inverse of the covariance matri : We call the precision matri. Multi-dimensional case Mean ¹ Covariance We can therefore also write the Gaussian as N(j¹; ) = ½ 1 ep 1 ¾ (2¼) D=2 j j1=2 2 ( ¹)T 1 ( ¹) Recap: Parametric Methods Recap: Maimum Likelihood Approach Given Data X = f 1 ; 2 ; : : : ; N g Parametric form of the distribution with parameters µ E.g. for Gaussian distrib.: µ = (¹; ¾) Learning Estimation of the parameters µ Likelihood of µ Probability that the data X have indeed been generated from a probability density with parameters µ L(µ) = p(xjµ) Slide adapted from Bernt Schiele 47 Computation of the likelihood Single data point: p( n jµ) Assumption: all data points X = f 1; : : : ; ng are independent NY L(µ) = p(xjµ) = p( n jµ) Log-likelihood NX E(µ) = ln L(µ) = ln p( n jµ) n=1 Estimation of the parameters µ (Learning) Maimize the likelihood (=minimize the negative log-likelihood) n=1 Take the derivative and set it to E(µ) = p( njµ) p( n jµ)! =

9 Recap: Maimum Likelihood Limitations Recap: Deeper Reason Maimum Likelihood has several significant limitations It systematically underestimates the variance of the distribution! E.g. consider the case N = 1; X = f 1 g Maimum Likelihood is a Frequentist concept In the Frequentist view, probabilities are the frequencies of random, repeatable events. These frequencies are fied, but can be estimated more precisely when more data is available. Maimum-likelihood estimate: We say ML overfits to the observed data. ^¾ = 0! We will still often use ML, but it is important to know about this effect. ^¹ This is in contrast to the Bayesian interpretation In the Bayesian view, probabilities quantify the uncertainty about certain states or events. This uncertainty can be revised in the light of new evidence. Bayesians and Frequentists do not like each other too well Slide adapted from Bernt Schiele Recap: Bayesian Learning Approach Bayesian view: Consider the parameter vector µ as a random variable. When estimating the parameters, what we compute is Z p(jx) = p(; µjx)dµ Assumption: given µ, this doesn t depend on X anymore Z p(jx) = Slide adapted from Bernt Schiele p(; µjx) = p(jµ; X)p(µjX) p(jµ)p(µjx)dµ This is entirely determined by the parameter µ (i.e. by the parametric form of the pdf). 51 Recap: Bayesian Learning Approach Discussion Estimate for based on parametric form µ Likelihood of the parametric form µ given the data set X. Z p(jµ)l(µ)p(µ) p(jx) = R dµ L(µ)p(µ)dµ Normalization: integrate over all possible values of µ Prior for the parameters µ The more uncertain we are about µ, the more we average over all possible parameter values. 52 Topics of This Lecture Regression: Motivation Polynomial fitting General Least-Squares Regression Overfitting problem Regularization Ridge Regression Recap: Important Concepts from ML Lecture Probability Theory Bayes Decision Theory Maimum Likelihood Estimation Bayesian Estimation Net lecture A Probabilistic View on Regression Least-Squares Estimation as Maimum Likelihood

10 References and Further Reading More information, including a short review of Probability theory and a good introduction in Bayes Decision Theory can be found in Chapters 1.1, 1.2 and 1.5 of Christopher M. Bishop Pattern Recognition and Machine Learning Springer,

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

Machine Learning Lecture 1

Machine Learning Lecture 1 Many slides adapted from B. Schiele Machine Learning Lecture 1 Introduction 18.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer Prof.

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants Advanced Machine Learning Lecture 2 Neural Networks 24..206 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI) 28..

More information

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 12 Neural Networks 24.11.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI)

More information

CS 540: Machine Learning Lecture 1: Introduction

CS 540: Machine Learning Lecture 1: Introduction CS 540: Machine Learning Lecture 1: Introduction AD January 2008 AD () January 2008 1 / 41 Acknowledgments Thanks to Nando de Freitas Kevin Murphy AD () January 2008 2 / 41 Administrivia & Announcement

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception COURSE INTRODUCTION COMPUTATIONAL MODELING OF VISUAL PERCEPTION 2 The goal of this course is to provide a framework and computational tools for modeling visual inference, motivated by interesting examples

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

Intelligent Systems I

Intelligent Systems I Intelligent Systems I 00 INTRODUCTION Stefan Harmeling & Philipp Hennig 24. October 2013 Max Planck Institute for Intelligent Systems Dptmt. of Empirical Inference Which Card? Opening Experiment Which

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Machine Learning CSE546

Machine Learning CSE546 http://www.cs.washington.edu/education/courses/cse546/17au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 28, 2017 1 You may also like ML uses past data to make personalized

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Machine Learning Lecture 12

Machine Learning Lecture 12 Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as:

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as: CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge

More information

Learning from Data. Amos Storkey, School of Informatics. Semester 1. amos/lfd/

Learning from Data. Amos Storkey, School of Informatics. Semester 1.   amos/lfd/ Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Introduction Welcome Administration Online notes Books: See website Assignments Tutorials Exams Acknowledgement: I would like to that David Barber and Chris

More information

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

3.2 Logarithmic Functions and Their Graphs

3.2 Logarithmic Functions and Their Graphs 96 Chapter 3 Eponential and Logarithmic Functions 3.2 Logarithmic Functions and Their Graphs Logarithmic Functions In Section.6, you studied the concept of an inverse function. There, you learned that

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

Bayesian Inference of Noise Levels in Regression

Bayesian Inference of Noise Levels in Regression Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January, 1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008 Outline 2 Administrivia Some basic definitions.

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics) COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a

More information

Week 3: Linear Regression

Week 3: Linear Regression Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to

More information

Learning from Data: Regression

Learning from Data: Regression November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear

More information

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem 3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/22/2010 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements W7 due tonight [this is your last written for

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Where now? Machine Learning and Bayesian Inference

Where now? Machine Learning and Bayesian Inference Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Regression with Input-Dependent Noise: A Bayesian Treatment

Regression with Input-Dependent Noise: A Bayesian Treatment Regression with Input-Dependent oise: A Bayesian Treatment Christopher M. Bishop C.M.BishopGaston.ac.uk Cazhaow S. Qazaz qazazcsgaston.ac.uk eural Computing Research Group Aston University, Birmingham,

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Machine Learning Lecture 14 Tricks of the Trade 07.12.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information