Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Similar documents
CS 540: Machine Learning Lecture 1: Introduction

CS 6375 Machine Learning

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Machine Learning (CS 567) Lecture 2

Machine Learning CSE546

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Machine Learning, Fall 2009: Midterm

Gaussian and Linear Discriminant Analysis; Multiclass Classification

MLE/MAP + Naïve Bayes

Overfitting, Bias / Variance Analysis

Lecture #11: Classification & Logistic Regression

Machine Learning (COMP-652 and ECSE-608)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

Learning from Data. Amos Storkey, School of Informatics. Semester 1. amos/lfd/

Machine Learning CSE546

CS 188: Artificial Intelligence Spring Announcements

Linear Models for Regression

Stats 170A: Project in Data Science Predictive Modeling: Regression

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Inf2b Learning and Data

Introduction to Machine Learning

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

IE598 Big Data Optimization Introduction

Collaborative Filtering. Radek Pelánek

Linear Models for Regression CS534

ECE521 week 3: 23/26 January 2017

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Announcements - Homework

Introduction. Le Song. Machine Learning I CSE 6740, Fall 2013

Statistical Machine Learning Hilary Term 2018

Restricted Boltzmann Machines for Collaborative Filtering

Introduction to Gaussian Processes

Machine Learning Lecture 7

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Tufts COMP 135: Introduction to Machine Learning

An Introduction to Statistical and Probabilistic Linear Models

Linear Regression (continued)

GI01/M055: Supervised Learning

Machine Learning 2007: Slides 1. Instructor: Tim van Erven Website: erven/teaching/0708/ml/

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Introduction to Logistic Regression

Introduction to Engineering Analysis - ENGR1100 Course Description and Syllabus Monday / Thursday Sections. Fall '10.

Machine Learning Theory (CS 6783)

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Machine Learning (COMP-652 and ECSE-608)

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

HONORS LINEAR ALGEBRA (MATH V 2020) SPRING 2013

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Machine Learning. Instructor: Pranjal Awasthi

Introduction to Engineering Analysis - ENGR1100 Course Description and Syllabus Tuesday / Friday Sections. Spring '13.

A 2. =... = c c N. 's arise from the three types of elementary row operations. If rref A = I its determinant is 1, and A = c 1

Generative Learning algorithms

Active Learning and Optimized Information Gathering

Recent Advances in Bayesian Inference Techniques

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Today s Outline. Biostatistics Statistical Inference Lecture 01 Introduction to BIOSTAT602 Principles of Data Reduction

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Stat 315c: Introduction

Lecture 11 Linear regression

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

The Multivariate Gaussian Distribution [DRAFT]

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Multivariate statistical methods and data mining in particle physics

MAT188H1S LINEAR ALGEBRA: Course Information as of February 2, Calendar Description:

Machine Learning and Deep Learning! Vincent Lepetit!

Probabilistic Machine Learning. Industrial AI Lab.

CPSC 340: Machine Learning and Data Mining

Motivating the Covariance Matrix

MATH 115 Administrative Overview Chapter 1 Review

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Bayesian Linear Regression [DRAFT - In Progress]

Introduction to Engineering Analysis - ENGR1100 Course Description and Syllabus Monday / Thursday Sections. Fall '17.

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

MAT 211, Spring 2015, Introduction to Linear Algebra.

Learning from Data Logistic Regression

But if z is conditioned on, we need to model it:

Department of Computer and Information Science and Engineering. CAP4770/CAP5771 Fall Midterm Exam. Instructor: Prof.

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Lecture Machine Learning (past summer semester) If at least one English-speaking student is present.

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Midterm: CS 6375 Spring 2015 Solutions

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

CS281 Section 4: Factor Analysis and PCA

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Linear Regression and Discrimination

Introduction to Machine Learning

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

Machine Learning Theory (CS 6783)

CS 188: Artificial Intelligence Spring Announcements

Linear Classifiers. Michael Collins. January 18, 2012

Linear & nonlinear classifiers

Kernel Methods in Machine Learning

Administrative Stuff

Transcription:

1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008

Outline 2 Administrivia Some basic definitions. Simple examples of regression. Real-world applications of regression. Simple examples of classification. Real-world applications of classification.

Administrivia 3 Web page http://www.cs.ubc.ca/~murphyk/teaching/stat406/ Spring08/index.html Check the news section before every class! Optional Lab Wed 4-5, LSK 310. The TA is Aline Tabet My office hours are Fri 4-5pm LSK 308d. Please send me email ahead of time if you plan to show up!

Grading 4 There will be weekly homework assignments worth 20%. Out on Mondays, return on Mondays (in class). The homeworks will involve theory and programming; you can get help during the lab session. The midterm will be Feb 25th and is worth 35%. The final will be in April (15-29) and is worth 45%.

Pre-requisites 5 Math: multivariate calculus, linear algebra, probability theory. Stats: stats 306 or CS 340 or equivalent. CS: some experience with programming (eg in R) is required.

Matlab 6 There will be weekly programming assignments. We will use Matlab. Matlab is very similar to R, but is somewhat faster. Matlab is widely used in the machine learning and Bayesian statistics community. Unfortunately Matlab is not free (unlike R). You can buy a copy from the bookstore for $150, or you can use the copy installed in the lab machines. In the first lab (this Wed), Aline will give an introduction to Matlab. More info on the class web page.

Textbook 7 I am writing my own textbook, but it is not yet finished. You should buy a photocopy of the current draft (433 pages) at Copiesmart in the village (near MacDonalds) for $30 (available on Wednesday). The following books are recommended additional reading, but not required Pattern recognition and machine learning, Chris Bishop, 2006 Elements of statistical learning, Hastie, Friedman and Tibshirani, 2001.

Learning objectives 8 Understand basic principles of machine learning and its connections to other field Derive, in a precise and concise fashion, the relevant mathematical equations needed for familiar and novel models/ algorithms Implement, in reasonably efficient Matlab, various familiar and novel ML model/ algorithms Know how to choose an appropriate method and apply it to various kinds of data/ problem domains

Syllabus 9 We will closely follow my book. Since people have different backgrounds (cs 340, stat 306, multiple versions), the exact syllabus may change as we go. See the web page for details. You will get a good feeling for the class during today s lecture.

Outline 10 Administrivia Machine learning: some basic definitions. Simple examples of regression. Real-world applications of regression. Simple examples of classification. Real-world applications of classification.

Learning to predict 11 This class is basically about machine learning. We will initially focus on supervised approaches. Given a training set of n input-output pairs D = ( x i, y i ) n i=1, we attempt to construct a function f which will accurately predict f( x ) on future, test examples x. Each input x i is a vector of d features or covariates. Each output y i is a target variable. The training data is stored in an n d design matrix X = [ x T i ]. The training outputs are stored in a n q matrix Y = [ y T i ].

Classification vs regression 12 If y IR q is a continuous-valued output, this is called regression. Often we will assume q = 1, i.e., scalar output. If y {1,...,C} is a discrete label, this is called classification or pattern recognition. The labels can be ordered (eg. low, medium, high) or unordered (e.g., male, female). N Y is the number of classes. If C = 2, this is called binary (dichotomous) classification.

Short/fat vs tall/skinny data 13 In traditional applications, the design matrix is tall and skinny (n p), i.e., there are many more training examples than inputs. In more recent applications (eg. bio-informatics or text analysis), the design matrix is short and fat (n p), so we will need to perform feature selection and/or dimensionality reduction.

Generalization performance 14 We care about performance on examples that are different from the training examples (so we can t just look up the answer).

No free lunch theorem 15 The no free lunch theorem says (roughly) that there is no single method that is better at predicting across all possible data sets than any other method. Different learning algorithms implicitly make different assumptions about the nature of the data, and if they work well, it is because the assumptions are reasonable in a particular domain.

Supervised vs unsupervised learning 16 In supervised learning, we are given ( x i, y i ) pairs and try to learn how to predict y given x. In unsupervised learning, we are just given x i vectors. The goal in unsupervised learning is to learn a model that explains the data well. There are two main kinds: Dimensionality reduction (eg PCA) Clustering (eg K-means)

Outline 17 Administrivia Machine learning: some basic definitions. Simple examples of regression. Real-world applications of regression. Simple examples of classification. Real-world applications of classification.

Linear regression 18 The output density is a 1D Gaussian (Normal) conditional on x: p(y x) = N(y; β T x,σ) = N(y;β 0 + β 1 x 1 + β p x p,σ) 1 N(y µ,σ) = exp( 1 2πσ 2σ 2(y µ)t (y µ)) For example, y = ax 1 +b is represented as x = (1,x 1 ) and β = (b,a).

Polynomial regression 19 If we use linear regression with non-linear basis functions p(y x 1 ) = N(y β T [1,x 1,x 2 1,...,xk 1 ],σ) we can produce curves like the one below. Note: In this class, we will often use w instead of β to denote the weight vector. t 1 0 1 0 x 1

Piecewise linear regression 20 How many pieces? Model selection problem. Where to put them? Segmentation problem.

2D linear regression 21

Piecewise linear 2D regression 22 How many pieces? Model selection problem. Where to put them? Segmentation problem.

Outline 23 Administrivia Machine learning: some basic definitions. Simple examples of regression. Real-world applications of regression. Simple examples of classification. Real-world applications of classification.

Real-world applications of regression 24 x = amount of various chemicals in my factory, y = amount of product produced. x = properties of a house (eg location, size), y = sales price. x = joint angles of my robot arm, y = location of arm in 3-space. x = stock prices today, y = stock prices tomorrow. (Time series data is not iid, and is beyond the scope of this course.)

Collaborative filtering 25 A very interesting ordinal regression problem is to build a system that can predict what ranking (from 1 to 5) you would give to a new movie. The input might just be the name of the movie, plus your past voting patterns, and those of other users. The collaborative filtering approach says you will give the same score as those people who have similar movie tastes to you, which can you infer by looking at past voting patterns. For each movie and each user, you can infer a set of latent traits and use these to predict (related to SVD of a matrix).

Netflix prize 26 The netflix prize http://netflixprize.com/ is an award of $1M USD for a system that can predict your movie preferences 10% more accurately than their current system (called Cinematch). A large training data set is provided: a sparse 18k 480k matrix (movies users) containing about 100M rankings (on the scale 1:5) of various movies. The test (probe) set is 2.8M (movie,user) pairs, for which the ranking is known but withheld from the training set. The performance measure is root mean square error: rmse = 1 n (R(u n i,m i ) ˆR(u i,m i )) 2 (1) i=1 where R(u i,m i ) is the true rating of user u i on movie m i, and ˆR(u i,m i ) is the prediction.

Outline 27 Administrivia Machine learning: some basic definitions. Simple examples of regression. Real-world applications of regression. Simple examples of classification. Real-world applications of classification.

Linearly separable 2D data 28 2D inputs x i IR 2, binary outputs y {0, 1}. The line is called a decision boundary. Points to the right are classified as y = 1, points to the left as y = 0.

Logistic regression 29 A simple approach to binary classification is logistic regression (briefly studied in 306). The output density is Bernoulli conditional on x: p(y x) = π(x) y (1 π(x)) 1 y where y {0, 1} and π(x) = σ( w T [1,x 1,x 2 ]) where 1 σ(u) = 1 + e u is the sigmoid (logistic) function that maps IR to [0, 1]. Hence 1 P(Y = 1 x) = 1 + e w 0+w 1 x 1 +w 2 x 2 where w 0 is the bias (offset) term corresponding to the dummy column of 1s added to the design matrix.

2D Logistic regression 30

Non-linearly separable 2D data 31 In 306, this is called checkerboard data. In machine learning, this is called the xor problem. The true function is y = x 1 x 2. The decision boundary is non-linear.

Logistic regression with quadratic features 32 We can separate the classes using P(Y = 1 x 1,x 2 ) = σ(w T [1,x 1,x 2,x 2 1,x2 2,x 1,x 2 ])

Outline 33 Administrivia Machine learning: some basic definitions. Simple examples of regression. Real-world applications of regression. Simple examples of classification. Real-world applications of classification.

Handwritten digit recognition 34 Multi-class classification.

Gene microarray expression data 35 Rows = examples, columns = features (genes). Short, fat data (p n). Might need to perform feature selection.

Other examples of classification 36 Email spam filtering (spam vs not spam) Detecting credit card fraud (fraudulent or legitimate) Face detection in images (face or background) Web page classification (sports vs politics vs entertainment etc) Steering an autonomous car across the US (turn left, right, or go straight)