Last Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression

Similar documents
UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia

Last Lecture Recap. UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 6: Regression Models with Regulariza8on

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 9: Classifica8on with Support Vector Machine (cont.

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 7: Regression Models - Review HW1 DUE NOW / HW2 OUT TODAY 9/18/14

UVA CS 4501: Machine Learning. Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce. Dr. Yanjun Qi. University of Virginia

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler

UVA CS 4501: Machine Learning. Lecture 6 Extra: Op=miza=on for Linear Regression Model with Regulariza=ons. Dr. Yanjun Qi. University of Virginia

COMP 562: Introduction to Machine Learning

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

UVA CS 4501: Machine Learning

Exponen'al growth and differen'al equa'ons

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

CS 6140: Machine Learning Spring 2016

UVA CS 4501: Machine Learning

3.3 Increasing & Decreasing Functions and The First Derivative Test

1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data

Linear Regression with mul2ple variables. Mul2ple features. Machine Learning

Where are we? è Five major sec3ons of this course

Natural Language Processing with Deep Learning. CS224N/Ling284

Announcements. Topics: Homework: - sec0ons 1.2, 1.3, and 2.1 * Read these sec0ons and study solved examples in your textbook!

Machine Learning Basics

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Least Mean Squares Regression. Machine Learning Fall 2017

Pseudospectral Methods For Op2mal Control. Jus2n Ruths March 27, 2009

Bellman s Curse of Dimensionality

Natural Language Processing with Deep Learning. CS224N/Ling284

Priors in Dependency network learning

REGRESSION AND CORRELATION ANALYSIS

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Reduced Models for Process Simula2on and Op2miza2on

Experiment 1: Linear Regression

Unsupervised Learning: K- Means & PCA

Tangent lines, cont d. Linear approxima5on and Newton s Method

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

Regression.

Bias/variance tradeoff, Model assessment and selec+on

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3.

Lecture 17: Face Recogni2on

Lecture 5 Multivariate Linear Regression

Machine Learning - MT Linear Regression

Gradient Descent for High Dimensional Systems

Be#er Generaliza,on with Forecasts. Tom Schaul Mark Ring

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Least Squares Parameter Es.ma.on

Lecture 2: Linear Algebra Review

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

STA 4273H: Sta-s-cal Machine Learning

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Introduction to Machine Learning

INTRODUCTION TO DATA SCIENCE

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

STAD68: Machine Learning

Lecture 17: Face Recogni2on

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

Machine Learning 4771

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 18: Principal Component Analysis (PCA) Dr. Yanjun Qi. University of Virginia

Lecture Topics. Structural Geology Sec3on Steve Martel POST Introduc3on. 2 Rock structures. 3 Stress and strain 4 Isostacy

ECE521 week 3: 23/26 January 2017

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Lecture 17: Neural Networks and Deep Learning

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models

Some Review and Hypothesis Tes4ng. Friday, March 15, 13

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Optimization

ML (cont.): SUPPORT VECTOR MACHINES

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

Supervised Learning. George Konidaris

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models

Loss Functions and Optimization. Lecture 3-1

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz

Linear Regression and Correla/on

CS60021: Scalable Data Mining. Large Scale Machine Learning

Machine Learning: Logistic Regression. Lecture 04

CS 730/730W/830: Intro AI

Classical Mechanics Lecture 21

Algorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Machine Learning Linear Models

The Rules. The Math Game. More Rules. Teams. 1. Slope of tangent line. Are you ready??? 10/24/17

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$5:$NonALinear$Regression$ Models$

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Basic Math Review for CS4830

Correla'on. Keegan Korthauer Department of Sta's'cs UW Madison

Mustafa Jarrar. Machine Learning Linear Regression. Birzeit University. Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

CS 6375 Machine Learning

Lecture 2 Machine Learning Review

Electricity & Magnetism Lecture 12

Transcription:

UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 3: Linear Regression Yanjun Qi / Jane University of Virginia Department of Computer Science 1 Last Lecture Recap q Data Representa8on q Linear Algebra Review 2 1

e.g. SUPERVISED LEARNING Find function to map input space X to output space Y So that the difference between y and f(x) of each example x is small. 3 A Dataset Data/points/instances/examples/samples/records: [ rows ] Features/a0ributes/dimensions/independent variables/covariates/ predictors/regressors: [ columns, except the last] Target/outcome/response/label/dependent variable: special column to be predicted [ last column ] 4 2

Main Types of Columns Con7nuous: a real number, for example, age or height Discrete: a symbol, like Good or Bad 5 e.g. SUPERVISED LEARNING Training dataset consists of input- output pairs e.g. Here, target Y is a discrete target variable f(x? ) 6 3

MATRIX OPERATIONS 1) Transposition 2) Addition and Subtraction 3) Multiplication 4) Norm (of vector) 5) Matrix Inversion 6) Matrix Rank 7) Matrix calculus Yanjun Qi / UVA CS 7 4501-01- 6501-07 Today q Linear regression (aka least squares) q Learn to derive the least squares es]mate by op]miza]on q Evalua]on with Cross- valida]on 8 4

For Example, Machine learning for apartment hun]ng Now youve moved to Charloeesville!! And you want to find the most reasonably priced apartment sa]sfying your needs: square- h., of bedroom, distance to campus Living area (ft 2 ) bedroom Rent () 230 1 600 506 2 1000 433 2 1100 109 1 500 150 1? 270 1.5? 9 For Example, Machine learning for apartment hun]ng Living area (ft 2 ) bedroom Rent () 230 1 600 506 2 1000 433 2 1100 109 1 500 150 1? 270 1.5? 10 5

Linear SUPERVISED LEARNING e.g. ŷ = f (x) =θ 0 +θ 1 x 1 +θ 2 x 2 Features: Living area, distance to campus, bedroom Target: Rent 11 Remember this: Linear? Y=mX+B? A slope of 2 (i.e. m=2) means that every 1- unit change in X yields a 2- unit change in Y. m B 6

rent Living area rent Loca8on Living area 13 A new representa]on Assume that each sample x is a column vector, Here we assume a vacuous "feature" X 0 =1 (this is the intercept term ), and define the feature vector to be: x T =[x 0,x 1, x 2, x p-1 ] the parameter vector is also a column vector " % θ = θ 0 θ 1! θ p 1 θ ŷ = f (x) = x T θ 14 7

Training / learning problem We can represent the whole Training set: " T % " x 1 T x X = 2 =!!! T x n! y 1 y Y = 2! " y n % x 1 0 x 2 0 x 1 1 x 2 1 x 1 p 1 x 2 p 1!!!! x n 0 x n 1 x n p 1 % Predicted output for each training sample:! " f (x T! 1 ) x 1T θ f (x T 2 ) x = 2T θ = Xθ!! f (x T n ) % " x nt θ % 15 training / learning goal Using matrix form, we get the following general representa]on of the linear func]on: Y = Xθ Our goal is to pick the op]mal that minimize the following cost n func]on: J(θ) = 1 2 i=1 θ ( f (! x i ) y i ) 2 16 8

Today q Linear regression (aka least squares) q Learn to derive the least squares es]mate by op]miza]on q Evalua]on with Cross- valida]on 17 Method I: normal equa]ons Write the cost func]on in matrix form: n 1 T J ( θ ) = ( xi θ yi ) 2 i= 1 1! = 2 1 = 2! T ( Xθ y) ( Xθ y) 2 T T T T!! T! T! ( θ X Xθ θ X y y Xθ + y y) To minimize J(θ), take deriva]ve and set to zero: * θ X Xθ = X y T T! The normal equa8ons = T 1 T! ( X X ) X y " T % x 1 T x X = 2!!! T x n! Y = " 18 y 1 y 2! y n % 9

Review: Special Uses for Matrix Mul8plica8on Dot (or Inner) Product of two Vectors <x, y> which is the sum of products of elements in similar posi]ons for the two vectors <x, y> = <y, x> Where <x, y> = 19 Review: Special Uses for Matrix Mul8plica8on Sum the Squared Elements of a Vector Premul]ply a column vector a by its transpose If 5 a = 2 8 then premul]plica]on by a row vector a T a T =! " 5 2 8 will yield the sum of the squared values of elements for a, i.e.! a T a =! " 5 2 8 5 % 2 " % 8 = 52 + 2 2 + 8 2 = 93 20 10

Review: Matrix Calculus: Types of Matrix Deriva8ves By Thomas Minka. Old and New Matrix Algebra Useful for Sta]s]cs 21 22 11

23 24 12

Comments on the normal equa]on In most situa]ons of prac]cal interest, the number of data points N is larger than the dimensionality p of the input space and the matrix X is of full column rank. If this condi]on holds, then it is easy to verify that X T X is necessarily inver]ble. The assump]on that X T X is inver]ble implies that it is posi]ve definite, thus the cri]cal point we have found is a minimum. What if X has less than full column rank? à regulariza]on (later). 25 Today q Linear regression (aka least squares) q Learn to derive the least squares es]mate by op]miza]on q Evalua]on with Train/Test OR k- folds Cross- valida]on 26 13

TYPICAL MACHINE LEARNING SYSTEM Low-level sensing Preprocessing Feature Extract Feature Select Inference, Prediction, Recognition Label Collection Evaluation 27 Evaluation Choice-I: Train and Test Training dataset consists of input- output pairs Evaluation f(x? ) Measure Loss on pair è ( f(x? ), y? ) 28 14

Yanjun Qi / UVA CS 294501-01- 6501-07 Evaluation Choice-I: e.g. for supervised classification ü Training (Learning): Learn a model using the training data ü Tes]ng: Test the model using unseen test data to assess the model accuracy Accuracy = Number of correct classifications, Total number of test cases Evaluation Choice-II: Cross Valida]on Problem: don t have enough data to set aside a test set Solu]on: Each data point is used both as train and test Common types: - K- fold cross- valida]on (e.g. K=5, K=10) - 2- fold cross- valida]on - Leave- one- out cross- valida]on 30 15

K- fold Cross Valida]on Basic idea: - Split the whole data to N pieces; - N- 1 pieces for fit model; 1 for test; - Cycle through all N cases; - K=10 folds a common rule of thumb. The advantage: - all pieces are used for both training and valida]on; - each observa]on is used for valida]on exactly once. 31 e.g. 10 fold Cross Valida]on Divide data into 10 equal pieces 9 pieces as training set, the rest 1 as test set Collect the scores from the diagonal model P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 1 train train train train train train train train train test 2 train train train train train train train train test train 3 train train train train train train train test train train 4 train train train train train train test train train train 5 train train train train train test train train train train 6 train train train train test train train train train train 7 train train train test train train train train train train 8 train train test train train train train train train train 9 train test train train train train train train train train 10 test train train train train train train train train train 32 16

e.g. 5 fold Cross Valida]on 33 Today Recap q Linear regression (aka least squares) q Learn to derive the least squares es]mate by op]miza]on q Evalua]on with Train/Test OR k- folds Cross- valida]on 34 17

References Big thanks to Prof. Eric Xing @ CMU for allowing me to reuse some of his slides q hep://www.cs.cmu.edu/~zkolter/course/ 15-884/linalg- review.pdf (please read) q hep://www.cs.cmu.edu/~aar]/class/10701/ recita]on/linearalgebra_matlab_rev iew.ppt q Prof. Alexander Gray s slides 35 18