Foundation of Intelligent Systems, Part I. Regression

Size: px
Start display at page:

Download "Foundation of Intelligent Systems, Part I. Regression"

Transcription

1 Foundation of Intelligent Systems, Part I Regression mcuturi@i.kyoto-u.ac.jp FIS

2 Before starting Please take this survey before the end of this week. Here are a few books which you can check beyond the slides. Elements of Statistical Learning, Hastie Tibshirani Friedman Pattern Recognition, Theodoridis Koutroumbras Pattern Recognition & Machine Learning, Bishop You can also check Andrew Ng s video lectures (Stanford) FIS

3 Fundamentals in Regression Can be studied from different viewpoints: statistical, linear algebra, AI... etc.. Linear regression is currently revived by different ideas in sparsity Lasso (1996 ) SVM for regression (1996 ) Compressed Sensing (2002 ) FIS

4 One of the most standard data analysis tasks: Regression Data: many observations of the same data type We have a database {x 1,,x N }. Each datapoint x j can be encoded as a vector of features x j = x 1,j x 2,j. x d,j Each feature x i,j (1 i d) of a given point x j is a number. FIS

5 One of the most standard data analysis tasks: Regression This database can be seen as a R d N matrix {x 1,,x N } x 1,1 x 1,2 x 1,N x 2,1 x 2,2 x 2,N x 3,1 x 3,2 x 3,N... x d,1 x d,2 x d,n FIS

6 Examples Credit card holder x j = Patient x j = Blog x j = Income Age. Work history (months) Family # Credit Incidents height weight. # minutes exercise/week LDL cholesterol HDL cholesterol avg. pages view/month # posts. avg. # comments/month revenue from ads/month FIS

7 Within such variables... Some variables are very cheap to measure, others very expensive. Some variables might have a causal effect on other variables. In the regression setting, the d variables are split between k regressor (or predictor) variables d k response (or predicted) variables. FIS

8 Regression In the regression setting, the d variables are split between k regressor (or predictor) variables d k response (or predicted) variables. guess response(expensive) variables using regressor(cheap) ones. FIS

9 The Regression Problem Given, A database {x 1,,x N } X = x 1,1 x 1,2 x 1,N x 2,1 x 2,2 x 2,N x 3,1 x 3,2 x 3,N... x d,1 x d,2 x d,n FIS

10 The Regression Problem Given, A database {x 1,,x N } X = x 1,1 x 1,2 x 1,N x 2,1 x 2,2 x 2,N x 3,1 x 3,2 x 3,N... x d 1,1 x d 1,2 x d 1,N x d,1 x d,2 x d,n A set of k regressors variables Reg {1,,d} A set of d k response variable Res {1,,d} FIS

11 The Regression Problem Regression = build a function f : R k R d-k such that, x,f((x i ) i Reg ) (x k ) k Res. e.g. if d = 6, k = 4, Reg= {1,2,3,4}, Res= {5,6} we look for a function f : R 4 R 2, f(x 1,x 2,x 3,x 4 ) (x 5,x 6 ) FIS

12 Examples continued Credit card holder x j = Patient x j = Blog x j = Income Age. Work history (months) Family # Credit Incidents height weight. # minutes exercise/week LDL cholesterol HDL cholesterol avg. pages view/month # posts. avg. # comments/month revenue from ads/month FIS

13 Examples continued Credit card holder x j = Patient x j = Blog x j = Income Age. Work history (months) Family # Credit Incidents height weight. # minutes exercise/week LDL cholesterol HDL cholesterol avg. pages view/month # posts. avg. # comments/month revenue from ads/month FIS

14 In the following slides... We only consider tasks with one response variable All other variables are regressors. We rename the response variable y and reassign x 1,,x d for the regressors predicting more than one variable? heavier mathematically, but similar. FIS

15 In the following slides... We assume that y takes continuous values. When y takes discrete values, notably binary {0, 1} things change a bit. Yet... binary real : regression techniques work on discrete data but real binary... we ll discuss that later. FIS

16 Today s Example: Your apartment Collected information about 285 (out of 1226) apartments close to Kyoto U. Source: FIS

17 Today s Example: Your apartment Collected information about 285 (out of 1226) apartments close to Kyoto U. Kept 4 variables: Surface, Rent, Age of Building, Walking distance to station. FIS

18 What does the matrix look like? imagecs(h); colorbar; 285 columns, 4 lines. Each column represents one apartment. In these slides, we will regress the rent using age, surface and distance FIS

19 Regression: one variable vs. another FIS

20 Rent vs. Surface 14 Scatter plot of Rent vs. Surface apartment 12 Rent (x JPY) Surface Note that the dataset has been censored above JPY FIS

21 Rent vs. Surface 14 y = 0.13*x Scatter plot of Rent vs. Surface apartment linear Rent (x JPY) Surface Using the linear tool in curve fitting, we obtain the approximation y = 0.13x+2.4 FIS

22 Rent vs. Surface Scatter plot of Rent vs. Surface y = 0.13*x y = *x *x 0.71 y = 6.2e 05*x *x *x 3.1 y = 1.4e 06*x *x *x *x 1.2 y = 1.8e 07*x e 05*x *x *x 2 1.1*x apartment linear quadratic cubic 4th degree 5th degree Rent (x JPY) Surface We can use higher order polynomials... yet look at the results. FIS

23 Behind the curve fitting tool Matlab selects these curves using the least-squares criterion e.g min f F N (y j f(x j )) 2 where F is a class of functions Matlab considers a few function classes for F. FIS

24 Behind the curve fitting tool Matlab selects these curves using the least-squares criterion e.g min f F N (y j f(x j )) 2 where F is a class of functions Matlab considers a few function classes for F. Among them.. Linear min b,a 1 R N (y j (b + a 1 x j )) 2 FIS

25 Behind the curve fitting tool Matlab selects these curves using the least-squares criterion e.g min f F N (y j f(x j )) 2 where F is a class of functions Matlab considers a few function classes for F. Among them.. Linear min b,a 1 R N (y j (b + a 1 x j )) 2 Quadratic min b,a 1,a 2 R N ) 2 (y j (b + a 1 x j +a 2 x 2j ) FIS

26 Behind the curve fitting tool Matlab selects these curves using the least-squares criterion e.g min f F N (y j f(x j )) 2 where F is a class of functions Matlab considers a few function classes for F. Among them.. Linear min b,a 1 R N (y j (b + a 1 x j )) 2 Quadratic min b,a 1,a 2 R Cubic etc. min b,a 1,a 2,a 3 R N ) 2 (y j (b + a 1 x j +a 2 x 2j ) N ) 2 (y j (b + a 1 x j +a 2 x 2j +a 3x 3j ) FIS

27 How can we solve this? The linear case Let s take a look at the function (a,b) N (y j (b+ax j )) 2. Using the notations Rent Y = [ y 1 y 2 y N ] Surface X = [ x 1 x 2 x N ] Constant 1 N = [ ] we have that N (y j (ax j +b)) 2 = Y ax b1 N 2 FIS

28 Contour plot of (a,b) Y ax b1 N 2 Since we only handle 2 parameters, we can make a contour plot 3 norm(h(:,4) a H(:,2) b) b a This validates the equation y = 0.13x+2.4. How to get there? FIS

29 We define the function L as Some linear algebra L : (a,b) N (y j (b+ax j )) 2 The partial derivatives of L can be computed. N L a = 2 L b = 2 N (y j (b+ax j ))x j y j (b+ax j ) Any minimum (a,b ) of L must be a saddle point. FIS

30 Some linear algebra Namely, the partial derivatives of L at (a,b ) must be zero L ( a = 2 a x 2 j +b x j ) y j x j L ( b = 2 Nb y j +a ) x j Hence (a,b ) must satisfy the linear system 0 = a x 2 j +b x j y j x j 0 = Nb y j +a x j Namely, [ a b ] = [ ] x 2 1 [ ] j xj yj yj x j xj N ans = FIS

31 Rent vs. Surface Scatter plot of Rent vs. Surface y = 0.13*x y = *x *x 0.71 y = 6.2e 05*x *x *x 3.1 y = 1.4e 06*x *x *x *x 1.2 y = 1.8e 07*x e 05*x *x *x 2 1.1*x apartment linear quadratic cubic 4th degree 5th degree Rent (x JPY) Surface We understood how to get the linear curve. What about the quadratic? FIS

32 same idea... define What about the quadratic case? Quadratic min b,a 1,a 2 R N ) (y j (b + a 1 x j + a 2 x 2j ) L : (a 1,a 2,b) N ( yj ( b+a 1 x j +a 2 x 2 2 j)) look at the objective s derivatives... L a 2 = 2 L a 1 = 2 N N L b = 2 N ( yj ( b+a 1 x j +a 2 x 2 j)) x 2 j ( yj ( b+a 1 x j +a 2 x 2 )) j xj ( yj ( b+a 1 x j +a 2 x 2 j)) FIS

33 What about the quadratic case? We consider the equations that a saddle point must verify: 0 = 0 = N N 0 = ( yj ( b +a 1x j +a 2x 2 j)) x 2 j ( yj ( )) b +a 1x j +a 2x 2 j xj N ( yj ( b +a 1x j +a 2x 2 j)) a 2 a 1 b = x 4 j x 3 j x 2 j x 3 j x 2 j xj x 2 j xj N 1 yj x 2 j yj yj x j ans = FIS

34 Higher order polynomials Intuitively, for polynomial up to degree p we would have to Build the corresponding Toeplitz Matrix Build the corresponding vector with y and x combined at different exponents Solve the linear system Surprisingly Finding the best p th order polynomial with least-squares Solving a p dimensional linear system Not so surprising after all: Least-squares: objective of degree 2 in coefficients Minimum saddle point system of degree 1.. Least-squares has been chosen because it yields a linear system... FIS

35 The general case: one vs. all rest What about using all other variables? Rent Y = [ ] y 1 y 2 y N... All other variables X = x 1 x 2 x N... FIS

36 The general case We assume that we have d regressor variables, 1 response variable. Consider again the linear approach. We look for a function f of the form f(x) = α 0 +α 1 x 1 +α 2 x 2 + +α d x d. We want to determine d+1 weights, a constant α 0 1 i d,α i weights for each variable. Least squares: L(α 0,α 1,α 2,,α d ) = N ( ) 2 y j (α 0 +α 1 x 1,j +α 2 x 2,j + +α d x d,j ) FIS

37 The general case Notice that L(α 0,α 1,α 2,,α d ) N i=1 y i α 0 + α 1. α d T x i 2 = α 0. α d T X Y 2, where and X = x 1 x 2 x N Rd+1 N... Y = [ y 1 y N ] R N. We write α for the d+1 dimensional vector α 0. α d. FIS

38 Linear least squares Expanding this expression, L(α) = ( α T XX T α 2YX T α+ Y 2) Consider the gradient of that function L = 2XX T α 2XY T Hence this gradient is zero for α = (XX T ) 1 XY T XX T S n +, that is XX T is a positive (semi)definite matrix. This works if XX T R d+1 is invertible, that is XX T S n ++. FIS

39 Considering again rents vs the rest Getting the data again, adding a line of 1 s >> (X X )\(X Y ) ans = FIS

40 What went wrong? 14 Scatter plot of Rent vs. Surface apartment 12 Rent (x JPY) Surface rent= age surf dist JPY FIS

41 What happens if we remove outliers? (surf> 40) 9 Scatter plot of Rent vs. Surface 8 7 Rent (x JPY) Surface >> (X X )\(X Y ) ans = x age x surface x distance JPY Moral of the story: easy to draw wrong conclusions even with simple tools FIS

42 What else can go wrong? Next time... What happens when d n? (XX T ) is no longer invertible... high-dimensional data in genomics, images analysis (lots of features) What happens when (XX T ) is badly conditioned ( λ min(xx T ) λ max (XX T ) 0)? if λ min (XX T ) = 1e 10, λ max ( (XX T ) 1) = 1e10!! Very bad numerical stability of the solution... When d n, we might want to do variable selection, i.e.pick a subset d of the d variables which is relevant to predict y. i.e.favor vectors β such that β 0 = cardβ i 0 is small. FIS

Foundation of Intelligent Systems, Part I. Regression 2

Foundation of Intelligent Systems, Part I. Regression 2 Foundation of Intelligent Systems, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Some Words on the Survey Not enough answers to say anything meaningful! Try again: survey. FIS - 2013 2 Last

More information

Statistical Machine Learning, Part I. Regression 2

Statistical Machine Learning, Part I. Regression 2 Statistical Machine Learning, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp SML-2015 1 Last Week Regression: highlight a functional relationship between a predicted variable and predictors SML-2015 2 Last

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1 Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

Polyhedral Computation. Linear Classifiers & the SVM

Polyhedral Computation. Linear Classifiers & the SVM Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology ..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Sample questions for Fundamentals of Machine Learning 2018

Sample questions for Fundamentals of Machine Learning 2018 Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Convex Optimization & Machine Learning. Introduction to Optimization

Convex Optimization & Machine Learning. Introduction to Optimization Convex Optimization & Machine Learning Introduction to Optimization mcuturi@i.kyoto-u.ac.jp CO&ML 1 Why do we need optimization in machine learning We want to find the best possible decision w.r.t. a problem

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

Lecture 10: Logistic Regression

Lecture 10: Logistic Regression BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 10: Logistic Regression Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline An

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Mustafa Jarrar. Machine Learning Linear Regression. Birzeit University. Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018

Mustafa Jarrar. Machine Learning Linear Regression. Birzeit University. Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018 Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018 Version 1 Machine Learning Linear Regression Mustafa Jarrar Birzeit University 1 Watch this lecture and download the slides Course

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Factoring. Expressions and Operations Factoring Polynomials. c) factor polynomials completely in one or two variables.

Factoring. Expressions and Operations Factoring Polynomials. c) factor polynomials completely in one or two variables. Factoring Strand: Topic: Primary SOL: Related SOL: Expressions and Operations Factoring Polynomials AII.1 The student will AII.8 c) factor polynomials completely in one or two variables. Materials Finding

More information

ES-2 Lecture: More Least-squares Fitting. Spring 2017

ES-2 Lecture: More Least-squares Fitting. Spring 2017 ES-2 Lecture: More Least-squares Fitting Spring 2017 Outline Quick review of least-squares line fitting (also called `linear regression ) How can we find the best-fit line? (Brute-force method is not efficient)

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014 Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Fundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015

Fundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015 Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Learning From Data: Modelling as an Optimisation Problem

Learning From Data: Modelling as an Optimisation Problem Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

5.3. Polynomials and Polynomial Functions

5.3. Polynomials and Polynomial Functions 5.3 Polynomials and Polynomial Functions Polynomial Vocabulary Term a number or a product of a number and variables raised to powers Coefficient numerical factor of a term Constant term which is only a

More information

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants Advanced Machine Learning Lecture 2 Neural Networks 24..206 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI) 28..

More information

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering DATA MINING LECTURE 9 Minimum Description Length Information Theory Co-Clustering MINIMUM DESCRIPTION LENGTH Occam s razor Most data mining tasks can be described as creating a model for the data E.g.,

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

CS 540: Machine Learning Lecture 1: Introduction

CS 540: Machine Learning Lecture 1: Introduction CS 540: Machine Learning Lecture 1: Introduction AD January 2008 AD () January 2008 1 / 41 Acknowledgments Thanks to Nando de Freitas Kevin Murphy AD () January 2008 2 / 41 Administrivia & Announcement

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Machine Learning - MT Linear Regression

Machine Learning - MT Linear Regression Machine Learning - MT 2016 2. Linear Regression Varun Kanade University of Oxford October 12, 2016 Announcements All students eligible to take the course for credit can sign-up for classes and practicals

More information

Fundamentals of Machine Learning (Part I)

Fundamentals of Machine Learning (Part I) Fundamentals of Machine Learning (Part I) Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp April 12, 2018 Mohammad Emtiyaz Khan 2018 1 Goals Understand (some) fundamentals

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Archdiocese of Washington Catholic Schools Academic Standards Mathematics

Archdiocese of Washington Catholic Schools Academic Standards Mathematics ALGEBRA 1 Standard 1 Operations with Real Numbers Students simplify and compare expressions. They use rational exponents, and simplify square roots. A1.1.1 A1.1.2 A1.1.3 A1.1.4 A1.1.5 Compare real number

More information

Randomness. What next?

Randomness. What next? Randomness What next? Random Walk Attribution These slides were prepared for the New Jersey Governor s School course The Math Behind the Machine taught in the summer of 2012 by Grant Schoenebeck Large

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

A Level Maths summer preparation work

A Level Maths summer preparation work A Level Maths summer preparation work Welcome to A Level Maths! We hope you are looking forward to two years of challenging and rewarding learning. You must make sure that you are prepared to study A Level

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

MATH36061 Convex Optimization

MATH36061 Convex Optimization M\cr NA Manchester Numerical Analysis MATH36061 Convex Optimization Martin Lotz School of Mathematics The University of Manchester Manchester, September 26, 2017 Outline General information What is optimization?

More information

Chapter Six. Polynomials. Properties of Exponents Algebraic Expressions Addition, Subtraction, and Multiplication Factoring Solving by Factoring

Chapter Six. Polynomials. Properties of Exponents Algebraic Expressions Addition, Subtraction, and Multiplication Factoring Solving by Factoring Chapter Six Polynomials Properties of Exponents Algebraic Expressions Addition, Subtraction, and Multiplication Factoring Solving by Factoring Properties of Exponents The properties below form the basis

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

where a =, and k =. Example 1: Determine if the function is a power function. For those that are not, explain why not.

where a =, and k =. Example 1: Determine if the function is a power function. For those that are not, explain why not. . Power Functions with Modeling PreCalculus. POWER FUNCTIONS WITH MODELING Learning Targets: 1. Identify a power functions.. Model power functions using the regression capabilities of your calculator.

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Math 128A: Homework 2 Solutions

Math 128A: Homework 2 Solutions Math 128A: Homework 2 Solutions Due: June 28 1. In problems where high precision is not needed, the IEEE standard provides a specification for single precision numbers, which occupy 32 bits of storage.

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Solving Quadratic & Higher Degree Equations

Solving Quadratic & Higher Degree Equations Chapter 9 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Day 4: Classification, support vector machines

Day 4: Classification, support vector machines Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Week 3: Linear Regression

Week 3: Linear Regression Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to

More information

Math 251 Midterm II Information Spring 2018

Math 251 Midterm II Information Spring 2018 Math 251 Midterm II Information Spring 2018 WHEN: Thursday, April 12 (in class). You will have the entire period (125 minutes) to work on the exam. RULES: No books or notes. You may bring a non-graphing

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Parabolas and lines

Parabolas and lines Parabolas and lines Study the diagram at the right. I have drawn the graph y = x. The vertical line x = 1 is drawn and a number of secants to the parabola are drawn, all centred at x=1. By this I mean

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information