Foundation of Intelligent Systems, Part I. Regression
|
|
- Rhoda Sharp
- 5 years ago
- Views:
Transcription
1 Foundation of Intelligent Systems, Part I Regression mcuturi@i.kyoto-u.ac.jp FIS
2 Before starting Please take this survey before the end of this week. Here are a few books which you can check beyond the slides. Elements of Statistical Learning, Hastie Tibshirani Friedman Pattern Recognition, Theodoridis Koutroumbras Pattern Recognition & Machine Learning, Bishop You can also check Andrew Ng s video lectures (Stanford) FIS
3 Fundamentals in Regression Can be studied from different viewpoints: statistical, linear algebra, AI... etc.. Linear regression is currently revived by different ideas in sparsity Lasso (1996 ) SVM for regression (1996 ) Compressed Sensing (2002 ) FIS
4 One of the most standard data analysis tasks: Regression Data: many observations of the same data type We have a database {x 1,,x N }. Each datapoint x j can be encoded as a vector of features x j = x 1,j x 2,j. x d,j Each feature x i,j (1 i d) of a given point x j is a number. FIS
5 One of the most standard data analysis tasks: Regression This database can be seen as a R d N matrix {x 1,,x N } x 1,1 x 1,2 x 1,N x 2,1 x 2,2 x 2,N x 3,1 x 3,2 x 3,N... x d,1 x d,2 x d,n FIS
6 Examples Credit card holder x j = Patient x j = Blog x j = Income Age. Work history (months) Family # Credit Incidents height weight. # minutes exercise/week LDL cholesterol HDL cholesterol avg. pages view/month # posts. avg. # comments/month revenue from ads/month FIS
7 Within such variables... Some variables are very cheap to measure, others very expensive. Some variables might have a causal effect on other variables. In the regression setting, the d variables are split between k regressor (or predictor) variables d k response (or predicted) variables. FIS
8 Regression In the regression setting, the d variables are split between k regressor (or predictor) variables d k response (or predicted) variables. guess response(expensive) variables using regressor(cheap) ones. FIS
9 The Regression Problem Given, A database {x 1,,x N } X = x 1,1 x 1,2 x 1,N x 2,1 x 2,2 x 2,N x 3,1 x 3,2 x 3,N... x d,1 x d,2 x d,n FIS
10 The Regression Problem Given, A database {x 1,,x N } X = x 1,1 x 1,2 x 1,N x 2,1 x 2,2 x 2,N x 3,1 x 3,2 x 3,N... x d 1,1 x d 1,2 x d 1,N x d,1 x d,2 x d,n A set of k regressors variables Reg {1,,d} A set of d k response variable Res {1,,d} FIS
11 The Regression Problem Regression = build a function f : R k R d-k such that, x,f((x i ) i Reg ) (x k ) k Res. e.g. if d = 6, k = 4, Reg= {1,2,3,4}, Res= {5,6} we look for a function f : R 4 R 2, f(x 1,x 2,x 3,x 4 ) (x 5,x 6 ) FIS
12 Examples continued Credit card holder x j = Patient x j = Blog x j = Income Age. Work history (months) Family # Credit Incidents height weight. # minutes exercise/week LDL cholesterol HDL cholesterol avg. pages view/month # posts. avg. # comments/month revenue from ads/month FIS
13 Examples continued Credit card holder x j = Patient x j = Blog x j = Income Age. Work history (months) Family # Credit Incidents height weight. # minutes exercise/week LDL cholesterol HDL cholesterol avg. pages view/month # posts. avg. # comments/month revenue from ads/month FIS
14 In the following slides... We only consider tasks with one response variable All other variables are regressors. We rename the response variable y and reassign x 1,,x d for the regressors predicting more than one variable? heavier mathematically, but similar. FIS
15 In the following slides... We assume that y takes continuous values. When y takes discrete values, notably binary {0, 1} things change a bit. Yet... binary real : regression techniques work on discrete data but real binary... we ll discuss that later. FIS
16 Today s Example: Your apartment Collected information about 285 (out of 1226) apartments close to Kyoto U. Source: FIS
17 Today s Example: Your apartment Collected information about 285 (out of 1226) apartments close to Kyoto U. Kept 4 variables: Surface, Rent, Age of Building, Walking distance to station. FIS
18 What does the matrix look like? imagecs(h); colorbar; 285 columns, 4 lines. Each column represents one apartment. In these slides, we will regress the rent using age, surface and distance FIS
19 Regression: one variable vs. another FIS
20 Rent vs. Surface 14 Scatter plot of Rent vs. Surface apartment 12 Rent (x JPY) Surface Note that the dataset has been censored above JPY FIS
21 Rent vs. Surface 14 y = 0.13*x Scatter plot of Rent vs. Surface apartment linear Rent (x JPY) Surface Using the linear tool in curve fitting, we obtain the approximation y = 0.13x+2.4 FIS
22 Rent vs. Surface Scatter plot of Rent vs. Surface y = 0.13*x y = *x *x 0.71 y = 6.2e 05*x *x *x 3.1 y = 1.4e 06*x *x *x *x 1.2 y = 1.8e 07*x e 05*x *x *x 2 1.1*x apartment linear quadratic cubic 4th degree 5th degree Rent (x JPY) Surface We can use higher order polynomials... yet look at the results. FIS
23 Behind the curve fitting tool Matlab selects these curves using the least-squares criterion e.g min f F N (y j f(x j )) 2 where F is a class of functions Matlab considers a few function classes for F. FIS
24 Behind the curve fitting tool Matlab selects these curves using the least-squares criterion e.g min f F N (y j f(x j )) 2 where F is a class of functions Matlab considers a few function classes for F. Among them.. Linear min b,a 1 R N (y j (b + a 1 x j )) 2 FIS
25 Behind the curve fitting tool Matlab selects these curves using the least-squares criterion e.g min f F N (y j f(x j )) 2 where F is a class of functions Matlab considers a few function classes for F. Among them.. Linear min b,a 1 R N (y j (b + a 1 x j )) 2 Quadratic min b,a 1,a 2 R N ) 2 (y j (b + a 1 x j +a 2 x 2j ) FIS
26 Behind the curve fitting tool Matlab selects these curves using the least-squares criterion e.g min f F N (y j f(x j )) 2 where F is a class of functions Matlab considers a few function classes for F. Among them.. Linear min b,a 1 R N (y j (b + a 1 x j )) 2 Quadratic min b,a 1,a 2 R Cubic etc. min b,a 1,a 2,a 3 R N ) 2 (y j (b + a 1 x j +a 2 x 2j ) N ) 2 (y j (b + a 1 x j +a 2 x 2j +a 3x 3j ) FIS
27 How can we solve this? The linear case Let s take a look at the function (a,b) N (y j (b+ax j )) 2. Using the notations Rent Y = [ y 1 y 2 y N ] Surface X = [ x 1 x 2 x N ] Constant 1 N = [ ] we have that N (y j (ax j +b)) 2 = Y ax b1 N 2 FIS
28 Contour plot of (a,b) Y ax b1 N 2 Since we only handle 2 parameters, we can make a contour plot 3 norm(h(:,4) a H(:,2) b) b a This validates the equation y = 0.13x+2.4. How to get there? FIS
29 We define the function L as Some linear algebra L : (a,b) N (y j (b+ax j )) 2 The partial derivatives of L can be computed. N L a = 2 L b = 2 N (y j (b+ax j ))x j y j (b+ax j ) Any minimum (a,b ) of L must be a saddle point. FIS
30 Some linear algebra Namely, the partial derivatives of L at (a,b ) must be zero L ( a = 2 a x 2 j +b x j ) y j x j L ( b = 2 Nb y j +a ) x j Hence (a,b ) must satisfy the linear system 0 = a x 2 j +b x j y j x j 0 = Nb y j +a x j Namely, [ a b ] = [ ] x 2 1 [ ] j xj yj yj x j xj N ans = FIS
31 Rent vs. Surface Scatter plot of Rent vs. Surface y = 0.13*x y = *x *x 0.71 y = 6.2e 05*x *x *x 3.1 y = 1.4e 06*x *x *x *x 1.2 y = 1.8e 07*x e 05*x *x *x 2 1.1*x apartment linear quadratic cubic 4th degree 5th degree Rent (x JPY) Surface We understood how to get the linear curve. What about the quadratic? FIS
32 same idea... define What about the quadratic case? Quadratic min b,a 1,a 2 R N ) (y j (b + a 1 x j + a 2 x 2j ) L : (a 1,a 2,b) N ( yj ( b+a 1 x j +a 2 x 2 2 j)) look at the objective s derivatives... L a 2 = 2 L a 1 = 2 N N L b = 2 N ( yj ( b+a 1 x j +a 2 x 2 j)) x 2 j ( yj ( b+a 1 x j +a 2 x 2 )) j xj ( yj ( b+a 1 x j +a 2 x 2 j)) FIS
33 What about the quadratic case? We consider the equations that a saddle point must verify: 0 = 0 = N N 0 = ( yj ( b +a 1x j +a 2x 2 j)) x 2 j ( yj ( )) b +a 1x j +a 2x 2 j xj N ( yj ( b +a 1x j +a 2x 2 j)) a 2 a 1 b = x 4 j x 3 j x 2 j x 3 j x 2 j xj x 2 j xj N 1 yj x 2 j yj yj x j ans = FIS
34 Higher order polynomials Intuitively, for polynomial up to degree p we would have to Build the corresponding Toeplitz Matrix Build the corresponding vector with y and x combined at different exponents Solve the linear system Surprisingly Finding the best p th order polynomial with least-squares Solving a p dimensional linear system Not so surprising after all: Least-squares: objective of degree 2 in coefficients Minimum saddle point system of degree 1.. Least-squares has been chosen because it yields a linear system... FIS
35 The general case: one vs. all rest What about using all other variables? Rent Y = [ ] y 1 y 2 y N... All other variables X = x 1 x 2 x N... FIS
36 The general case We assume that we have d regressor variables, 1 response variable. Consider again the linear approach. We look for a function f of the form f(x) = α 0 +α 1 x 1 +α 2 x 2 + +α d x d. We want to determine d+1 weights, a constant α 0 1 i d,α i weights for each variable. Least squares: L(α 0,α 1,α 2,,α d ) = N ( ) 2 y j (α 0 +α 1 x 1,j +α 2 x 2,j + +α d x d,j ) FIS
37 The general case Notice that L(α 0,α 1,α 2,,α d ) N i=1 y i α 0 + α 1. α d T x i 2 = α 0. α d T X Y 2, where and X = x 1 x 2 x N Rd+1 N... Y = [ y 1 y N ] R N. We write α for the d+1 dimensional vector α 0. α d. FIS
38 Linear least squares Expanding this expression, L(α) = ( α T XX T α 2YX T α+ Y 2) Consider the gradient of that function L = 2XX T α 2XY T Hence this gradient is zero for α = (XX T ) 1 XY T XX T S n +, that is XX T is a positive (semi)definite matrix. This works if XX T R d+1 is invertible, that is XX T S n ++. FIS
39 Considering again rents vs the rest Getting the data again, adding a line of 1 s >> (X X )\(X Y ) ans = FIS
40 What went wrong? 14 Scatter plot of Rent vs. Surface apartment 12 Rent (x JPY) Surface rent= age surf dist JPY FIS
41 What happens if we remove outliers? (surf> 40) 9 Scatter plot of Rent vs. Surface 8 7 Rent (x JPY) Surface >> (X X )\(X Y ) ans = x age x surface x distance JPY Moral of the story: easy to draw wrong conclusions even with simple tools FIS
42 What else can go wrong? Next time... What happens when d n? (XX T ) is no longer invertible... high-dimensional data in genomics, images analysis (lots of features) What happens when (XX T ) is badly conditioned ( λ min(xx T ) λ max (XX T ) 0)? if λ min (XX T ) = 1e 10, λ max ( (XX T ) 1) = 1e10!! Very bad numerical stability of the solution... When d n, we might want to do variable selection, i.e.pick a subset d of the d variables which is relevant to predict y. i.e.favor vectors β such that β 0 = cardβ i 0 is small. FIS
Foundation of Intelligent Systems, Part I. Regression 2
Foundation of Intelligent Systems, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Some Words on the Survey Not enough answers to say anything meaningful! Try again: survey. FIS - 2013 2 Last
More informationStatistical Machine Learning, Part I. Regression 2
Statistical Machine Learning, Part I Regression 2 mcuturi@i.kyoto-u.ac.jp SML-2015 1 Last Week Regression: highlight a functional relationship between a predicted variable and predictors SML-2015 2 Last
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell
More informationLinear Regression. S. Sumitra
Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationPolyhedral Computation. Linear Classifiers & the SVM
Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More information... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology
..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationSample questions for Fundamentals of Machine Learning 2018
Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationLinear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationConvex Optimization & Machine Learning. Introduction to Optimization
Convex Optimization & Machine Learning Introduction to Optimization mcuturi@i.kyoto-u.ac.jp CO&ML 1 Why do we need optimization in machine learning We want to find the best possible decision w.r.t. a problem
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationMachine Learning Linear Models
Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method
More informationLecture 10: Logistic Regression
BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 10: Logistic Regression Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline An
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationMustafa Jarrar. Machine Learning Linear Regression. Birzeit University. Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018
Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018 Version 1 Machine Learning Linear Regression Mustafa Jarrar Birzeit University 1 Watch this lecture and download the slides Course
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationFactoring. Expressions and Operations Factoring Polynomials. c) factor polynomials completely in one or two variables.
Factoring Strand: Topic: Primary SOL: Related SOL: Expressions and Operations Factoring Polynomials AII.1 The student will AII.8 c) factor polynomials completely in one or two variables. Materials Finding
More informationES-2 Lecture: More Least-squares Fitting. Spring 2017
ES-2 Lecture: More Least-squares Fitting Spring 2017 Outline Quick review of least-squares line fitting (also called `linear regression ) How can we find the best-fit line? (Brute-force method is not efficient)
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationFundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015
Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationLearning From Data: Modelling as an Optimisation Problem
Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More information5.3. Polynomials and Polynomial Functions
5.3 Polynomials and Polynomial Functions Polynomial Vocabulary Term a number or a product of a number and variables raised to powers Coefficient numerical factor of a term Constant term which is only a
More informationLecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants
Advanced Machine Learning Lecture 2 Neural Networks 24..206 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI) 28..
More informationDATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering
DATA MINING LECTURE 9 Minimum Description Length Information Theory Co-Clustering MINIMUM DESCRIPTION LENGTH Occam s razor Most data mining tasks can be described as creating a model for the data E.g.,
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationCS 540: Machine Learning Lecture 1: Introduction
CS 540: Machine Learning Lecture 1: Introduction AD January 2008 AD () January 2008 1 / 41 Acknowledgments Thanks to Nando de Freitas Kevin Murphy AD () January 2008 2 / 41 Administrivia & Announcement
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning - MT Linear Regression
Machine Learning - MT 2016 2. Linear Regression Varun Kanade University of Oxford October 12, 2016 Announcements All students eligible to take the course for credit can sign-up for classes and practicals
More informationFundamentals of Machine Learning (Part I)
Fundamentals of Machine Learning (Part I) Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp April 12, 2018 Mohammad Emtiyaz Khan 2018 1 Goals Understand (some) fundamentals
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationArchdiocese of Washington Catholic Schools Academic Standards Mathematics
ALGEBRA 1 Standard 1 Operations with Real Numbers Students simplify and compare expressions. They use rational exponents, and simplify square roots. A1.1.1 A1.1.2 A1.1.3 A1.1.4 A1.1.5 Compare real number
More informationRandomness. What next?
Randomness What next? Random Walk Attribution These slides were prepared for the New Jersey Governor s School course The Math Behind the Machine taught in the summer of 2012 by Grant Schoenebeck Large
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationLecture 12. Neural Networks Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationA Level Maths summer preparation work
A Level Maths summer preparation work Welcome to A Level Maths! We hope you are looking forward to two years of challenging and rewarding learning. You must make sure that you are prepared to study A Level
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationMATH36061 Convex Optimization
M\cr NA Manchester Numerical Analysis MATH36061 Convex Optimization Martin Lotz School of Mathematics The University of Manchester Manchester, September 26, 2017 Outline General information What is optimization?
More informationChapter Six. Polynomials. Properties of Exponents Algebraic Expressions Addition, Subtraction, and Multiplication Factoring Solving by Factoring
Chapter Six Polynomials Properties of Exponents Algebraic Expressions Addition, Subtraction, and Multiplication Factoring Solving by Factoring Properties of Exponents The properties below form the basis
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationwhere a =, and k =. Example 1: Determine if the function is a power function. For those that are not, explain why not.
. Power Functions with Modeling PreCalculus. POWER FUNCTIONS WITH MODELING Learning Targets: 1. Identify a power functions.. Model power functions using the regression capabilities of your calculator.
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Week #1
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationMath 128A: Homework 2 Solutions
Math 128A: Homework 2 Solutions Due: June 28 1. In problems where high precision is not needed, the IEEE standard provides a specification for single precision numbers, which occupy 32 bits of storage.
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending
More informationCS168: The Modern Algorithmic Toolbox Lecture #6: Regularization
CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models
More informationSolving Quadratic & Higher Degree Equations
Chapter 9 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationDay 4: Classification, support vector machines
Day 4: Classification, support vector machines Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 21 June 2018 Topics so far
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationMath 251 Midterm II Information Spring 2018
Math 251 Midterm II Information Spring 2018 WHEN: Thursday, April 12 (in class). You will have the entire period (125 minutes) to work on the exam. RULES: No books or notes. You may bring a non-graphing
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationCITS 4402 Computer Vision
CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationParabolas and lines
Parabolas and lines Study the diagram at the right. I have drawn the graph y = x. The vertical line x = 1 is drawn and a number of secants to the parabola are drawn, all centred at x=1. By this I mean
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More information