ECE 592 Topics in Data Science

Size: px
Start display at page:

Download "ECE 592 Topics in Data Science"

Transcription

1 ECE 592 Topics in Data Science Dror Baron Associate Professor Dept. of Electrical and Computer Engr. North Carolina State University, NC, USA

2 Two Classification Examples [Hastie et al., Chapter ] Keywords: classification, least squares, linear models, nearest neighbors

3 Classification problem Simple binary classification problem (code online) Training samples in 2-dimensional space (200 samples) Each sample labeled blue/red (book uses green/red) Not (yet!) revealing how samples were generated Goal for test data, will predict class (blue/red)

4 Is goal realistic? Structure of classes not clear Some red/blue samples almost overlap Want two conflicting goals: Stable classification method doesn t jump around lots Low-bias method usually accurate

5 Nearest neighbors approach For each test point, find K nearest neighbors among training samples Classes (blue/red) of K nearest neighbors are known Classify using majority vote How big should K be? 5

6 Nearest neighbors results (K=50) Begin with large K=50 (recall 200 training samples) Each test sample classified using lots of neighbors Very stable (large areas of red and blue in classifier output) Also very noisy (fails to adapt to small patches of other color)

7 Nearest neighbors results (K=5) Smaller K=5 Less stable (areas of blue and red change quickly) Seems to classify more precisely Maybe we should use minimal K=1?

8 Nearest neighbor (K=1) Unstable (often have singleton samples) Classifies training data well but maybe prone to noise? Moderate K offers best stability/bias trade-off

9 How can we choose K automatically? Optimal value of K depends on: More randomness greater K for confident majority vote High dimensional data affects neighborliness Known as curse of dimensionality Large N lots of computation to find nearest neighbors Can select good K w/validation approach Partition into training/test data Validate quality on test data after training w/different K 9

10 Linear model pp Consider model for data YY = ββ 0 + jj=1 # dimensions p=2 X j coordinate along dimension j XX jj ββ jj Estimated linear coefficient ββ jj First coefficient ββ 0 corresponds to intercept or bias Why is linear model commonly used? Relatively simple (tractable) Often works well Many systems well-approximated as linear lots of machinery developed for this 10

11 How do we compute coefficients? Let s simplify notation: YY =X T ββ Interpret as linear function, f X = XX TT ββ Least squares method [Gauss, 1795] NN Minimize sum of square errors Error(ββ)= nn=1 yy nn xx TT nn ββ 2 Express in vector form Error(ββ)= yy XXββ TT yy XXββ X is N*p matrix, y is N-vector of training classes Want to minimize Error(ββ) Take derivative with respect to (w.r.t.) ββ XX TT yy XXββ = 0 (zero derivative) XX TT yy = XX TT XXββ Solution (pseudo-inverse): ββ = XX TT XX 1 XX TT yy 11

12 Linear classifier results We have two classification regions red/blue Straight line separates between them Very stable; terrible bias

13 Discussion Linear model very stable & biased Nearest neighbors less stable & less biased Need moderate value of K Could interpret nearest neighbors as better Suffers from curse of dimensionality Scarcity of neighbors in high dimensions 13

14 Decision Theory [Hastie et al., Section 2.4] Keywords: decision theory

15 Decision theory More general problem minimize loss function Square error loss function NN We ve seen Error(ββ)= nn=1 yy nn xx TT nn ββ 2 Emphasizes/accentuates larger errors Bayesian case (known statistics) conditional expectation YY =E[Y X] Absolute error NN Our example: Error(ββ)= nn=1 yy nn xx TT nn ββ Smaller errors receive more weight Leads to median estimator; typically viewed as more robust Bayesian binary classification selects most probable class 15

16 Example Consider Y={2,1,1.5,2,17} Let s show for Error ββ = YY 2 2 that average ββ = 4.7 minimizes error (details in supplements) Will show for Error ββ = YY 1 that median ββ = 2 minimizes error (details in supplements) 16

17 Clustering (Related to Project 1) [Hastie et al., Section 14.3] Keywords: clustering, unsupervised learning

18 Goals Want to group data into clusters that seem related Central notion degree of similarity between different clusters Typical algorithmic approach is iterative; move points between clusters, recalculate cluster centers 18

19 K means algorithm Initialize K cluster centers Select K points among training data Iterate until convergence: Associate each training datum with nearest cluster center Recompute cluster centers as average of training data in cluster Sensitive to initialization (can get stuck in local optimum) Other clustering algos use model for cluster 19

20 More about K means Map datum x n to cluster C(n)=k to representation level r k, k=k(n)=c(n) Squared error between x n and r k dd xx nn, rr kk) = PP pp=1 (xx nnnn rrrrrr) 2 = xx nn rr kk 2 Want rr kk = min xx R PP {nn:cc nn =kk} xx nn rrrr 2 Select cluster center, rr kk = Summed square errors for mapping C Error CC = NN 2 nn=1 xx nn rr kk(nn) 1 {nn:cc nn =kk} {nn:cc nn =kk} xx nn sum over N data = KK kk=1 {nn:cc nn =kk} xx nn rr 2 kk sum over K clusters 20

21 Linear Regression [Hastie et al., Chapter 3] Keywords: linear models, regression

22 Recall linear model pp Linear model for data YY = ββ 0 + jj=1 XX jj ββ jj Linear model is tractable, often works well, and many systems approximately linear Favorable feedback loop between math/algorithms for linear models and hardware w/ linear properties 22

23 Meaning of X? pp YY = ββ 0 + jj=1 XX jj ββ jj Variables X j could be: 1. Actual inputs (example: data about patients) 2. Function of inputs (example: log, square root) 3. Polynomials of inputs Taylor series approximation 4. Codes of classes (example: red=1, green=2, blue=3) 5. Interactions between variables (example: X 3 =X 1 X 2 ) pp Function ff(xx) = ββ 0 + jj=1 underlying factors XX jj ββ jj could be nonlinear in 23

24 Classical solution Least squares method [Gauss, 1795] NN Minimize sum of square errors Error(ββ)= nn=1 yy nn xx TT nn ββ 2 Closed form solution: ββ = XX TT XX 1 XX TT yy Pseudo-inverse XX + = XX TT XX 1 XX TT Geometric interpretation Interpret inputs as matrix, X={1, X 1, X 2,, X p } Consider vector space spanned by columns of X Project Y to space, forming YY = XX + YY Residual Y =Y- YY orthogonal to space 24

25 Real world example Consider financial prediction Y n = future % change in price of stock n {1,,N} X n1 = earnings / price High is good, want high earnings/profits X n2 = book / price Book value = value of company on paper; high is good X n3 = % price change last 6 months Price momentum: stocks that went up keep going up (on average) Can have (many) more variables X Goal: predict future returns (% price change) from data 25

26 Example continued How to predict future returns? pp Form linear model for returns YY nn = ββ 0 + jj=1 XX nnnn ββ jj Compute parameters β Can use least squares Apply parameters for linear model to predict (future) returns 26

27 Multivariate and Univariate Regression [Hastie et al., Chapter 3.3] Keywords: multivariate linear regression, orthogonalization, univariate linear regression

28 Start with univariate Univariate linear model, Y=Xβ+ε Keep it simple - no intercept Will show ββ = nn=1 NN NN nn=1 xx nn yy nn (xxxx) 2 (details in supplements) To do so, use vector notation: x = vector of x n y = vector of y n Inner product <x,y>= NN nn=1 xx nn yy nn ββ = <xx,yy> <xx,xx> Can compute residual (error), r=y-x ββ 28

29 From univariate to multivariate Suppose that columns x 1,, x p orthogonal Orthogonal means <x j,x k >=0 for j k Can show ββ ii = <xxxx,yy> <xxxx,xxxx> (for orthogonal case) Challenge: real data isn t orthogonal Solution: orthogonalize it (e.g. Gram-Schmidt procedure) But orthogonalization can be numerically sensitive 29

30 Subset Selection [Hastie et al., Chapter 3.4] Keywords: feature selection, subset selection

31 Some problems with least squares Least squares often outputs entirely nonzero vector Dense (non-sparse) output corresponds to complicated linear model More convenient/insightful for human operator to interpret simpler model (humans comfortable w/5-10 variables, not 500) Slight loss of accuracy with sparse output may be preferred Least squares doesn t use possible knowledge about structure of β Example structures: sparse β, discrete valued β, non-negative β, Least squares (multiplication by pseudo-inverse) ignores these 31

32 Example where least squares fails Consider N users each with phone Most users are inactive K<<N active users transmit {-1,+1} Communication channel often modeled as linear Y=HX+Noise; X comprised of {-1,0 (inactive),+1} Least squares multiplies Y by pseudo-inverse H + H + Y contains elements not in {-1,0,+1} But we want output to be in {-1,0,+1} 32

33 Subset selection Want sparse β select subset of columns of matrix X that approximate Y well Called subset selection Requires combinatorial complexity in general Various techniques provide faster runtime May approach best solution but not find it Will study fast algorithms for this; but they require elements of X to be independent and identically distributed (i.i.d.) 33

34 Another real world example Medical setting Columns of X are measurements relating to patients Examples: height, blood pressure, heart problems in family, Y is medical outcome we re trying to predict Example: blood pressure, sugar level, future treatment costs, Want to understand relations between conditions, possibly saving $ Lots of columns (dozens, hundreds, ) challenging for physician to comprehend linear model Subset selection model based on few terms simpler for human to work with 34

35 Shrinkage [Hastie et al., Chapter 3.4.3] Keywords: LASSO, ridge regression, shrinkage

36 Shrinkage and ridge regression Subset selection throws out some columns may have large errors Shrinkage methods modify β more subtly They shrink elements of β toward zero Ridge regression penalizes large β ββ rrrrrrrrrr = argmin ββ yy XXββ 2 + λλ ββ 2 Can be implemented with modified pseudo inverse, ββ rrrrrrrrrr =(X T X+λI) -1 X T y Identity matrix I new term is l 2 norm 36

37 LASSO Least absolute shrinkage and selection operator (LASSO) ββ llllllllll = argmin ββ yy XXββ 2 s.t. ββ 1 tt Can express w/lagrange multiplier λ: argmin ββ Small t drives many coefficients to zero yy XXββ 2 +λ ββ 1 Various fast algorithms implement LASSO Gradient projection for sparse reconstruction (GPSR) [Nowak et al.] Approximate message passing (AMP) [Donoho et al.] 37

38 Example: Estimating Sparse Inputs [Sarvotham, B, & Baraniuk, Sudocodes - Fast Measurement and Reconstruction of Sparse Signals, Int. Symp. Info. Theory, 2006] Keywords: sudocodes

39 How can we estimate sparse β? LASSO can drive many coeffs to zero; reasonable? Let s revisit linear problem, Y=Xβ, and assume: Sparse β (mostly zeros; will justify later in course) Real valued β No noise Linear algebra 101: X must contain at least as many observations as unknowns Incorrect when β contains structure (e.g., sparsity)! 39

40 Example Y X β ??????

41 Example What does zero measurement imply? Hint: sparse and real-valued β Y X β ??????

42 Example Graph reduction! Y X β ? 0 0???

43 Example What do matching measurements imply? Hint: non-zeros in x are real numbers Y X β ? 0 0???

44 Example What is the last entry of x? ?

45 And more Resembles solving Sudoku puzzles BUT Sparse β No noise Binary matrix X Can expand to noisy data, for example by evaluating almostmatching measurements Details in [Ma, B, & Needell, Two-Part Reconstruction with Noisy- Sudocodes, IEEE Trans. Signal Proc., 2014] Will study more powerful algorithms using sparse signal processing 45

46 Linear Methods for Classification [Hastie et al., Chapter 4] Keywords: decision boundaries, linear discriminant analysis, logistic regression

47 Classification problem Data X takes values in class G={1,,K} Want to predict G(X) Main idea: Create affine function f k (X)= β 0 + β T X, k {1,,K} Affine linear, need not go through origin Note that β depends on k Predict GG XX = argmax kk 1,,KK ff kk (XX) Decision boundary between classes f k (X)=f l (X) 47

48 Illustration [Hastie et al. Fig. 4.1] Data from K=3 classes illustrated by colored numbers Linear boundaries Computed by linear discriminant analysis (LDA) LDA details later Quadratic boundaries Add cross terms (X 1 X 2, (X 1 ) 2, (X 2 ) 2 ) to space Run LDA 48

49 Linear regression of indicator matrix Define K indicator variables 1, GG XX = kk YY kk XX =, kk {1,, KK} 0, eeeeeeee Form K variables into vector, Y=(Y 1,,Y K ) N training instances form N K indicator matrix Train linear regression on indicator matrix YY = XXXX + YY Pseudo-inverse XX + = XX TT XX 1 XX TT First column of X will contains ones (bias) 49

50 How is this useful? Classify: GG XX = argmax kk 1,,KK ff kk (XX) f k (X) uses linear regression Interpretation: f k (X) resembles Pr(G(X)=k X) Due to E[indicator]=Probability In general, as N increases will add more basis expansions (e.g., polynomial terms) and run regression on those 50

51 Limitation Classes can be masked by others Here we have K=3 classes; but only 2 recognized In general order-(k-1) polynomial corrects this 51

52 Linear Discriminant Analysis [Hastie et al., Section 4.3] Keywords: linear discriminant analysis, quadratic discriminant analysis

53 Bayesian approach to classification Consider f k (X) being posterior densities, f k (X)=f(X k) Class k has probability π k 0 (Σ k π k =1) Bayes theorem: Pr kk XX = = = KK kk=1 KK kk=1 ππ kk ff(xx kk) KK kk`=1 ππ kk` ff(xx kk`) ππ kk ffff(xx) ππ kk` ffff (XX) Pr(XX,kk) Pr(XX,kkk) Need statistical model for posterior densities 53

54 Example (Question 3, practice midterm 2016) Two classes with pdfs f blue =0.5NN(-1,1)+0.5 NN(2,1) and f red = NN(0,1); Pr(blue)=Pr(red)=0.5 Means of Gaussian components differ Variances of all Gaussian components are 1 For blue class, 0.5 = probabilities of Gaussian components Let s derive Bayesian classification approach Will work out; detailed solution in supplement 54

55 What posterior model to use? IF we know true f k (X)=f(X k), we re done BIG IF Possible models: Gaussian densities (linear discriminant analysis) Gaussian mixtures nonlinear decision boundaries Nonparametric densities Naïve Bayes (separable pdf) 55

56 Multivariate Gaussian posterior Choose multivariate Gaussian density ff kk XX = 1 Cluster mean μμ kk (2ππ) pp Σ kk exp 1 2 XX μμ kk TT Σ kk 1 XX μμ kk Covariance matrix Σ kk Entry ij captures covariance between elements i & j Assume same covariance matrix all classes, ΣΣ kk = ΣΣ Will revisit assumption soon 56

57 Interpretation Each class has Gaussian centered around µ k Elliptical level set (contour) for Gaussian component Structure of ellipse depends on Σ 57

58 Linear discriminant analysis (LDA) Can show log Pr(GG=kk XX) Pr(GG=ll XX) = log ππ kk ππ ll 1 2 μμ kk + μμ ll TT Σ 1 μμ kk μμ ll + xxxxσ 1 μμ kk μμ ll Linear decision boundary Classification: Linear discriminant function δδ kk XX = log ππ kk + xx TT Σ 1 μμ kk 1 2 μμ kk TT Σ 1 μμ kk Choose k that maximizes δδ kk XX In practice, will learn Σ, µ k, π k Closely related to linear regression approach 58

59 Quadratic discriminant analysis (QDA) What if covariance matrices ΣΣ kk are different? Each class has different elliptical set Decision boundaries become quadratic QDA requires more parameters than LDA needs somewhat larger N to work well In general, LDA and QDA have good performance Data seldom support complicated decision rules LDA and QDA yield robust/stable boundaries 59

60 Logistic Regression [Hastie et al., Section 4.4] Keywords: logistic regression

61 Motivation Want to model posteriors linearly log Pr(GG=ll XX) Pr(GG=KK XX) = ββ lll +ββ ll TT XX, class l=0,1,, K-1 Can show: Pr G = l X = Pr G = K X = TT exp(ββ lll +ββ ll XX ) KK 1 exp(ββ mm0 +ββ TT mm XX ) 1+ mm=1 1 KK 1 exp(ββ mm0 +ββ TT mm XX ) 1+ mm=1 61

62 What s the hard part? Hard part is learning the parameters Book describes some details Left for possible final project 62

63 Basis Expansions [Hastie et al., Section ] Keywords: basis expansion, regularization, splines, Taylor approximation

64 Why linear models? So far have discussed linear models quite a bit Linear first order Taylor approximation Hard to fit more complex functions when N small or P large But real-world functions/data aren t linear 64

65 Augmented models Begin with vector of inputs (X 1,, X P ) Augment/replace inputs with transformations of X Transformation h m (X): R P R Linear basis expansion ff XX = MM mm=1 ββ mm h mm (XX) Will fit parameters β m 65

66 Example basis expansions 1. h m (X)=X m original linear model 2. h m (X)=(X j ) 2 or X j X k second (or higher) order Taylor Beware growth in number of terms (need to regularize) 3. Piecewise polynomials/splines Control complexity using variable selection or regularization 66

67 Why piecewise polynomials? Global fit fluctuates wildly Piecewise fits for local structure Enforce continuity Constraints reduce degrees of freedom 67

68 Common approach: Cubic splines Fits well within region Fluctuates even more elsewhere Also have multi dimensional splines, wavelets 68

69 Real world application Recall financial example X contains financial data Prices, firm profits, sales, Linear model for future prices may be restrictive Augment X with basis functions possibly better fit 69

70 Kernel Methods [Hastie et al., Section ] Keywords: kernel method

71 Recall nearest neighbors ff XX = Average(YYYY XXXX NNNN(XX)) N K (X) set of K nearest neighbors Yellow rectangle: height represents weights (1/K) Points leave/enter N K (X) NN estimator is bumpy 71

72 Key idea Maybe give different weights to XX ii NNNN(XX)? Assign greater confidence (weight) to nearer points Taper away for distant points ff XX 0 = nn=1 NN KK λλ XX 0,XX nn YY nn NN KK λλ (XX 0,XX nn ) nn=1 KK λλ XX 0, XX = DD XX XX 0 λλ Width parameter λλ Can choose automatically 72

73 Nearest neighbors (NN) for regression In the past we discussed nearest neighbors (NN) for classification using majority/plurality votes Can also be used for regression Perform locally-linear fit 73

74 Support Vector Machines [Hastie et al., Sections ] Keywords: separating hyperplane, support vector machines

75 From linear to nonlinear classifiers Main idea: find hyperplane that best-separates two classes of data Can seldom separate all the data Allow erroneous slack variables Select points locally (w/kernel?) Nonlinear mapping to higher dimension 75

76 Simple case How do we compute perfectly separating hyperplane? Hyperplane = {X: f(x)=x T β+β 0 =0}, β 0 R, β R P Classifier G(X)=sign{X T β+β 0 } Perfect classification Y n f(x n )>0, n Best hyperplane maximizes margin M max MMsubject to YY nn XX TT nn ββ + ββ 0 MM, nn ββ,ββ 0, ββ =1 Rewrite: min ββ subject to YY nn XX TT nn ββ + ββ 0 1, nn ββ,ββ 0 76

77 Slack variables Perfectly separating hyperplane may not exist Require YY nn XX TT nn ββ + ββ 0 MM 1 ξξ nn, nn Slack variables {ξξ nn } should be 0, have small l 1 norm Details in book 77

78 SVM + non-linear mappings Perfectly separating hyperplane may exist in highdimensional space Map R PP to high-dimensional space Kernels, splines, polynomials Find hyperplane w/best margin in transformed space 78

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

ECE 421 Introduction to Signal Processing

ECE 421 Introduction to Signal Processing ECE 421 Introduction to Signal Processing Dror Baron Assistant Professor Dept. of Electrical and Computer Engr. North Carolina State University, NC, USA Denoising and Project 4 Where does denoising appear?

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Radial Basis Function (RBF) Networks

Radial Basis Function (RBF) Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

11 More Regression; Newton s Method; ROC Curves

11 More Regression; Newton s Method; ROC Curves More Regression; Newton s Method; ROC Curves 59 11 More Regression; Newton s Method; ROC Curves LEAST-SQUARES POLYNOMIAL REGRESSION Replace each X i with feature vector e.g. (X i ) = [X 2 i1 X i1 X i2

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Lecture 10: Logistic Regression

Lecture 10: Logistic Regression BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 10: Logistic Regression Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline An

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Pattern Recognition 2018 Support Vector Machines

Pattern Recognition 2018 Support Vector Machines Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht

More information

Supervised Learning Coursework

Supervised Learning Coursework Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information