COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Size: px
Start display at page:

Download "COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection"

Transcription

1 COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof Based on slides by:, Jackie Chi Kit Cheung Class web page: Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the instructor s written permission.

2 Stacking and overfitting To see the difference between the strategies for using the crossvalidation set, I worked out an example here: 2

3 Instruction team Last lecture by me! Second half of the course taught by Ryan Lowe I ll still have office hours until the end of February For questions regarding the first half of the course try to come by before then! 3

4 <latexit sha1_base64="skdgcxz5fpmp3p7bt6nkqpimmc=">aaadlhichvjdb9mwfhuwpkb56ucbb3iwqiy6cvvjhqq8tjqykhhca1e2qs6r47itncfxbgdbzfwfhv8cf55xvko0ayjk0u5vvec4yvfm0rotimi78fweo36jzvbt3q379y9d7/8czlkpf6iquvfankdaum0enhhlot6sioe85pu5pd6v68tlvmhxik1ljosvxqra5i9j4vnl/hussiu7nzohsumdcyqxwylnilpmyigqvkzgiimvzslhf0uzbfdiiyffvatbs9vslbem1ijvih2pyoif9qtsk6obxqdycawjjknkjfqcsigvohsecaz2ni2lm3tywwqnrovjtickpxtcphwlnvm9s/xqo7vpmbuef8p8wsm7rba413qvp56zy7pu3vqv/fdtwpr5q5llqpagctjcnc85naws5gazpigxfoubjor5xifzyowj8dpqiuevsjhn2l8psguenvpipvgsovqnfdyi6lav20hsnim9pxcdpg/n3zsrbwfetwhjnqxyfruv9v7pbdjvfqcq8w1/grz9elttu8bzwuoznl/k46ezbs8lresuhj//g0otir2s3l5hflnc0vsnxdh6tgmh69hkufxgwo3rsbsg0eg6dgcglwehyadaitaabp4mnww7wlhwu7oeh4duguhw0modgi8l3vwbbjs6w</latexit> Find good non-linear expansions Outside of language data, we have mainly looked at polynomial expansions, e.g. polynomial expansion up to second order: x1 x 2 = When is this a good idea? What are the alternatives? 0 1 x 1 x 2 x 2 1 x 2 2 x 1 x 2 1 C A 4

5 Find good non-linear expansions Sometimes, we can find good features by prior knowledge about the type of problem E.g. language features Robert arm y x y = l cos(x), use (x) = cos(x) tofindl 5

6 Find good non-linear expansions Sometimes, we can find good features by prior knowledge about the type of problem Other types of knowledge: Is there a periodic aspect to the data (e.g. temperature over days or years) -> use cosine or sine with proper period Are there special points in the function? E.g. discontinuity at 0? -> at a binary function to indicate when it happens? Do we know certain factors to be independent? -> we can avoid adding cross-terms for these features 6

7 Find good non-linear expansions Sometimes, we can find good features by analyzing the data set x x 1 7

8 Find good non-linear expansions Sometimes, we can find good features by analyzing the data set x x 1 It seems distance from the origin is a good features, we could use d = q x 2 1 x2 2, or x2 1 and x 2 2 8

9 Find good non-linear expansions Sometimes, we can find good features by analyzing the data set It seems distance from the origin is a good features, we could use x d = x 1 x 2 2 q x 2 1 x2 2, or x2 1 and x x 1 2 9

10 <latexit sha1_base64="qingchd2dmeq58e4nehqwxhwgve=">aaaddnichzlnihqxemcz7dc6fu3q0utweeaqovse9sasevg4iumutldnol2zezbpzjp0rkpmq3jxqm/hsbz6cj6e72d6q3f6bqua/qfq968uoqrfmbfx/gmuntt/4eklrcvjk1evxbxvxpzjzg1pjcnkkt9ubadnfuwt8xyofaaicg47bdhz5v6/glow2t12q4vziicvmzjklehlazqxxkgnj3b1mpylncrv4reh6muf97ou7o59pkwktolkue2mwsaxs5oi2jhlw47q2oag9ioewcliiakzm2qe9vhsyjv5khb7k4jb7t8mrycxafieuxk7msnyk/1vb1hb5ohosurwfinyxlwuorctnccsaacwr4mgvlmwk6yrogm14z3gaqwnvapbqtklhetlcvc/sllu0awkuuubw7vwk8r7j8cpb5sg7je/2urjzujab8gmdewif9/jb/nn1enp0sxvutpcxevfo56z/3e4wee4zjsixfh0we37k06tv2pmxzi2ywr6vzpyc0wfiqzrsnzmx8wezklxz6c7d7rn2ul3uz30bql6bhars/qhpojiht6id6hz9gh6ev0nfrwodgo99xcgxf9/wvsdqk5</latexit> Consideration for generic expansions With enough terms, polynomial can fit any function (generic) Polynomial features and fitted function (6 features) 10 i = x i Assumes function is relatively flat around 0 Tends to to extrapolate wildly away from 0: can we do better? 10

11 <latexit sha1_base64="dk4i8u3ediuyxudqhrh2svjtiy=">aaadixichzlnihqxemcz7dc6fs3quytgimycdn0iqadh0yvhvrx3yti06xtnttikk03suw4xj1/fi1d9c0/itxwg38h0dcvorgbb0/uv0r1aeklbh1afq9l5w7fhipa3l/stxr12/mdicaq2jcymcwuosiobcermdjubbxoa1qwavalodnff8ejowqeu2wgmashlz8zhl1mzup7hc94dnhtzfhyo6i46ie/zyqzthfyqfddjyuap8vwsegqiu9flv3k5sk1riqxws1dpql2s08ny4zaafpaguasin6cnmokyrbzvzqpwkfzmlnistn8rhvfzvh6fs2qusiimpw9jnwpp8v21au/njmeevrh1urd1oxgvsfg4ubzfcahnigqvlhsdzmvtqq5mlv9cnfzwyjswtsk8kjcotyggazdqpzialpglondipsxac7hmbjktun7/ercvb2tjewucn5wl4t3rbsbpwuqon1jsh7co8k9c7jvp/c4tnjzhzvyc4i9hg9xmb91yncdudlzlxz539gxmfxqblndtgrjg/gt8bpy4fd3wfdpmyh2gugqempuk76axaqxpe0hv0ex1cn5mpyzfka/ktrzne57mf1il58quupwnj</latexit> Consideration for generic expansions Alternative: Fourier-based transform (Konidaris et al, 2011) Scale between 0 and 1, then i = cos( x i) Behaves better outside of the data! But requires range of x Easy to model periodic behavior (scale between -1 and 1) 11

12 <latexit sha1_base64="jr1sc0wbw5zkumgxrhyqrc3vuae=">aaaduxichzlnbhmxemedhis8phckytfhjribbqbiqehpaouxjakirrshfzerzexaq9d29smmn4qxovlr3dmbtjhzs4vszeyydq/z34znh1nqjgznoouw3ojzu3bu/c6e7evxf/qw/v4scjs03ohegu9umkdewsobpllkcnslmsuk6p09m3vfz4ngrdzphrrhsdctwvwm4itsgv9n4hy3hg3dinbb8itwajg68gokufom3t4adlghm3wmidemuop49g2oidjslhc5is/ncdpnepxpfa4pxrdyipmjsknlr/uszjkwghsucgzoni2vndmvlckei0pdfsanee6nqrzyudnz6//28gnwzdcxopzcwrx37wyhhterkqzsylsw27hka/ytlt5i5ljhsotluj9uf5yacwshggzpimxfbuejpqfxifz4dang0bdrqw9ifiixgqopzjn55t4atyrl1vdkjtuvaqf68fee9gnw7yoch/4zsjkckzwpgg2qcih/zf11/ih30q1vca683v9grn74bpx5ow3ox4r5ywlibd2wcl266tak79jzow4qxqv5tjvctrlcxivl0o18vkpho5it4/6xbjzlbzwgt8aaxoa5oarvwrgyaakgkvwhfxof2v/6obou0bbrsbnediwzu5vurozzq==</latexit> Consideration for generic expansions Alternative: Radial basis functions Pick representative points 1 x i, i =exp (x x) Again well-behaved outside of the data! Flexible, but requires knowing range of x and setting σ 12

13 Consideration for generic expansions Side by side comparison: each uses 6 features Polynomial Fourier RBF All three can model any function given enough bases Polynomial does relatively poorly Alternatives require knowing / normalizing range of x 13

14 Main types of machine learning problems Supervised learning Classification Regression Unsupervised learning Dimensionality reduction Reinforcement learning 14

15 Applications with lots of features Any kind of task involving images or videos - object recognition, face recognition. Lots of pixels! Classifying from gene expression data. Lots of different genes! Number of data examples: 100 Number of variables: 6000 to 60,000 Natural language processing tasks. Lots of possible words! Tasks where we constructed many features using e.g. nonlinear expansion (e.g. polynomial) 15

16 Dimensionality reduction Thousands to millions of low level features: select the most relevant one to build better, faster, and easier to understand learning machines. m m n X slide by Isabelle Guyon 16

17 Dimensionality reduction Principal Component Analysis (PCA) Combine existing features into a smaller set of features Also called (Truncated) Singular Value Decomposition, or Latent Semantic Indexing in NLP Variable Ranking Think of features as random variables Find how strong they are associated with the output prediction, remove the ones that are not highly associated, either before training or during training Representation learning techniques like word2vec 17

18 Principal Component Analysis (PCA) Idea: Project data into a lower-dimensional sub-space, R m èr m, where m <m. 18

19 Principal Component Analysis (PCA) Idea: Project data into a lower-dimensional sub-space, R m èr m, where m <m. Consider a linear mapping, x i => Wx i W is the compression matrix with dimension R m xm. Assume there is a decompression matrix U mxm. 19

20 Principal Component Analysis (PCA) Idea: Project data into a lower-dimensional sub-space, R m èr m, where m <m. Consider a linear mapping, x i => Wx i W is the compression matrix with dimension R m xm. Assume there is a decompression matrix U mxm. Solve the following problem: argmin W,U Σ i=1:n x i UWx i 2 20

21 Principal Component Analysis (PCA) Idea: Project data into a lower-dimensional sub-space, R m èr m, where m <m. Consider a linear mapping, x i => Wx i W is the compression matrix with dimension R m xm. Assume there is a decompression matrix U mxm. Solve the following problem: argmin W,U Σ i=1:n x i UWx i 2 Select the project dimension, m, using cross-validation. Typically center the examples before applying PCA (subtract the mean). 21

22 Principal Component Analysis (PCA) This problem has a closed-form solution: W are the m eigenvectors corresponding to the largest eigenvalues of the covariance matrix of X (For centered data sample covariance is simply X T X) 22

23 Principal Component Analysis (PCA) Eigenvector equation: λw= Σw 23

24 Principal Component Analysis (PCA) Σv v Σw w u Σu Eigenvector equation: λw= Σw 24

25 Principal Component Analysis (PCA) w 1 w 2 Eigenvector equation: λw= Σw 25

26 Principal Component Analysis (PCA) w 1 w 2 W specifies a new basis 26

27 Principal Component Analysis (PCA) w 1 λ 1 λ 2 w 2 W specifies a new basis 27

28 Principal Component Analysis (PCA) w 1 w 2 Now it is easy to specify most relevant dimension: Reduce the dimensionality 1d Found dimension is a combination of x1 & x2 28

29 Principal Component Analysis (PCA) w 1 Now it is easy to specify most relevant dimension: Reduce the dimensionality 1d Found dimension is a combination of x1 & x2 29

30 Principal Component Analysis (PCA) PCA Calculate full matrix W (eigenvectors of covariance) Take m most relevant components (largest eigenvalues) Covariance between variables i and j after transformation? (Xw i ) T (Xw i ) w it X T Xw j (W are eigenvalues, so λ j w j =X T Xw j ) λ j w it w j = 0 (i j), λ j (i=j) 30

31 Principal Component Analysis (PCA) PCA Calculate full matrix W (eigenvectors of covariance) Take m most relevant components (largest eigenvalues) Covariance between variables i and j after transformation? (Xw i ) T (Xw i ) w it X T Xw j (W are eigenvalues, so λ j w j =X T Xw j ) λ j w it w j = 0 (i j), λ j (i=j) 31

32 Principal Component Analysis (PCA) Coloring transform Multiply by W T Multiply by Λ 1/2 Multiply by W Whitening transform Multiply by Λ -1/2 32

33 Eigenfaces Turk & Pentland (1991) used PCA method to capture face images. Assume all faces are about the same size Represent each face image as a data vector. Each Eigen vector is an image, called an Eigenface. Eigenfaces Average image 33

34 Training images Original images in R 50x50. Projection to R 10 and reconstruction 34

35 Training images Original images in R 50x50. Projection to R 10 and reconstruction o o o o o Projection to R 2. Different marks Indicate different individuals. x x x x * xx * * * 35

36 Feature selection methods The goal: Find the input representation that produces the best generalization error. Two classes of approaches: Wrapper & Filter methods: Feature selection is applied as a preprocessing step. Embedded methods: Feature selection is integrated in the learning (optimization) method, e.g. Regularization 36

37 Variable Ranking Idea: Rank features by a scoring function defined for individual features, independently of the context of others. Choose the m highest ranked features. Pros / cons: 37

38 Variable Ranking Idea: Rank features by a scoring function defined for individual features, independently of the context of others. Choose the m highest ranked features. Pros / cons: Need to select a scoring function. Must select subset size (m ): cross-validation Simple and fast just need to compute a scoring function m times and sort m scores. 38

39 Scoring function: Correlation Criteria R( j) = (x ij x i=1:n j )(y i y) (x ij x i=1:n j ) 2 (y i y) 2 i=1:n slide by Michel Verleysen 39

40 Scoring function: Mutual information Think of X j and Y as random variables. Mutual information between variable X j and target Y: I( j) = p(x j, y)log p(x j, y) dx dy X j Y p(x j )p(y) Empirical estimate from data (assume discretized variables): I( j) = P(X j = x j,y = y)log p(x j = x j,y = y) X j Y p(x j = x j )p(y = y) 40

41 Nonlinear dependencies with MI Mutual information identifies nonlinear relationships between variables. Example: x uniformly distributed over [-1 1] y = x 2 noise z uniformly distributed over [-1 1] z and x are independent slide by Michel Verleysen 41

42 Variable Ranking Idea: Rank features by a scoring function defined for individual features, independently of the context of others. Choose the m highest ranked features. Pros / cons? 42

43 Variable Ranking Idea: Rank features by a scoring function defined for individual features, independently of the context of others. Choose the m highest ranked features. Pros / cons? Need select a scoring function. Must select subset size (m ): cross-validation Simple and fast just need to compute a scoring function m times and sort m scores. Scoring function is defined for individual features (not subsets). 43

44 Best-Subset selection Idea: Consider all possible subsets of the features, measure performance on a validation set, and keep the subset with the best performance. Pros / cons? We get the best model! Very expensive to compute, since there is a combinatorial number of subsets. 44

45 Search space of subsets Kohavi-John, 1997 n features, 2 n 1 possible feature subsets! slide by Isabelle Guyon 45

46 Subset selection in practice Formulate as a search problem, where the state is the feature set that is used, and search operators involve adding or removing feature set Constructive methods like forward/backward search Local search methods, genetic algorithms Use domain knowledge to help you group features together, to reduce size of search space e.g., In NLP, group syntactic features together, semantic features, etc. 46

47 Steps to solving a supervised learning problem 1. Decide what the input-output pairs are. 2. Decide how to encode inputs and outputs. This defines the input space X and output space Y. 3. Choose a class of hypotheses / representations H. Evaluate on validation set Evaluate on test set E.g. linear functions. 4. Choose an error function (cost function) to define best hypothesis. E.g. Least-mean squares. 5. Choose an algorithm for searching through space of hypotheses. If doing k-fold cross-validation, re-do feature selection for each fold. 47

48 Regularization Idea: Modify the objective function to constrain the model choice. Typically adding term ( j=1:m w jp ) 1/p. Linear regression -> Ridge regression, Lasso Challenge: Need to adapt the optimization procedure (e.g. handle non-convex objective). This approach is often used for very large natural (nonconstructed) feature sets, e.g. images, speech, text, video. With lasso (p=1), some weights tend to become 0, so can interpret as feature selection 48

49 Final comments Classic paper on this: I. Guyon, A. Elisseeff, An introduction to Variable and Feature Selection. Journal of Machine Learning Research 3 (2003) pp (and references therein.) More recently, move towards learning the features end-to-end, using neural network architecture (more on this next week.) 49

50 What you should know Properties of some families of generic features Basic idea and algorithm of principal component analysis Mutual information and correlation Feature selection approaches: Wrap&Filter, Embedded 50

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear regression COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

PCA FACE RECOGNITION

PCA FACE RECOGNITION PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Eigenface-based facial recognition

Eigenface-based facial recognition Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression

More information

Eigenfaces. Face Recognition Using Principal Components Analysis

Eigenfaces. Face Recognition Using Principal Components Analysis Eigenfaces Face Recognition Using Principal Components Analysis M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. Slides : George Bebis, UNR

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Face Detection and Recognition

Face Detection and Recognition Face Detection and Recognition Face Recognition Problem Reading: Chapter 18.10 and, optionally, Face Recognition using Eigenfaces by M. Turk and A. Pentland Queryimage face query database Face Verification

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Principal Components Analysis

Principal Components Analysis Principal Components Analysis Santiago Paternain, Aryan Mokhtari and Alejandro Ribeiro March 29, 2018 At this point we have already seen how the Discrete Fourier Transform and the Discrete Cosine Transform

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Lecture VIII Dim. Reduction (I)

Lecture VIII Dim. Reduction (I) Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System processes System Overview Previous Systems:

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

November 28 th, Carlos Guestrin 1. Lower dimensional projections

November 28 th, Carlos Guestrin 1. Lower dimensional projections PCA Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 1 Lower dimensional projections Rather than picking a subset of the features, we can new features that are

More information

Multi-Label Informed Latent Semantic Indexing

Multi-Label Informed Latent Semantic Indexing Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August 2005 1 Siemens Corporate Technology Department of Neural Computation 2 University of Munich

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Machine Learning for Software Engineering

Machine Learning for Software Engineering Machine Learning for Software Engineering Dimensionality Reduction Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Exam Info Scheduled for Tuesday 25 th of July 11-13h (same time as the

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Spectral Regularization

Spectral Regularization Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Machine Learning 11. week

Machine Learning 11. week Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Regularization via Spectral Filtering

Regularization via Spectral Filtering Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Variable Selection in Data Mining Project

Variable Selection in Data Mining Project Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal

More information