COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
|
|
- Ami King
- 5 years ago
- Views:
Transcription
1 COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof Based on slides by:, Jackie Chi Kit Cheung Class web page: Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the instructor s written permission.
2 Stacking and overfitting To see the difference between the strategies for using the crossvalidation set, I worked out an example here: 2
3 Instruction team Last lecture by me! Second half of the course taught by Ryan Lowe I ll still have office hours until the end of February For questions regarding the first half of the course try to come by before then! 3
4 <latexit sha1_base64="skdgcxz5fpmp3p7bt6nkqpimmc=">aaadlhichvjdb9mwfhuwpkb56ucbb3iwqiy6cvvjhqq8tjqykhhca1e2qs6r47itncfxbgdbzfwfhv8cf55xvko0ayjk0u5vvec4yvfm0rotimi78fweo36jzvbt3q379y9d7/8czlkpf6iquvfankdaum0enhhlot6sioe85pu5pd6v68tlvmhxik1ljosvxqra5i9j4vnl/hussiu7nzohsumdcyqxwylnilpmyigqvkzgiimvzslhf0uzbfdiiyffvatbs9vslbem1ijvih2pyoif9qtsk6obxqdycawjjknkjfqcsigvohsecaz2ni2lm3tywwqnrovjtickpxtcphwlnvm9s/xqo7vpmbuef8p8wsm7rba413qvp56zy7pu3vqv/fdtwpr5q5llqpagctjcnc85naws5gazpigxfoubjor5xifzyowj8dpqiuevsjhn2l8psguenvpipvgsovqnfdyi6lav20hsnim9pxcdpg/n3zsrbwfetwhjnqxyfruv9v7pbdjvfqcq8w1/grz9elttu8bzwuoznl/k46ezbs8lresuhj//g0otir2s3l5hflnc0vsnxdh6tgmh69hkufxgwo3rsbsg0eg6dgcglwehyadaitaabp4mnww7wlhwu7oeh4duguhw0modgi8l3vwbbjs6w</latexit> Find good non-linear expansions Outside of language data, we have mainly looked at polynomial expansions, e.g. polynomial expansion up to second order: x1 x 2 = When is this a good idea? What are the alternatives? 0 1 x 1 x 2 x 2 1 x 2 2 x 1 x 2 1 C A 4
5 Find good non-linear expansions Sometimes, we can find good features by prior knowledge about the type of problem E.g. language features Robert arm y x y = l cos(x), use (x) = cos(x) tofindl 5
6 Find good non-linear expansions Sometimes, we can find good features by prior knowledge about the type of problem Other types of knowledge: Is there a periodic aspect to the data (e.g. temperature over days or years) -> use cosine or sine with proper period Are there special points in the function? E.g. discontinuity at 0? -> at a binary function to indicate when it happens? Do we know certain factors to be independent? -> we can avoid adding cross-terms for these features 6
7 Find good non-linear expansions Sometimes, we can find good features by analyzing the data set x x 1 7
8 Find good non-linear expansions Sometimes, we can find good features by analyzing the data set x x 1 It seems distance from the origin is a good features, we could use d = q x 2 1 x2 2, or x2 1 and x 2 2 8
9 Find good non-linear expansions Sometimes, we can find good features by analyzing the data set It seems distance from the origin is a good features, we could use x d = x 1 x 2 2 q x 2 1 x2 2, or x2 1 and x x 1 2 9
10 <latexit sha1_base64="qingchd2dmeq58e4nehqwxhwgve=">aaaddnichzlnihqxemcz7dc6fu3q0utweeaqovse9sasevg4iumutldnol2zezbpzjp0rkpmq3jxqm/hsbz6cj6e72d6q3f6bqua/qfq968uoqrfmbfx/gmuntt/4eklrcvjk1evxbxvxpzjzg1pjcnkkt9ubadnfuwt8xyofaaicg47bdhz5v6/glow2t12q4vziicvmzjklehlazqxxkgnj3b1mpylncrv4reh6muf97ou7o59pkwktolkue2mwsaxs5oi2jhlw47q2oag9ioewcliiakzm2qe9vhsyjv5khb7k4jb7t8mrycxafieuxk7msnyk/1vb1hb5ohosurwfinyxlwuorctnccsaacwr4mgvlmwk6yrogm14z3gaqwnvapbqtklhetlcvc/sllu0awkuuubw7vwk8r7j8cpb5sg7je/2urjzujab8gmdewif9/jb/nn1enp0sxvutpcxevfo56z/3e4wee4zjsixfh0we37k06tv2pmxzi2ywr6vzpyc0wfiqzrsnzmx8wezklxz6c7d7rn2ul3uz30bql6bhars/qhpojiht6id6hz9gh6ev0nfrwodgo99xcgxf9/wvsdqk5</latexit> Consideration for generic expansions With enough terms, polynomial can fit any function (generic) Polynomial features and fitted function (6 features) 10 i = x i Assumes function is relatively flat around 0 Tends to to extrapolate wildly away from 0: can we do better? 10
11 <latexit sha1_base64="dk4i8u3ediuyxudqhrh2svjtiy=">aaadixichzlnihqxemcz7dc6fs3quytgimycdn0iqadh0yvhvrx3yti06xtnttikk03suw4xj1/fi1d9c0/itxwg38h0dcvorgbb0/uv0r1aeklbh1afq9l5w7fhipa3l/stxr12/mdicaq2jcymcwuosiobcermdjubbxoa1qwavalodnff8ejowqeu2wgmashlz8zhl1mzup7hc94dnhtzfhyo6i46ie/zyqzthfyqfddjyuap8vwsegqiu9flv3k5sk1riqxws1dpql2s08ny4zaafpaguasin6cnmokyrbzvzqpwkfzmlnistn8rhvfzvh6fs2qusiimpw9jnwpp8v21au/njmeevrh1urd1oxgvsfg4ubzfcahnigqvlhsdzmvtqq5mlv9cnfzwyjswtsk8kjcotyggazdqpzialpglondipsxac7hmbjktun7/ercvb2tjewucn5wl4t3rbsbpwuqon1jsh7co8k9c7jvp/c4tnjzhzvyc4i9hg9xmb91yncdudlzlxz539gxmfxqblndtgrjg/gt8bpy4fd3wfdpmyh2gugqempuk76axaqxpe0hv0ex1cn5mpyzfka/ktrzne57mf1il58quupwnj</latexit> Consideration for generic expansions Alternative: Fourier-based transform (Konidaris et al, 2011) Scale between 0 and 1, then i = cos( x i) Behaves better outside of the data! But requires range of x Easy to model periodic behavior (scale between -1 and 1) 11
12 <latexit sha1_base64="jr1sc0wbw5zkumgxrhyqrc3vuae=">aaaduxichzlnbhmxemedhis8phckytfhjribbqbiqehpaouxjakirrshfzerzexaq9d29smmn4qxovlr3dmbtjhzs4vszeyydq/z34znh1nqjgznoouw3ojzu3bu/c6e7evxf/qw/v4scjs03ohegu9umkdewsobpllkcnslmsuk6p09m3vfz4ngrdzphrrhsdctwvwm4itsgv9n4hy3hg3dinbb8itwajg68gokufom3t4adlghm3wmidemuop49g2oidjslhc5is/ncdpnepxpfa4pxrdyipmjsknlr/uszjkwghsucgzoni2vndmvlckei0pdfsanee6nqrzyudnz6//28gnwzdcxopzcwrx37wyhhterkqzsylsw27hka/ytlt5i5ljhsotluj9uf5yacwshggzpimxfbuejpqfxifz4dang0bdrqw9ifiixgqopzjn55t4atyrl1vdkjtuvaqf68fee9gnw7yoch/4zsjkckzwpgg2qcih/zf11/ih30q1vca683v9grn74bpx5ow3ox4r5ywlibd2wcl266tak79jzow4qxqv5tjvctrlcxivl0o18vkpho5it4/6xbjzlbzwgt8aaxoa5oarvwrgyaakgkvwhfxof2v/6obou0bbrsbnediwzu5vurozzq==</latexit> Consideration for generic expansions Alternative: Radial basis functions Pick representative points 1 x i, i =exp (x x) Again well-behaved outside of the data! Flexible, but requires knowing range of x and setting σ 12
13 Consideration for generic expansions Side by side comparison: each uses 6 features Polynomial Fourier RBF All three can model any function given enough bases Polynomial does relatively poorly Alternatives require knowing / normalizing range of x 13
14 Main types of machine learning problems Supervised learning Classification Regression Unsupervised learning Dimensionality reduction Reinforcement learning 14
15 Applications with lots of features Any kind of task involving images or videos - object recognition, face recognition. Lots of pixels! Classifying from gene expression data. Lots of different genes! Number of data examples: 100 Number of variables: 6000 to 60,000 Natural language processing tasks. Lots of possible words! Tasks where we constructed many features using e.g. nonlinear expansion (e.g. polynomial) 15
16 Dimensionality reduction Thousands to millions of low level features: select the most relevant one to build better, faster, and easier to understand learning machines. m m n X slide by Isabelle Guyon 16
17 Dimensionality reduction Principal Component Analysis (PCA) Combine existing features into a smaller set of features Also called (Truncated) Singular Value Decomposition, or Latent Semantic Indexing in NLP Variable Ranking Think of features as random variables Find how strong they are associated with the output prediction, remove the ones that are not highly associated, either before training or during training Representation learning techniques like word2vec 17
18 Principal Component Analysis (PCA) Idea: Project data into a lower-dimensional sub-space, R m èr m, where m <m. 18
19 Principal Component Analysis (PCA) Idea: Project data into a lower-dimensional sub-space, R m èr m, where m <m. Consider a linear mapping, x i => Wx i W is the compression matrix with dimension R m xm. Assume there is a decompression matrix U mxm. 19
20 Principal Component Analysis (PCA) Idea: Project data into a lower-dimensional sub-space, R m èr m, where m <m. Consider a linear mapping, x i => Wx i W is the compression matrix with dimension R m xm. Assume there is a decompression matrix U mxm. Solve the following problem: argmin W,U Σ i=1:n x i UWx i 2 20
21 Principal Component Analysis (PCA) Idea: Project data into a lower-dimensional sub-space, R m èr m, where m <m. Consider a linear mapping, x i => Wx i W is the compression matrix with dimension R m xm. Assume there is a decompression matrix U mxm. Solve the following problem: argmin W,U Σ i=1:n x i UWx i 2 Select the project dimension, m, using cross-validation. Typically center the examples before applying PCA (subtract the mean). 21
22 Principal Component Analysis (PCA) This problem has a closed-form solution: W are the m eigenvectors corresponding to the largest eigenvalues of the covariance matrix of X (For centered data sample covariance is simply X T X) 22
23 Principal Component Analysis (PCA) Eigenvector equation: λw= Σw 23
24 Principal Component Analysis (PCA) Σv v Σw w u Σu Eigenvector equation: λw= Σw 24
25 Principal Component Analysis (PCA) w 1 w 2 Eigenvector equation: λw= Σw 25
26 Principal Component Analysis (PCA) w 1 w 2 W specifies a new basis 26
27 Principal Component Analysis (PCA) w 1 λ 1 λ 2 w 2 W specifies a new basis 27
28 Principal Component Analysis (PCA) w 1 w 2 Now it is easy to specify most relevant dimension: Reduce the dimensionality 1d Found dimension is a combination of x1 & x2 28
29 Principal Component Analysis (PCA) w 1 Now it is easy to specify most relevant dimension: Reduce the dimensionality 1d Found dimension is a combination of x1 & x2 29
30 Principal Component Analysis (PCA) PCA Calculate full matrix W (eigenvectors of covariance) Take m most relevant components (largest eigenvalues) Covariance between variables i and j after transformation? (Xw i ) T (Xw i ) w it X T Xw j (W are eigenvalues, so λ j w j =X T Xw j ) λ j w it w j = 0 (i j), λ j (i=j) 30
31 Principal Component Analysis (PCA) PCA Calculate full matrix W (eigenvectors of covariance) Take m most relevant components (largest eigenvalues) Covariance between variables i and j after transformation? (Xw i ) T (Xw i ) w it X T Xw j (W are eigenvalues, so λ j w j =X T Xw j ) λ j w it w j = 0 (i j), λ j (i=j) 31
32 Principal Component Analysis (PCA) Coloring transform Multiply by W T Multiply by Λ 1/2 Multiply by W Whitening transform Multiply by Λ -1/2 32
33 Eigenfaces Turk & Pentland (1991) used PCA method to capture face images. Assume all faces are about the same size Represent each face image as a data vector. Each Eigen vector is an image, called an Eigenface. Eigenfaces Average image 33
34 Training images Original images in R 50x50. Projection to R 10 and reconstruction 34
35 Training images Original images in R 50x50. Projection to R 10 and reconstruction o o o o o Projection to R 2. Different marks Indicate different individuals. x x x x * xx * * * 35
36 Feature selection methods The goal: Find the input representation that produces the best generalization error. Two classes of approaches: Wrapper & Filter methods: Feature selection is applied as a preprocessing step. Embedded methods: Feature selection is integrated in the learning (optimization) method, e.g. Regularization 36
37 Variable Ranking Idea: Rank features by a scoring function defined for individual features, independently of the context of others. Choose the m highest ranked features. Pros / cons: 37
38 Variable Ranking Idea: Rank features by a scoring function defined for individual features, independently of the context of others. Choose the m highest ranked features. Pros / cons: Need to select a scoring function. Must select subset size (m ): cross-validation Simple and fast just need to compute a scoring function m times and sort m scores. 38
39 Scoring function: Correlation Criteria R( j) = (x ij x i=1:n j )(y i y) (x ij x i=1:n j ) 2 (y i y) 2 i=1:n slide by Michel Verleysen 39
40 Scoring function: Mutual information Think of X j and Y as random variables. Mutual information between variable X j and target Y: I( j) = p(x j, y)log p(x j, y) dx dy X j Y p(x j )p(y) Empirical estimate from data (assume discretized variables): I( j) = P(X j = x j,y = y)log p(x j = x j,y = y) X j Y p(x j = x j )p(y = y) 40
41 Nonlinear dependencies with MI Mutual information identifies nonlinear relationships between variables. Example: x uniformly distributed over [-1 1] y = x 2 noise z uniformly distributed over [-1 1] z and x are independent slide by Michel Verleysen 41
42 Variable Ranking Idea: Rank features by a scoring function defined for individual features, independently of the context of others. Choose the m highest ranked features. Pros / cons? 42
43 Variable Ranking Idea: Rank features by a scoring function defined for individual features, independently of the context of others. Choose the m highest ranked features. Pros / cons? Need select a scoring function. Must select subset size (m ): cross-validation Simple and fast just need to compute a scoring function m times and sort m scores. Scoring function is defined for individual features (not subsets). 43
44 Best-Subset selection Idea: Consider all possible subsets of the features, measure performance on a validation set, and keep the subset with the best performance. Pros / cons? We get the best model! Very expensive to compute, since there is a combinatorial number of subsets. 44
45 Search space of subsets Kohavi-John, 1997 n features, 2 n 1 possible feature subsets! slide by Isabelle Guyon 45
46 Subset selection in practice Formulate as a search problem, where the state is the feature set that is used, and search operators involve adding or removing feature set Constructive methods like forward/backward search Local search methods, genetic algorithms Use domain knowledge to help you group features together, to reduce size of search space e.g., In NLP, group syntactic features together, semantic features, etc. 46
47 Steps to solving a supervised learning problem 1. Decide what the input-output pairs are. 2. Decide how to encode inputs and outputs. This defines the input space X and output space Y. 3. Choose a class of hypotheses / representations H. Evaluate on validation set Evaluate on test set E.g. linear functions. 4. Choose an error function (cost function) to define best hypothesis. E.g. Least-mean squares. 5. Choose an algorithm for searching through space of hypotheses. If doing k-fold cross-validation, re-do feature selection for each fold. 47
48 Regularization Idea: Modify the objective function to constrain the model choice. Typically adding term ( j=1:m w jp ) 1/p. Linear regression -> Ridge regression, Lasso Challenge: Need to adapt the optimization procedure (e.g. handle non-convex objective). This approach is often used for very large natural (nonconstructed) feature sets, e.g. images, speech, text, video. With lasso (p=1), some weights tend to become 0, so can interpret as feature selection 48
49 Final comments Classic paper on this: I. Guyon, A. Elisseeff, An introduction to Variable and Feature Selection. Journal of Machine Learning Research 3 (2003) pp (and references therein.) More recently, move towards learning the features end-to-end, using neural network architecture (more on this next week.) 49
50 What you should know Properties of some families of generic features Basic idea and algorithm of principal component analysis Mutual information and correlation Feature selection approaches: Wrap&Filter, Embedded 50
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationCOMP 551 Applied Machine Learning Lecture 2: Linear Regression
COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise
More informationCOMP 551 Applied Machine Learning Lecture 2: Linear regression
COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this
More informationDimensionality reduction
Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More information20 Unsupervised Learning and Principal Components Analysis (PCA)
116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationMachine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha
More informationCITS 4402 Computer Vision
CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images
More informationPCA FACE RECOGNITION
PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationEigenface-based facial recognition
Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationCS 340 Lec. 6: Linear Dimensionality Reduction
CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression
More informationEigenfaces. Face Recognition Using Principal Components Analysis
Eigenfaces Face Recognition Using Principal Components Analysis M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. Slides : George Bebis, UNR
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationCSC 411 Lecture 12: Principal Component Analysis
CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationPrincipal Component Analysis -- PCA (also called Karhunen-Loeve transformation)
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationFace Detection and Recognition
Face Detection and Recognition Face Recognition Problem Reading: Chapter 18.10 and, optionally, Face Recognition using Eigenfaces by M. Turk and A. Pentland Queryimage face query database Face Verification
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationPrincipal Components Analysis
Principal Components Analysis Santiago Paternain, Aryan Mokhtari and Alejandro Ribeiro March 29, 2018 At this point we have already seen how the Discrete Fourier Transform and the Discrete Cosine Transform
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationLecture VIII Dim. Reduction (I)
Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More informationPattern Recognition 2
Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error
More informationExample: Face Detection
Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationModeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview
4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System processes System Overview Previous Systems:
More informationFace detection and recognition. Detection Recognition Sally
Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification
More informationNovember 28 th, Carlos Guestrin 1. Lower dimensional projections
PCA Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 1 Lower dimensional projections Rather than picking a subset of the features, we can new features that are
More informationMulti-Label Informed Latent Semantic Indexing
Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August 2005 1 Siemens Corporate Technology Department of Neural Computation 2 University of Munich
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationMachine Learning for Software Engineering
Machine Learning for Software Engineering Dimensionality Reduction Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Exam Info Scheduled for Tuesday 25 th of July 11-13h (same time as the
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationSpectral Regularization
Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More informationRegularization via Spectral Filtering
Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationVariable Selection in Data Mining Project
Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal
More information