STATS 306B: Unsupervised Learning Spring Lecture 8 April 23
|
|
- Marybeth Walton
- 5 years ago
- Views:
Transcription
1 STATS 306B: Usupervised Learig Sprig 2014 Lecture 8 April 23 Lecturer: Lester Mackey Scribe: Kexi Nie, Na Bi 8.1 Pricipal Compoet Aalysis Last time we itroduced the mathematical framework uderlyig Pricipal Compoet Aalysis (PCA); ext we will cosider some of its applicatios. Please refer to the accompayig slides Examples Example 1. Digit data (Slide 2:) Here is a example take from the textbook. This set of had writte digital images cotais 130 threes, ad each three is a greyscale image. Hece we may represet each datapoit as a vector of 256 greyscale pixels. (Slide 3:) The figure o the left shows the first two pricipal compoets of these images. The rectagular grid is computed by selected quatiles of the two pricipal compoets. Based o the projected coordiates o the two directios, the circled poits refer to the images that are closest to these vertices of the grid. The figure o the right displays the threes correspodig to the circled poits. The vertical compoet appears to capture chages i lie thickess / darkess, while the horizotal compoet appears to capture chages i the legth of the bottom of the three. (Slide 4:) This is a visual represetatio of the leared two-compoet PCA model. The first term is the mea of all images, ad the followig v 1 ad v 2 are two visualized pricipal directios (the loadigs), which ca also be called eige threes. Example 2. Eige-faces (Slide 5:) PCA is widely used i face recogitio. Suppose X d is the pixel-image matrix, where each colum is a face image. d is the umber of pixels ad x ji is the itesity of j-th pixel i image i. The loadigs retured by PCA are liear combiatios of faces, which ca be called eige-faces. The workig assumptio is that the PC scores z i, gotte by projectig the origial image oto the eige-face space, represet a more meaigful ad compact represetatio of the i-th face tha the raw pixel represetatio. The z i ca be used i place of x i for earest-eighbor classificatio. Sice the dimesio of face-space has decreased from d to k, the computatioal complexity becomes O(dk + k) istead of O(d). This is of great efficiecy whe, d k. Example 3. Latet sematic aalysis (Slide 6:) Aother applicatio of PCA is i text aalysis. Let d to be the total umber of words i the vocabulary; the each documet x i R d is a vector of word couts, ad x ji is the frequecy of word j i documet i. After we apply PCA here, the similarity betwee two documets is ow z T i z j, which is ofte more iformative tha the raw measure x T i x j. Notice that there may ot be sigificat computatioal savigs, sice the origial word-documet matrix was sparse, while the reduced represetatio is typically dese. 8-1
2 Example 4. Aomaly detectio (Slide 7:) PCA ca be used i etwork aomaly detectio. I the time-lik matrix X, x ji represets the amout of traffic o lik j i the etwork durig time-iterval i. I the two pictures o the left, traffic appears periodic ad reasoably determiistic o the selected pricipal compoet, which asserts that these two are ormal behaviors. I the cotrast, traffic spikes i the pictures o the right, which idicates aomalous behavior i this flow. Example 5. Part-of-speech taggig (Slide 8:) Usupervised part-of-speech taggig is a commo task i atural laguage processig, as maually taggig a large corpus is expesive ad time-cosumig. Here it is commo to model each word i a vocabulary by its cotext distributio, i.e., x ji is the umber of times that word i appears i cotext j. The key idea of usupervised POS taggig is that words appearig i similar cotexts ted to have same POS tags. Hece, a typical taggig techique is to cluster words accordig to their cotexts. However, i ay give corpus, ay give cotext may occur rather ifrequetly (the vectors x i are too sparse), so PCA has bee used to fid a more suitable, comparable represetatio for each word before clusterig is applied. Example 6. Multi-task learig (Slide 9:) I multi-task learig, oe is attemptig to solve related learig tasks simultaeously, e.g., classifyig documets as relevat or ot for users. Ofte task i reduces to learig a weight vector x i which produces for example the classificatio rule. Our goal is to exploit the similarities amogst these tasks to do more effective learig overall. Oe way to accomplish this is to use PCA is to idetify a small set of eige-classifiers amog the leared rules x 1,..., x. The, the classifiers ca be retraied with a added regularizatio term ecouragig each x i to lie ear the subspace spaed by the pricipal directios. These two steps of PCA ad retraiig are iterated util covergece. I this way, low-dimesioal represetatio of classifiers ca help to detect the shared structures betwee idepedet tasks Choosig a umber of compoets (Slide 10:) As i the clusterig settig, we face a model selectio questio: how do we choose the umber of pricipal compoets? While there is o agreed-upo solutio to this problem, here are some guidelies. The umber of pricipal compoets might be costraied by the problem goal, your computatioal or storage resources, or by the miimum fractio of variace to be explaied. For example, it is commo to choose 3 or fewer pricipal compoets whe doig visualizatio problems. Recall that eigevalue magitudes determie the explaied variace. I the accompayig figure, the first 5 pricipal compoets already explai early all of the variace, so a small umber of pricipal compoets may be sufficiet (although oe must use care i drawig this coclusio, sice small differeces i recostructio error may still be sematically sigificat; cosider face recogitio for example). Furthermore, we may look for elbow criterio or compare explaied variace with that obtaied uder a referece distributio. 8-2
3 8.1.3 PCA limitatios ad extesios While PCA has a great umber of applicatios, it has its limitatios as well: Squared Euclidea recostructio error is ot appropriate for all data types. Various extesios, such as expoetial family PCA, have bee developed for biary, categorical, cout, ad oegative data. PCA ca oly fid liear compressios of data. Kerel PCA is a importat geeralizatio desiged for o-liear dimesioality reductio. 8.2 No-liear dimesioality reductio with kerel PCA Ituitio Figure 8.1. Data lyig ear a liear subspace Figure 8.2. Data lyig ear a parabola Figure 8.1 displays a 2D example i which PCA is effective because data lie ear a liear subspace. However, i Figure 8.2 PCA is ieffective, because data the data lie ear a parabola. I this case, the PCA compressio of the data might project all poits oto the orage lie, which is far from ideal. Let us cosider the differeces betwee these two settigs mathematically. Liear subspace (Figure 8.1): I this example we have ambiet dimesio p = 2 ad compoet dimesio k = 1. Sice the blue lie is a k-dimesioal liear subspace of R p, we kow that there is some matrix U R p k such that the subspace S takes the form S = {x R p : x = Uz, z R k } where U = [ u1 u 2 = {(x 1, x 2 ) : x 1 = u 1 z, x 2 = u 2 z} = {(x 1, x 2 ) : x 2 = u 2 x 1 }, u 1 ], sice (p, k) = (2, 1) i our example. 8-3
4 Parabola (Figure 8.2): I this example we agai have ambiet dimesio p = 2 ad compoet dimesio k = 1. Moreover, there is some fixed matrix U R p k such that the uderlyig blue parabola takes the form S = {(x 1, x 2 ) : x 2 = u 2 u 1 x 2 1} which is similar to the represetatio derived i the liear model. Ideed, if we itroduce a auxiliary variable z, we get, S = {(x 1, x 2 ) : x 2 1 = u 1 z, x 2 2 = u 2 z, for z R} = {x R p : φ(x) = Uz, z R k, } [ ] x 2 where φ(x) = 1 is a o-liear fuctio of x. I this fial represetatio, U is still a x 2 liear mappig of the latet compoets z, but the represetatio beig recostructed liearly is o loger x itself but rather a potetially o-liear mappig φ of x Take-away We should be able to capture o-liear dimesioality reductio i x space by performig liear dimesioality reductio i φ(x) space (we ofte call φ(x) the feature space). Of course we still eed to fid the right feature space to perform dimesioality reductio i. Oe optio is to had-desig the feature mappig φ explicitly coordiate by coordiate, e.g., φ(x) = (x 1, x 2 2, x 1 x 2, si(x 1 ),...). However, this process quickly becomes tedious ad has to be ad hoc. Moreover, workig i feature space becomes expesive if φ(x) is very large. For example, cosider the umber of all quadratic terms x i x j = O(p 2 ). A alterative, which we will explore ext, is to ecode φ implicitly via its ier products usig the kerel trick The Kerel Trick Our path to the kerel trick begis with a iterestig claim: the PCA solutio depeds o the data matrix x 1 X = x 2... x oly through the Gram matrix (a.k.a. the Kerel matrix), K = XX T R. The kerel matrix is the matrix of ier products K ij =< x i, x j >. Proof. Each Pricipal Compoet loadig u j is a eigevector of XT X u j = λ j u j for some λ j XT X u j = X T α j = i=1 α jix i for some weights α j. That is, u j is a liear combiatio of the datapoits. This is called a represeter theorem for the PCA solutio. It is aalogous 8-4
5 to represeter theorems you may have see for Support Vector Machies or ridge regressio. Therefore oe ca restrict attetio to cadidate loadigs u j with this form. Now cosider the PCA objective max u j u T j X T X u j s.t. u j 2 = 1, u T X T X j u l = 0, l < j X T α j s.t. αj T XX T α j = 1, αj T X (XT X) max αj T X (XT X) α j max α j α T j K 2 α j, s.t. α T j Kα j = 1, α T j which oly depeds o the data through K! X T α l = 0, l < j K 2 α l = 0, l < j (8.1) The fial represetatio of PCA i kerel form (8.1) is a example of a geeralized eigevalue problem, so we kow how to compute its solutio. However, we will give a more explicit derivatio of its solutio by covertig this problem ito a equivalet eigevalue problem. Hereafter we will assume K is o-sigular. Let β j = K 1 2 α j so that α j = K 1 2 β j. Now the problem becomes (8.1) max u j βj T Kβ j, s.t. βj T β j = 1, βj T K β j = 0, l < j This is a eigevalue problem with solutio give by β j = the j-th leadig eigevector of K ad hece α j = K 1 2 β j = β j λj (K). Furthermore, we ca recover the pricipal compoet scores from this represetatio by z = u T X T = [α 1,..., α k] T XX T = [α 1,..., α k] T K. The puchlie is that we ca solve PCA by fidig the eigevectors ad eigevalues of K; this is kerel PCA, the kerelized form of the PCA algorithm (ote that the solutio is equivalet to the origial PCA solutio if K = XX T ). Hece, the ier products of X are sufficiet, ad we do ot eed additioal access to explicit datapoits. Why is this relevat? Suppose we wat to ru kerel PCA o a o-liear mappig of data Φ = φ(x 1 ) φ(x 2 )... φ(x ) The we do ot eed to compute or store Φ explicitly; K φ = ΦΦ T suffices to ru kerel PCA. Moreover, we ca ofte compute etries of K φ ij =< φ(x i), φ(x j ) > via a kerel fuctio K(x i, x j ) without formig φ(x i ) explicitly. This is the kerel trick. Here are a few commo examples:. 8-5
6 Kerel trick examples Kerel K(x i, x j ) φ(x) Liear < x i, x j > x Quadratic (1+ < x i, x j >) 2 (1, x 1,..., x p, x , x 2 p, x 1 x 2,..., x p 1 x p ) Polyomial (1+ < x i, x j >) d all moomials of order d or less ( ) xi x Gaussia/ Radial basis fuctio exp j 2 2 σ 2 ifiite dimesioal feature vector A pricipal advatage of the kerel trick is that oe ca carry out o-liear dimesio reductio with little depedece o the dimesio of the o-liear feature space. However, oe has to form ad operate o a matrix (which ca be quite expesive). It is commo to approximate the kerel whe is large usig radom (e.g., the Nystrom method of Williams & Seeger, 2000) or determiistic (e.g., the icomplete Cholesky decompositio of Fie & Scheiberg, 2001) low-rak approximatios. 8-6
Machine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationBIOINF 585: Machine Learning for Systems Biology & Clinical Informatics
BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?
More informationCS276A Practice Problem Set 1 Solutions
CS76A Practice Problem Set Solutios Problem. (i) (ii) 8 (iii) 6 Compute the gamma-codes for the followig itegers: (i) (ii) 8 (iii) 6 Problem. For this problem, we will be dealig with a collectio of millio
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationLecture 12: February 28
10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationFactor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis
Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model
More informationVector Quantization: a Limiting Case of EM
. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationLecture 8: October 20, Applications of SVD: least squares approximation
Mathematical Toolkit Autum 2016 Lecturer: Madhur Tulsiai Lecture 8: October 20, 2016 1 Applicatios of SVD: least squares approximatio We discuss aother applicatio of sigular value decompositio (SVD) of
More informationChapter Vectors
Chapter 4. Vectors fter readig this chapter you should be able to:. defie a vector. add ad subtract vectors. fid liear combiatios of vectors ad their relatioship to a set of equatios 4. explai what it
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationCHAPTER 5. Theory and Solution Using Matrix Techniques
A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationThe Method of Least Squares. To understand least squares fitting of data.
The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve
More informationTMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods
TMA4205 Numerical Liear Algebra The Poisso problem i R 2 : diagoalizatio methods September 3, 2007 c Eiar M Røquist Departmet of Mathematical Scieces NTNU, N-749 Trodheim, Norway All rights reserved A
More informationStochastic Simulation
Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More informationMatrix Representation of Data in Experiment
Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y
More informationGrouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014
Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group
More informationSTA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:
STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationStatistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions
Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationSession 5. (1) Principal component analysis and Karhunen-Loève transformation
200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationDistributional Similarity Models (cont.)
Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationApply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.
Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationDistributional Similarity Models (cont.)
Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical
More informationA Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010
A Note o Effi ciet Coditioal Simulatio of Gaussia Distributios A D D C S S, U B C, V, BC, C April 2010 A Cosider a multivariate Gaussia radom vector which ca be partitioed ito observed ad uobserved compoetswe
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationNotes on iteration and Newton s method. Iteration
Notes o iteratio ad Newto s method Iteratio Iteratio meas doig somethig over ad over. I our cotet, a iteratio is a sequece of umbers, vectors, fuctios, etc. geerated by a iteratio rule of the type 1 f
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationAssumptions. Motivation. Linear Transforms. Standard measures. Correlation. Cofactor. γ k
Outlie Pricipal Compoet Aalysis Yaju Ya Itroductio of PCA Mathematical basis Calculatio of PCA Applicatios //04 ELE79, Sprig 004 What is PCA? Pricipal Compoets Pricipal Compoet Aalysis, origially developed
More informationCMSE 820: Math. Foundations of Data Sci.
Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationQuantum Computing Lecture 7. Quantum Factoring
Quatum Computig Lecture 7 Quatum Factorig Maris Ozols Quatum factorig A polyomial time quatum algorithm for factorig umbers was published by Peter Shor i 1994. Polyomial time meas that the umber of gates
More informationALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES.
ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES. ANDREW SALCH 1. The Jacobia criterio for osigularity. You have probably oticed by ow that some poits o varieties are smooth i a sese somethig
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationData Analysis and Statistical Methods Statistics 651
Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationJacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3
No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies
More informationMath 312 Lecture Notes One Dimensional Maps
Math 312 Lecture Notes Oe Dimesioal Maps Warre Weckesser Departmet of Mathematics Colgate Uiversity 21-23 February 25 A Example We begi with the simplest model of populatio growth. Suppose, for example,
More informationDefinitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.
Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationSlide 1. Slide 2. Slide 3. Solids of Rotation:
Slide 1 Solids of Rotatio: The Eggplat Experiece Suz Atik Palo Alto High School Palo Alto, Ca EdD; NBCT, AYA Math satik@pausd.org May thaks to my colleague, Kathy Weiss, NBCT, AYA Math, who origially desiged
More information10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random
Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),
More informationSection 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations
Differece Equatios to Differetial Equatios Sectio. Calculus: Areas Ad Tagets The study of calculus begis with questios about chage. What happes to the velocity of a swigig pedulum as its positio chages?
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationCHAPTER I: Vector Spaces
CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig
More informationChapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian
Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde
More informationSection 5.1 The Basics of Counting
1 Sectio 5.1 The Basics of Coutig Combiatorics, the study of arragemets of objects, is a importat part of discrete mathematics. I this chapter, we will lear basic techiques of coutig which has a lot of
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationt distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference
EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationCov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.
CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model
More information1 Last time: similar and diagonalizable matrices
Last time: similar ad diagoalizable matrices Let be a positive iteger Suppose A is a matrix, v R, ad λ R Recall that v a eigevector for A with eigevalue λ if v ad Av λv, or equivaletly if v is a ozero
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationECON 3150/4150, Spring term Lecture 3
Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio
More informationMath 61CM - Solutions to homework 3
Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationProbability, Expectation Value and Uncertainty
Chapter 1 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More information1 Adiabatic and diabatic representations
1 Adiabatic ad diabatic represetatios 1.1 Bor-Oppeheimer approximatio The time-idepedet Schrödiger equatio for both electroic ad uclear degrees of freedom is Ĥ Ψ(r, R) = E Ψ(r, R), (1) where the full molecular
More information(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:
Math 5-4 Tue Feb 4 Cotiue with sectio 36 Determiats The effective way to compute determiats for larger-sized matrices without lots of zeroes is to ot use the defiitio, but rather to use the followig facts,
More informationTMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.
Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationInverse Matrix. A meaning that matrix B is an inverse of matrix A.
Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix
More information