STATS 306B: Unsupervised Learning Spring Lecture 8 April 23

Size: px
Start display at page:

Download "STATS 306B: Unsupervised Learning Spring Lecture 8 April 23"

Transcription

1 STATS 306B: Usupervised Learig Sprig 2014 Lecture 8 April 23 Lecturer: Lester Mackey Scribe: Kexi Nie, Na Bi 8.1 Pricipal Compoet Aalysis Last time we itroduced the mathematical framework uderlyig Pricipal Compoet Aalysis (PCA); ext we will cosider some of its applicatios. Please refer to the accompayig slides Examples Example 1. Digit data (Slide 2:) Here is a example take from the textbook. This set of had writte digital images cotais 130 threes, ad each three is a greyscale image. Hece we may represet each datapoit as a vector of 256 greyscale pixels. (Slide 3:) The figure o the left shows the first two pricipal compoets of these images. The rectagular grid is computed by selected quatiles of the two pricipal compoets. Based o the projected coordiates o the two directios, the circled poits refer to the images that are closest to these vertices of the grid. The figure o the right displays the threes correspodig to the circled poits. The vertical compoet appears to capture chages i lie thickess / darkess, while the horizotal compoet appears to capture chages i the legth of the bottom of the three. (Slide 4:) This is a visual represetatio of the leared two-compoet PCA model. The first term is the mea of all images, ad the followig v 1 ad v 2 are two visualized pricipal directios (the loadigs), which ca also be called eige threes. Example 2. Eige-faces (Slide 5:) PCA is widely used i face recogitio. Suppose X d is the pixel-image matrix, where each colum is a face image. d is the umber of pixels ad x ji is the itesity of j-th pixel i image i. The loadigs retured by PCA are liear combiatios of faces, which ca be called eige-faces. The workig assumptio is that the PC scores z i, gotte by projectig the origial image oto the eige-face space, represet a more meaigful ad compact represetatio of the i-th face tha the raw pixel represetatio. The z i ca be used i place of x i for earest-eighbor classificatio. Sice the dimesio of face-space has decreased from d to k, the computatioal complexity becomes O(dk + k) istead of O(d). This is of great efficiecy whe, d k. Example 3. Latet sematic aalysis (Slide 6:) Aother applicatio of PCA is i text aalysis. Let d to be the total umber of words i the vocabulary; the each documet x i R d is a vector of word couts, ad x ji is the frequecy of word j i documet i. After we apply PCA here, the similarity betwee two documets is ow z T i z j, which is ofte more iformative tha the raw measure x T i x j. Notice that there may ot be sigificat computatioal savigs, sice the origial word-documet matrix was sparse, while the reduced represetatio is typically dese. 8-1

2 Example 4. Aomaly detectio (Slide 7:) PCA ca be used i etwork aomaly detectio. I the time-lik matrix X, x ji represets the amout of traffic o lik j i the etwork durig time-iterval i. I the two pictures o the left, traffic appears periodic ad reasoably determiistic o the selected pricipal compoet, which asserts that these two are ormal behaviors. I the cotrast, traffic spikes i the pictures o the right, which idicates aomalous behavior i this flow. Example 5. Part-of-speech taggig (Slide 8:) Usupervised part-of-speech taggig is a commo task i atural laguage processig, as maually taggig a large corpus is expesive ad time-cosumig. Here it is commo to model each word i a vocabulary by its cotext distributio, i.e., x ji is the umber of times that word i appears i cotext j. The key idea of usupervised POS taggig is that words appearig i similar cotexts ted to have same POS tags. Hece, a typical taggig techique is to cluster words accordig to their cotexts. However, i ay give corpus, ay give cotext may occur rather ifrequetly (the vectors x i are too sparse), so PCA has bee used to fid a more suitable, comparable represetatio for each word before clusterig is applied. Example 6. Multi-task learig (Slide 9:) I multi-task learig, oe is attemptig to solve related learig tasks simultaeously, e.g., classifyig documets as relevat or ot for users. Ofte task i reduces to learig a weight vector x i which produces for example the classificatio rule. Our goal is to exploit the similarities amogst these tasks to do more effective learig overall. Oe way to accomplish this is to use PCA is to idetify a small set of eige-classifiers amog the leared rules x 1,..., x. The, the classifiers ca be retraied with a added regularizatio term ecouragig each x i to lie ear the subspace spaed by the pricipal directios. These two steps of PCA ad retraiig are iterated util covergece. I this way, low-dimesioal represetatio of classifiers ca help to detect the shared structures betwee idepedet tasks Choosig a umber of compoets (Slide 10:) As i the clusterig settig, we face a model selectio questio: how do we choose the umber of pricipal compoets? While there is o agreed-upo solutio to this problem, here are some guidelies. The umber of pricipal compoets might be costraied by the problem goal, your computatioal or storage resources, or by the miimum fractio of variace to be explaied. For example, it is commo to choose 3 or fewer pricipal compoets whe doig visualizatio problems. Recall that eigevalue magitudes determie the explaied variace. I the accompayig figure, the first 5 pricipal compoets already explai early all of the variace, so a small umber of pricipal compoets may be sufficiet (although oe must use care i drawig this coclusio, sice small differeces i recostructio error may still be sematically sigificat; cosider face recogitio for example). Furthermore, we may look for elbow criterio or compare explaied variace with that obtaied uder a referece distributio. 8-2

3 8.1.3 PCA limitatios ad extesios While PCA has a great umber of applicatios, it has its limitatios as well: Squared Euclidea recostructio error is ot appropriate for all data types. Various extesios, such as expoetial family PCA, have bee developed for biary, categorical, cout, ad oegative data. PCA ca oly fid liear compressios of data. Kerel PCA is a importat geeralizatio desiged for o-liear dimesioality reductio. 8.2 No-liear dimesioality reductio with kerel PCA Ituitio Figure 8.1. Data lyig ear a liear subspace Figure 8.2. Data lyig ear a parabola Figure 8.1 displays a 2D example i which PCA is effective because data lie ear a liear subspace. However, i Figure 8.2 PCA is ieffective, because data the data lie ear a parabola. I this case, the PCA compressio of the data might project all poits oto the orage lie, which is far from ideal. Let us cosider the differeces betwee these two settigs mathematically. Liear subspace (Figure 8.1): I this example we have ambiet dimesio p = 2 ad compoet dimesio k = 1. Sice the blue lie is a k-dimesioal liear subspace of R p, we kow that there is some matrix U R p k such that the subspace S takes the form S = {x R p : x = Uz, z R k } where U = [ u1 u 2 = {(x 1, x 2 ) : x 1 = u 1 z, x 2 = u 2 z} = {(x 1, x 2 ) : x 2 = u 2 x 1 }, u 1 ], sice (p, k) = (2, 1) i our example. 8-3

4 Parabola (Figure 8.2): I this example we agai have ambiet dimesio p = 2 ad compoet dimesio k = 1. Moreover, there is some fixed matrix U R p k such that the uderlyig blue parabola takes the form S = {(x 1, x 2 ) : x 2 = u 2 u 1 x 2 1} which is similar to the represetatio derived i the liear model. Ideed, if we itroduce a auxiliary variable z, we get, S = {(x 1, x 2 ) : x 2 1 = u 1 z, x 2 2 = u 2 z, for z R} = {x R p : φ(x) = Uz, z R k, } [ ] x 2 where φ(x) = 1 is a o-liear fuctio of x. I this fial represetatio, U is still a x 2 liear mappig of the latet compoets z, but the represetatio beig recostructed liearly is o loger x itself but rather a potetially o-liear mappig φ of x Take-away We should be able to capture o-liear dimesioality reductio i x space by performig liear dimesioality reductio i φ(x) space (we ofte call φ(x) the feature space). Of course we still eed to fid the right feature space to perform dimesioality reductio i. Oe optio is to had-desig the feature mappig φ explicitly coordiate by coordiate, e.g., φ(x) = (x 1, x 2 2, x 1 x 2, si(x 1 ),...). However, this process quickly becomes tedious ad has to be ad hoc. Moreover, workig i feature space becomes expesive if φ(x) is very large. For example, cosider the umber of all quadratic terms x i x j = O(p 2 ). A alterative, which we will explore ext, is to ecode φ implicitly via its ier products usig the kerel trick The Kerel Trick Our path to the kerel trick begis with a iterestig claim: the PCA solutio depeds o the data matrix x 1 X = x 2... x oly through the Gram matrix (a.k.a. the Kerel matrix), K = XX T R. The kerel matrix is the matrix of ier products K ij =< x i, x j >. Proof. Each Pricipal Compoet loadig u j is a eigevector of XT X u j = λ j u j for some λ j XT X u j = X T α j = i=1 α jix i for some weights α j. That is, u j is a liear combiatio of the datapoits. This is called a represeter theorem for the PCA solutio. It is aalogous 8-4

5 to represeter theorems you may have see for Support Vector Machies or ridge regressio. Therefore oe ca restrict attetio to cadidate loadigs u j with this form. Now cosider the PCA objective max u j u T j X T X u j s.t. u j 2 = 1, u T X T X j u l = 0, l < j X T α j s.t. αj T XX T α j = 1, αj T X (XT X) max αj T X (XT X) α j max α j α T j K 2 α j, s.t. α T j Kα j = 1, α T j which oly depeds o the data through K! X T α l = 0, l < j K 2 α l = 0, l < j (8.1) The fial represetatio of PCA i kerel form (8.1) is a example of a geeralized eigevalue problem, so we kow how to compute its solutio. However, we will give a more explicit derivatio of its solutio by covertig this problem ito a equivalet eigevalue problem. Hereafter we will assume K is o-sigular. Let β j = K 1 2 α j so that α j = K 1 2 β j. Now the problem becomes (8.1) max u j βj T Kβ j, s.t. βj T β j = 1, βj T K β j = 0, l < j This is a eigevalue problem with solutio give by β j = the j-th leadig eigevector of K ad hece α j = K 1 2 β j = β j λj (K). Furthermore, we ca recover the pricipal compoet scores from this represetatio by z = u T X T = [α 1,..., α k] T XX T = [α 1,..., α k] T K. The puchlie is that we ca solve PCA by fidig the eigevectors ad eigevalues of K; this is kerel PCA, the kerelized form of the PCA algorithm (ote that the solutio is equivalet to the origial PCA solutio if K = XX T ). Hece, the ier products of X are sufficiet, ad we do ot eed additioal access to explicit datapoits. Why is this relevat? Suppose we wat to ru kerel PCA o a o-liear mappig of data Φ = φ(x 1 ) φ(x 2 )... φ(x ) The we do ot eed to compute or store Φ explicitly; K φ = ΦΦ T suffices to ru kerel PCA. Moreover, we ca ofte compute etries of K φ ij =< φ(x i), φ(x j ) > via a kerel fuctio K(x i, x j ) without formig φ(x i ) explicitly. This is the kerel trick. Here are a few commo examples:. 8-5

6 Kerel trick examples Kerel K(x i, x j ) φ(x) Liear < x i, x j > x Quadratic (1+ < x i, x j >) 2 (1, x 1,..., x p, x , x 2 p, x 1 x 2,..., x p 1 x p ) Polyomial (1+ < x i, x j >) d all moomials of order d or less ( ) xi x Gaussia/ Radial basis fuctio exp j 2 2 σ 2 ifiite dimesioal feature vector A pricipal advatage of the kerel trick is that oe ca carry out o-liear dimesio reductio with little depedece o the dimesio of the o-liear feature space. However, oe has to form ad operate o a matrix (which ca be quite expesive). It is commo to approximate the kerel whe is large usig radom (e.g., the Nystrom method of Williams & Seeger, 2000) or determiistic (e.g., the icomplete Cholesky decompositio of Fie & Scheiberg, 2001) low-rak approximatios. 8-6

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?

More information

CS276A Practice Problem Set 1 Solutions

CS276A Practice Problem Set 1 Solutions CS76A Practice Problem Set Solutios Problem. (i) (ii) 8 (iii) 6 Compute the gamma-codes for the followig itegers: (i) (ii) 8 (iii) 6 Problem. For this problem, we will be dealig with a collectio of millio

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Lecture 8: October 20, Applications of SVD: least squares approximation

Lecture 8: October 20, Applications of SVD: least squares approximation Mathematical Toolkit Autum 2016 Lecturer: Madhur Tulsiai Lecture 8: October 20, 2016 1 Applicatios of SVD: least squares approximatio We discuss aother applicatio of sigular value decompositio (SVD) of

More information

Chapter Vectors

Chapter Vectors Chapter 4. Vectors fter readig this chapter you should be able to:. defie a vector. add ad subtract vectors. fid liear combiatios of vectors ad their relatioship to a set of equatios 4. explai what it

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

CHAPTER 5. Theory and Solution Using Matrix Techniques

CHAPTER 5. Theory and Solution Using Matrix Techniques A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

TMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods

TMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods TMA4205 Numerical Liear Algebra The Poisso problem i R 2 : diagoalizatio methods September 3, 2007 c Eiar M Røquist Departmet of Mathematical Scieces NTNU, N-749 Trodheim, Norway All rights reserved A

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Matrix Representation of Data in Experiment

Matrix Representation of Data in Experiment Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Session 5. (1) Principal component analysis and Karhunen-Loève transformation 200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

Lecture 2: April 3, 2013

Lecture 2: April 3, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010 A Note o Effi ciet Coditioal Simulatio of Gaussia Distributios A D D C S S, U B C, V, BC, C April 2010 A Cosider a multivariate Gaussia radom vector which ca be partitioed ito observed ad uobserved compoetswe

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Notes on iteration and Newton s method. Iteration

Notes on iteration and Newton s method. Iteration Notes o iteratio ad Newto s method Iteratio Iteratio meas doig somethig over ad over. I our cotet, a iteratio is a sequece of umbers, vectors, fuctios, etc. geerated by a iteratio rule of the type 1 f

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Assumptions. Motivation. Linear Transforms. Standard measures. Correlation. Cofactor. γ k

Assumptions. Motivation. Linear Transforms. Standard measures. Correlation. Cofactor. γ k Outlie Pricipal Compoet Aalysis Yaju Ya Itroductio of PCA Mathematical basis Calculatio of PCA Applicatios //04 ELE79, Sprig 004 What is PCA? Pricipal Compoets Pricipal Compoet Aalysis, origially developed

More information

CMSE 820: Math. Foundations of Data Sci.

CMSE 820: Math. Foundations of Data Sci. Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Quantum Computing Lecture 7. Quantum Factoring

Quantum Computing Lecture 7. Quantum Factoring Quatum Computig Lecture 7 Quatum Factorig Maris Ozols Quatum factorig A polyomial time quatum algorithm for factorig umbers was published by Peter Shor i 1994. Polyomial time meas that the umber of gates

More information

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES.

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES. ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES. ANDREW SALCH 1. The Jacobia criterio for osigularity. You have probably oticed by ow that some poits o varieties are smooth i a sese somethig

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies

More information

Math 312 Lecture Notes One Dimensional Maps

Math 312 Lecture Notes One Dimensional Maps Math 312 Lecture Notes Oe Dimesioal Maps Warre Weckesser Departmet of Mathematics Colgate Uiversity 21-23 February 25 A Example We begi with the simplest model of populatio growth. Suppose, for example,

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Slide 1. Slide 2. Slide 3. Solids of Rotation:

Slide 1. Slide 2. Slide 3. Solids of Rotation: Slide 1 Solids of Rotatio: The Eggplat Experiece Suz Atik Palo Alto High School Palo Alto, Ca EdD; NBCT, AYA Math satik@pausd.org May thaks to my colleague, Kathy Weiss, NBCT, AYA Math, who origially desiged

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Section 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations

Section 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations Differece Equatios to Differetial Equatios Sectio. Calculus: Areas Ad Tagets The study of calculus begis with questios about chage. What happes to the velocity of a swigig pedulum as its positio chages?

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Section 5.1 The Basics of Counting

Section 5.1 The Basics of Counting 1 Sectio 5.1 The Basics of Coutig Combiatorics, the study of arragemets of objects, is a importat part of discrete mathematics. I this chapter, we will lear basic techiques of coutig which has a lot of

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n. CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model

More information

1 Last time: similar and diagonalizable matrices

1 Last time: similar and diagonalizable matrices Last time: similar ad diagoalizable matrices Let be a positive iteger Suppose A is a matrix, v R, ad λ R Recall that v a eigevector for A with eigevalue λ if v ad Av λv, or equivaletly if v is a ozero

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Probability, Expectation Value and Uncertainty

Probability, Expectation Value and Uncertainty Chapter 1 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

1 Adiabatic and diabatic representations

1 Adiabatic and diabatic representations 1 Adiabatic ad diabatic represetatios 1.1 Bor-Oppeheimer approximatio The time-idepedet Schrödiger equatio for both electroic ad uclear degrees of freedom is Ĥ Ψ(r, R) = E Ψ(r, R), (1) where the full molecular

More information

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row: Math 5-4 Tue Feb 4 Cotiue with sectio 36 Determiats The effective way to compute determiats for larger-sized matrices without lots of zeroes is to ot use the defiitio, but rather to use the followig facts,

More information

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences. Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A. Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix

More information