Distributional Similarity Models (cont.)

Size: px
Start display at page:

Download "Distributional Similarity Models (cont.)"

Transcription

1 Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004

2 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time Distributioal Similarity Models (cot.) 1/26

3 Example Delicately hadlig the beautiful sati bidigs, Emma looked with dazzled eyes at the ames of the ukow authors. The orage blossoms were yellow with dust ad the silver bordered sati ribbos frayed at the borders. The cofessioal forms a pedat to a statuette of the Virgi, clothed i a sati robe. Never had Emma bee so beautiful as at this period. He picked up a cigar-case with a gree silk border. The Border Emma Ribbo Beautiful Sati Silk Distributioal Similarity Models (cot.) 2/26

4 EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {x 1,..., x } R m to be clustered ito k groups Observable data: X = { X i }, where x i =(x i1,...,x im ) Uobservable data: Z = { Z i }, where withi each z i = z i1,...z ik, the compoet z ij is 1 if object i is a member of cluster j ad 0 otherwise Clusterig is viewed as estimatig a mixture of probability distributios Distributioal Similarity Models (cot.) 3/26

5 Example of the EM algorithm for Soft Clusterig c c c c c 2 1 c Iitial state After iteratio 1 After iteratio 2 Distributioal Similarity Models (cot.) 4/26

6 Multivariate Normal Distributios Key Assumptio: Data Geerated by k Gaussias The probability desity fuctio for a Gaussia: 1 1 x; µ j, Σ j )= (2π)m Σ j ) exp[ 2 ( µ)t Σ 1 j ( x j ( x µ)] Goal: fid the maximum likelihood model of the form k j=1 π j ( x; µ j, Σ j ) Distributioal Similarity Models (cot.) 5/26

7 The EM algorithm for Gaussia Mixtures Hidde Parameters: Θ j =(µ j, Σ j,π j ) Log likelihood of the data: k L(X Θ) = log P (x i ) = log π j j (x i ; µ j, Σ j ) i=1 i=1 j=1 k = i=1 log j=1 π j j (x i ; µ j, Σ j ) Distributioal Similarity Models (cot.) 6/26

8 Iterative Solutio Estimate: If we kew the value of Θ we could compute the expected values of the hidde structure of the model. Maximize: If we kew the expected values of the hidde structure of the model, the we could compute the maximum likelihood value of Θ. Distributioal Similarity Models (cot.) 7/26

9 Iitializatio The covariace matrices Σ j are iitialized as idetity matrix. Meas µ j are selected to be a radom perturbatio away from a data poit radomly selected from X. Distributioal Similarity Models (cot.) 8/26

10 Expectatio Step Give the curret parameters, compute cluster membership probabilities h ij = E(z ij x i ;Θ)= k l=1 P (x i j ;Θ) P (x i l ;Θ) Distributioal Similarity Models (cot.) 9/26

11 Maximizatio Step Give the cluster membership probabilities (expected values), compute the most likely parameters Θ h µ j = ij x i i=1 Σ j π h ij i=1 i=1 = i=1 h ij j = k h ij (x i µ j ) T j )(x i µ h ij i=1 h ij i=1 = h ij j=1 i=1 Distributioal Similarity Models (cot.) 10/26

12 Example of a Gaussia mixture Posterior probabilities P (w i c j ) Mai cluster Word ballot polls Gov seats profit fiace paymets NFL Reds Distributioal Similarity Models (cot.) 11/26

13 Other Methods of Dimesioality Reductio Latet Sematic Idexig Similar objects are projected oto the same dimesios The represetatio i the origial space is chaged as little as possible Distributioal Similarity Models (cot.) 12/26

14 Documet-by-word Matrix d 1 d 2 d 3 d 4 d 5 d 6 cosmoaut astroaut moo car truck Distributioal Similarity Models (cot.) 13/26

15 Least-Squares Methods: Liear Regressio y x Distributioal Similarity Models (cot.) 14/26

16 Least-Squares Methods: Liear Regressio Iput: (x 1,y 1 ), (x 2,y 2 ),...,(x,y ) Goal: Fid f(x) =mx + b that miimizes the sum of the squares of the differece SS(m, b) = (y i f(x i ) i=1 2 Distributioal Similarity Models (cot.) 15/26

17 Liear Regressio Miimize SS(m, b) = 2 = i=1 (y i f(x i ) i =1 (y i mx i b) 2 SS(m,b) b = i=1 [2(y i m x i b)( 1)]=0 i=1 i=1 b = x, where ȳ = y i y m ad x = d (y i m x i y+m x) 2 i=1 m = dm SS(m,b) m = y y ( i )( x x i) i=1 x x i ) 2 i=1 ( x i Distributioal Similarity Models (cot.) 16/26

18 Sigular Value Decompositio(SVD) Ratioal: Icrease similarity i represetatio by dimesioality reductio SVD projects a -dimesioal space ito a k-dimesioal space where >k Example: Word-documet matrices i iformatio retrieval. is a umber of word types i the collectio. k ca be 100 Costrait: such that the their distace δ = A A is miimal Distributioal Similarity Models (cot.) 17/26

19 Sigular Value Decompositio Ay m by matrix A ca be factored ito A = T ΣD T =(orthogoal)(diagoal)(orthogoal) The colums of T (m by m) are eigevectors of AA T, ad the colums of D ( by ) are eigevectors of A T A. The r sigular values o the diagoal of Σ (m by ) are the square roots of the ozero eigevalues of both AA T ad A T A. SVD is uique (up to sig flip i D ad T) Distributioal Similarity Models (cot.) 18/26

20 Ituitio SVD rotates the the axes of -dimesioal space such that the first axis rus alog the largest variatio amog the documets, the secod dimesio rus alog the secod largest variatio ad... Matrices T ad D represet terms ad documets i the ew space. Distributioal Similarity Models (cot.) 19/26

21 Origial Matrix d 1 d 2 d 3 d 4 d 5 d 6 cosmoaut astroaut moo car truck Distributioal Similarity Models (cot.) 20/26

22 T Matrix Dim 1 Dim 2 Dim 3 Dim 4 Dim 5 cosmoaut astroaut moo car truck Distributioal Similarity Models (cot.) 21/26

23 D T Matrix d 1 d 2 d 3 d 4 d 5 d 6 Dim Dim Dim Dim Dim Distributioal Similarity Models (cot.) 22/26

24 Matrix of Sigular Values Distributioal Similarity Models (cot.) 23/26

25 Reductio Restrict the matrices T,S ad D to their first k < colums T t k S k k (D d k ) T is the best least square approximatio of A by a matrix of rak k Term similarity ca be computed as (T t k S k k )(T t k S k k ) T AA T = TSD T (TSD T ) T = TSD T DS T T T =(TS)(TS) T Distributioal Similarity Models (cot.) 24/26

26 Pros ad Cos + Clea formal framework with a clearly defied optimizatio criterio + Used i a variety of applicatios (from IR to dialogue processig) - Computatioally expesive - Assumes ormally-distributed data Distributioal Similarity Models (cot.) 25/26

27 Coclusios The EM algorithm for Gaussia Mixtures Latet Sematic Idexig Sigular Value decompositio Distributioal Similarity Models (cot.) 26/26

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems Cotets i latter part PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Liear Dyamical Systems What is differet from HMM? Kalma filter Its stregth ad limitatio Particle Filter Its simple

More information

Clustering: Mixture Models

Clustering: Mixture Models Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Lecture 8: October 20, Applications of SVD: least squares approximation

Lecture 8: October 20, Applications of SVD: least squares approximation Mathematical Toolkit Autum 2016 Lecturer: Madhur Tulsiai Lecture 8: October 20, 2016 1 Applicatios of SVD: least squares approximatio We discuss aother applicatio of sigular value decompositio (SVD) of

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Dimensionality Reduction vs. Clustering

Dimensionality Reduction vs. Clustering Dimesioality Reductio vs. Clusterig Lecture 9: Cotiuous Latet Variable Models Sam Roweis Traiig such factor models (e.g. FA, PCA, ICA) is called dimesioality reductio. You ca thik of this as (o)liear regressio

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

36-755, Fall 2017 Homework 5 Solution Due Wed Nov 15 by 5:00pm in Jisu s mailbox

36-755, Fall 2017 Homework 5 Solution Due Wed Nov 15 by 5:00pm in Jisu s mailbox Poits: 00+ pts total for the assigmet 36-755, Fall 07 Homework 5 Solutio Due Wed Nov 5 by 5:00pm i Jisu s mailbox We first review some basic relatios with orms ad the sigular value decompositio o matrices

More information

Unsupervised Learning 2001

Unsupervised Learning 2001 Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

STATS 306B: Unsupervised Learning Spring Lecture 8 April 23

STATS 306B: Unsupervised Learning Spring Lecture 8 April 23 STATS 306B: Usupervised Learig Sprig 2014 Lecture 8 April 23 Lecturer: Lester Mackey Scribe: Kexi Nie, Na Bi 8.1 Pricipal Compoet Aalysis Last time we itroduced the mathematical framework uderlyig Pricipal

More information

18.S096: Homework Problem Set 1 (revised)

18.S096: Homework Problem Set 1 (revised) 8.S096: Homework Problem Set (revised) Topics i Mathematics of Data Sciece (Fall 05) Afoso S. Badeira Due o October 6, 05 Exteded to: October 8, 05 This homework problem set is due o October 6, at the

More information

Lecture 2 Clustering Part II

Lecture 2 Clustering Part II COMS 4995: Usupervised Learig (Summer 8) May 24, 208 Lecture 2 Clusterig Part II Istructor: Nakul Verma Scribes: Jie Li, Yadi Rozov Today, we will be talkig about the hardess results for k-meas. More specifically,

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Signal Processing in Mechatronics

Signal Processing in Mechatronics Sigal Processig i Mechatroics Zhu K.P. AIS, UM. Lecture, Brief itroductio to Sigals ad Systems, Review of Liear Algebra ad Sigal Processig Related Mathematics . Brief Itroductio to Sigals What is sigal

More information

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice 0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010 A Note o Effi ciet Coditioal Simulatio of Gaussia Distributios A D D C S S, U B C, V, BC, C April 2010 A Cosider a multivariate Gaussia radom vector which ca be partitioed ito observed ad uobserved compoetswe

More information

Expectation maximization

Expectation maximization Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label

More information

Chimica Inorganica 3

Chimica Inorganica 3 himica Iorgaica Irreducible Represetatios ad haracter Tables Rather tha usig geometrical operatios, it is ofte much more coveiet to employ a ew set of group elemets which are matrices ad to make the rule

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

The Basic Space Model

The Basic Space Model The Basic Space Model Let x i be the ith idividual s (i=,, ) reported positio o the th issue ( =,, m) ad let X 0 be the by m matrix of observed data here the 0 subscript idicates that elemets are missig

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum

More information

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n. CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Affine Structure from Motion

Affine Structure from Motion Affie Structure from Motio EECS 598-8 Fall 24! Foudatios of Computer Visio!! Istructor: Jaso Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f4!! Readigs: FP 8.2! Date: /5/4!! Materials o these slides

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3.

For a 3 3 diagonal matrix we find. Thus e 1 is a eigenvector corresponding to eigenvalue λ = a 11. Thus matrix A has eigenvalues 2 and 3. Closed Leotief Model Chapter 6 Eigevalues I a closed Leotief iput-output-model cosumptio ad productio coicide, i.e. V x = x = x Is this possible for the give techology matrix V? This is a special case

More information

PC5215 Numerical Recipes with Applications - Review Problems

PC5215 Numerical Recipes with Applications - Review Problems PC55 Numerical Recipes with Applicatios - Review Problems Give the IEEE 754 sigle precisio bit patter (biary or he format) of the followig umbers: 0 0 05 00 0 00 Note that it has 8 bits for the epoet,

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

CMSE 820: Math. Foundations of Data Sci.

CMSE 820: Math. Foundations of Data Sci. Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

State Space Representation

State Space Representation Optimal Cotrol, Guidace ad Estimatio Lecture 2 Overview of SS Approach ad Matrix heory Prof. Radhakat Padhi Dept. of Aerospace Egieerig Idia Istitute of Sciece - Bagalore State Space Represetatio Prof.

More information

Lecture 5: Latent Semantic Indexing. Independence. Dealing with Topics. Latent Semantic Indexing. Linear Algebra

Lecture 5: Latent Semantic Indexing. Independence. Dealing with Topics. Latent Semantic Indexing. Linear Algebra Idepedece Iformatio Retrieval ad Web Search Egies Lecture 5: Latet Sematic Idexig November 26 th, 2013 Wolf-Tilo Balke ad Kida El Maarry Istitut für Iformatiossysteme Techische Uiversität Brauschweig May

More information

Fastest mixing Markov chain on a path

Fastest mixing Markov chain on a path Fastest mixig Markov chai o a path Stephe Boyd Persi Diacois Ju Su Li Xiao Revised July 2004 Abstract We ider the problem of assigig trasitio probabilities to the edges of a path, so the resultig Markov

More information

Quantile regression with multilayer perceptrons.

Quantile regression with multilayer perceptrons. Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer

More information

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition 6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

PCA SVD LDA MDS, LLE, CCA. Data mining. Dimensionality reduction. University of Szeged. Data mining

PCA SVD LDA MDS, LLE, CCA. Data mining. Dimensionality reduction. University of Szeged. Data mining Dimesioality reductio Uiversity of Szeged The role of dimesioality reductio We ca spare computatioal costs (or simply fit etire datasets ito mai memory) if we represet data i fewer dimesios Visualizatio

More information

Matrix Representation of Data in Experiment

Matrix Representation of Data in Experiment Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y

More information

TAMS24: Notations and Formulas

TAMS24: Notations and Formulas TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =

More information

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Session 5. (1) Principal component analysis and Karhunen-Loève transformation 200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image

More information

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

IIT JAM Mathematical Statistics (MS) 2006 SECTION A IIT JAM Mathematical Statistics (MS) 6 SECTION A. If a > for ad lim a / L >, the which of the followig series is ot coverget? (a) (b) (c) (d) (d) = = a = a = a a + / a lim a a / + = lim a / a / + = lim

More information

Mathematics 3 Outcome 1. Vectors (9/10 pers) Lesson, Outline, Approach etc. This is page number 13. produced for TeeJay Publishers by Tom Strang

Mathematics 3 Outcome 1. Vectors (9/10 pers) Lesson, Outline, Approach etc. This is page number 13. produced for TeeJay Publishers by Tom Strang Vectors (9/0 pers) Mathematics 3 Outcome / Revise positio vector, PQ = q p, commuicative, associative, zero vector, multiplicatio by a scalar k, compoets, magitude, uit vector, (i, j, ad k) as well as

More information

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

http://www.xelca.l/articles/ufo_ladigsbaa_houte.aspx imulatio Output aalysis 3/4/06 This lecture Output: A simulatio determies the value of some performace measures, e.g. productio per hour, average queue

More information

Markov Decision Processes

Markov Decision Processes Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Lecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm

Lecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm 8.409 A Algorithmist s Toolkit Nov. 9, 2009 Lecturer: Joatha Keler Lecture 20 Brief Review of Gram-Schmidt ad Gauss s Algorithm Our mai task of this lecture is to show a polyomial time algorithm which

More information

EECS 442 Computer vision. Multiple view geometry Affine structure from Motion

EECS 442 Computer vision. Multiple view geometry Affine structure from Motion EECS 442 Computer visio Multiple view geometry Affie structure from Motio - Affie structure from motio problem - Algebraic methods - Factorizatio methods Readig: [HZ] Chapters: 6,4,8 [FP] Chapter: 2 Some

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let.

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let. Joh Riley 8 Jue 03 ANSWERS TO EXERCISES IN APPENDIX B Sectio B VECTORS AND SETS Exercise B-: Covex sets (a) Let 0 x, x X, X, hece 0 x, x X ad 0 x, x X Sice X ad X are covex, x X ad x X The x X X, which

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?

More information

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods

More information

Chapter 1 Simple Linear Regression (part 6: matrix version)

Chapter 1 Simple Linear Regression (part 6: matrix version) Chapter Simple Liear Regressio (part 6: matrix versio) Overview Simple liear regressio model: respose variable Y, a sigle idepedet variable X Y β 0 + β X + ε Multiple liear regressio model: respose Y,

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13 BHW # /5 ENGR Probabilistic Aalysis Beautiful Homework # Three differet roads feed ito a particular freeway etrace. Suppose that durig a fixed time period, the umber of cars comig from each road oto the

More information

Estimating Confidence Interval of Mean Using. Classical, Bayesian, and Bootstrap Approaches

Estimating Confidence Interval of Mean Using. Classical, Bayesian, and Bootstrap Approaches Iteratioal Joural of Mathematical Aalysis Vol. 8, 2014, o. 48, 2375-2383 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijma.2014.49287 Estimatig Cofidece Iterval of Mea Usig Classical, Bayesia,

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Multinomial likelihood. Multinomial MLE. NIST data and genetic fingerprints. θ = (p 1,..., p m ) with j p j = 1 and p j 0. Point probabilities

Multinomial likelihood. Multinomial MLE. NIST data and genetic fingerprints. θ = (p 1,..., p m ) with j p j = 1 and p j 0. Point probabilities Multiomial distributio Let Y,..., Y be iid, uiformly sampled from a fiite populatio ad X i deotes a property of the idividual i. Label the properties,..., m. p j = PX i = j) = umber of idividuals with

More information

Concavity-Preserving Integration and Its Application in Principal-Agent Problems

Concavity-Preserving Integration and Its Application in Principal-Agent Problems ANNALS OF ECONOMICS AND FINANCE 19-2, 585 590 (2018) Cocavity-Preservig Itegratio ad Its Applicatio i Pricipal-Aget Problems Jia Xie * This paper fids the ecessary ad sufficiet coditio for a itegratio

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Lecture 23: Minimal sufficiency

Lecture 23: Minimal sufficiency Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

The Perturbation Bound for the Perron Vector of a Transition Probability Tensor

The Perturbation Bound for the Perron Vector of a Transition Probability Tensor NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Liear Algebra Appl. ; : 6 Published olie i Wiley IterSciece www.itersciece.wiley.com. DOI:./la The Perturbatio Boud for the Perro Vector of a Trasitio

More information

Applications in Linear Algebra and Uses of Technology

Applications in Linear Algebra and Uses of Technology 1 TI-89: Let A 1 4 5 6 7 8 10 Applicatios i Liear Algebra ad Uses of Techology,adB 4 1 1 4 type i: [1,,;4,5,6;7,8,10] press: STO type i: A type i: [4,-1;-1,4] press: STO (1) Row Echelo Form: MATH/matrix

More information

A Unified Approach on Fast Training of Feedforward and Recurrent Networks Using EM Algorithm

A Unified Approach on Fast Training of Feedforward and Recurrent Networks Using EM Algorithm 2270 IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 46, O. 8, AUGUST 1998 [12] Q. T. Zhag, K. M. Wog, P. C. Yip, ad J. P. Reilly, Statistical aalysis of the performace of iformatio criteria i the detectio of

More information

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7:

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7: 0 Multivariate Cotrol Chart 3 Multivariate Normal Distributio 5 Estimatio of the Mea ad Covariace Matrix 6 Hotellig s Cotrol Chart 6 Hotellig s Square 8 Average Value of k Subgroups 0 Example 3 3 Value

More information