Non-Linear Maximum Likelihood Feature Transformation For Speech Recognition

Size: px
Start display at page:

Download "Non-Linear Maximum Likelihood Feature Transformation For Speech Recognition"

Transcription

1 No-Liear Maximum Likelihood Feature Trasformatio For Speech Recogitio Mohamed Kamal Omar, Mark Hasegawa-Johso Departmet of Electrical Ad Computer Egieerig, Uiversity of Illiois at Urbaa-Champaig, Urbaa, IL Abstract Most automatic speech recogitio systems (ASR) use Hidde Markov model (HMM) with a diagoal-covariace Gaussia mixture model for the state-coditioal probability desity fuctio. The diagoal-covariace Gaussia mixture ca model discrete sources of variability like speaker variatios, geder variatios, or local dialect, but ca ot model cotiuous types of variability that accout for correlatio betwee the elemets of the feature vector. I this paper, we preset a trasformatio of the acoustic feature vector that miimize a empirical estimate of the relative etropy betwee the likelihood based o the diagoal-covriace Gaussia mixture HMM model ad the true likelihood. We show that this miimizatio is equivalet to maximizig the likelihood i the origial feature space. Based o this formulatio, we provide a computatioally efficiet solutio to the problem based o volume-preservig maps; existig liear feature trasform desigs are show to be special cases of the proposed solutio. Sice most of the acoustic features used i ASR are ot liear fuctios of the sources of correlatio i the speech sigal, we use a o-liear trasformatio of the features to miimize this objective fuctio. We describe a iterative algorithm to estimate the parameters of both the volume-preservig feature trasformatio ad the hidde Markov models (HMM) that joitly optimize the objective fuctio for a HMM-based speech recogizer. Usig this algorithm, we achieved 2% improvemet i phoeme recogitio accuracy compared to the origial system that uses the origial Mel-frequecy cepstral coeeficiets (MFCC) acoustic features. Our approach is compared also to previous similar liear approaches like MLLT ad ICA. 1. Itroductio A importat goal for desigers of ASR systems is to achieve a high level of performace while miimizig the umber of parameters used by the system. Not oly because it icreases the computatioal load ad the storage requiremets, but also because it icreases the size of the traiig data required to estimate the parameters. Oe way of cotrollig the umber of parameters is to adjust the structure of the coditioal joit PDF used by the recogizer. For example, the dimesioality of the acoustic feature vectors i Gaussia mixture HMM is too large for their coditioal joit PDFs to have full covariace matrices. O the other had, approximatig the coditioal PDF by a diagoal covariace matrix Gaussia PDF degrades the performace of the recogizer [?], as the acoustic features used i ASR systems are either decorrelated or idepedet give the Gaussia compoet idex. The mixture of Gaussia compoets ca model discrete sources of variability like speaker variatios, geder variatios, or local dialect, but ca ot model cotiuous types of variability that accout for correlatio betwee the elemets of the feature vector like coarticulatio effects ad backgroud oise. Recet approaches to this problem that offer ew alteratives ca be classified ito two major categories. The first category try to decrease the umber of parameters required for full covariace matrices. This category iclude a variety of choices for covariace structure other tha diagoal or full. Two examples that ca be used i ASR systems are block-diagoal [?] ad baded-diagoal matrices. Aother method ofte used by ASR systems is tyig, where certai parameters are shared amogst a umber of differet models. For example, the semi-tied covariace matrices approach that estimates a trasform i a maximum likelihood fashio give the curret model parameters is described i [?]. Factor aalysis also was used i [?] to model the covariace matrix of each Gaussia compoet of the Gaussia mixture used withi each state of the HMM recogizer. The secod category choose to trasform the origial feature space to a ew feature space that satisfies the diagoalcovariace models better. This is achieved by optimizig the trasform based o a criterio that measures the validity of the assumptio. A example is a state-specific pricipal compoet aalysis (PCA) approach that was itroduced i [?]. Aother example is idepedet compoet aalysis (ICA) that was used i developig features for speaker recogitio [?] ad speech recogitio [?], [?], [?]. The maximum likelihood liear trasform (MLLT) itroduced i [?] is also a example of feature-based solutios. All previous approaches assume that idepedet or decorrelated compoets are mixed liearly to geerate the observatio data. However, for most acoustic features used i ASR, this assumptio is ujustified or uacceptable. A example is cepstral features like MFCC ad PLPCC; I the cepstral domai, coarticulatio effects ad additive oise are examples of idepedet sources i the speech sigal that are oliearly combied with the iformatio about the vocal tract shape that is importat for recogitio. The source-filter model proposes that the excitatio sigal ad the vocal tract filter are liearly combied i the cepstral domai, but the source-filter model is urealistic i may cases, especially for cosoats. Timevaryig filters ad filter-depedet sources result i oliear source-filter combiatio i the cepstral domai [?]. I [?], we formulated the problem as a o-liear idepedet compoet aalysis (NICA) problem. We showed that usig the features geerated usig NICA i speech recogitio icreased the phoeme recogitio accuracy compared to liear feature trasforms like ICA [?], liear discrimiat aalysis (LDA) [?], ad MLLT. However, usig PCA or ICA approaches

2 is justified oly if a differet feature trasform is desiged for each Gaussia compoet i the model, as it assumes that the probabilistic model imposes idepedece or decorrelatio o the features. I this work, we will itroduce a uified iformatiotheoretic approach to feature trasformatio that makes o assumptios about the true probability desity fuctio of the origial features ad ca be applied for ay probabilistic model with arbitrary costraits. It estimates a oliear trasform ad the parameters of the probabilistic model that joitly miimize the relative etropy betwee the true likelihood ad its estimatio based o the model. Ulike previous approaches, this formulatio justify usig a sigle trasform for observatios geerated by differet classes. I the ext sectio, a iformatio-theoretic formulatio of the problem is described ad a solutio based o volume-preservig maps is itroduced. A iterative algorithm is described i sectio 4 to joitly estimate the parameters of the trasform of the features ad the parameters of the model. The, experimets based o a efficiet implemetatio of this algorithm are described i sectio 5. Fially, sectio 6 provides discussio of the results ad a summary of this work. 2. Problem Formulatio We will take here a differet approach to the problem, motivated by the discussio of the previous sectio. Istead of focusig o specific model assumptios, we will choose ay hypothesized parametric family of distributios to be used i our probabilistic model, ad search for a map of the features that improves the validity of our model. To do that, we will eed the followig propositio. Propositio: Let y f (x) be a arbitrary oe-to-oe map of the features radom vector X i < to Y i <, ad let ^P Λ(y) be the likelihood of the ew features usig HMM. The map f Λ (:) ad the set of parameters Λ Λ miimize the relative etropy betwee the hypothesized ad the true likelihoods of Y if ad oly if they also maximize the objective fuctio fi L E P (Y )»log det fifi log ^P Λ(Y ) ; (1) where ] is the Jacobia matrix of the map f (:). This ca be show by writig the expressio for the relative etropy after a arbitrary trasformatio, y f (x), of the iput radom vector X i <,as R(P (Y ); ^P (Y )) H(P (Y )) E P (Y ) hlog i ^P (Y ) ; where H(P (Y )) is the differetial etropy of the radom vector Y based o its true PDF P (Y ). The relatio betwee the output differetial etropy ad the iput differetial etropy is i geeral [?], H(P (Y ))» H(P (X)) Z < P (x)log fi fififi det fi fififi dx; where P (x) is the probability desity fuctio of the radom vector X, for a arbitrary trasformatio, y f (x), of the radom vector X i <, with equality if f (x) is ivertible. (2) (3) Therefore the relative etropy ca be writte as R(P (Y ); ^P (Y )) H(P (X)) fi E P (X)»log fifi det i hlog ^P (Y ) fi fifi ; (4) for a ivertible map y f (x). The expectatio of a fuctio g(x) for a arbitrary oe-tooe map y f (x) ca be writte as [?], E P (X) [g(x)] E P (Y ) g(f 1 (y)) Λ ; (5) where f 1 (:) is the iverse map. Therefore R(P (Y ); ^P (Y )) H(P (X)) fi fififi»log det i hlog ^P (Y ) : fi fififi Equatio 6 proves the propositio. The propositio states that miimizig the relative etropy is equivalet to maximizig the likelihood i the origial feature space, but with the ew features are modeled by HMM istead of the origial features A Maximum Likelihood Approach A importat special case that reduces the problem to maximum likelihood estimatio (MLE) of the model ad map parameters is give i the followig lemma, but first we eed to defie volume-preservig maps i <, where is a arbitrary positive iteger. Defiitio: A C 1 map f : S x! S y where S x ρ < ad S y ρ < is said to be volume-preservig if ad oly if fi fidet fi fi 18x 2 Sx. Lemma: Let y f (x) be a arbitrary oe-to-oe C 1 volume-preservig map of the radom vector X i < to Y i <, ad let ^P Λ(y) be the estimated likelihood usig HMM. The map f Λ (:) ad the set of parameters Λ Λ joitly miimize the relative etropy betwee the hypothesized ad the true likelihoods of Y if ad oly if they also maximize the expected log likelihood based o the hypothesized PDF. Usig the defiitio of the volume-preservig maps, the proof of the lemma is straightforward. By reducig the problem to MLE problem, efficiet algorithms based o the icremetal EM algorithm ca be desiged [?] Geerality of The Approach Our approach geeralizes previous approaches to feature trasform for speech recogitio i two ways. First, trasforms ca be desiged to satisfy arbitrary costraits o the model, ot ecessarily those that impose a idepedece or decorrelatio costrait o the features. Secod, it ca also be applied to ay parameterized probabilistic model ot ecessarily Gaussia. Therefore, it ca be used to desig a sigle trasform of the observatios, if the whole HMM recogizer is take as our probabilistic model, ad it ca be used to desig statedepedet or phoeme-depedet trasforms, if the state or the (6)

3 phoeme probabilistic models i the recogizer are used respectively. To show the geerality of our approach ad its wide rage of applicatios, we relate it with previous methods. PCA may be viewed as a special case of the propositio uder two equivalet costraits. First, if the trasform is costraied to be liear ad the model PDF is costraied to be a diagoal-covariace Gaussia, the the propositioreduces to PCA. Equivaletly, if the true feature PDF is assumed to be Gaussia, ad the model PDF is costraied to be a diagoalcovariace Gaussia, the propositioreduces to PCA. ICA also ca be show as a special case of propositio whe the hypothesized model assumes statistical idepedece of the trasformed features ad the trasform is costraied to be liear. Noliear ICA removes the costrait that the trasform must be liear. Factor aalysis is also a special case of the propositio by assumig that the hypothesized joit PDF is Gaussia with special covariace structure. MLLT is a special case of the propositio by usig a liear volume-preservig map of the features ad assumig the hypothesized joit PDF is Gaussia or a mixture of Gaussis. The two assumptios of liearity ad Gaussiaity together are equivalet to the assumptio that the origial features are Gaussia. It should be oted that all liear maps desiged to improve the satisfactio of the features of a give model are special cases of the lemma, as ay liear map is equivalet to a liear volumepreservig map multiplied by a scalar. 3. Implemetatio of the Maximum Likelihood Approach I the previous sectio, we showed that by usig a volumepreservig map, the problem is reduced to maximizig the likelihood of the output compoets. I this sectio, we use a symplectic map to geerate the ew set of features Symplectic Maps Symplectic maps are volume-preservig maps that ca be represeted by scalar fuctios. This very iterestig result allows us to joitly optimize the parameters of the symplectic map ad the model parameters usig the EM algorithm or oe of its icremetal forms [?]. Let x (x 1;x 2), ad y (y 1;y 2), with x 1;x 2;y 1;y 2 2 < 2, the ay reflectig symplectic map ca be represeted by y 1 x (x2) 2 ; (7) y 2 x 1 ; (8) where V ( ) ad T ( ) are two arbitrary scalar fuctios [?]. We use two multi-layer feed-forward eural etworks to get a good approximatio of these scalar fuctios [?]. V (u; A; C) T (u; B; D) j1 j1 c js(a ju); (9) d js(b ju); (10) where S(:) is a oliear fuctio like sigmoid or hyperbolic taget, a j is the jth row of the M matrix A, ad c j is the jth elemet of the M 1 vector C, b j is the jth row of the M matrix B, ad d j is the jth elemet of the M 1 vector D. The parameters of these two eural etworks ad the parameters of the model are joitly optimized to maximize the likelihood of the traiig data Joit Optimizatio of The Map ad Model Parameters We will explai i this sectio, how the parameters of the volume-preservig map ad the probabilistic model ca be joitly optimized to maximize the likelihood of the estimated features. We will assume that the system is HMM-based recogizer [?]. However, this approach ca be applied to ay statistical classificatio, detectio, or recogitio systems. We will assume also that the scalar fuctios i the symplectic map are represeted by three-layer feed forward eural etworks (NN) with the oliearity i the NNs represeted by hyperbolic taget fuctios. The derivatio for ay other o-liear fuctio is a straightforward replicatio of the derivatio provided here. Usig the EM algorithm, the auxiliary fuctio [?] tobe maximized is Q(Φ k ; Φ k1 ) E ο [log P (y; jφ k1 )jy; Φ k ]; (11) where 2 ο is the state sequece correspodig to the sequece of observatios x 2 < T that are trasformed to the sequece y 2 < T, T is the sequece legth i frames, Φ k (Λ k ;W k ) is the set of the recogizer parameters ad the symplectic parameters at iteratio k of the algorithm. The updatig equatios for the HMM parameters are the same as metioed i [?], ad therefore will ot be give here. We will assume that the recogizer models the coditioal PDF of the observatio as a mixture of diagoal-covariace Gaussias ad therefore j NX KX i1 m1 P (y i ;mjφ k ) P (y i jφ k ) (μ mj yj i ) ; ff 2 mj (12) where μ mj, ad ffmj 2 are the mea ad the variace of the jth elemet of the mth PDF respectively. Startig with A ad B, to update the values of the symplectic parameters a qr ad b qr for q 1; 2; ;M; ad for r 1; 2; ; ; we have to calculate the partial derivative of 2 the auxiliary fuctio with respect to this parameters. These partial derivatives are related to the partial derivatives of the auxiliary fuctio with respect to the features by the followig relatio ad 1j j1 1j ; (13) j1 2j ; (14) j1

4 where 1j 8 >< >: ad 8 >< >: P Mh1 2x 2r ch a hj S(a h x 2)[1 S 2 (a h x 2)] for r 6 j P M 2x 2r h1 ch a hj S(a h x 2)[1 S 2 (a h x 2)] c q[1 S 2 (aqx 2)] for r j h1 2 X k1 (15) ; (16) dh b hj b hk S(b h y 1)[1 S 2 (b h y 1)] ; (17) P Mh1 2y 1r ch b hj S(b h y 1)[1 S 2 (b h x 2)] for r 6 j P M 2y 1r h1 ch b hj S(b h y 1)[1 S 2 (b h x 2)] d q[1 S 2 (bqy 1)] for r j (18) For C ad D, to updated values of the symplectic parameter c q ad d q for q 1; 2; ;M; we have to calculate the partial derivative of the auxiliary fuctio with respect to these parameters. These partial derivatives are related to the partial derivatives of the auxiliary fuctio with respect to the features by the followig relatio ad where j1 1j j1 2j j1 1j ; (19) ; (20) 1j a qj[1 S 2 (aqx 2)]; (21) 2 X k1 ; (22) ad b qj[1 2 S (bqy1)]: (23) Usig Equatios from 12 to 23, the values of the symplectic map parameters ca be updated i each iteratio usig ay gradiet-based optimizatio algorithm. 4. EXPERIMENTS AND RESULTS The symplectic maximum likelihood algorithm described i sectio 3 is used to study the optimal feature space for diagoalcovariace Gaussia mixture HMM modelig of the TIMIT database. We used the cojugate-gradiet algorithm to update the values of the symplectic map parameters i each iteratio. The Mel-frequecy Cepstrum Coefficiets are calculated for 4500 utteraces from the TIMIT database. The overall 26- feature vector cosists of 12 MFCC coefficiets, eergy ad their deltas. I each iteratio, the ew feature vector is calculated usig the curret symplectic trasformatio parameters by usig the symplectic mappig equatio, the the maximum likelihood estimates of the HMM model parameters are calculated. The, the maximum likelihood estimates of the symplectic map parameters are estimated usig the cojugate-gradiet algorithm.. After the iterative algorithm coverges to a set of locally optimal HMM ad symplectic parameters, the traiig data are trasformed by the symplectic map yieldig the fial symplectic maximum likelihodd traform (SMLT) feature vector. The ew features are compared to LDA, liear ICA, ad MLLT i their phoeme recogitio accuracy. I our experimets, the 61 phoemes defied i the TIMIT database are mapped to 48 phoeme labels for each frame of speech as described i [?]. These 48 phoemes are collapsed to 39 phoeme for testig purposes as i [?]. A three-state leftto-right model for each triphoe is traied usig the EM algorithm. The umber of mixtures per state was fixed to four. After traiig the overall system ad obtaiig the symplectic map parameters, the approximately idepedet output coefficiets of the symplectic map are used as the iput acoustic features to a Gaussia mixture hidde Markov model speech recogizer [?]. The parameters of the recogizer are traied usig the traiig portio of the TIMIT database. The parameters of the triphoe models are the tied together usig the same approach as i [?]. To compare the performace of the proposed algorithm with other approaches, we geerated acoustic features usig LDA, liear ICA, ad MLLT. We used the maximum likelihood approach to LDA [?] ad kept the dimesios of the output of LDA the same as the iput. We used also the maximum likelihood approach to liear ICA as described i [?] ad briefly overviewed i sectio 2. Fially we implemeted MLLT as described i [?] ad briefly overviewed i sectio 2. All these techiques used a feature vector that cosists of twelve MFCC coefficiets, the eergy, ad their deltas as their iput. Testig this recogizer, usig the test data i the TIMIT database, we get the phoeme recogitio results i table 1. These results are obtaied by usig a bigram phoeme laguage model ad by keepig the isertio error aroud 10% as i [?]. The table compares these recogitio results to the oes obtaied by MFCC, LDA, liear ICA ad MLLT. Table 1: Phoeme Recogitio Accuracy Acoustic Features Recogitio Accuracy MFCC 73.7% Liear ICA 73.5% LDA 73.8% MLLT 74.6% SMLT 75.5%

5 5. DISCUSSION I this work, we described a framework for feature trasformatio for speech recogitio. We itroduced a oliear symplectic maximum likelihood feature trasform algorithm. This ca be attributed to the ability of the algorithm to fid a better represetatio of the acoustic clues of differet phoemes. The improvemet due to this differet represetatio over the iput MFCC features that have the same amout of iformatio about phoemes, is due to the approximate idepedece property of the ew features that allow a more efficiet probabilistic modelig of the coditioal probabilities with the same model complexity. 6. ACKNOWLEDGMENT This work was supported by NSF award umber

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

STUDY OF ATTRACTOR VARIATION IN THE RECONSTRUCTED PHASE SPACE OF SPEECH SIGNALS

STUDY OF ATTRACTOR VARIATION IN THE RECONSTRUCTED PHASE SPACE OF SPEECH SIGNALS STUDY OF ATTRACTOR VARIATION IN THE RECONSTRUCTED PHASE SPACE OF SPEECH SIGNALS Jiji Ye Departmet of Electrical ad Computer Egieerig Milwaukee, WI USA jiji.ye@mu.edu Michael T. Johso Departmet of Electrical

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

A Unified Approach on Fast Training of Feedforward and Recurrent Networks Using EM Algorithm

A Unified Approach on Fast Training of Feedforward and Recurrent Networks Using EM Algorithm 2270 IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 46, O. 8, AUGUST 1998 [12] Q. T. Zhag, K. M. Wog, P. C. Yip, ad J. P. Reilly, Statistical aalysis of the performace of iformatio criteria i the detectio of

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

FFTs in Graphics and Vision. The Fast Fourier Transform

FFTs in Graphics and Vision. The Fast Fourier Transform FFTs i Graphics ad Visio The Fast Fourier Trasform 1 Outlie The FFT Algorithm Applicatios i 1D Multi-Dimesioal FFTs More Applicatios Real FFTs 2 Computatioal Complexity To compute the movig dot-product

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE Geeral e Image Coder Structure Motio Video (s 1,s 2,t) or (s 1,s 2 ) Natural Image Samplig A form of data compressio; usually lossless, but ca be lossy Redudacy Removal Lossless compressio: predictive

More information

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Session 5. (1) Principal component analysis and Karhunen-Loève transformation 200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course Sigal-EE Postal Correspodece Course 1 SAMPLE STUDY MATERIAL Electrical Egieerig EE / EEE Postal Correspodece Course GATE, IES & PSUs Sigal System Sigal-EE Postal Correspodece Course CONTENTS 1. SIGNAL

More information

Introduction to Signals and Systems, Part V: Lecture Summary

Introduction to Signals and Systems, Part V: Lecture Summary EEL33: Discrete-Time Sigals ad Systems Itroductio to Sigals ad Systems, Part V: Lecture Summary Itroductio to Sigals ad Systems, Part V: Lecture Summary So far we have oly looked at examples of o-recursive

More information

Filter banks. Separately, the lowpass and highpass filters are not invertible. removes the highest frequency 1/ 2and

Filter banks. Separately, the lowpass and highpass filters are not invertible. removes the highest frequency 1/ 2and Filter bas Separately, the lowpass ad highpass filters are ot ivertible T removes the highest frequecy / ad removes the lowest frequecy Together these filters separate the sigal ito low-frequecy ad high-frequecy

More information

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A. Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

CALCULATION OF FIBONACCI VECTORS

CALCULATION OF FIBONACCI VECTORS CALCULATION OF FIBONACCI VECTORS Stuart D. Aderso Departmet of Physics, Ithaca College 953 Daby Road, Ithaca NY 14850, USA email: saderso@ithaca.edu ad Dai Novak Departmet of Mathematics, Ithaca College

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Frequency Domain Filtering

Frequency Domain Filtering Frequecy Domai Filterig Raga Rodrigo October 19, 2010 Outlie Cotets 1 Itroductio 1 2 Fourier Represetatio of Fiite-Duratio Sequeces: The Discrete Fourier Trasform 1 3 The 2-D Discrete Fourier Trasform

More information

Olli Simula T / Chapter 1 3. Olli Simula T / Chapter 1 5

Olli Simula T / Chapter 1 3. Olli Simula T / Chapter 1 5 Sigals ad Systems Sigals ad Systems Sigals are variables that carry iformatio Systemstake sigals as iputs ad produce sigals as outputs The course deals with the passage of sigals through systems T-6.4

More information

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A) REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data

More information

Nonlinear regression

Nonlinear regression oliear regressio How to aalyse data? How to aalyse data? Plot! How to aalyse data? Plot! Huma brai is oe the most powerfull computatioall tools Works differetly tha a computer What if data have o liear

More information

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series. .3 Covergece Theorems of Fourier Series I this sectio, we preset the covergece of Fourier series. A ifiite sum is, by defiitio, a limit of partial sums, that is, a cos( kx) b si( kx) lim a cos( kx) b si(

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Chapter 2 Systems and Signals

Chapter 2 Systems and Signals Chapter 2 Systems ad Sigals 1 Itroductio Discrete-Time Sigals: Sequeces Discrete-Time Systems Properties of Liear Time-Ivariat Systems Liear Costat-Coefficiet Differece Equatios Frequecy-Domai Represetatio

More information

a for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a

a for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a Math E-2b Lecture #8 Notes This week is all about determiats. We ll discuss how to defie them, how to calculate them, lear the allimportat property kow as multiliearity, ad show that a square matrix A

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

6.003 Homework #3 Solutions

6.003 Homework #3 Solutions 6.00 Homework # Solutios Problems. Complex umbers a. Evaluate the real ad imagiary parts of j j. π/ Real part = Imagiary part = 0 e Euler s formula says that j = e jπ/, so jπ/ j π/ j j = e = e. Thus the

More information

Extended Weighted Linear Prediction Using the Autocorrelation Snapshot

Extended Weighted Linear Prediction Using the Autocorrelation Snapshot Exteded Weighted Liear Predictio Usig the Autocorrelatio Sapshot - A Robust Speech Aalysis Method ad its Applicatio to Recogitio of Vocal Emotios Joui Pohjalaie ad Paavo Alku Departmet of Sigal Processig

More information

a for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a

a for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a Math S-b Lecture # Notes This wee is all about determiats We ll discuss how to defie them, how to calculate them, lear the allimportat property ow as multiliearity, ad show that a square matrix A is ivertible

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

Discrete Orthogonal Moment Features Using Chebyshev Polynomials

Discrete Orthogonal Moment Features Using Chebyshev Polynomials Discrete Orthogoal Momet Features Usig Chebyshev Polyomials R. Mukuda, 1 S.H.Og ad P.A. Lee 3 1 Faculty of Iformatio Sciece ad Techology, Multimedia Uiversity 75450 Malacca, Malaysia. Istitute of Mathematical

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Orthogonal Gaussian Filters for Signal Processing

Orthogonal Gaussian Filters for Signal Processing Orthogoal Gaussia Filters for Sigal Processig Mark Mackezie ad Kiet Tieu Mechaical Egieerig Uiversity of Wollogog.S.W. Australia Abstract A Gaussia filter usig the Hermite orthoormal series of fuctios

More information

Frequency Response of FIR Filters

Frequency Response of FIR Filters EEL335: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we itroduce the idea of the frequecy respose of LTI systems, ad focus specifically o the frequecy respose of FIR filters.. Steady-state

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Data Assimilation. Alan O Neill University of Reading, UK

Data Assimilation. Alan O Neill University of Reading, UK Data Assimilatio Ala O Neill Uiversity of Readig, UK he Kalma Filter Kalma Filter (expesive Use model equatios to propagate B forward i time. B B(t Aalysis step as i OI Evolutio of Covariace Matrices (

More information

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems Cotets i latter part PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Liear Dyamical Systems What is differet from HMM? Kalma filter Its stregth ad limitatio Particle Filter Its simple

More information

CALCULATING FIBONACCI VECTORS

CALCULATING FIBONACCI VECTORS THE GENERALIZED BINET FORMULA FOR CALCULATING FIBONACCI VECTORS Stuart D Aderso Departmet of Physics, Ithaca College 953 Daby Road, Ithaca NY 14850, USA email: saderso@ithacaedu ad Dai Novak Departmet

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Optimum LMSE Discrete Transform

Optimum LMSE Discrete Transform Image Trasformatio Two-dimesioal image trasforms are extremely importat areas of study i image processig. The image output i the trasformed space may be aalyzed, iterpreted, ad further processed for implemetig

More information

CS322: Network Analysis. Problem Set 2 - Fall 2009

CS322: Network Analysis. Problem Set 2 - Fall 2009 Due October 9 009 i class CS3: Network Aalysis Problem Set - Fall 009 If you have ay questios regardig the problems set, sed a email to the course assistats: simlac@staford.edu ad peleato@staford.edu.

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution A Note o Box-Cox Quatile Regressio Estimatio of the Parameters of the Geeralized Pareto Distributio JM va Zyl Abstract: Makig use of the quatile equatio, Box-Cox regressio ad Laplace distributed disturbaces,

More information

DIGITAL FILTER ORDER REDUCTION

DIGITAL FILTER ORDER REDUCTION DIGITAL FILTER RDER REDUTI VAHID RAISSI DEHKRDI, McGILL UIVERSITY, AADA, vahid@cim.mcgill.ca AMIR G. AGHDAM, RDIA UIVERSITY, AADA, aghdam@ece.cocordia.ca ABSTRAT I this paper, a method is proposed to reduce

More information

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading : ME 537: Learig-Based Cotrol Week 1, Lecture 2 Neural Network Basics Aoucemets: HW 1 Due o 10/8 Data sets for HW 1 are olie Proect selectio 10/11 Suggested readig : NN survey paper (Zhag Chap 1, 2 ad Sectios

More information

OPTIMAL PIECEWISE UNIFORM VECTOR QUANTIZATION OF THE MEMORYLESS LAPLACIAN SOURCE

OPTIMAL PIECEWISE UNIFORM VECTOR QUANTIZATION OF THE MEMORYLESS LAPLACIAN SOURCE Joural of ELECTRICAL EGIEERIG, VOL. 56, O. 7-8, 2005, 200 204 OPTIMAL PIECEWISE UIFORM VECTOR QUATIZATIO OF THE MEMORYLESS LAPLACIA SOURCE Zora H. Perić Veljo Lj. Staović Alesadra Z. Jovaović Srdja M.

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

State Space Representation

State Space Representation Optimal Cotrol, Guidace ad Estimatio Lecture 2 Overview of SS Approach ad Matrix heory Prof. Radhakat Padhi Dept. of Aerospace Egieerig Idia Istitute of Sciece - Bagalore State Space Represetatio Prof.

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

, then cv V. Differential Equations Elements of Lineaer Algebra Name: Consider the differential equation. and y2 cos( kx)

, then cv V. Differential Equations Elements of Lineaer Algebra Name: Consider the differential equation. and y2 cos( kx) Cosider the differetial equatio y '' k y 0 has particular solutios y1 si( kx) ad y cos( kx) I geeral, ay liear combiatio of y1 ad y, cy 1 1 cy where c1, c is also a solutio to the equatio above The reaso

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies

More information

CHAPTER 4 BIVARIATE DISTRIBUTION EXTENSION

CHAPTER 4 BIVARIATE DISTRIBUTION EXTENSION CHAPTER 4 BIVARIATE DISTRIBUTION EXTENSION 4. Itroductio Numerous bivariate discrete distributios have bee defied ad studied (see Mardia, 97 ad Kocherlakota ad Kocherlakota, 99) based o various methods

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Lecture 2 October 11

Lecture 2 October 11 Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Inequalities for Entropies of Sets of Subsets of Random Variables

Inequalities for Entropies of Sets of Subsets of Random Variables Iequalities for Etropies of Sets of Subsets of Radom Variables Chao Tia AT&T Labs-Research Florham Par, NJ 0792, USA. tia@research.att.com Abstract Ha s iequality o the etropy rates of subsets of radom

More information

CS321. Numerical Analysis and Computing

CS321. Numerical Analysis and Computing CS Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 September 8 5 What is the Root May physical system ca

More information

Linear time invariant systems

Linear time invariant systems Liear time ivariat systems Alejadro Ribeiro Dept. of Electrical ad Systems Egieerig Uiversity of Pesylvaia aribeiro@seas.upe.edu http://www.seas.upe.edu/users/~aribeiro/ February 25, 2016 Sigal ad Iformatio

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Introduction to Optimization Techniques

Introduction to Optimization Techniques Itroductio to Optimizatio Techiques Basic Cocepts of Aalysis - Real Aalysis, Fuctioal Aalysis 1 Basic Cocepts of Aalysis Liear Vector Spaces Defiitio: A vector space X is a set of elemets called vectors

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation Chapter Output Aalysis for a Sigle Model Baks, Carso, Nelso & Nicol Discrete-Evet System Simulatio Error Estimatio If {,, } are ot statistically idepedet, the S / is a biased estimator of the true variace.

More information

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So, 0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical

More information

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS R775 Philips Res. Repts 26,414-423, 1971' THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS by H. W. HANNEMAN Abstract Usig the law of propagatio of errors, approximated

More information

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information