Bayes nets with tabular CPDs We have mostly focused on graphs where all latent nodes are discrete, and all CPDs/potentials are full tables.

Size: px
Start display at page:

Download "Bayes nets with tabular CPDs We have mostly focused on graphs where all latent nodes are discrete, and all CPDs/potentials are full tables."

Transcription

1 Lecture 7: Liear Gaussia Models Bayes ets with tabular CPDs We have mostly focused o graphs where all latet odes are discrete, ad all CPDs/potetials are full tables. x X x 2 x X 2 x 4 x 2 X 4 x 6 X 6 x 5 x 2 Kevi Murphy 7 November 24 x 3 x X 3 X 5 x 5 x 3 Mixtures of Gaussias We have also cosidered discrete latet variables with cotiuous observed variables. e.g., i a mixture of Gaussias: P(Z = i) = θ i p(x = x Z = i) = N(x;µ i, Σ i ) This ca be used for classificatio (supervised) ad clusterig/ vector quatizatio (usupervised). Z Liear Gaussia model We ow cosider the case where all CPDs are liear-gaussia: p(z = z) = N(z;µ z, Σ z ) p(x = x Z = z) = N(x;µ x + Λz, Σ) i.e., child = liear fuctio of paret plus gaussia oise X = ΛZ + oise, oise N(µ x, Σ x ) For the supervised case, this is just liear regressio: Z X X

2 Factor aalysis (Jorda ch 4) Usupervised liear regressio is called factor aalysis. p(x) = N(x;,I) p(y x) = N(y;µ + Λx, Ψ) where Λ is the factor loadig matrix ad Ψ is diagoal. Z X y 3 µ y y 2 To geerate data, first geerate a poit withi the maifold the add oise. Coordiates of poit are compoets of latet variable. λ λ2 Review: Meas, Variaces ad Covariaces Remember the defiitio of the mea ad covariace of a vector radom variable: E[x] = xp(x)dx = m x Cov[x] = E[(x m)(x m) ] = (x m)(x m) p(x)dx = V which is the expected value of the outer product of the variable with itself, after subtractig the mea. Also, the covariace betwee two variables: Cov[x,y] = E[(x m x )(y m y ) ] = C = (x m x )(y m y ) p(x,y)dxdy = C xy which is the expected value of the outer product of oe variable with aother, after subtractig their meas. Note: C is ot symmetric. x Margial Data Distributio Sice a margial Gaussia times a coditioal Gaussia is a joit Gaussia, we ca compute the margial desity p(y θ) by itegratig out x. p(y θ) = p(x)p(y x,θ)dx = N(y µ, ΛΛ +Ψ) x which ca be doe by completig the square i the expoet. However, sice the margial is Gaussia, we ca also just compute its mea ad covariace. (Assume oise ucorrelated with data.) E[y] = E[µ + Λx + oise] = µ + ΛE[x] + E[oise] = µ + Λ + = µ Cov[y] = E[(y µ)(y µ) ] = E[(µ + Λx + oise µ)(µ + Λx + oise µ) ] = E[(Λx + )(Λx + ) ] = ΛE(xx )Λ + E( ) = ΛΛ + Ψ FA = Costraied Covariace Gaussia Margial desity for factor aalysis (y is p-dim, x is k-dim): p(y θ) = N(y µ, ΛΛ +Ψ) So the effective covariace is the low-rak outer product of two log skiy matrices plus a diagoal matrix: Cov[y] Λ I other words, factor aalysis is just a costraied Gaussia model. (If Ψ were ot diagoal the we could model ay Gaussia ad it would be poitless.) Λ T Ψ

3 Probabilistic Pricipal Compoet Aalysis (PPCA) I Factor Aalysis, we ca write the margial desity explicitly: p(y θ) = p(x)p(y x,θ)dx = N(y µ, ΛΛ +Ψ) x Noise Ψ mut be restricted for model to be iterestig. (Why?) I Factor Aalysis the restrictio is that Ψ is diagoal (axis-aliged). What if we further restrict Ψ = σ 2 I (ie spherical)? We get the Probabilistic Pricipal Compoet Aalysis (PPCA) model: ywhere µ is the mea vector, colums of Λ are the pricipal compoets (usually orthogoal), ad σ 2 is the global sesor oise. PCA: Zero Noise Limit The traditioal PCA model is actually a limit as σ 2. Actually PCA is the limit as the ratio of the oise variace o the output to the prior variace o the latet variables goes to zero. We ca either achieve this with zero oise or with ifiite variace priors. (P)PCA is very useful for dimesioality reductio, e.g., for visualizig high-dimesioal data. Differece betwee FA ad PPCA Recall the ituitio that Gaussias are hyperellipsoids, where Mea = cetre of football, Eigevectors of covariace matrix = axes of football, Eigevalues = legths of axes FA Ψ I FA our football is a axis aliged cigar. I PPCA our football is a sphere of radius σ 2. PCA I FA, the variace of the oise is idepedet for each dimesio; i PPCA, the oise is assumed to be the same. PCA looks for directios of large variace, Coversely, FA looks for directios of large correlatio. ει Model Ivariace ad Idetfiability There is degeeracy i the FA model. Sice Λ oly appears as outer product ΛΛ, the model is ivariat to rotatio ad axis flips of the latet space. We ca replace Λ with ΛQ for ay uitary matrix Q ad the model remais the same: (ΛQ)(ΛQ) = Λ(QQ )Λ = ΛΛ. This meas that there is o oe best settig of the parameters. A ifiite umber of parameters all give the ML score! Such models are called u-idetifiable sice two people both fittig ML params to the idetical data will ot be guarateed to idetify the same parameters.

4 Latet Covariace i Factor Aalysis ad PCA What if we allow the latet variable x to have a covariace matrix of its ow: p(x) = N(x,P)? We ca still compute the margial probability: p(y θ) = p(x)p(y x,θ)dx = N(y µ, ΛPΛ +Ψ) x We ca always absorb P ito the loadig matrix Λ by diagoalizig it: P = EDE ad settig Λ = ΛED /2. Thus, there is aother degeeracy i FA, betwee P ad Λ: we ca set P to be the idetity, to be diagoal, whatever we wat. Traditioally we break this degeeracy by either: set the covariace P of the latet variable to be I (FA) or force the colums of Λ to be orthoormal (PCA) Model FA joit distributio p(x) = N(x,I) p(y x,θ) = N(y µ + Λx, Ψ) Hece the joit distribtio of x ad y: [ ] [ ] [ ] [ ] x x I Λ p( ) = N(, y y µ Λ ΛΛ ) + Ψ where the corer elemets Λ, Λ come from Cov[x,y]: Cov[x,y] = E[(x )(y µ) ] = E[x(µ + Λx + oise µ) ] = E[x(Λx + oise) ] = Λ Assume oise is ucorrelated with data or latet variables. Review: Gaussia Coditioig (Jorda ch 3) Remember the formulas [ ] for coditioal [ ] [ ] Gaussia [ distributios: ] x x µ Σ Σ p( ) = N(, 2 ) x 2 x 2 µ 2 Σ 2 Σ 22 p(x x 2 ) = N(x m 2,V 2 ) m 2 = µ + Σ 2 Σ 22 (x 2 µ 2 ) V 2 = Σ Σ 2 Σ 22 Σ 2 Iferece i Factor Aalysis Apply the Gaussia coditioig formulas to the joit distributio we derived above, where gives Σ = I Σ 2 = Λ T Σ 22 = (ΛΛT + Ψ) V 2 = Σ Σ 2 Σ 22 Σ 2 = I Λ (ΛΛ + Ψ) Λ m 2 = m + Σ 2 Σ 22 (x 2 m 2 ) = Λ (ΛΛ + Ψ) (y µ)

5 Iferece i Factor Aalysis The previous formula requires ivertig a matrix of size y y. p(x y) = N(x m,v) V = I Λ (ΛΛ + Ψ) Λ m = Λ (ΛΛ + Ψ) (y µ) We ow use a good trick for ivertig matrices whe they ca be decomposed ito the sum of a easily iverted matrix (D) ad a low rak outer product. It is called the matrix iversio lemma. (D AB A ) = D + D A(B A D A) A D Apply the matrix iversio lemma we get: p(x y) = N(x m,v) V = (I + Λ Ψ Λ) m = VΛ Ψ (y µ) which iverts a matrix of size x x. The posterior is Iferece is Liear projectio p(x y) = N(x m,v) V = I Λ (ΛΛ + Ψ) Λ = (I + Λ Ψ Λ) m = Λ (ΛΛ + Ψ) (y µ) = VΛ Ψ (y µ) Posterior covariace does ot deped o observed data y! (Ca be precomputed.) Also, computig the posterior mea is just a liear operatio: m = β(y µ) where β (hat matrix) ca be precomputed. y 3 µ y y 2 y Icomplete data log likelihood Fuctio Usig the margial desity of p(y): l(θ; D) = N 2 log ΛΛ + Ψ (y µ) (ΛΛ + Ψ) (y µ) 2 [ = N 2 log V 2 trace V ] (y µ)(y µ) = N 2 log V ] [V 2 trace S V is model covariace; S is sample data covariace. I other words, we are tryig to make the costraied model covariace as close as possible to the observed covariace, where close meas the trace of the ratio. Thus, the sufficiet statistics are the same as for the Gaussia: mea y ad covariace (y µ)(y µ). EM for Factor Aalysis Parameters are coupled oliearly i log-likelihood: l(θ; D) = N 2 log ΛΛ + Ψ ] [(ΛΛ 2 trace + Ψ) S We could use cojugate gradiet methods. But EM is simpler. For E-step we do iferece (liear projectio) For M-step we do liear regressio o the expected values. More precisely, we maximize the expected complete log likelihood. E step : q t+ = p(x y,θ t ) = N(x m,v ) M step : Λ t+ = argmax Λ l c (x,y ) q t+ Ψ t+ = argmax Ψ l c (x,y ) q t+

6 Complete Data Likelihood We kow the optimal µ is the data mea. Assume the mea has bee subtracted off y from ow o. The complete likelihood (igorig mea): l c (Λ, Ψ) = log p(x,y ) = log p(x ) + log p(y x ) = N 2 log Ψ x x (y Λx ) Ψ (y Λx ) 2 2 = N 2 log Ψ N 2 trace[sψ ] S = (y Λx )(y Λx ) N M-step: Optimize Parameters (Jorda ch 3) Take the derivates of the complete log likelihood wrt. parameters. We will eed thse idetities. A log A = (A ) A trace[b A] = B A trace[ba CA] = 2CAB Hece l c (Λ, Ψ)/ Λ = Ψ y x + Ψ Λ l c (Λ, Ψ)/ Ψ = +(N/2)Ψ (N/2)S x x M-step: Optimize Parameters Derivatives from previous slide: M-step: Optimize Parameters Take the expectatio of gradiet with respect to q t from E-step: l c (Λ, Ψ)/ Λ = Ψ y x + Ψ Λ x x < l Λ > = Ψ y m + Ψ Λ V l c (Λ, Ψ)/ Ψ = +(N/2)Ψ (N/2)S Take the expectatio of gradiet with respect to q t from E-step: < l Λ > = Ψ y m + Ψ Λ V < l Ψ > = +(N/2)Ψ (N/2) < S > where m = E[X y ] V = V ar(x y ) + E(X y )E(X y ) < S > = y y y m N Λ Λm y + ΛV Λ < l Ψ > = +(N/2)Ψ (N/2) < S > Fially, set the derivatives to zero to solve for optimal parameters: ( ) ( ) Λ t+ = y m V [ Ψ t+ = N diag y y Λ t+ m y ]

7 Fial Algorithm: EM for Factor Aalysis First, set µ equal to the sample mea (/N) y, ad subtract this mea from all the data. Now ru the followig iteratios: E step : q t+ = p(x y,θ t ) = N(x m,v ) V = (I + Λ Ψ Λ) m = V Λ Ψ (y µ) ( )( M step : Λ t+ = y m Ψ t+ = N diag [ V ) y y Λ t+ m y ] Learig for PPCA For FA the parameters are coupled i a way that makes it impossible to solve for the ML params directly. We must use EM or other oliear optimizatio techiques. But for (P)PCA, the ML params ca be solved for directly: The k th colum of Λ is the k th largest eigevalue of the sample covariace S times the associated eigevector. The global sesor oise σ 2 is the sum of all the eigevalues smaller tha the k th oe. This techique is good for iitializig FA also. Note: EM ca be used as a fast way to compute eigevectors! Idepedet Compoets Aalysis (ICA) ICA is similar to FA, except it assumes the latet source has o-gaussia desity. Hece ICA ca extract higher order momets (ot just secod order). It is commoly used to solve blid source separatio (cocktail party problem). Idepedet Factor Aalysis (IFA) is a approximatio to ICA where we model the source usig a mixture of Gaussias. Hidde Markov Models You ca thik of a HMM as: A Markov chai with stochastic measuremets. x y x 2 y 2 x 3 y 3 or a mixture of Gaussias, with latet variables chagig over time (lily pad model) x T y T x x 2 x 3 x T y y 2 y 3 y T FA FA Mixture of FA IFA

8 State space models (SSMs) A SSM is like a HMM, except the hidde state is cotiuous valued. I geeral, we ca write x t = f(x t ) + w t y t = g(x t ) + v t where f is the dyamical (process) model, g is the observatio model. w t is the process oise, embedded iside P(X t X t ). v t is the observatio oise embedded iside P(Y t X t ). For a cotrolled Markov chai, we coditio all quatities o the observed iput sigals u t. Liear dyamic systems (LDS) (Jorda ch 5) A LDS is a special case of a SSM where f ad g are liear fuctios ad the oise terms are Gaussia. x t = Ax t + w t y t = Bx t + v t w t N(,Q) v t N(,R) A LDS is like factor aalysis through time. Kalma filterig ad smoothig Olie vs offlie iferece The Kalma filter is a way to perform exact olie iferece (sequetial Bayesia updatig) i a LDS. It is the Gaussia aalog of the forwards algorithm for HMMs: P(X t = i y :t ) = α t (i) p(y t X t = i) j P(X t = i X t = j)α t (j) The Rauch-Tug-Strievel smoother is a way to perform exact offlie iferece i a LDS. It is the Gaussia aalog of the forwardsbackwards algorithm: P(X t = i y :T ) α t (i)β t (i)

9 LDS for 2D trackig Dyamics: ew positio = old positio + * velocity + oise (costat velocity model, Gaussia acceleratio) x t x x 2 t t ẋ t = x 2 t 2 ẋ t 3 + oise ẋ 2 t ẋ 2 t 4 Observatio: project out first two compoets (we observe Cartesia positio of object - liear!) ( ) y t yt 2 = ( ) x t x 2 t ẋ t ẋ 2 t + oise Y D filterig true observed smoothed X (a) 2D trackig Y D smoothig true observed smoothed X (b) Kalma filterig i the brai? r(t-) The World Hidde State ("Iteral" to the World) The World V m Temporal Dyamics of Hidde State r (mappig fuctio) r(t) U Visible State ("Exteral") Coversio to Visible State (Image I ) (a) - The Estimator Model Parameters Hidde State (iverse ("Iteral" to mappig) the Estimator) I(t) I(t) U r(t) V r(t-) (b) Visible State (Image I ) The Estimator (Iteral Model of the World) Predicted Kalma Filter Estimated Dyamics of Hidde State r Kalma filterig derivatio Sice all CPDs are liear Gaussia, the system defies a large multivariate Gaussia. Hece all margials are Gaussia. Hece we ca represet the belief state P(X t y :t ) as a Gaussia with mea ˆx t t ad covariace P t t. It is commo to work with the iverse covariace (precisio) matrix ; this is called iformatio form. P t t Kalma filterig is a recursive procedure to update the belief state: Predict step: compute P(X t+ y :t ) from prior belief P(X t y :t ) ad dyamical model P(X t+ X t ). Update step: compute ew belief P(X t+ y :t+ ) from predictio P(X t+ y :t ), observatio y t+ ad observatio model P(y t X t ).

10 Predict step Dyamical Model: x t+ = Ax t + Gw t, w t N(,Q). Oe step ahead predictio of state: ˆx t+ t = E[X t+ y :t ] = Aˆx t t P t+ t = E[(X t+ ˆx t+ t )(X t+ ˆx t+ t ) T y :t ] = AP t t A T + GQG T Observatio Model: y t = Cy t + v t, v t N(,R). Oe step ahead predictio of observatio: E[Y t+ y :t ] = E[CX t+ + v t+ y :t ] = C ˆx t+ t E[(Y t+ ŷ t+ t )(Y t+ ŷ t+ t ) T y :t ] = CP t+ t C T + R E[(Y t+ ŷ t+ t )(X t+ ˆx t+ t ) T y :t ] = CP t+ t Update step Summarizig results from previous slide, we have P(X t+,y t+ y :t ) N(m t+,v t+ ), where ( ) ( ˆxt+ t Pt+ t P m t+ =, V C ˆx t+ = t+ t C T ) t+ t CP t+ t CP t+ t C T + R Remember the formulas [ ] for coditioal [ ] [ ] Gaussia [ distributios: ] x x µ Σ Σ p( ) = N(, 2 ) x 2 x 2 µ 2 Σ 2 Σ 22 p(x x 2 ) = N(x m 2,V 2 ) m 2 = µ + Σ 2 Σ 22 (x 2 µ 2 ) V 2 = Σ Σ 2 Σ 22 Σ 2 Kalma filter equatios Usig the coditioig formulas, we get: ˆx t+ t = Aˆx t t P t+ t = AP t t A T + GQG T ˆx t+ t+ = ˆx t+ t + K t+ (y t+ C ˆx t+ t ) P t+ t+ = P t+ t K t+ CP t+ t where K t+ is the Kalma gai matrix K t+ = P t+ t C T (CP t+ t C T + R) K t ca be precomputed (sice it is idepedet of the data). This rapidly coverges to a costat matrix (Ricatti equatio). Example of KF i D (Russell & Norvig 2e p554) Cosider oisy observatios of a D particle doig a radom walk: x t = x t + N(,σ x ), y t = x t + N(,σ y ) Hece P t+ t = AP t t A T + GQG T = σ t + σ x K t+ = P t+ t C T (CP t+ t C T + R) = (σ t + σ x )(σ t + σ x + σ z ) ˆx t+ t+ = ˆx t+ t + K t+ (y t+ C ˆx t+ t ) = (σ t + σ x )z t + σ z µ t σ t + σ x + σ z P t+ t+ = P t+ t K t+ CP t+ t = (σ t + σ x )σ z σ t + σ x + σ z P(x) P(x ) P(x ) P(x z = 2.5) *z x positio

11 The KF update of the mea is KF: ituitio ˆx t+ t+ = Aˆx t t + K t+ (y t+ CAˆx t t ) = (σ t + σ x )z t + σ z µ t σ t + σ x + σ z The term (y t+ C ˆx t+ t ) is called the iovatio. New belief is covex combiatio of updates from prior ad observatio, weighted by Kalma Gai matrix: K t+ = P t+ t C T (CP t+ t C T + R) If the observatio is ureliable, σ z is large so K t+ is small, so we pay more attetio to the predictio. If the old prior is ureliable (large σ t ) or the process is very upredictable (large σ x ), we pay more attetio to the observatio. The KF update of the mea is KF, RLS ad LMS ˆx t+ t+ = Aˆx t t + K t+ (y t+ CAˆx t t ) Cosider the special case where the hidde state is a costat, x t = θ, but the observatio matrix C is a time-varyig vector, C = x T t. Hece y t = x T t θ + v t as i liear regressio. We ca estimate θ recursively usig the Kalma filter: ˆθ t+ = ˆθ t + P t+ R (y t+ x T t ˆθ t )x t This is called the recursive least squares (RLS) algorithm. We ca approximate P t+ R η t+ by a scalar costat. This is called the least mea squares (LMS) algorithm. We ca adapt η t olie usig stochastic approximatio theory.

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Dimensionality Reduction vs. Clustering

Dimensionality Reduction vs. Clustering Dimesioality Reductio vs. Clusterig Lecture 9: Cotiuous Latet Variable Models Sam Roweis Traiig such factor models (e.g. FA, PCA, ICA) is called dimesioality reductio. You ca thik of this as (o)liear regressio

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Unsupervised Learning 2001

Unsupervised Learning 2001 Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems Cotets i latter part PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Liear Dyamical Systems What is differet from HMM? Kalma filter Its stregth ad limitatio Particle Filter Its simple

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)]. Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

5 : Exponential Family and Generalized Linear Models

5 : Exponential Family and Generalized Linear Models 0-708: Probabilistic Graphical Models 0-708, Sprig 206 5 : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

TAMS24: Notations and Formulas

TAMS24: Notations and Formulas TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

State Space Representation

State Space Representation Optimal Cotrol, Guidace ad Estimatio Lecture 2 Overview of SS Approach ad Matrix heory Prof. Radhakat Padhi Dept. of Aerospace Egieerig Idia Istitute of Sciece - Bagalore State Space Represetatio Prof.

More information

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A) " p(a)

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A)  p(a) Outlie L7: Probability Basics CS 344R/393R: Robotics Bejami Kuipers. Bayes Law 2. Probability distributios 3. Decisios uder ucertaity Probability For a propositio A, the probability p(a is your degree

More information

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n. CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model

More information

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Session 5. (1) Principal component analysis and Karhunen-Loève transformation 200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image

More information

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010 A Note o Effi ciet Coditioal Simulatio of Gaussia Distributios A D D C S S, U B C, V, BC, C April 2010 A Cosider a multivariate Gaussia radom vector which ca be partitioed ito observed ad uobserved compoetswe

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Notes The Incremental Motion Model:

Notes The Incremental Motion Model: The Icremetal Motio Model: The Jacobia Matrix I the forward kiematics model, we saw that it was possible to relate joit agles θ, to the cofiguratio of the robot ed effector T I this sectio, we will see

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

Signal Processing in Mechatronics

Signal Processing in Mechatronics Sigal Processig i Mechatroics Zhu K.P. AIS, UM. Lecture, Brief itroductio to Sigals ad Systems, Review of Liear Algebra ad Sigal Processig Related Mathematics . Brief Itroductio to Sigals What is sigal

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

A Simplified Derivation of Scalar Kalman Filter using Bayesian Probability Theory

A Simplified Derivation of Scalar Kalman Filter using Bayesian Probability Theory 6th Iteratioal Workshop o Aalysis of Dyamic Measuremets Jue -3 0 Göteorg Swede A Simplified Derivatio of Scalar Kalma Filter usig Bayesia Proaility Theory Gregory Kyriazis Imetro - Electrical Metrology

More information

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001.

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001. Physics 324, Fall 2002 Dirac Notatio These otes were produced by David Kapla for Phys. 324 i Autum 2001. 1 Vectors 1.1 Ier product Recall from liear algebra: we ca represet a vector V as a colum vector;

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter & Teachig Material.

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t = Mathematics Summer Wilso Fial Exam August 8, ANSWERS Problem 1 (a) Fid the solutio to y +x y = e x x that satisfies y() = 5 : This is already i the form we used for a first order liear differetial equatio,

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let.

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let. Joh Riley 8 Jue 03 ANSWERS TO EXERCISES IN APPENDIX B Sectio B VECTORS AND SETS Exercise B-: Covex sets (a) Let 0 x, x X, X, hece 0 x, x X ad 0 x, x X Sice X ad X are covex, x X ad x X The x X X, which

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter 2 & Teachig

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

of the matrix is =-85, so it is not positive definite. Thus, the first

of the matrix is =-85, so it is not positive definite. Thus, the first BOSTON COLLEGE Departmet of Ecoomics EC771: Ecoometrics Sprig 4 Prof. Baum, Ms. Uysal Solutio Key for Problem Set 1 1. Are the followig quadratic forms positive for all values of x? (a) y = x 1 8x 1 x

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13 BHW # /5 ENGR Probabilistic Aalysis Beautiful Homework # Three differet roads feed ito a particular freeway etrace. Suppose that durig a fixed time period, the umber of cars comig from each road oto the

More information

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object 6.3 Stochastic Estimatio ad Cotrol, Fall 004 Lecture 7 Last time: Momets of the Poisso distributio from its geeratig fuctio. Gs () e dg µ e ds dg µ ( s) µ ( s) µ ( s) µ e ds dg X µ ds X s dg dg + ds ds

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Eco 35: Itroductio to Empirical Ecoomics Lecture 3 Discrete Radom Variables ad Probability Distributios Copyright 010 Pearso Educatio, Ic. Publishig as Pretice Hall Ch. 4-1 4.1 Itroductio to Probability

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition 6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Math 525: Lecture 5. January 18, 2018

Math 525: Lecture 5. January 18, 2018 Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Recursive Updating Fixed Parameter

Recursive Updating Fixed Parameter Recursive Udatig Fixed Parameter So far we ve cosidered the sceario where we collect a buch of data ad the use that (ad the rior PDF) to comute the coditioal PDF from which we ca the get the MAP, or ay

More information

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman: Math 224 Fall 2017 Homework 4 Drew Armstrog Problems from 9th editio of Probability ad Statistical Iferece by Hogg, Tais ad Zimmerma: Sectio 2.3, Exercises 16(a,d),18. Sectio 2.4, Exercises 13, 14. Sectio

More information

Notes 27 : Brownian motion: path properties

Notes 27 : Brownian motion: path properties Notes 27 : Browia motio: path properties Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces:[Dur10, Sectio 8.1], [MP10, Sectio 1.1, 1.2, 1.3]. Recall: DEF 27.1 (Covariace) Let X = (X

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Bayesian networks are graphical models that characterize how variables are independent of each other.

Bayesian networks are graphical models that characterize how variables are independent of each other. Ali Tomescu, http://people.csail.mit.edu/~aliush 6.867 Machie learig Prof. Tommi Jaakkola Week 12, Tuesday, November 19th, 2013 Lecture 21 Lecture 21: Hidde Markov Models Fial exam: Eveig of December 10

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Some Basic Probability Cocepts 2. Experimets, Outcomes ad Radom Variables A radom variable is a variable whose value is ukow util it is observed. The value of a radom variable results from a experimet;

More information

Chapter 2 Systems and Signals

Chapter 2 Systems and Signals Chapter 2 Systems ad Sigals 1 Itroductio Discrete-Time Sigals: Sequeces Discrete-Time Systems Properties of Liear Time-Ivariat Systems Liear Costat-Coefficiet Differece Equatios Frequecy-Domai Represetatio

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information