Expectation-Maximization Algorithm.

Size: px
Start display at page:

Download "Expectation-Maximization Algorithm."

Transcription

1 Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood Icomplete data Geeral EM K-meas 7 Algorithm Illustratio EM view EM for Mixtures 16 Geeral mixture EM for Mixtures GMM EM for GMM EM for HMM 37 HMM HMM learig Sufficiet statistics Baum-Welch Summary 42 Competecies

2 Maximum likelihood estimatio 2 / 43 Likelihood maximizatio Let s have a radom variable X with probability distributio p X (x θ). This emphasizes that the distributio is parameterized by θ Θ, i.e. the distributio comes from certai parametric family. Θ is the space of possible parameter values. Learig task: assume the parameters θ are ukow, but we have a i.i.d. traiig dataset T = {x 1,..., x } which ca be used to estimate the ukow parameters. The probability of observig dataset T give some parameter values θ is p(x θ) = p X (x j θ) def = L(θ; T). j=1 This probability ca be iterpretted as a degree with which the model parameters θ coform to the data T. It is thus called the likelihood of parameters θ w.r.t. data T. The optimal θ is obtaied by maximizig the likelihood θ = arg max θ Θ L(θ; T) = arg max θ Θ j=1 p X (x j θ) Sice arg max x f(x) = arg max x log f(x), we ofte maximize the log-likelihood l(θ; T) = log L(θ; T) θ = arg max l(θ; T) = arg max log θ Θ θ Θ p X (x j θ) = arg max log p X (x j θ), j=1 θ Θ j=1 which is ofte easier tha maximizatio of L. P. Pošík c 217 Artificial Itelligece 3 / 43 Icomplete data Assume we caot observe the objects completely: r.v. X describes the observable part, r.v. K describes the uobservable, hidde part. We assume there is a uderlyig distributio p XK (x, k θ) of objects (x, k). Learig task: we wat to estimate the model parameters θ, but the traiig set cotais i.i.d. samples for the observable part oly, i.e. T X = {x 1,..., x }. (Still, there also exists a hidde, uobservable dataset T K = {k 1,..., k }.) If we had a complete data (T X, T K ), we could directly optimize l(θ; T X, T K ) = log p(t X, T K θ). But we do ot have access to T K. If we would like to maximize l(θ; T X ) = log p(t X θ) = log T K p(t X, T K θ), the summatio iside log() results i complicated expressios, or we would have to use umerical methods. Our state of kowledge about T K is give by p(t K T X, θ). The complete-data likelihood L(θ; T X, T K ) = P(T X, T K θ) is a radom variable sice T K is ukow, radom, but govered by the uderlyig distributio. Istead of optimizig it directly, cosider its expected value uder the posterior distributio over latet variables (E-step), ad the maximize this expectatio (M-step). P. Pošík c 217 Artificial Itelligece 4 / 43 2

3 Expectatio-Maximizatio algorithm EM algorithm: A geeral method of fidig MLE of prob. dist. parameters from a give dataset whe data is icomplete (hidde variables, or missig values). Hidde variables: mixture models, Hidde Markov models,... It is a family of algorithms, or a recipe to derive a ML estimatio algorithm for various kids of probabilistic models. 1. Preted that you kow θ. (Use some iitial guess θ ().) Set iteratio couter i = E-step: Use the curret parameter values θ (i 1) to fid the posterior distributio of the latet variables P(T K T X, θ (i 1) ). Use this posterior distributio to fid the expectatio of the complete-data log-likelihood evaluated for some geeral parameter values θ: Q(θ, θ (i 1) ) = T K p(t K T X, θ (i 1) ) log p(t X, T K θ). 3. M-step: maximize the expectatio, i.e. compute a updated estimate of θ as θ (i) = arg max θ Θ Q(θ, θ(i 1) ). 4. Check for covergece: fiish, or advace the iteratio couter i = i+1, ad repeat from 2. P. Pošík c 217 Artificial Itelligece 5 / 43 EM algorithm features Pros: Amog the possible optimizatio methods, EM exploits the structure of the model. For p X K from expoetial family: M-step ca be doe aalytically ad there is a uique optimizer. The expected value i the E-step ca be expressed as a fuctio of θ without solvig it explicitly for each θ. p X (T X θ (i+1) ) p X (T X θ (i) ), i.e. the process fids a local optimum. Works well i practice. Cos: Not guarateed to get globally optimal estimate. MLE ca overfit; use MAP istead (EM ca be used as well). Covergece may be slow. P. Pošík c 217 Artificial Itelligece 6 / 43 3

4 K-meas 7 / 43 K-meas algorithm Clusterig is oe of the tasks of usupervised learig. K-meas algorithm for clusterig [Mac67]: K is the apriori give umber of clusters. Algorithm: 1. Choose K cetroids µ k (i almost ay way, but every cluster should have at least oe example.) 2. For all x, assig x to its closest µ k. 3. Compute the ew positio of cetroids µ k based o all examples x i, i I k, i cluster k. 4. If the positios of cetroids chaged, repeat from 2. Algorithm features: Algorithm miimizes the fuctio (itracluster variace): k j J = xi,j c j 2 j=1 i=1 (1) Algorithm is fast, but each time it ca coverge to a differet local optimum of J. [Mac67] J. B. MacQuee. Some methods for classificatio ad aalysis of multivariate observatios. I Proceedigs of 5-th Berkeley Symposium o Mathematical Statistics ad Probability, volume 1, pages , Berkeley, Uiversity of Califoria Press. P. Pošík c 217 Artificial Itelligece 8 / 43 Illustratio K meas clusterig: iteratio P. Pošík c 217 Artificial Itelligece 9 / 43 4

5 Illustratio K meas clusterig: iteratio P. Pošík c 217 Artificial Itelligece / 43 Illustratio K meas clusterig: iteratio P. Pošík c 217 Artificial Itelligece 11 / 43 5

6 Illustratio K meas clusterig: iteratio P. Pošík c 217 Artificial Itelligece 12 / 43 Illustratio K meas clusterig: iteratio P. Pošík c 217 Artificial Itelligece 13 / 43 6

7 Illustratio K meas clusterig: iteratio P. Pošík c 217 Artificial Itelligece 14 / 43 K-meas: EM view Assume: A object ca be i oe of the K states with equal probabilities. All p X K (x k) are isotropic Gaussias: p X K (x k) = N(x µ k, σi). Recogitio (Part of E-step): The task is to decide the state k for each x, assumig all µ k are kow. The Bayesia strategy (miimizes the probability of error) chooses the cluster which ceter is the closest to observatio x: q (x) = arg mi k K (x µ k) 2 If µ k, k K, are ot kow, it is a parametrized strategy q Θ (x), where Θ = (µ k ) K k=1. Decidig state k for each x assumig kow µ k is actually the computatio of a degeerate probability distributio p(t K T X, θ (i 1) ), i.e. the first part of E-step. Learig (The rest of E-step ad M-step): Fid the maximum-likelihood estimates of µ k based o kow (x 1, k 1 ),...,(x l, k l ): µ k = 1 I k i I k x i, where I k is a set of idices of traiig examples (curretly) belogig to state k. This completes the E-step ad implemets the M-step. P. Pošík c 217 Artificial Itelligece 15 / 43 7

8 EM for Mixture Models 16 / 43 Geeral mixture distributios Assume the data are samples from a distributio factorized as p XK (x, k) = p K (k)p X K (x k), i.e. p X (x) = p K (k)p X K (x k) k K ad that the distributio is kow (except the distributio parameters). Recogitio (Part of E-step): Let s defie the result of recogitio ot as a sigle decisio for some state k (as doe i K-meas), but rather as a set of posterior probabilities (sometimes called resposibilities) for all k give x i γ k (x i ) = p K X (k x i, θ (t) p X K (x i k)p K (k) ) = k K p X K (x i k)p K (k) that a object was i state k whe observatio x i was made. The γ k (x) fuctios ca be viewed as discrimiat fuctios. P. Pošík c 217 Artificial Itelligece 17 / 43 Geeral mixture distributios (cot.) Learig (The rest of E-step ad M-step): Give the traiig multiset T = (x i, k i ) i=1 (or the respective γ k(x i ) istead of k i ), assume γ k (x) is kow, p K (k) are ot kow, ad p X K (x k) are kow except the parameter values Θ k, i.e. we shall write p X K (x k, Θ k ). Let the object model m be a set of all ukow parameters m = (p K (k), Θ k ) k K. The log-likelihood of model m if we assume k i is kow: log L(m) = log i=1 p XK (x i, k i ) = log p K (k i )+ log p X K (x i k i, Θ ki ) i=1 i=1 The log-likelihood of model m if we assume a distributio (γ) over k is kow: log L(m) = i=1 k K γ k (x i ) log p K (k)+ γ k (x i ) log p X K (x i k, Θ k ) i=1 k K We search for the optimal model usig maximum likelihood: m = (p K (k), Θ k i.e. we compute ) = arg max log L(m) m p K (k) = 1 γ k (x i ) ad solve k idepedet tasks i=1 Θ k = arg max γ k (x i ) log p X K (x i k, Θ k ). Θ k i=1 P. Pošík c 217 Artificial Itelligece 18 / 43 8

9 EM for mixture distributio Usupervised learig algorithm [?] for geeral mixture distributios: 1. Iitialize the model parameters m = ((p K (k), Θ k ) k). 2. Perform the recogitio task, i.e. assumig m is kow, compute γ k (x i ) = ˆp K X (k x i ) = p K(k)p X K (x i k, Θ k ) j K p K (j)p X K (x i j, Θ j ). 3. Perform the learig task, i.e. assumig γ k (x i ) are kow, update the ML estimates of the model parameters p K (k) ad Θ k for all k: p K (k) = 1 γ k (x i ) i=1 Θ k = arg max γ k (x i ) log p X K (x i k, Θ k ) Θ k i=1 4. Iterate 2 ad 3 util the model stabilizes. Features: The algorithm does ot specify how to update Θ k i step 3, it depeds o the chose form of p X K. The model created i iteratio t is always at least as good as the model from iteratio t 1, i.e. L(m) = p(t m) icreases. [Mac67] J. B. MacQuee. Some methods for classificatio ad aalysis of multivariate observatios. I Proceedigs of 5-th Berkeley Symposium o Mathematical Statistics ad Probability, volume 1, pages , Berkeley, Uiversity of Califoria Press. P. Pošík c 217 Artificial Itelligece 19 / 43 Special Case: Gaussia Mixture Model Each kth compoet is a Gaussia distributio: 1 N(x µ k, Σ k ) = (2π) D 2 Σ k 1 2 Gaussia Mixture Model (GMM): exp{ 1 2 (x µ k) T Σ 1 k (x µ k )} K K p(x) = p K (k)p X K (x k, Θ k ) = α k N(x µ k, Σ k ) k=1 k=1 assumig K α k = 1 ad α k 1 k=1 5 x P. Pošík c 217 Artificial Itelligece 2 / 43 9

10 EM for GMM 1. Iitialize the model parameters m = ((p K (k), µ k, Σ k ) k). 2. Perform the recogitio task as i the geeral case, i.e. assumig m is kow, compute γ k (x i ) = ˆp K X (k x i ) = p K(k)p X K (x i k, Θ k ) j K p K (j)p X K (x i j, Θ j ) = α kn(x i µ k, Σ k ) j K α j N(x i µ j, Σ j ). 3. Perform the learig task, i.e. assumig γ k (x i ) are kow, update the ML estimates of the model parameters α k, µ k ad Σ k for all k: α k = p K (k) = 1 γ k (x i ) i=1 µ k = i=1 γ k(x i )x i i=1 γ k(x i ) Σ k = i=1 γ k(x i )(x i µ k )(x i µ k ) T i=1 γ k(x i ) 4. Iterate 2 ad 3 util the model stabilizes. Remarks: Each data poit belogs to all compoets to a certai degree γ k (x i ). The eq. for µ k is just a weighted average of x i s. The eq. for Σ k is just a weighted covariace matrix. P. Pošík c 217 Artificial Itelligece 21 / 43 Example: Source data Source data geerated from 3 Gaussias P. Pošík c 217 Artificial Itelligece 22 / 43

11 Example: Iput to EM algorithm 5 The data were give to the EM algorithm as a ulabeled dataset P. Pošík c 217 Artificial Itelligece 23 / 43 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 24 / 43 11

12 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 25 / 43 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 26 / 43 12

13 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 27 / 43 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 28 / 43 13

14 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 29 / 43 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 3 / 43 14

15 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 31 / 43 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 32 / 43 15

16 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 33 / 43 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 34 / 43 16

17 Example: EM Iteratios P. Pošík c 217 Artificial Itelligece 35 / 43 Example: Groud Truth ad EM Estimate The groud truth (left) ad the EM estimate (right) are very close because we have eough data, we kow the right umber of compoets, ad we were lucky that EM coverged to the right local optimum of the likelihood fuctio. P. Pošík c 217 Artificial Itelligece 36 / 43 17

18 Baum-Welch Algorithm: EM for HMM 37 / 43 Hidde Markov Model 1st order HMM is a geerative probabilistic model formed by a sequece of hidde variables X,..., X t, the domai of all of them is the set of states {s 1,..., s N }. a sequece of observed variables E 1,..., E t, the domai of all of them is the set of observatios {v 1,..., v M }. a iitial distributio over hidde states P(X ), a trasitio model P(X t X t 1 ), ad a emissio model P(E t X t ). Simulatig HMM: 1. Geerate a iitial state x accordig to P(X ). Set t Geerate a ew curret state x t accordig to P(X t x t 1 ). 3. Geerate a observatio e t accordig to P(E t x t ). 4. Advace time t t Fiish, or repeat from step 2. With HMM: efficiet algorithms exist for solvig iferece tasks; but we have o idea (so far) how to lear HMM parameters from the observatio sequece, because we do ot have access to the hidde states. P. Pošík c 217 Artificial Itelligece 38 / 43 Learig HMM from data Is it possible to lear HMM from data? No kow way to aalytically solve for the model which maximizes the probability of observatios. No optimal way of estimatig the model parameters from the observatio sequeces. We ca fid model parameters such that the probability of observatios is maximized Baum-Welch algorithm (a special case of EM). Let s use a slightly differet otatio to emphasize the model parameters: π = [π i ] = [P(X 1 = s i )]... vector of the iitial probabilities of states A = [a i,j ] = [P(X t = s j X t 1 = s i )]... the matrix of trasitio probabilities to ext state give the curret state B = [b i,k ] = [P(E t = v k X t = s i )]... the matrix of observatio probabilities give the curret state The whole set of HMM parameters is the θ = (π, A, B) The algorithm (preseted o the ext slides) will compute the expected umbers of beig i a state or takig a trasitio give the observatios ad the curret model parameters θ = (π, A, B), ad the compute the ew estimate of model parameters θ = (π, A, B ), such that P(e t 1 θ ) P(e t 1 θ). P. Pošík c 217 Artificial Itelligece 39 / 43 18

19 Sufficiet statistics Let s defie the probability of trasitio from state s i at time t to state s j at time t+1, give the model ad the observatio sequece e t 1 : ξ t (i, j) = P(X t = s i, X t+1 = s j e t 1, θ) = α t(s i )a ij b jk β t+1 (s j ) P(e t 1 θ) = = α t (s i )a ij b jk β t+1 (s j ) N i=1 N j=1 α t(s i )a ij b jk β t+1 (s j ), where α t ad β t are the forward ad backward messages computed by the forward-backward algorithm, ad the probability of beig i state s i at time t, give the model ad the observatio sequece: γ t (i) = The we ca iterpret N ξ t (i, j). j=1 T 1 γ k (i) as the expected umber of trasitios from state s i, ad k=1 T 1 ξ k (i, j) as the expected umber of trasitios from s i to s j. k=1 P. Pošík c 217 Artificial Itelligece 4 / 43 Baum-Welch algorithm The re-estimatio formulas are π i = expected frequecy of beig i state s i at time (t = 1) = = γ 1 (i) a ij = expected umber of trasitios from s i to s j expected umber of trasitios from s i = = T 1 k=1 ξ k(i, j) T 1 k=1 γ k(i) b jk = expected umber of times beig i state s j ad observig v k expected umber of times beig i state s j = = T t=1 I(e t = v k )γ t (j) T t=1 γ t(j) As with other EM variats, with the old model parameters θ = (π, A, B) ad ew, re-estimated parameters θ = (π, A, B ), the ew model is at least as likely as the old oe: P(e t 1 θ ) P(e t 1 θ) The above equatios are used iteratively with θ takig place of θ. P. Pošík c 217 Artificial Itelligece 41 / 43 19

20 Summary 42 / 43 Competecies After this lecture, a studet shall be able to... defie ad explai the task of maximum likelihood estimatio; explai why we ca maximize log-likelihood istead of likelihood, describe the advatages; describe the issues we face whe tryig to maximize the likelihood i case of icomplete data; explai the geeral high-level priciple of Expectatio-Maximizatio algorithm; describe the pros ad cos of the EM algorithm, especially what happes with the likelihood i oe EM iteratio; describe the EM algorithm for mixture distributios, icludig the otio of resposibilities; explai the Baum-Welch algorithm, i.e. the applicatio of EM to HMM; what parameters are leared ad how (coceptually). P. Pošík c 217 Artificial Itelligece 43 / 43 2

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Clustering: Mixture Models

Clustering: Mixture Models Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments LECTURE NOTES 9 Poit Estimatio Uder the hypothesis that the sample was geerated from some parametric statistical model, a atural way to uderstad the uderlyig populatio is by estimatig the parameters of

More information

Expectation maximization

Expectation maximization Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

15-780: Graduate Artificial Intelligence. Density estimation

15-780: Graduate Artificial Intelligence. Density estimation 5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

Unsupervised Learning 2001

Unsupervised Learning 2001 Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Application to Random Graphs

Application to Random Graphs A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let

More information

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems Cotets i latter part PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Liear Dyamical Systems What is differet from HMM? Kalma filter Its stregth ad limitatio Particle Filter Its simple

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

8 : Learning Partially Observed GM: the EM algorithm

8 : Learning Partially Observed GM: the EM algorithm 10-708: Probabilistic Graphical Models, Sprig 2015 8 : Learig Partially Observed GM: the EM algorithm Lecturer: Eric P. Xig Scribes: Auric Qiao, Hao Zhag, Big Liu 1 Itroductio Two fudametal questios i

More information

Lecture 2 February 8, 2016

Lecture 2 February 8, 2016 MIT 6.854/8.45: Advaced Algorithms Sprig 206 Prof. Akur Moitra Lecture 2 February 8, 206 Scribe: Calvi Huag, Lih V. Nguye I this lecture, we aalyze the problem of schedulig equal size tasks arrivig olie

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Markov Decision Processes

Markov Decision Processes Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

EE 6885 Statistical Pattern Recognition

EE 6885 Statistical Pattern Recognition EE 6885 Statistical Patter Recogitio Fall 5 Prof. Shih-Fu Chag http://www.ee.columbia.edu/~sfchag Lecture 6 (9/8/5 EE6887-Chag 6- Readig EM for Missig Features Textboo, DHS 3.9 Bayesia Parameter Estimatio

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Computing the maximum likelihood estimates: concentrated likelihood, EM-algorithm. Dmitry Pavlyuk

Computing the maximum likelihood estimates: concentrated likelihood, EM-algorithm. Dmitry Pavlyuk Computig the maximum likelihood estimates: cocetrated likelihood, EM-algorithm Dmitry Pavlyuk The Mathematical Semiar, Trasport ad Telecommuicatio Istitute, Riga, 13.05.2016 Presetatio outlie 1. Basics

More information

Three classification models Discriminant Model: learn the decision boundary directly and apply it to determine the class of each data point

Three classification models Discriminant Model: learn the decision boundary directly and apply it to determine the class of each data point Review of Last Wee Three classificatio models Discrimiat Model: lear the decisio boudary directly ad aly it to determie the class of each data oit Discrimiative Model: lear PY directly Geerative Model:

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018 CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical

More information

Lecture 12: November 13, 2018

Lecture 12: November 13, 2018 Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,

More information

Generalized Semi- Markov Processes (GSMP)

Generalized Semi- Markov Processes (GSMP) Geeralized Semi- Markov Processes (GSMP) Summary Some Defiitios Markov ad Semi-Markov Processes The Poisso Process Properties of the Poisso Process Iterarrival times Memoryless property ad the residual

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Approximations and more PMFs and PDFs

Approximations and more PMFs and PDFs Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

CS537. Numerical Analysis and Computing

CS537. Numerical Analysis and Computing CS57 Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 Jauary 9 9 What is the Root May physical system ca be

More information

Estimation of the Mean and the ACVF

Estimation of the Mean and the ACVF Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet

More information

Elementary manipulations of probabilities

Elementary manipulations of probabilities Elemetary maipulatios of probabilities Set probability of multi-valued r.v. {=Odd} = +3+5 = /6+/6+/6 = ½ X X,, X i j X i j Multi-variat distributio: Joit probability: X true true X X,, X X i j i j X X

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

11 Hidden Markov Models

11 Hidden Markov Models Hidde Markov Models Hidde Markov Models are a popular machie learig approach i bioiformatics. Machie learig algorithms are preseted with traiig data, which are used to derive importat isights about the

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2013 Aat Sahai Lecture 18 Iferece Oe of the major uses of probability is to provide a systematic framework to perform iferece uder ucertaity. A

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Lecture 2 October 11

Lecture 2 October 11 Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.

More information

Pixel Recurrent Neural Networks

Pixel Recurrent Neural Networks Pixel Recurret Neural Networks Aa ro va de Oord, Nal Kalchbreer, Koray Kavukcuoglu Google DeepMid August 2016 Preseter - Neha M Example problem (completig a image) Give the first half of the image, create

More information

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

More information