16 EXPECTATION MAXIMIZATION
|
|
- Morris Warner
- 6 years ago
- Views:
Transcription
1 16 EXPECTATION MAXIMIZATION A he is oly a egg s way of akig aother egg. Sauel Butler Suppose you were buildig a aive Bayes odel for a text categorizatio proble. After you were doe, your boss told you that it becae prohibitively expesive to obtai labeled data. You ow have a probabilistic odel that assues access to labels, but you do t have ay labels! Ca you still do soethig? Aazigly, you ca. You ca treat the labels as hidde variables, ad attept to lear the at the sae tie as you lear the paraeters of your odel. A very broad faily of algoriths for solvig probles just like this is the expectatio axiizatio faily. I this chapter, you will derive expectatio axiizatio (EM) algoriths for clusterig ad diesioality reductio, ad the see why EM works. Learig Objectives: Explai the relatioship betwee paraeters ad hidde variables. Costruct geerative stories for clusterig ad diesioality reductio. Draw a graph explaiig how EM works by costructig covex lower bouds. Ipleet EM for clusterig with ixtures of Gaussias, ad cotrastig it with k-eas. Evaluate the differeces betwee EM ad gradiet descet for hidde variable odels. Depedecies: 16.1 Gradig a Exa without a Aswer Key Alice s achie learig professor Carlos gives out a exa that cosists of 50 true/false questios. Alice s class of 100 studets takes the exa ad Carlos goes to grade their solutios. If Carlos ade a aswer key, this would be easy: he would just cout the fractio of correctly aswered questios each studet got, ad that would be their score. But, like ay professors, Carlos was really busy ad did t have tie to ake a aswer key. Ca he still grade the exa? There are two isights that suggest that he ight be able to. Suppose he kow ahead of tie that Alice was a awesoe studet, ad is basically guarateed to get 100% o the exa. I that case, Carlos ca siply use Alice s aswers as the groud truth. More geerally, if Carlos assues that o average studets are better tha rado guessig, he ca hope that the ajority aswer for each questio is likely to be correct. Cobiig this with the previous isight, whe doig the votig, he ight wat to pay ore attetio to the aswers of the better studets. To be a bit ore pedatic, suppose there are N = 100 studets ad M = 50 questios. Each studet has a score s, betwee 0 ad
2 expectatio axiizatio that deotes how well they do o the exa. The score is what we really wat to copute. For each questio ad each studet, the studet has provided a aswer a,, which is either zero or oe. There is also a ukow groud truth aswer for each questio, which we ll call t, which is also either zero or oe. As a startig poit, let s cosider a siple heuristic ad the coplexify it. The heuristic is the ajority vote heuristic ad works as follows. First, we estiate t as the ost coo aswer for questio : t = argax t 1a, = t. Oce we have a guess for each true aswer, we estiate each studets score as how ay aswers they produced that atch this guessed key: s = 1 M 1a, = t. Oce we have these scores, however, we ight wat to trust soe of the studets ore tha others. I particular, aswers fro studets with high scores are perhaps ore likely to be correct, so we ca recopute the groud truth, accordig to weighted votes. The weight of the votes will be precisely the score the correspodig each studet: t = argax t s 1a, = t (16.1) You ca recogize this as a chicke ad egg proble. If you kew the studet s scores, you could estiate a aswer key. If you had a aswer key, you could copute studet scores. A very coo strategy i coputer sciece for dealig with such chicke ad egg probles is to iterate. Take a guess at the first, copute the secod, recopute the first, ad so o. I order to develop this idea forally, we have to case the proble i ters of a probabilistic odel with a geerative story. The geerative story we ll use is: 1. For each questio, choose a true aswer t Ber(0.5) 2. For each studet, choose a score s Ui(0, 1) 3. For each questio ad each studet, choose a aswer a, Ber(s ) t Ber(1 s ) 1 t I the first step, we geerate the true aswers idepedetly by flippig a fair coi. I the secod step, each studets overall score is deteried to be a uifor rado uber betwee zero ad oe. The tricky step is step three, where each studets aswer is geerated for each questio. Cosider studet aswerig questio, ad suppose that s = 0.9. If t = 1, the a, should be 1 (i.e., correct) 90% of the tie; this ca be accoplished by drawig the aswer fro Ber(0.9). O the other had, if t = 0, the a, should 1 (i.e., icorrect) 10% of the tie; this ca be accoplished by drawig
3 188 a course i achie learig the aswer fro Ber(0.1). The expoet i step 3 selects which of two Beroulli distributios to draw fro, ad the ipleets this rule. This ca be traslated ito the followig likelihood: p(a, t, s) = 0.5 t t = 0.5 M 1 s a,t (1 s ) (1 a,)t s (1 a,)(1 t ) (1 s ) a,(1 t ) (16.2) s a,t (1 s ) (1 a,)t s (1 a,)(1 t ) (1 s ) a,(1 t ) (16.3) Suppose we kew the true lables t. We ca take the log of this likelihood ad differetiate it with respect to the score s of soe studet (ote: we ca drop the 0.5 M ter because it is just a costat): log p(a, t, s) = log p(a, t, s) s a, t log s + (1 a, )(1 t ) log(s ) + (1 a, )t log(1 s ) + a, (1 t ) log(1 s ) (16.4) a, t + (1 a, )(1 t ) = (1 a,)t + a, (1 t ) s 1 s (16.5) The derivative has the for s A 1 s. If we set this equal to zero ad solve for s, we get a optiu of s =. I this case: B A A+B A = a, t + (1 a, )(1 t ) (16.6) B = (1 a, )t + a, (1 t ) (16.7) A + B = 1 = M (16.8) Puttig this together, we get: s = 1 M a, t + (1 a, )(1 t ) (16.9) I the case of kow ts, this atches exactly what we had i the heuristic. However, we do ot kow t, so istead of usig the true values of t, we re goig to use their expectatios. I particular, we will copute s by axiizig its likelihood uder the expected values
4 expectatio axiizatio 189 of t, hece the ae expectatio axiizatio. If we are goig to copute expectatios of t, we have to say: expectatios accordig to which probability distributio? We will use the distributio p(t a, s). Let t deote E t p(t a,s)t. Because t is a biary variable, its expectatio is equal to it s probability; aely: t = p(t a, s). How ca we copute this? We will copute C = p(t = 1, a, s) ad D = p(t = 0, a, s) ad the copute t = C/(C + D). The coputatio is straightforward: C = 0.5 s a, (1 s ) 1 a, = 0.5 s (1 s ) (16.10) : : a, =1 a, =0 D = 0.5 s 1 a, (1 s ) a, = 0.5 (1 s ) s (16.11) : : a, =1 a, =0 If you ispect the value of C, it is basically votig (i a product for, ot a su for) the scores of those studets who agree that the aswer is 1 with oe-ius-the-score of those studets who do ot. The value of D is doig the reverse. This is a for of ultiplicative votig, which has the effect that if a give studet has a perfect score of 1.0, their results will carry the vote copletely. We ow have a way to: 1. Copute expected groud truth values t, give scores. 2. Optiize scores s give expected groud truth values. The full solutio is the to alterate betwee these two. You ca start by iitializig the groud truth values at the ajority vote (this sees like a safe iitializatio). Give those, copute ew scores. Give those ew scores, copute ew groud truth values. Ad repeat util tired. I the ext two sectios, we will cosider a ore coplex usupervised learig odel for clusterig, ad the a geeric atheatical fraework for expectatio axiizatio, which will aswer questios like: will this process coverge, ad, if so, to what? 16.2 Clusterig with a Mixture of Gaussias I Chapter 9, you leared about probabilitic odels for classificatio based o desity estiatio. Let s start with a fairly siple classificatio odel that assues we have labeled data. We will shortly reove this assuptio. Our odel will state that we have K classes, ad data fro class k is draw fro a Gaussia with ea µ k ad variace σ 2 k. The choice of classes is paraeterized by θ. The geerative story for this odel is:
5 190 a course i achie learig 1. For each exaple = 1... N: (a) Choose a label y Disc(θ) (b) Choose exaple x Nor(µ y, σ 2 y ) This geerative story ca be directly traslated ito a likelihood as before: p(d) = Mult(y θ)nor(x µ y, σy 2 ) (16.12) for each exaple {}} { = θ y 2πσy 2 D 2 }{{} exp 1 x 2 2σy 2 µ y choose label } {{ } choose feature values (16.13) If you had access to labels, this would be all well ad good, ad you could obtai closed for solutios for the axiu likelihood estiates of all paraeters by takig a log ad the takig gradiets of the log likelihood: θ k = fractio of traiig exaples i class k (16.14) = 1 N y = k µ k = ea of traiig exaples i class k (16.15) σ 2 k = y = kx y = k = variace of traiig exaples i class k (16.16) = y = k x µ k y = k Suppose that you do t have labels. Aalogously to the K-eas algorith, oe potetial solutio is to iterate. You ca start off with guesses for the values of the ukow variables, ad the iteratively iprove the over tie. I K-eas, the approach was the assig exaples to labels (or clusters). This tie, istead of akig hard assigets ( exaple 10 belogs to cluster 4 ), we ll ake soft assigets ( exaple 10 belogs half to cluster 4, a quarter to cluster 2 ad a quarter to cluster 5 ). So as ot to cofuse ourselves too uch, we ll itroduce a ew variable, z = z,1,..., z,k (that sus to oe), to deote a fractioal assiget of exaples to clusters. This otio of soft-assigets is visualized i Figure Here, we ve depicted each exaple as a pie chart, ad it s colorig deotes the degree to which it s bee assiged to each (of three) clusters. The size of the pie pieces correspod to the z values.? You should be able to derive the axiu likelihood solutio results forally by ow. Figure 16.1: e:piecharts: A figure
6 expectatio axiizatio 191 Forally, z,k deotes the probability that exaple is assiged to cluster k: z,k = p(y = k x ) (16.17) = p(y = k, x ) p(x ) (16.18) = 1 Z Mult(k θ)nor(x µ k, σ 2 k ) (16.19) Here, the oralizer Z is to esure that z sus to oe. Give a set of paraeters (the θs, µs ad σ 2 s), the fractioal assigets z,k are easy to copute. Now, aki to K-eas, give fractioal assigets, you eed to recopute estiates of the odel paraeters. I aalogy to the axiu likelihood solutio (Eqs (??)-(??)), you ca do this by coutig fractioal poits rather tha full poits. This gives the followig re-estiatio updates: θ k = fractio of traiig exaples i class k (16.20) = 1 N z,k µ k = ea of fractioal exaples i class k (16.21) = z,k x z,k σ 2 k = variace of fractioal exaples i class k (16.22) = z,k x µ k z,k All that has happeed here is that the hard assigets y = k have bee replaced with soft assigets z,k. As a bit of foreshadowig of what is to coe, what we ve doe is essetially replace kow labels with expected labels, hece the ae expectatio axiizatio. Puttig this together yields Algorith This is the GMM ( Gaussia Mixture Models ) algorith, because the probabilitic odel beig leared describes a dataset as beig draw fro a ixture distributio, where each copoet of this distributio is a Gaussia. Just as i the K-eas algorith, this approach is succeptible to local optia ad quality of iitializatio. The heuristics for coputig better iitializers for K-eas are also useful here.? Aside fro the fact that GMMs use soft assigets ad K-eas uses hard assigets, there are other differeces betwee the two approaches. What are they? 16.3 The Expectatio Maxiizatio Fraework At this poit, you ve see a ethod for learig i a particular probabilistic odel with hidde variables. Two questios reai: (1) ca
7 192 a course i achie learig Algorith 38 GMM(X, K) 1: for k = 1 to K do 2: µ k soe rado locatio // radoly iitialize ea for kth cluster 3: σk 2 1 // iitialize variaces 4: θ k 1/K // each cluster equally likely a priori 5: ed for 6: repeat 7: for = 1 to N do 8: for k = 1 to K do 9: z,k θ k 2πσ 2 D 2 k exp 1 x µ k 2 // copute 2σ 2 k (uoralized) fractioal assigets 10: ed for 11: z 1 k z,k z 12: ed for 13: for k = 1 to K do // oralize fractioal assigets 14: θ k 1 N z,k // re-estiate prior probability of cluster k 15: µ k z,k x z,k 16: σk 2 z,k x µ k z,k 17: ed for 18: util coverged // re-estiate ea of cluster k // re-estiate variace of cluster k 19: retur z // retur cluster assigets you apply this idea ore geerally ad (2) why is it eve a reasoable thig to do? Expectatio axiizatio is a faily of algoriths for perforig axiu likelihood estiatio i probabilistic odels with hidde variables. The geeral flavor of how we will proceed is as follows. We wat to axiize the log likelihood L, but this will tur out to be difficult to do directly. Istead, we ll pick a surrogate fuctio L that s a lower boud o L (i.e., L L everywhere) that s (hopefully) easier to axiize. We ll costruct the surrogate i such a way that icreasig it will force the true likelihood to also go up. After axiizig L, we ll costruct a ew lower boud ad optiize that. This process is show pictorially i Figure To proceed, cosider a arbitrary probabilistic odel p(x, y θ), where x deotes the observed data, y deotes the hidde data ad θ deotes the paraeters. I the case of Gaussia Mixture Models, x was the data poits, y was the (ukow) labels ad θ icluded the cluster prior probabilities, the cluster eas ad the cluster variaces. Now, give access oly to a uber of exaples x 1,..., x N, you would like to estiate the paraeters (θ) of the odel. Probabilistically, this eas that soe of the variables are ukow ad therefore you eed to argialize (or su) over their possible values. Now, your data cosists oly of X = x 1, x 2,..., x N, Figure 16.2: e:lowerboud: A figure showig successive lower bouds
8 expectatio axiizatio 193 ot the (x, y) pairs i D. You ca the write the likelihood as: p(x θ) = y 1 y 2 y N p(x, y 1, y 2,... y N θ) = y 1 y 2 y N = p(x, y θ) y p(x, y θ) argializatio (16.23) exaples are idepedet (16.24) algebra (16.25) At this poit, the atural thig to do is to take logs ad the start takig gradiets. However, oce you start takig logs, you ru ito a proble: the log caot eat the su! L(X θ) = log p(x, y θ) (16.26) y Naely, the log gets stuck outside the su ad caot ove i to decopose the rest of the likelihood ter! The ext step is to apply the soewhat strage, but stragely useful, trick of ultiplyig by 1. I particular, let q( ) be a arbitrary probability distributio. We will ultiply the p(... ) ter above by q(y )/q(y ), a valid step so log as q is ever zero. This leads to: L(X θ) = log q(y ) p(x, y θ) y q(y ) (16.27) We will ow costruct a lower boud usig Jese s iequality. This is a very useful (ad easy to prove!) result that states that f ( i λ i x i ) i λ i f (x i ), so log as (a) λ i 0 for all i, (b) i λ i = 1, ad (c) f is cocave. If this looks failiar, that s just because it s a direct result of the defiitio of cocavity. Recall that f is cocave if f (ax + by) a f (x) + b f (x) wheever a + b = 1. You ca ow apply Jese s iequality to the log likelihood by idetifyig the list of q(y )s as the λs, log as f (which is, ideed, cocave) ad each x as the p/q ter. This yields:? Prove Jese s iequality usig the defiitio of cocavity ad iductio. L(X θ) = q(y ) log p(x, y θ) y q(y ) q(y ) log p(x, y θ) q(y ) log q(y ) y (16.28) (16.29) L(X θ) (16.30) Note that this iequality holds for ay choice of fuctio q, so log as its o-egative ad sus to oe. I particular, it eed t eve by the
9 194 a course i achie learig sae fuctio q for each. We will eed to take advatage of both of these properties. We have succeeded i our first goal: costructig a lower boud o L. Whe you go to optiize this lower boud for θ, the oly part that atters is the first ter. The secod ter, q log q, drops out as a fuctio of θ. This eas that the the axiizatio you eed to be able to copute, for fixed q s, is: θ (ew) arg ax θ q (y ) log p(x, y θ) (16.31) y This is exactly the sort of axiizatio doe for Gaussia ixture odels whe we recoputed ew eas, variaces ad cluster prior probabilities. The secod questio is: what should q ( ) actually be? Ay reasoable q will lead to a lower boud, so i order to choose oe q over aother, we eed aother criterio. Recall that we are hopig to axiize L by istead axiizig a lower boud. I order to esure that a icrease i the lower boud iplies a icrease i L, we eed to esure that L(X θ) = L(X θ). I words: L should be a lower boud o L that akes cotact at the curret poit, θ Further Readig TODO further readig
Expectation maximization
Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label
More informationMixture models (cont d)
6.867 Machie learig, lecture 5 (Jaakkola) Lecture topics: Differet types of ixture odels (cot d) Estiatig ixtures: the EM algorith Mixture odels (cot d) Basic ixture odel Mixture odels try to capture ad
More informationCS 70 Second Midterm 7 April NAME (1 pt): SID (1 pt): TA (1 pt): Name of Neighbor to your left (1 pt): Name of Neighbor to your right (1 pt):
CS 70 Secod Midter 7 April 2011 NAME (1 pt): SID (1 pt): TA (1 pt): Nae of Neighbor to your left (1 pt): Nae of Neighbor to your right (1 pt): Istructios: This is a closed book, closed calculator, closed
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationECE 901 Lecture 4: Estimation of Lipschitz smooth functions
ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig
More information5.6 Binomial Multi-section Matching Transformer
4/14/21 5_6 Bioial Multisectio Matchig Trasforers 1/1 5.6 Bioial Multi-sectio Matchig Trasforer Readig Assiget: pp. 246-25 Oe way to axiize badwidth is to costruct a ultisectio Γ f that is axially flat.
More informationA PROBABILITY PROBLEM
A PROBABILITY PROBLEM A big superarket chai has the followig policy: For every Euros you sped per buy, you ear oe poit (suppose, e.g., that = 3; i this case, if you sped 8.45 Euros, you get two poits,
More informationWe have also learned that, thanks to the Central Limit Theorem and the Law of Large Numbers,
Cofidece Itervals III What we kow so far: We have see how to set cofidece itervals for the ea, or expected value, of a oral probability distributio, both whe the variace is kow (usig the stadard oral,
More informationStatistics and Data Analysis in MATLAB Kendrick Kay, February 28, Lecture 4: Model fitting
Statistics ad Data Aalysis i MATLAB Kedrick Kay, kedrick.kay@wustl.edu February 28, 2014 Lecture 4: Model fittig 1. The basics - Suppose that we have a set of data ad suppose that we have selected the
More informationA string of not-so-obvious statements about correlation in the data. (This refers to the mechanical calculation of correlation in the data.
STAT-UB.003 NOTES for Wedesday 0.MAY.0 We will use the file JulieApartet.tw. We ll give the regressio of Price o SqFt, show residual versus fitted plot, save residuals ad fitted. Give plot of (Resid, Price,
More information5.6 Binomial Multi-section Matching Transformer
4/14/2010 5_6 Bioial Multisectio Matchig Trasforers 1/1 5.6 Bioial Multi-sectio Matchig Trasforer Readig Assiget: pp. 246-250 Oe way to axiize badwidth is to costruct a ultisectio Γ f that is axially flat.
More informationOn Modeling On Minimum Description Length Modeling. M-closed
O Modelig O Miiu Descriptio Legth Modelig M M-closed M-ope Do you believe that the data geeratig echais really is i your odel class M? 7 73 Miiu Descriptio Legth Priciple o-m-closed predictive iferece
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationAutomated Proofs for Some Stirling Number Identities
Autoated Proofs for Soe Stirlig Nuber Idetities Mauel Kauers ad Carste Scheider Research Istitute for Sybolic Coputatio Johaes Kepler Uiversity Altebergerstraße 69 A4040 Liz, Austria Subitted: Sep 1, 2007;
More information19.1 The dictionary problem
CS125 Lecture 19 Fall 2016 19.1 The dictioary proble Cosider the followig data structural proble, usually called the dictioary proble. We have a set of ites. Each ite is a (key, value pair. Keys are i
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationData Analysis and Statistical Methods Statistics 651
Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tau.edu/~suhasii/teachig.htl Suhasii Subba Rao Exaple The itroge cotet of three differet clover plats is give below. 3DOK1 3DOK5 3DOK7
More informationThe Hypergeometric Coupon Collection Problem and its Dual
Joural of Idustrial ad Systes Egieerig Vol., o., pp -7 Sprig 7 The Hypergeoetric Coupo Collectio Proble ad its Dual Sheldo M. Ross Epstei Departet of Idustrial ad Systes Egieerig, Uiversity of Souther
More information1 The Primal and Dual of an Optimization Problem
CS 189 Itroductio to Machie Learig Fall 2017 Note 18 Previously, i our ivestigatio of SVMs, we forulated a costraied optiizatio proble that we ca solve to fid the optial paraeters for our hyperplae decisio
More informationChapter 2. Asymptotic Notation
Asyptotic Notatio 3 Chapter Asyptotic Notatio Goal : To siplify the aalysis of ruig tie by gettig rid of details which ay be affected by specific ipleetatio ad hardware. [1] The Big Oh (O-Notatio) : It
More informationOutline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationMath 10A final exam, December 16, 2016
Please put away all books, calculators, cell phoes ad other devices. You may cosult a sigle two-sided sheet of otes. Please write carefully ad clearly, USING WORDS (ot just symbols). Remember that the
More informationThe Binomial Multi-Section Transformer
4/15/2010 The Bioial Multisectio Matchig Trasforer preset.doc 1/24 The Bioial Multi-Sectio Trasforer Recall that a ulti-sectio atchig etwork ca be described usig the theory of sall reflectios as: where:
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationDefine a Markov chain on {1,..., 6} with transition probability matrix P =
Pla Group Work 0. The title says it all Next Tie: MCMC ad Geeral-state Markov Chais Midter Exa: Tuesday 8 March i class Hoework 4 due Thursday Uless otherwise oted, let X be a irreducible, aperiodic Markov
More informationLecture 19. Curve fitting I. 1 Introduction. 2 Fitting a constant to measured data
Lecture 9 Curve fittig I Itroductio Suppose we are preseted with eight poits of easured data (x i, y j ). As show i Fig. o the left, we could represet the uderlyig fuctio of which these data are saples
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationHOMEWORK 2 SOLUTIONS
HOMEWORK SOLUTIONS CSE 55 RANDOMIZED AND APPROXIMATION ALGORITHMS 1. Questio 1. a) The larger the value of k is, the smaller the expected umber of days util we get all the coupos we eed. I fact if = k
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationBertrand s postulate Chapter 2
Bertrad s postulate Chapter We have see that the sequece of prie ubers, 3, 5, 7,... is ifiite. To see that the size of its gaps is ot bouded, let N := 3 5 p deote the product of all prie ubers that are
More informationAVERAGE MARKS SCALING
TERTIARY INSTITUTIONS SERVICE CENTRE Level 1, 100 Royal Street East Perth, Wester Australia 6004 Telephoe (08) 9318 8000 Facsiile (08) 95 7050 http://wwwtisceduau/ 1 Itroductio AVERAGE MARKS SCALING I
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More information1 (12 points) Red-Black trees and Red-Purple trees
CS6 Hoework 3 Due: 29 April 206, 2 oo Subit o Gradescope Haded out: 22 April 206 Istructios: Please aswer the followig questios to the best of your ability. If you are asked to desig a algorith, please
More informationCOMP 2804 Solutions Assignment 1
COMP 2804 Solutios Assiget 1 Questio 1: O the first page of your assiget, write your ae ad studet uber Solutio: Nae: Jaes Bod Studet uber: 007 Questio 2: I Tic-Tac-Toe, we are give a 3 3 grid, cosistig
More information) is a square matrix with the property that for any m n matrix A, the product AI equals A. The identity matrix has a ii
square atrix is oe that has the sae uber of rows as colus; that is, a atrix. he idetity atrix (deoted by I, I, or [] I ) is a square atrix with the property that for ay atrix, the product I equals. he
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationStatistics for Applications Fall Problem Set 7
18.650. Statistics for Applicatios Fall 016. Proble Set 7 Due Friday, Oct. 8 at 1 oo Proble 1 QQ-plots Recall that the Laplace distributio with paraeter λ > 0 is the cotiuous probaλ bility easure with
More informationMath 4707 Spring 2018 (Darij Grinberg): midterm 2 page 1. Math 4707 Spring 2018 (Darij Grinberg): midterm 2 with solutions [preliminary version]
Math 4707 Sprig 08 Darij Griberg: idter page Math 4707 Sprig 08 Darij Griberg: idter with solutios [preliiary versio] Cotets 0.. Coutig first-eve tuples......................... 3 0.. Coutig legal paths
More informationSome Examples on Gibbs Sampling and Metropolis-Hastings methods
Soe Exaples o Gibbs Saplig ad Metropolis-Hastigs ethods S420/620 Itroductio to Statistical Theory, Fall 2012 Gibbs Sapler Saple a ultidiesioal probability distributio fro coditioal desities. Suppose d
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationThe Binomial Multi- Section Transformer
4/4/26 The Bioial Multisectio Matchig Trasforer /2 The Bioial Multi- Sectio Trasforer Recall that a ulti-sectio atchig etwork ca be described usig the theory of sall reflectios as: where: ( ω ) = + e +
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationAnswer Key, Problem Set 1, Written
Cheistry 1 Mies, Sprig, 018 Aswer Key, Proble Set 1, Writte 1. 14.3;. 14.34 (add part (e): Estiate / calculate the iitial rate of the reactio); 3. NT1; 4. NT; 5. 14.37; 6. 14.39; 7. 14.41; 8. NT3; 9. 14.46;
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More information6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.
6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio
More informationStanford Statistics 311/Electrical Engineering 377
I. Uiversal predictio ad codig a. Gae: sequecex ofdata, adwattopredict(orcode)aswellasifwekew distributio of data b. Two versios: probabilistic ad adversarial. I either case, let p ad q be desities or
More informationLearning Theory for Conditional Risk Minimization: Supplementary Material
Learig Theory for Coditioal Risk Miiizatio: Suppleetary Material Alexader Zii IST Austria azii@istacat Christoph H Lapter IST Austria chl@istacat Proofs Proof of Theore After the applicatio of (6) ad (8)
More informationLecture Outline. 2 Separating Hyperplanes. 3 Banach Mazur Distance An Algorithmist s Toolkit October 22, 2009
18.409 A Algorithist s Toolkit October, 009 Lecture 1 Lecturer: Joatha Keler Scribes: Alex Levi (009) 1 Outlie Today we ll go over soe of the details fro last class ad ake precise ay details that were
More informationHidden Variables, the EM Algorithm, and Mixtures of Gaussians
Hidde Variables the EM Algorith ad Mitures of Gaussias Couter Visio Jia-Bi Huag Virgiia Tech May slides fro D. Hoie Adiistrative stuffs Fial roject roosal due Oct 7 (Thursday) Tis for fial roject Set u
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationContents Two Sample t Tests Two Sample t Tests
Cotets 3.5.3 Two Saple t Tests................................... 3.5.3 Two Saple t Tests Setup: Two Saples We ow focus o a sceario where we have two idepedet saples fro possibly differet populatios. Our
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More information11 KERNEL METHODS From Feature Combinations to Kernels. φ(x) = 1, 2x 1, 2x 2, 2x 3,..., 2x D, Learning Objectives:
11 KERNEL METHODS May who have had a opportuity of kowig ay ore about atheatics cofuse it with arithetic, ad cosider it a arid sciece. I reality, however, it is a sciece which requires a great aout of
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationTomoki Toda. Augmented Human Communication Laboratory Graduate School of Information Science
Seuetial Data Modelig d class Basics of seuetial data odelig ooki oda Augeted Hua Couicatio Laboratory Graduate School of Iforatio Sciece Basic Aroaches How to efficietly odel joit robability of high diesioal
More information42 Dependence and Bases
42 Depedece ad Bases The spa s(a) of a subset A i vector space V is a subspace of V. This spa ay be the whole vector space V (we say the A spas V). I this paragraph we study subsets A of V which spa V
More informationMath 475, Problem Set #12: Answers
Math 475, Problem Set #12: Aswers A. Chapter 8, problem 12, parts (b) ad (d). (b) S # (, 2) = 2 2, sice, from amog the 2 ways of puttig elemets ito 2 distiguishable boxes, exactly 2 of them result i oe
More informationChapter 8: Estimating with Confidence
Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig
More informationLecture 11. Solution of Nonlinear Equations - III
Eiciecy o a ethod Lecture Solutio o Noliear Equatios - III The eiciecy ide o a iterative ethod is deied by / E r r: rate o covergece o the ethod : total uber o uctios ad derivative evaluatios at each step
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationHomework 5 Solutions
Homework 5 Solutios p329 # 12 No. To estimate the chace you eed the expected value ad stadard error. To do get the expected value you eed the average of the box ad to get the stadard error you eed the
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More information6.867 Machine learning, lecture 13 (Jaakkola)
Lecture topics: Boostig, argi, ad gradiet descet copleity of classifiers, geeralizatio Boostig Last tie we arrived at a boostig algorith for sequetially creatig a eseble of base classifiers. Our base classifiers
More informationPARTIAL DIFFERENTIAL EQUATIONS SEPARATION OF VARIABLES
Diola Bagayoko (0 PARTAL DFFERENTAL EQUATONS SEPARATON OF ARABLES. troductio As discussed i previous lectures, partial differetial equatios arise whe the depedet variale, i.e., the fuctio, varies with
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationLecture 6: Integration and the Mean Value Theorem. slope =
Math 8 Istructor: Padraic Bartlett Lecture 6: Itegratio ad the Mea Value Theorem Week 6 Caltech 202 The Mea Value Theorem The Mea Value Theorem abbreviated MVT is the followig result: Theorem. Suppose
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationDiscrete Mathematics and Probability Theory Summer 2014 James Cook Note 15
CS 70 Discrete Mathematics ad Probability Theory Summer 2014 James Cook Note 15 Some Importat Distributios I this ote we will itroduce three importat probability distributios that are widely used to model
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationDiscrete Mathematics: Lectures 8 and 9 Principle of Inclusion and Exclusion Instructor: Arijit Bishnu Date: August 11 and 13, 2009
Discrete Matheatics: Lectures 8 ad 9 Priciple of Iclusio ad Exclusio Istructor: Arijit Bishu Date: August ad 3, 009 As you ca observe by ow, we ca cout i various ways. Oe such ethod is the age-old priciple
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationUnderstanding Samples
1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationBernoulli Polynomials Talks given at LSBU, October and November 2015 Tony Forbes
Beroulli Polyoials Tals give at LSBU, October ad Noveber 5 Toy Forbes Beroulli Polyoials The Beroulli polyoials B (x) are defied by B (x), Thus B (x) B (x) ad B (x) x, B (x) x x + 6, B (x) dx,. () B 3
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationFirst Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >
More informationMA131 - Analysis 1. Workbook 2 Sequences I
MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................
More informationBinomial transform of products
Jauary 02 207 Bioial trasfor of products Khristo N Boyadzhiev Departet of Matheatics ad Statistics Ohio Norther Uiversity Ada OH 4580 USA -boyadzhiev@ouedu Abstract Give the bioial trasfors { b } ad {
More informationName Period ALGEBRA II Chapter 1B and 2A Notes Solving Inequalities and Absolute Value / Numbers and Functions
Nae Period ALGEBRA II Chapter B ad A Notes Solvig Iequalities ad Absolute Value / Nubers ad Fuctios SECTION.6 Itroductio to Solvig Equatios Objectives: Write ad solve a liear equatio i oe variable. Solve
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationSTAT Homework 1 - Solutions
STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationRandomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)
Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black
More informationIntegrals of Functions of Several Variables
Itegrals of Fuctios of Several Variables We ofte resort to itegratios i order to deterie the exact value I of soe quatity which we are uable to evaluate by perforig a fiite uber of additio or ultiplicatio
More informationProbability Theory. Exercise Sheet 4. ETH Zurich HS 2017
ETH Zurich HS 2017 D-MATH, D-PHYS Prof. A.-S. Szita Coordiator Yili Wag Probability Theory Exercise Sheet 4 Exercise 4.1 Let X ) N be a sequece of i.i.d. rado variables i a probability space Ω, A, P ).
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More information