Appendix. Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA Qihang Lin
|
|
- Barrie Wade
- 5 years ago
- Views:
Transcription
1 Xi Chen Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 53, USA Qihang Lin Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 53, USA Dengyong Zhou Machine Learning Group, Microsoft Research, Redmond, WA 9805, USA Proof of Corollary Recall that Ia, b Prθ θ Betaa, b where is the beta function t a t b dt, It is easy to see that Ia, b > Ia, b > Ia, b We re-write Ia, b as follows Ia, b 0 t a t b dt t b t a dt, where the second equality is obtained by setting t : t Then we have: Ia, b Ia, b t a t b t b t a dt a b t t a t b dt t t t > When a > b, t a b Since t >, t > and hence Ia, b Ia, b > 0, ie, Ia, b > a b t When a b, t and Ia, b When a b t a < b, t < and Ia, b < Proof of Proposition 3 We use the proof technique in Xie & Frazier, 0 to prove Proposition 3 By Proposition, we first obtain the value function: V S 0 sup E i H + i H F T i H T i H T K sup E hp T i i To decompose the final accuracy K i hp i T into the intermediate reward at each stage, we define G 0 K i hp i 0 and G t+ K t+ i hpi K i hp Then, K i hp i T can be decomposed as: K i hp i T G 0 + T t0 G t+ The value function can now be re-written as follows: V S 0 T G 0 S 0 + sup E G t+ t0 T G 0 S 0 + sup E EG t+ F t t0 T G 0 S 0 + sup E EG t+ S t, t0 Here, the first inequality is true because G 0 is determinant and independent of, the second inequality is due to the tower property of conditional expectation and the third inequality holds because P t+ i and Pi i, and thus, G t+ depend on F t only through S t and We define intermediate expected reward gained by labeling the -th instance at the state S t as follows: RS t, EG t+ S t, K K E hp t+ i hp S t, i i E hp t+ hp t S t, 3
2 The last equation is due to the fact that only P t will be changed if the -th instance is labeled next With the expected reward function in place, our value function takes the following form: V S 0 G 0 s + sup T E RS t, S 0 4 t0 3 Proof of Proposition 3 To prove the failure of deterministic KG, we first show a key property for the expected reward function: Ra, b a hia +, b hia, b 5 a + b + b hia, b + hia, b a + b Lemma 3 When a, b are positive integers, if a b, Ra, b a aba,a and if a b, Ra, b 0 To prove lemma 3, we first present several basic properties for and Ia, b, which will be used in all the following theorems and proofs Properties for : Properties for : Bb, a 6 Ba +, b a a + b 7 Ba, b + b a + b 8 Ia, b Ib, a 9 Ia +, b Ia, b + a+b a Ia, b + Ia, b a+b b 0 The properties for Ia, b are derived from the basic property of regularized incomplete beta function Proof When a b, by Corollary, we have Ia +, b >, Ia, b and Ia, b + < Therefore, the expected reward 5 takes the following form: Ra, b Ia +, a Ia, a + Ia, a + Ia, a Ia +, a Ia, a a aba, a When a > b, since a, b are integers, we have a b + and hence Ia +, b >, Ia, b >, Ia, b + according to Corollary The expected reward 5 now becomes: Ra, b a Ia +, b + b Ia, b + Ia, b a + b a + b a t t a t b dt a + b Ba +, b + b a + b Ba, b + Ia, b Ia, b Ia, b 0 t a t t b dt t + t t a t b dt Ia, b Here we use 7 and 8 to show that b a+b Ba,b+ Ba,b a a+b Ba+,b When a b, we can prove Ra, b 0 in a similar way With Lemma 3 in place, the proof for Proposition 3 is straightforward Recall that the deterministic KG policy chooses the next instance according to arg max i RS t, i arg max Ra t i, b t i, i and breaks the tie by selecting the one with the smallest index Since Ra, b > 0 if and only if a b, at the initial stage t 0, Ra 0 i, b0 i > 0 for those instances i E {i : a 0 i b 0 i } The policy will first select i 0 E with the largest Ra 0 i, b0 i After obtaining the label y i0, either a 0 i 0 or b 0 i 0 will add one and hence a i 0 b i 0 and Ra i 0, b i 0 0 The policy will select another instance i E with the current largest expected reward and the expected reward for i after obtaining the label y i will then become zero As a consequence, the KG policy will label each instance in E for the first E stages and Ra E i, b E i 0 for all i {,, K} Then the deterministic policy will break the tie selecting the first instance to label From now on, for any t E, if a t b t, then the expected reward Ra t, b t 0 Since the expected reward for other instances are all zero, the policy will still label the first instance On the other hand, if a t b t, and the first instance is the only one with the positive expected reward and the policy will label it Thus Proposition 3 is proved Remark For randomized KG, after getting one label for each instance in E for the first E stages, the expected reward for each instance has become zero Then randomized KG will uniformly select one instance to label At any stage t E, if there exists one instance
3 i at most one instance with a t i b t i, the KG policy will provide the next label for i; otherwise, it will randomly select an instance to label 4 Proof of Theorem 3 To prove the consistency of the optimistic KG policy, we first show the exact values for R + α a, b maxr a, b, R a, b b When a b + : R a, b Ia +, b Ia, b a+b a > 0; R a, b Ia, b + Ia, b a+b b < 0 Therefore, When a b: R + a, b R a, b a+b a > 0 R a, b Ia +, a Ia, a a aba, a ; R a, b Ia, a + Ia, a a aba, a Therefore, we have R R and R + a, b R a, b R a, b a aba, a > 0 3 When b a: R a, b Ia, b Ia +, b a+b a < 0; R a, b Ia, b Ia, b + a+b b > 0 Therefore R + a, b R a, b a+b b > 0 For better visualization, we plot values of R + a, b for different a, b in Figure As we can see R + a, b > 0 for any positive integers a, b, we first prove that lim a+b R + a, b 0 in the following Lemma Lemma 4 Properties for R + a, b: Ra, b is symmetric, ie, R + a, b R + b, a a Figure Illustration of R + a, b lim a R + a, a 0 3 For any fixed a, R + a + k, a k R + a k, a + k is monotonically decreasing in k for k 0,, a 4 When a b, for any fixed b, R + a, b is monotonically decreasing in a By the symmetry of R + a, b, when b a, for any fixed a, R + a, b is monotonically decreasing in b By the above four properties, we have lim a+b R + a, b 0 Proof We first prove these four properties Property : By the fact that Bb, a, the symmetry of R + a, b is straightforward R Property : For a >, + a,a R + a,a a a < and hence R + a, a is monotonically decreasing in a Moreover, a R + a, a R + i, i i a R +, i i R +, e a i i Since lim a a i i and R + a, a 0, lim a R + a, a 0 Property 3: For any k 0, R + a + k +, a k + R + a + k, a k a + kba + k, a k a + k + Ba + k +, a k + a k + a + k + <
4 Property 4: When a b, for any fixed b: R + a +, b R + a, b a a + Ba +, b aa + b aa + < According to the third property, when a + b is an even number, we have R + a, b < R + a+b, a+b According to the fourth property, when a + b is an odd number and a b +, we have R + a, b < R + a, b < R + a+b, a+b ; while when a + b is an odd number and a b, we have R + a, b < R + a, b < R + a+b, a+b Therefore, R + a, b < R + a + b + b, a According to the second property such that lim a R + a, a 0, we conclude that lim a+b R + a, b 0 Using Lemma 4, we first show that, in any sample path, the optimistic KG will label each instance infinitely many times as T goes to infinity Let η i T be a random variable representing the number of times that the i-th instance has been labeled until the stage T using optimistic KG Given a sample path ω, let Iω {i : lim T η i T ω < } be the set of instances that has been labeled only finite number of times as T goes to infinity in this sample path We need to prove that Iω is an empty set for any ω We prove it by contradiction Assuming that Iω is not empty, then after a certain stage T, instances in Iω will never be labeled By Lemma 4, for any j I c, lim T R + a T j ω, bt j ω 0 Therefore, there will exist T > T such that: max j I R+ a T j ω, b T j ω < max c i I R+ a T i ω, b T i ω max i I R+ a T i ω, b T i ω Then according to the optimistic KG policy, the next instance to be labeled must be in Iω, which leads to the contradiction Therefore, Iω will be an empty set for any sample path ω Let Yi s be the random variable which takes the value if the s-th label of the i-th instance is and the value if the s-th label is 0 It is easy to see that EYi s θ i PrYi s θ i θ i Hence, Yi s, s,, are iid random variables By the fact that lim T η T i in all sample paths and using the strong law of large number, we conclude that, conditioning on θ i, i,, K, the conditional probability of a T i b T ηit i s lim lim Y i s EYi s θ i θ i T η i T T η i T for all i,, K, is one According to Proposition, we have H T {i : a T i b T i } and H {i : θ i } The accuracy is AccT K H T H + HT c H c We have: Pr lim AccT T {θi}k i Pr lim T HT H + HT c H c K {θ i} K i a T i b T i Pr lim θ i, i,, K {θ i} K i T η it, whenever θ i for all i The last inequality is due to the fact that, as long as θ i is not in any i, any sample path that gives the event a lim T i bt i T η it θ i, i,, K also gives the event lim T a T i bt i sgnθ i +, which further implies lim T H T H + HT c H c K Finally, we have: Pr lim AccT T [ ] E {θi } K Pr lim AccT i T {θi}k i [ ] E {θi :θ i } K Pr lim AccT i T {θi}k i [], E {θi :θ i } K i where the second equality is because {θ i : i, θ i } is a zero measure set 5 Incorporate Workers Reliability We assume there are K instances with the soft-label θ i Betaa 0 i, b0 i and M workers with the reliability ρ j Betac 0 j, d0 j Given the decision on labeling the i- th instance by the j-th worker, we have the probability of the outcome Z ij : PrZ ij θ i, ρ j θ i ρ j + θ i ρ j PrZ ij θ i, ρ j θ i ρ j + θ i ρ j 3 We approximate the posterior so that at any stage for all i, j, θ i and ρ j will follow Beta distributions In particular, assuming at the current state θ i Betaa i, b i and ρ j Betac j, d j, the posterior distribution conditioned on Z ij takes the following form: PrZij θi, ρjbetaai, bibetacj, dj pθ i, ρ j Z ij PrZ ij PrZij θi, ρjbetaai, bibetacj, dj pθ i, ρ j Z ij PrZ ij
5 where the likelihood PrZ ij z θ i, ρ j for z, is defined in and 3 respectively and PrZ ij EPrZ ij θ i, ρ j Eθ i Eρ j + Eθ i Eρ j a i c j + a i + b i c j + d j b i a i + b i PrZ ij EPrZ ij θ i, ρ j d j c j + d j Eθ i Eρ j + Eθ i Eρ j b i c j + a i + b i c j + d j a i a i + b i d j c j + d j The posterior distributions pθ i, p j Z ij z no longer takes the form of the product of Beta distributions on θ i and p j Therefore, we use variational approximation by first assuming the conditional independence of θ i and ρ j : pθ i, ρ j Z ij z pθ i Z ij zpρ j Z ij z In particular, we have the exact form for the marginal distributions: pθ i Z ij pρ j Z ij pθ i Z ij pρ j Z ij θieρj + θi Eρj Betaa i, b i PrZ ij Eθiρj + Eθi ρj Betac j, d j PrZ ij θieρj + θi Eρj Betaa i, b i PrZ ij Eθiρj + Eθi ρj Betac j, d j PrZ ij To approximate the marginal distribution as Beta distribution, we use the moment matching technique In particular, we approximate such that θ i Z ij z Betaã i z, b i z, Ẽ zθ i E pθi Z ij zθ i ã iz ã iz + b iz, 4 Ẽ zθ i E pθi Z ij zθ i ã izã iz + ã iz + b izã iz + b iz +, where ã iz and ã iz+ b iz 5 ã izã iz+ are the ã iz+ b izã iz+ b iz+ first and second order moment of Betaã i z, b i z To make 4 and 5 hold, we have: ã i z Ẽzθ Ẽ z θ i i Ẽzθi, 6 Ẽ z θi Ẽz θ i bi z Ẽzθ Ẽ z θ i i Ẽzθi 7 Ẽ z θi Ẽz θ i Similarly, we approximate such that ρ j Z ij z Beta c j z, d j z, Ẽ zρ j E pρj Z ij zρ j c jz c jz + d jz, 8 Ẽ zρ j E pρj Z ij zρ j c jz c jz + c jz + d jz c jz + d jz +, where c jz c jz+ d and jz 9 c jz c jz+ c jz+ d jz c jz+ d are the jz+ first and second order moment of Beta c j z, d j z To make 4 and 5 hold, we have: c j z Ẽzρ Ẽ z ρ j j Ẽzρ j, 0 Ẽ z ρ j Ẽz ρ j d j z Ẽzρ Ẽ z ρ j j Ẽzρ j Ẽ z ρ j Ẽz ρ j Furthermore, we can compute the exact values for Ẽ z θ i, Ẽzθi, Ẽzρ j and Ẽzρ j as follows Ẽ θ i Eθ i Eρ j + Eθ i Eθi Eρ j pz ij aiai + cj + bidj a i + b i + a ic j + b id j Ẽ θi Eθ3 i Eρ j + Eθi Eθi 3 Eρ j pz ij a ia i + a i + c j + b id j a i + b i + a i + b i + a ic j + b id j Ẽ θ i Eθi Eθ i Eρ j + Eθi Eρ j pz ij aibicj + ai + dj a i + b i + b ic j + a id j Ẽ θi Eθ i Eθi 3 Eρ j + Eθi 3 Eρ j pz ij a ia i + b ic j + a i + d j a i + b i + a i + b i + b ic j + a id j Ẽ ρ j EθiEρ j + Eθ ieρ j Eρ j pz ij cjaicj + + bidj c j + d j + a ic j + b id j Ẽ ρ j EθiEρ3 j + Eθ ieρ j Eρ 3 j pz ij c jc j + a ic j + + b id j c j + d j + c j + d j + a ic j + b id j
6 Algorithm Optimistic Knowledge Gradient with Workers Reliability Input: Parameters of prior distributions for instances {a 0 i, b0 i }K i and for workers {c0 j, d0 j }M j The total budget T for t 0,, T do Select the next instance to label and the next worker j t to label according to: Here, j t arg max R + a t i, b t i, c t j, d t j i {,,K},j {,,M} R + a t i, bt i, ct j, dt j maxrat i, bt i, ct j, dt j, Rat i, bt i, ct j, dt j Acquire the label Z itj t {, } of the i-th instance from the j-th worker 3 Update the posterior by setting: a t+ ã t Z itj t b t+ b t Z itj t c t+ j t c t j t Z itj t d t+ j t d t j t Z itj t, and all parameters for i and j j t remain the same end for Output: The positive set H T {i : a T i b T i } Ẽ ρ j EθiEρ j + Eθ ieρ j Eρ j pz ij cjbicj + + aidj c j + d j + b ic j + a id j Ẽ ρ j EθiEρ3 j + Eθ ieρ j Eρ 3 j pz ij c jc j + b ic j + + a id j c j + d j + c j + d j + b ic j + a id j Assuming at a certain stage, θ i for the i-th instance has the Beta posterior Betaa i, b i and ρ j for the j-th worker has the Beta posterior Betac j, d j The reward of getting label for the i-th instance from the j-th worker and getting label - are: R a i, b i, c j, d j hiã iz, b iz hia i, b i R a i, b i, c j, d j hiã iz, b iz hia i, b i, 3 where ã i z ± and b i z ± are defined in 0 and, which further depend on c j and d j With the reward in place, we present the optimistic knowledge gradient algorithm for budget allocation with the modeling of workers reliability in Algorithm 6 Extensions 6 Incorporating Feature Information When each instance is associated with a p-dimensional feature vector x i R p, we incorporate the feature information in our budget allocation problem by assuming: θ i σ w, x i where σx +exp{ x} + exp{ w, x i }, 4 is the sigmoid function and w is assumed to be drawn from a Gaussian prior Nµ 0, Σ 0 At the t-th stage with the state S t µ t, Σ t and w µ t, Σ t, one decides to label the -th instance according to a certain policy eg, KG and observes the label y it {, } The posterior distribution pw y it, S t py it wpw S t has the following log-likelihood: ln pw y it, S t ln py it w + ln pw S t + const y it ln σ w, x it + y it ln σ w, x it w µ t Ω t w µ t + const, where Ω t Σ t is the precision matrix To approximate pw y it, µ t, Σ t by a Gaussian distribution Nµ t+, Σ t+, we use the Laplace method see Chapter 44 in Bishop, 007 In particular, we define the mean of the posterior Gaussian using the MAP maximum a posteriori estimator of w: µ t+ arg max ln pw y it, S t 5 w And µ t+ can be solved by Newton s method and the precision matrix: Ω t+ ln pw y it, S t wµt+ Ω t + σµ t+x it+ σµ t+x it+ x it+ x + By Sherman-Morrison formula, the covariance matrix Σ t+ Ω t+ Σ t σµ t+x it σµ t+ x it + σµ t+ x σµ t+ xx Σ tx it Σ tx it+ x Σ t We also calculate the transition probability of y it and y it as follows: Pry it S t, py it wpw S t dw σw x i pw S t dw σµ i κs i,
7 where κs i + s i /8 / and µ i µ t, x i and s i x i Σ tx i To calculate the reward function, in addition to the transition probability, we also need to compute: P Prθ i F t Pr + exp{ w tx i } wt Nµ t, Σ t Prw tx i 0 w t Nµ t, Σ t δc w, x i Nw µ t, Σ t dw dc, 0 w where δ is the Dirac delta functionlet pc δc w, x i Nw µ t, Σ t dw w Since the marginal of a Gaussian distribution is still a Gaussian, pc is a univariate-gaussian distribution with the mean and variance: µ i Ec Ew, x i µ t, x i s i Varc x i Covw, wx i x i Σ t x i Therefore, we have: P t i 0 pcdc Φ where Φ is the Gaussian CDF µ i, 6 s i With Pi t and transition probability in place, the expected reward in value function takes the following form : K K RS t, E hp t+ i hp S t, 7 i i We note that since w will affect all P, the summation from to K in 7 can not be omitted and hence 7 cannot be written as E hp t+ hp t S t, in 3 In this problem, the myopic KG or optimistic KG need to solve OT K optimization problems to compute the mean of the posterior as in 5, which could be computationally quite expensive One possibility to address this problem is to use the variational Bayesian logistic regression Jaakkola & Jordan, 000, which could lead to a faster optimization procedure 6 Multi-Class Setting In multi-class setting with C different classes, we assume that the i-th instance is associated with a probability vector θ i θ i, θ ic, where θ ic is the probability that the i-th instance belongs to the class c and C i θ ic We assume that θ i has a Dirichlet prior θ i Dirα 0 i and our initial state S0 is a K C matrix with α 0 i as its i-th row At each stage t with the current state S t, we determine an instance to label and collect its label y it {,, C}, which follows the categorical distribution: py it C c θiy c c Since the Dirichlet is the conjugate prior of the categorical distribution, the next state induced by the posterior distribution is: S t+ S t + δ yit and S t+ i S for all i Here δ c is a row vector with one at the c- th entry and zeros at all other entries The transition probability: Pry it c S t, Eθ itc S t α tc C c αt c In multi-class problem, at the final stage T when all budget is used up, we construct the set Hc T for each class c to maximize the conditional expected classification accuracy: K C {H T c }C c arg max E Ii H cii H c F T Hc {,,C},Hc H c i c K C arg max Ii H c Pr i H c F T Hc {,,C},Hc H c i c 8 Here, Hc {i : θ ic θ ic, c c} is the true set of instances in the class c The set Hc T consists of instances that belong to class c Therefore, {Hc T } C c should form a partition of all instances {,, K} Let P T ic Pri H c F T Prθ ic θ i c, c c F T 9 To maximize RHS of 8, we have H T c {i : P T ic P T i c, c c} 30 If there is i belongs to more than one H T c, we only assign it to the one with the smallest index c The maximum conditional expected accuracy takes the form: K i max c {,C} P T ic Then the value function can be defined as: V S 0 sup where P T i E K sup E sup i c E K C Ii H T S c Ii H c 0 i c K E hp T S i 0, i P T i,, P T ic and hp T i 3 C Ii H T c Ii H c FT S 0 max P ic T c {,C} 3
8 Following Proposition, let Pic t Pri H c F t and P t i P,, P ic t By defining intermediate reward function at each stage: RS t, E hp t+ hp t S t, The value function can be re-written as: T V S 0 G 0 S 0 + sup E RS t, S 0, t0 where G 0 S 0 K i hp0 i Since the reward function only depends on S t α t R C +, we can define the reward function in a more explicit way by defining: References Bishop, C M Pattern Recognition and Machine Learning Springer, 007 Jaakkola, T and Jordan, M I Bayesian parameter estimation via variational methods Statistics and Computing, 0:5 37, 000 Xie, J and Frazier, P I Sequential bayes-optimal policies for multiple comparions with a control Technical report, Cornell University, 0 Rα C c α c C c α c hiα + δ c hiα Here δ c be a row vector of length C with one at the c-th entry and zeros at all other entries; and Iα I α,, I C α where I c α Prθ c θ c, c c θ Dirα 33 Therefore, we have RS t, Rα t To evaluate the reward Rα, the major bottleneck is how to compute I c α efficiently Directly taking the C-dimensional integration on the region {θ c θ c, c c} C will be computationally very expensive, where C denotes the C-dimensional simplex Therefore, we propose a method to convert the computation of I c α into a one-dimensional integration It is known that to generate θ Dirα, it is equivalent to generate {X c } C c with X c Gammaα c, X and let θ c C c Then θ θ,, θ C will fol- c Xc low Dirα Therefore, we have: I c α PrX c X c, c c X c Gammaα c, 34 It is easy to see that I cα 35 C f Gammax c; α c, dx dx C 0 x xc xc 0 0 x C xc c f Gammax c; α c, c c F Gammax c; α c, dx c, xc 0 where f Gamma x; α c, is the density function of Gamma distribution with the parameter α c, and F Gamma x c ; α c, is the CDF of Gamma distribution at x c with the parameter α c, In many softwares, F Gamma x c ; α c, can be calculated very efficiently without an explicit integration Therefore, we can evaluate I c α by performing only a one-dimensional numerical integration as in 35 We could also use Monte-Carlo approximation to further accelerate the computation in 35
Crowdsourcing & Optimal Budget Allocation in Crowd Labeling
Crowdsourcing & Optimal Budget Allocation in Crowd Labeling Madhav Mohandas, Richard Zhu, Vincent Zhuang May 5, 2016 Table of Contents 1. Intro to Crowdsourcing 2. The Problem 3. Knowledge Gradient Algorithm
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMotif representation using position weight matrix
Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31 Position weight matrix Position weight matrix representation
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More information44 CHAPTER 2. BAYESIAN DECISION THEORY
44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,
More informationExpect Values and Probability Density Functions
Intelligent Systems: Reasoning and Recognition James L. Crowley ESIAG / osig Second Semester 00/0 Lesson 5 8 april 0 Expect Values and Probability Density Functions otation... Bayesian Classification (Reminder...3
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationBayesian Logistic Regression
Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationClustering bi-partite networks using collapsed latent block models
Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space
More informationECE521 Tutorial 2. Regression, GPs, Assignment 1. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides.
ECE521 Tutorial 2 Regression, GPs, Assignment 1 ECE521 Winter 2016 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides. ECE521 Tutorial 2 ECE521 Winter 2016 Credits to Alireza / 3 Outline
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationVariational Bayesian Logistic Regression
Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic
More informationLecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationIntroduction to Bayesian Statistics
School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More information6-1. Canonical Correlation Analysis
6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More information