Multiple Experts with Binary Features

Similar documents
Goodness-of-fit for composite hypotheses.

The Substring Search Problem

1D2G - Numerical solution of the neutron diffusion equation

Unobserved Correlation in Ascending Auctions: Example And Extensions

CSCE 478/878 Lecture 4: Experimental Design and Analysis. Stephen Scott. 3 Building a tree on the training set Introduction. Outline.

Information Retrieval Advanced IR models. Luca Bondi

On the integration of the equations of hydrodynamics

10/04/18. P [P(x)] 1 negl(n).

4/18/2005. Statistical Learning Theory

MULTILAYER PERCEPTRONS

-Δ u = λ u. u(x,y) = u 1. (x) u 2. (y) u(r,θ) = R(r) Θ(θ) Δu = 2 u + 2 u. r = x 2 + y 2. tan(θ) = y/x. r cos(θ) = cos(θ) r.

15 Solving the Laplace equation by Fourier method

A Crash Course in (2 2) Matrices

How to Obtain Desirable Transfer Functions in MIMO Systems Under Internal Stability Using Open and Closed Loop Control

3.1 Random variables

Chapter 2: Introduction to Implicit Equations

A Multivariate Normal Law for Turing s Formulae

arxiv: v1 [math.co] 1 Apr 2011

MODULE 5a and 5b (Stewart, Sections 12.2, 12.3) INTRO: In MATH 1114 vectors were written either as rows (a1, a2,..., an) or as columns a 1 a. ...

B. Spherical Wave Propagation

MATH 417 Homework 3 Instructor: D. Cabrera Due June 30. sin θ v x = v r cos θ v θ r. (b) Then use the Cauchy-Riemann equations in polar coordinates

MATH 220: SECOND ORDER CONSTANT COEFFICIENT PDE. We consider second order constant coefficient scalar linear PDEs on R n. These have the form

3.6 Applied Optimization

Hopefully Helpful Hints for Gauss s Law

Question 1: The dipole

Surveillance Points in High Dimensional Spaces

Pearson s Chi-Square Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms

Nuclear Medicine Physics 02 Oct. 2007

In statistical computations it is desirable to have a simplified system of notation to avoid complicated formulas describing mathematical operations.

( ) [ ] [ ] [ ] δf φ = F φ+δφ F. xdx.

Introduction to Mathematical Statistics Robert V. Hogg Joeseph McKean Allen T. Craig Seventh Edition

Scattering in Three Dimensions

On the Quasi-inverse of a Non-square Matrix: An Infinite Solution

I. CONSTRUCTION OF THE GREEN S FUNCTION

Quantum Fourier Transform

Physics 161 Fall 2011 Extra Credit 2 Investigating Black Holes - Solutions The Following is Worth 50 Points!!!

Chapter 9 Dynamic stability analysis III Lateral motion (Lectures 33 and 34)

33. 12, or its reciprocal. or its negative.

Numerical Integration

Value Prediction with FA. Chapter 8: Generalization and Function Approximation. Adapt Supervised Learning Algorithms. Backups as Training Examples [ ]

LET a random variable x follows the two - parameter

15.081J/6.251J Introduction to Mathematical Programming. Lecture 6: The Simplex Method II

Chapter 3: Theory of Modular Arithmetic 38

Chapter 8: Generalization and Function Approximation

Appendix A. Appendices. A.1 ɛ ijk and cross products. Vector Operations: δ ij and ɛ ijk

6 PROBABILITY GENERATING FUNCTIONS

Berkeley Math Circle AIME Preparation March 5, 2013

C/CS/Phys C191 Shor s order (period) finding algorithm and factoring 11/12/14 Fall 2014 Lecture 22

The second law of thermodynamics - II.

Auchmuty High School Mathematics Department Advanced Higher Notes Teacher Version

PROBLEM SET #1 SOLUTIONS by Robert A. DiStasio Jr.

Section 8.2 Polar Coordinates

Lecture 8 - Gauss s Law

6 Matrix Concentration Bounds

Random Variables and Probability Distribution Random Variable

Probablistically Checkable Proofs

Lecture 28: Convergence of Random Variables and Related Theorems

1 Notes on Order Statistics

Conservative Averaging Method and its Application for One Heat Conduction Problem

arxiv: v1 [math.co] 6 Mar 2008

In the previous section we considered problems where the

Central Coverage Bayes Prediction Intervals for the Generalized Pareto Distribution

Topic 4a Introduction to Root Finding & Bracketing Methods

Lab 10: Newton s Second Law in Rotation

An Exact Solution of Navier Stokes Equation

Transverse Wakefield in a Dielectric Tube with Frequency Dependent Dielectric Constant

equilibrium in the money market

Solution to HW 3, Ma 1a Fall 2016

Objectives: After finishing this unit you should be able to:

4. Some Applications of first order linear differential

Math 151. Rumbos Spring Solutions to Assignment #7

Geometry of the homogeneous and isotropic spaces

Temporal-Difference Learning

A Bijective Approach to the Permutational Power of a Priority Queue

Chapter 6 Balanced Incomplete Block Design (BIBD)

Quasi-Randomness and the Distribution of Copies of a Fixed Graph

18.06 Problem Set 4 Solution

you of a spring. The potential energy for a spring is given by the parabola U( x)

Hammerstein Model Identification Based On Instrumental Variable and Least Square Methods

MAC Module 12 Eigenvalues and Eigenvectors

New problems in universal algebraic geometry illustrated by boolean equations

2 x 8 2 x 2 SKILLS Determine whether the given value is a solution of the. equation. (a) x 2 (b) x 4. (a) x 2 (b) x 4 (a) x 4 (b) x 8

DonnishJournals

A Converse to Low-Rank Matrix Completion

Review: Electrostatics and Magnetostatics

MONTE CARLO SIMULATION OF FLUID FLOW

1 Spherical multipole moments

Qualifying Examination Electricity and Magnetism Solutions January 12, 2006

Partition Functions. Chris Clark July 18, 2006

Problem 1. Part b. Part a. Wayne Witzke ProblemSet #1 PHY 361. Calculate x, the expected value of x, defined by

MAGNETIC FIELD AROUND TWO SEPARATED MAGNETIZING COILS

Determining solar characteristics using planetary data

A scaling-up methodology for co-rotating twin-screw extruders

Rigid Body Dynamics 2. CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2018

Numerical solution of diffusion mass transfer model in adsorption systems. Prof. Nina Paula Gonçalves Salau, D.Sc.

FALL 2006 EXAM C SOLUTIONS

Computational Methods of Solid Mechanics. Project report

3D-Central Force Problems I

Handout: IS/LM Model

Stanford University CS259Q: Quantum Computing Handout 8 Luca Trevisan October 18, 2012

Transcription:

Multiple Expets with Binay Featues Ye Jin & Lingen Zhang Decembe 9, 2010

1 Intoduction Ou intuition fo the poect comes fom the pape Supevised Leaning fom Multiple Expets: Whom to tust when eveyone lies a bit by Rayka, Yu, etc. The pape analyzed a classification poblem whee instead of obseving a tue classification of each data point, we obseve some classification fom seveal expets. The poect will attempt to solve a vaiation of the poblem in which the featues ae binay instead of eal-valued. In addition, we genealized the poblem to do a N-class classification instead of a binay classification. 2 Model Desciption 2.1 Taining Data The data set contains m taining examples: { x i, y i ; i 1,..., m}, whee x i x i 1, x i 2,...x i k with xi {0, 1} and y i A R with A {1, 2,..., N} epesenting the N diffeent classes, i.e. the featue space is k-dimensional and thee ae R expets poviding estimates of the tue y i. 2.2 Model Assumptions 2.2.1 Naive Bayes Assumption Simila to the spam classification example given in class, we make the Naive Bayes assumption. Assume that fo the ith taining example, the x i s ae conditionally independent given the tue y i. Then we have the following popety which is convenient: px i 1, x i 2,..., x i k yi k px i yi. 1 2.2.2 Chaacteistic Matix fo Each Custome Fo the th expet, we define his/he chaacteistic matix to be M, whee M p,q P y q y p fo p, q {1, 2,..., N} i.e. the enty on the pth ow and qth column is the pobability that the th expet gives classification q given that the tue classification is p. Notice that each ow of this matix has to add up to one, hence the degee of feedom is NN 1 instead of N 2, i.e. we should eally descibe each custome using a N N 1 matix instead of a N N matix, but fo symmety and simplicity we keep it that way fo now. 2

3 Single Expet Case: A Genealization to Spam Classification We use two sets of paametes to model this poblem: φ y P y i y φ y P x i 1 y i y. Note that we only conside φ 1, φ 2,..., φ N 1 as paametes, φ N φ N 1 N 1 p1 φ p. The oint likelihood is: can be calculated as Lφ y, φ y px i 1, x i 2,..., x i k, yi py i φ y i k k px i φ xi y i y i 1 xi. Set the patial deivatives of L to 0 and we deive the maximum likelihood estimatos: φ y φ y 1{yi y} m 1{yi y}x i 1{yi y}. 4 Multiple Expet Case 4.1 Likelihood Function We need to use the chaacteistic matices M as well as the paametes used in the single expet case φ y, φ y. Let Θ M, φ y, φ y. We can calculate the likelihood function: 3

LΘ P y i 1,..., y i R, xi ; Θ N n1 n1 P y i 1,..., y i R yi n, x i ; ΘP x i y i n; ΘP y i n; Θ N R 1 M n,y i k φ xi n n 1 xi Howeve, LΘ is quite difficult to maximize because of the summation in the fomula and hence taking the log-likelihood does not simplify the poblem vey much. The solution is to use the EM algoithm with y y 1,..., y m as latent vaiables. Now conside the new likelihood function: φ n. L y, Θ P y i 1,..., y i R, xi, y i ; Θ py i 1,..., y i R yi, x i ; Θpx i y i ; Θpy i ; Θ R 1 M y i,y i k φ xi y i y i 1 xi φ y i. 4.2 The EM Algoithm 4.2.1 E-step We need Q i y i py i 1,..., y i R yi, x i ; Θpx i y i ; Θpy i ; Θ, and thus Q i y i N n1 P yi R 1 M y i,y i py i 1,..., y i R yi, x i ; Θpx i y i ; Θpy i ; Θ 1,..., y i R yi n, x i ; ΘP x i y i n; ΘP y i n; Θ k N n1 R 1 M n,y i φxi k y i y i 1 xi φxi n n 1 xi φ y i φ n To initialize because duing the fist E-step, we do not have Θ to let us calculate Q i y i, we can set Q i y i 1 R R 1 1{yi y i }. 4

4.2.2 M-step Given Q i y i calculated in the E-step, we want to maximize lθ m n1 m n1 N R Q i n log N R Q i n 1 1 M n,y i k log M + log φ n,y i n + φ xi n n 1 xi k x i φ n log φ n + 1 x i log n Setting l M n,p 0 fo p 1, 2,..., N 1 bea in mind that M n,n 1 N 1 p1 M n,p is not a fee paamete, we get M n,p Q in1{y i M n,p p}, hence Q in1{y i p} Q. 2 in Setting l φ n 0 fo n 1, 2,..., N 1 again, φ N 1 N 1 n1 φ n is not a fee paamete, we get φ n Q in, hence Lastly, we set l φ n φ n Q in Q ip Q in. 3 m N p1 m Q x in i φ n 1 xi 1 φ n equal to zeo. And we get φ n Q inx i Q. 4 in It is woth noting that φ n, φ n deived hee in the M-step is the same as MLE in the single expet case, except all the 1{y i n} tems ae substituted with Q i n. 4.3 Missing Labels One of the technical details that we need to deal with is the missing labels, meaning that not all expets give classifications to all taining examples, i.e. y i does not necessaily exist fo all i and. It tuns out that we can make a small change to ou algoithm to take cae of this issue. Let R i be the set of expets who classified taining example i. Then the likelihood function becomes LΘ n1 N M n,y i R i k φ xi n n 1 xi φ n. 5

Consequently, the E-step can be modified to Q i y i Ri M y i,y i k φxi N n1 R M k i n,y i y i y i 1 xi φxi n n 1 xi φ y i φ n. with the initial step to be Q i y i 1 R i R i 1{y i y i }. Fo the M-step, the update fomulae fo φ n and φ n does not change, the fomula fo M n,p can be ewitten as M n,p 4.4 Laplace Smoothing i: R i Q i n1{y i p}. i: R i Q i n Anothe technical detail that we may encounte is that the denominatos in the E-step and M-step fomulae may be zeo. As a esult, we need to apply Laplace smoothing. In the M-step, since Q in and i: R i Q i n might be zeo conside the case whee nobody eve gave a classification of n in the taining set, and the fist M-step ight afte the initial E-step which is to set Q i y i 1 R i R i 1{y i y i }, the fomulae can be eplaced with M n,p i: R i Q i n1{y i p} + 1 i: R i Q i n + N Q in + 1 φ n m + N φ n Q inx i + 1 Q in + 2. One can also sanity-check that N p1 M n,p 1 and N n1 φ n 1 using the smoothed fomulae. In the E-step, applying Laplace smoothing might be difficult since each M y i,y i R i k φ xi y i y i 1 xi tem can be vey small and it is had the estimate the ode of magnitude of these tems. As a esult, adding 1 to the numeato and N to the denominato, o geneally φ y i 6

adding some pe-detemined constant c to the numeato and Nc to the denominato, may destoy most of the meaningful infomation in the Q i y i distibution, making them all equal to 1. Howeve, the good news is that because of the smoothing applied N in the M-step, one is guaanteed that M n,p, φ n, φ n 0, 1 and the denominato will be non-zeo, hence thee is no need to apply smoothing in the E-step. 7