Multiple Experts with Binary Features

Multiple Expets with Binay Featues Ye Jin & Lingen Zhang Decembe 9, 2010

1 Intoduction Ou intuition fo the poect comes fom the pape Supevised Leaning fom Multiple Expets: Whom to tust when eveyone lies a bit by Rayka, Yu, etc. The pape analyzed a classification poblem whee instead of obseving a tue classification of each data point, we obseve some classification fom seveal expets. The poect will attempt to solve a vaiation of the poblem in which the featues ae binay instead of eal-valued. In addition, we genealized the poblem to do a N-class classification instead of a binay classification. 2 Model Desciption 2.1 Taining Data The data set contains m taining examples: { x i, y i ; i 1,..., m}, whee x i x i 1, x i 2,...x i k with xi {0, 1} and y i A R with A {1, 2,..., N} epesenting the N diffeent classes, i.e. the featue space is k-dimensional and thee ae R expets poviding estimates of the tue y i. 2.2 Model Assumptions 2.2.1 Naive Bayes Assumption Simila to the spam classification example given in class, we make the Naive Bayes assumption. Assume that fo the ith taining example, the x i s ae conditionally independent given the tue y i. Then we have the following popety which is convenient: px i 1, x i 2,..., x i k yi k px i yi. 1 2.2.2 Chaacteistic Matix fo Each Custome Fo the th expet, we define his/he chaacteistic matix to be M, whee M p,q P y q y p fo p, q {1, 2,..., N} i.e. the enty on the pth ow and qth column is the pobability that the th expet gives classification q given that the tue classification is p. Notice that each ow of this matix has to add up to one, hence the degee of feedom is NN 1 instead of N 2, i.e. we should eally descibe each custome using a N N 1 matix instead of a N N matix, but fo symmety and simplicity we keep it that way fo now. 2

3 Single Expet Case: A Genealization to Spam Classification We use two sets of paametes to model this poblem: φ y P y i y φ y P x i 1 y i y. Note that we only conside φ 1, φ 2,..., φ N 1 as paametes, φ N φ N 1 N 1 p1 φ p. The oint likelihood is: can be calculated as Lφ y, φ y px i 1, x i 2,..., x i k, yi py i φ y i k k px i φ xi y i y i 1 xi. Set the patial deivatives of L to 0 and we deive the maximum likelihood estimatos: φ y φ y 1{yi y} m 1{yi y}x i 1{yi y}. 4 Multiple Expet Case 4.1 Likelihood Function We need to use the chaacteistic matices M as well as the paametes used in the single expet case φ y, φ y. Let Θ M, φ y, φ y. We can calculate the likelihood function: 3

LΘ P y i 1,..., y i R, xi ; Θ N n1 n1 P y i 1,..., y i R yi n, x i ; ΘP x i y i n; ΘP y i n; Θ N R 1 M n,y i k φ xi n n 1 xi Howeve, LΘ is quite difficult to maximize because of the summation in the fomula and hence taking the log-likelihood does not simplify the poblem vey much. The solution is to use the EM algoithm with y y 1,..., y m as latent vaiables. Now conside the new likelihood function: φ n. L y, Θ P y i 1,..., y i R, xi, y i ; Θ py i 1,..., y i R yi, x i ; Θpx i y i ; Θpy i ; Θ R 1 M y i,y i k φ xi y i y i 1 xi φ y i. 4.2 The EM Algoithm 4.2.1 E-step We need Q i y i py i 1,..., y i R yi, x i ; Θpx i y i ; Θpy i ; Θ, and thus Q i y i N n1 P yi R 1 M y i,y i py i 1,..., y i R yi, x i ; Θpx i y i ; Θpy i ; Θ 1,..., y i R yi n, x i ; ΘP x i y i n; ΘP y i n; Θ k N n1 R 1 M n,y i φxi k y i y i 1 xi φxi n n 1 xi φ y i φ n To initialize because duing the fist E-step, we do not have Θ to let us calculate Q i y i, we can set Q i y i 1 R R 1 1{yi y i }. 4

4.2.2 M-step Given Q i y i calculated in the E-step, we want to maximize lθ m n1 m n1 N R Q i n log N R Q i n 1 1 M n,y i k log M + log φ n,y i n + φ xi n n 1 xi k x i φ n log φ n + 1 x i log n Setting l M n,p 0 fo p 1, 2,..., N 1 bea in mind that M n,n 1 N 1 p1 M n,p is not a fee paamete, we get M n,p Q in1{y i M n,p p}, hence Q in1{y i p} Q. 2 in Setting l φ n 0 fo n 1, 2,..., N 1 again, φ N 1 N 1 n1 φ n is not a fee paamete, we get φ n Q in, hence Lastly, we set l φ n φ n Q in Q ip Q in. 3 m N p1 m Q x in i φ n 1 xi 1 φ n equal to zeo. And we get φ n Q inx i Q. 4 in It is woth noting that φ n, φ n deived hee in the M-step is the same as MLE in the single expet case, except all the 1{y i n} tems ae substituted with Q i n. 4.3 Missing Labels One of the technical details that we need to deal with is the missing labels, meaning that not all expets give classifications to all taining examples, i.e. y i does not necessaily exist fo all i and. It tuns out that we can make a small change to ou algoithm to take cae of this issue. Let R i be the set of expets who classified taining example i. Then the likelihood function becomes LΘ n1 N M n,y i R i k φ xi n n 1 xi φ n. 5

Consequently, the E-step can be modified to Q i y i Ri M y i,y i k φxi N n1 R M k i n,y i y i y i 1 xi φxi n n 1 xi φ y i φ n. with the initial step to be Q i y i 1 R i R i 1{y i y i }. Fo the M-step, the update fomulae fo φ n and φ n does not change, the fomula fo M n,p can be ewitten as M n,p 4.4 Laplace Smoothing i: R i Q i n1{y i p}. i: R i Q i n Anothe technical detail that we may encounte is that the denominatos in the E-step and M-step fomulae may be zeo. As a esult, we need to apply Laplace smoothing. In the M-step, since Q in and i: R i Q i n might be zeo conside the case whee nobody eve gave a classification of n in the taining set, and the fist M-step ight afte the initial E-step which is to set Q i y i 1 R i R i 1{y i y i }, the fomulae can be eplaced with M n,p i: R i Q i n1{y i p} + 1 i: R i Q i n + N Q in + 1 φ n m + N φ n Q inx i + 1 Q in + 2. One can also sanity-check that N p1 M n,p 1 and N n1 φ n 1 using the smoothed fomulae. In the E-step, applying Laplace smoothing might be difficult since each M y i,y i R i k φ xi y i y i 1 xi tem can be vey small and it is had the estimate the ode of magnitude of these tems. As a esult, adding 1 to the numeato and N to the denominato, o geneally φ y i 6

adding some pe-detemined constant c to the numeato and Nc to the denominato, may destoy most of the meaningful infomation in the Q i y i distibution, making them all equal to 1. Howeve, the good news is that because of the smoothing applied N in the M-step, one is guaanteed that M n,p, φ n, φ n 0, 1 and the denominato will be non-zeo, hence thee is no need to apply smoothing in the E-step. 7