Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Size: px

Start display at page:

Download "Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics"

Emory Roberts
5 years ago
Views:

1 /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space of ML Problems Dscrete Functon Contnuous Functon Type of Supervson (eg, Experence, Feedback) Labeled Examples Classfcaton Regresson Polcy Learnng from Demonstraton Reward Renforcement Learnng Nothng Clusterng Learnng Topcs Learnng Parameters for a Bayesan Network Fully observable Hdden varables (EM algorth Learnng Structure of Bayesan We have: - Bayes Net structure and observatons - We need: Bayes Net parameters P(B) =? P( B) = - P(B) =. =.6 5 P(A E,B) =? P(A E,B) =? P(A E, B) =? 6

2 /7/7 P(A E,B) =? P(A E,B) =? P(A E, B) =.5 7 P(B data) =? Pror Now compute ether MAP or Bayesan estmate + data = - 8 P(B data) = Pror B B Beta(,) + data = (3,7).3.7 Pror P(B)= /(+) = % wth equvalent sample sze 9 5 P(A E,B) =? P(A E,B) =? P(A E, B) =? P(A E,B) =? P(A E,B) =? P(A E, B) =? Pror Beta(,3) P(A E,B) =? Pror P(A E,B) =? Beta(,3) + data= (3,) P(A E, B) =?

3 /7/7 Hdden Varables We could- But we d get a fully-connected network But we can t observe the dsease varable Can t we learn wthout t? 3 Wth 78 parameters (vs. 78) Much harder to learn! Chcken & Egg Problem If we knew that a tranng nstance (patent) had the dsease, then t d be easy to learn P(symptom dsease) If we knew params, e.g. P(symptom dsease) then t d be easy to estmate f the patent had the dsease 5 977: The EM Algorthm Dempster, Lard, and Rubn General framework for lkelhood-based parameter estmaton wth mssng data start wth ntal guesses of parameters E-step: estmate membershps gven params M-step: estmate params gven membershps Repeat untl convergence Converges to a local maxmum of lkelhood E-step and M-step are often computatonally smple Can ncorporate prors over parameters 6 Expectaton Maxmzaton (EM) (hgh-level verson) Pretend we do know the parameters Intalze randomly [E step] Compute probablty of nstance havng each possble value of the hdden varable [M step] Treatng each nstance as fractonally havng both values compute the new parameter values Iterate untl convergence! 7 Expectaton Maxmzaton and Gaussan Mxtures CSE

4 /7/7 The problem of fndng labels for unlabeled data In nature, tems often do not come wth labels. How can we learn labels wthout a teacher? x Unlabeled data Labeled data..3.. ANEMIA PATIENTS AND CONTROLS x From Shadmehr & Dedrchsen Fttng a Gaussan PDF to Data Good ft Poor ft Component Component Mxture Model x Component Component.5 Component Models Mxture Model.5. Mxture Model x x

5 /7/7 Bayes Net for Gaussan Mxtures Hdden varable p(y) P y ( y) p( x y) p x y =, µ, s ) ( p x y = 3, µ 3, s ) ( 3 Learnng of mxture models Measured varable x p x y =, µ, s ) ( p( x) = 3 å = p( y = ) p( x y =, µ, s ) 5 6 Learnng Mxtures from Data Consder fxed K = e.g., unknown parameters Q = {µ, s, µ, s, a } Gven data D = {x,.x N }, we want to fnd the parameters Q that best ft the data 7 EM for Mxture of Gaussans E-step: Compute probablty that pont x was generated by component : p = α P(x C = ) P(C = ) p = p M-step: Compute new mean, covarance, and component weghts: µ s w p å å p x / p p ( x - µ ) / p 8. ANEMIA PATIENTS AND CONTROLS. EM ITERATION

6 /7/7. EM ITERATION 3. EM ITERATION EM ITERATION. EM ITERATION EM ITERATION 5 9 LOG-LIKELIHOOD AS A FUNCTION OF EM ITERATIONS.3.. Log-Lkelhood EM Iteraton 6

/7/7..3.. ANEMIA DATA WITH LABELS Anema Group Control Group Beam-based Sensor Model P( = K Õ k = P( k 3.3 3. 3.5 3.

mssng all obstacles (total reflecton, glass, ). Nose s due to uncertanty n measurng dstance to known obstacle. n poston of known obstacles.

Beam-based Proxmty Model Measurement nose P ht ( = η z exp z max (z z exp ) πσ e σ Unexpected obstacles P unexp( z exp = h l e z max -lz 39

7 /7/ ANEMIA DATA WITH LABELS Anema Group Control Group Beam-based Sensor Model P( = K Õ k = P( k Proxmty Measurement Measurement can be caused by a known obstacle. cross-talk. an unexpected obstacle (people, furnture, ). mssng all obstacles (total reflecton, glass, ). Nose s due to uncertanty n measurng dstance to known obstacle. n poston of known obstacles. n poston of addtonal obstacles. whether obstacle s mssed. Beam-based Proxmty Model Measurement nose P ht ( = η z exp z max (z z exp ) πσ e σ Unexpected obstacles P unexp( z exp = h l e z max -lz 39 Beam-based Proxmty Model Random measurement Max range Mxture Densty æ a ç ça P( = ç a ç è a ht unexp max rand T ö æ Pht( ö ç ç Punexp( ç P (, ) max z x m ç ø è Prand ( ø z exp z max z exp z max P rand ( = h z max P max ( = h z small How can we determne the model parameters? 7

Sonar 3cm cm 5 6 Learnng The Structure of Bayesan Search through the space of possble network structures!

8 /7/7 Raw Sensor Data Measured dstances for expected dstance of 3 cm. Approxmaton Maxmze log lkelhood of the data z: P z z ) ( exp Search parameter space. Sonar Laser EM to fnd mxture parameters Assgn measurements to denstes. Estmate denstes usng assgnments. Reassgn measurements. 3 Approxmaton Results Laser What f we don t know structure? Sonar 3cm cm 5 6 Learnng The Structure of Bayesan Search through the space of possble network structures! (for now, assume we observe all varables) For each structure, learn parameters Pck the one that fts observed data best Caveat won t we end up fully connected???? When scorng, add a penalty for model complexty: Bayesan Informaton Crteron (BIC) BIC = -log(p(d BN)) + penalty Penalty = ½ (# parameters) log (# data ponts) Learnng The Structure of Bayesan Search through the space For each structure, learn parameters Pck the one that fts observed data best Penalze complex models Problem? Exponental number of networks! And we need to learn parameters for each! Exhaustve search out of the queston! 8 8

9 /7/7 Structure Learnng as Search Local Search. Start wth some network structure. Try to make a change (add or delete or reverse edge) 3. See f the new network s any better What should the ntal state be? Unform pror over random networks? Based on pror knowledge? Empty network? How do we evaluate networks? B D B D A A C E C E B D B D A A C E C E 5 9 9

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder