Structure Learning. Instructor: Su-In Lee University of Washington, Seattle. Score-based structure learning

Size: px

Start display at page:

Download "Structure Learning. Instructor: Su-In Lee University of Washington, Seattle. Score-based structure learning"

Philip Chandler
6 years ago
Views:

1 Readngs: K&F 18.3, 18.4, 18.5, 18.6 Structure Learnng Lecture 11 ay 2, 2011 SE 515, Statstcal ethods, Sprng 2011 Instructor: Su-In Lee Unversty of Washngton, Seattle Last Tme Score-based structure learnng anddate structures; Score functon; Search for the hgh-scorng structure Scorng functons axmum lkelhood score Score L :log P, θ where θ s LE for Prone to overfttng ayesan score SE 515 Statstcal ethods Sprng

2 ayesan Score an prncple of the ayesan approach Whenever we have uncertanty over anythng, place a dstrbuton over t. What uncertanty?, Θ argnal lkelhood Pror over structures P P P P argnal probablty of ata P does not depend on the network ayesan Score: Score : log P log P 3 argnal Lkelhood of ata ven ayesan Score: Score : log P log P Lkelhood argnal lkelhood Pror over parameters P P, θ P θ dθ θ Note smlarty to maxmum lkelhood score, but wth the key dfference that L fnds maxmum of lkelhood and here we compute average of the terms over parameter space 4 2

3 argnal Lkelhood: nomal ase ssume a sequence of m con tosses y the chan rule for probabltes P x[1], K, x[ m] P x[1]... P x[ m] x[1], K, x[ m 1] Lkelhood Pror over parameters P P, θ P θ dθ θ 5 argnal Lkelhood: nomal ase ssume a sequence of m con tosses y the chan rule for probabltes P x[1], K, x[ m] P x[1]... P x[ m] x[1], K, x[ m 1] Recall that for rchlet prors m P x[ m 1] x[1], K, x[ m] m Where m s number of heads n frst m examples T [... 1][ T... T P x[1],..., x[ m]... 1 T 1] 6 3

4 4 argnal Lkelhood: nomal ase ]... 1][... [ ] [ [1],..., m x x P T T T 1 1 L Smplfy usng x1xx ] [ [1],..., T T T m x x P k x m x x P 1 ] [ ] [ [1],..., For multnomals wth rchlet pror 7 T T T T T T Y Network structure determnes form of margnal lkelhood P Network 1: Two rchlet margnal lkelhoods P[1],,[7] PY[1],,Y[7] argnal Lkelhood: ayesnets Y Network 0 8

5 argnal Lkelhood: ayesnets Network structure determnes form of margnal lkelhood P Y T T T T T T Network 2: Three rchlet margnal lkelhoods P[1],,[7] PY[1]Y[4]Y[6]Y[7] PY[2]Y[3]Y[5] Y T Network 1 Y Θ y x Θ yt x T Θ y xt Θ yt xt 9 logp / Idealzed Experment P 0.5 PY 0.5 p PY T 0.5 p s we get more data, the ayesan score prefers 1 where and Y are dependent. Network 0 ny p Y Network 1 Y P 0.05 P 0.10 P 0.15 P

6 argnal Lkelhood: ayesnets The margnal lkelhood has the form: ecomposablty of ayesan Score P pa [ pa ] x, pa pa pa x x, pa [ x, pa ] Pa T,, Θ,, Θ T,, Pa rchlet argnal Lkelhood For the sequence of values of when s parents have a partcular value,,t Θ T,,T Θ T,,T where.. are the counts from the data.. are hyperparameters for each famly 11 ayesan Score: symptotc ehavor For, a network wth rchlet prors satsfes ˆ log log P l θ : m O1 2 m: number of ndependent parameters n pproxmaton s called I score ˆ log ScoreI : l θ : m 2 n 1 I Pˆ, Pa n 1 Pˆ log 2 m Score exhbts tradeoff between ft to data and complexty utual nformaton grows lnearly wth whle complexty grows logarthmcally wth s grows, more emphass s gven to the ft to the data 12 6

7 ayesan Score: symptotc ehavor For, a network wth rchlet prors satsfes ˆ log log P l θ : m O1 2 n I ˆ, Pa 1 n log m 2 ˆ P P 1 O1 ayesan score s consstent s, the true structure * maxmzes the score Spurous edges wll not contrbute to lkelhood and wll be penalzed Requred edges wll be added due to lnear growth of lkelhood term relatve to compared to logarthmc growth of model complexty 13 Prors ayesan Score: Score : log P log P Structure pror P Unform pror: P constant Pror penalzng number of edges: P c 0<c<1 Normalzng constant across networks s smlar and can thus be gnored 14 7

8 Prors ayesan Score: Score : log P log P Parameter pror Pθ e pror 0 : equvalent sample sze 0 : pror network representng the pror probablty of events Set x,pa 0 Px,pa 0 Note: pa may not the same as parents of n 0 ompute Px,pa 0 usng standard nference n 0 e requres assessng pror network 0 an naturally ncorporate pror knowledge e s consstent and asymptotcally equvalent up to a constant to I 15 Summary: Network Scores ecomposablty Lkelhood, I, log e have the form Score : Score Pa : ll are score-equvalent I-equvalent to Score Score 16 8

9 So far, we dscussed scores for evaluatng the qualty of dfferent canddate N structures Let s now examne how to fnd a structure wth a hgh score. STRUTURE SER 17 Optmzaton Problem Input: Tranng data {[1],,[]} Scorng functon ncludng prors, f needed Set of possble structures search space Includng pror knowledge about structure Output: network or networks that maxmze the score Key Property: ecomposablty: the score of a network s a sum of terms. Score : Score Pa : 18 9

10 Learnng Trees Trees t most one parent per varable Why trees? Elegant math we can solve the optmzaton problem effcently wth a greedy algorthm Sparse parameterzaton avod overfttng whle adaptng to the data 19 Learnng Trees Let p denote parent of, or 0 f has no parent We can wrte the score as Score : : p > 0 Score : p Score : p > 0 Score : Pa Score : p : p 0 Score Score Improvement over empty network Score of empty network Score sum of edge scores constant 20 10

11 Learnng Trees lgorthm onstruct graph wth vertces: 1,...,n For all,j, set edge score w j Score j - Score j If the score satsfes score equvalence, w j wj Structure learnng problem: Fnd the tree structure wth maxmum sum of weghts. Solve an undrected spannng tree forest problem and determne drectons of edges afterwards. Ths can be done usng standard algorthms n low-order polynomal tme by buldng a tree n a greedy fashon e.g. Kruskal s maxmum spannng tree algorthm Theorem: Procedure fnds the tree wth maxmal score sum of w j for all edges j When score s lkelhood, then w j s proportonal to I ; j. Ths s known as the how & Lu method. 21 Learnng Trees: Example Tree learned from data sampled from the IU-larm network {[1],,[]} PULEOLUS PP NPYLIS SUNT INTUTION INOVL FIO2 PVST VENTLUN VENTLV RTO2 KINKETUE PRESS INVOLSET VENT VENITUE ISONNET TPR SO2 INSUFFNEST EPO2 YPOVOLEI LVFILURE TEOL LVEVOLUE STROEVOLUE ISTORY ERRLOWOUTPUT R ERRUTER orrect edges VP PWP O REK RST Spurous edges P RP Not every edge n tree s n the orgnal network Tree drecton s arbtrary --- we can t learn about arc drecton 22 11

12 eyond Trees Problem s not easy for more complex networks Example: llowng two parents, greedy algorthm s no longer guaranteed to fnd the optmal network Theorem: Fndng maxmal scorng network structure wth at most k parents for each varable s NP-hard for k>1 In fact, no effcent algorthm exsts 23 Fxed Orderng For any decomposable scorng functon Score: Score : Score Pa : and orderng the maxmal scorng network has: Pa arg maxu { : < } Score U : j j snce choce at does not constran other choces For fxed orderng, the structure learnng problem becomes a set of ndependent problems of fndng parents of. If we bound the n-degree per varable by d, then complexty s exponental n d 24 12

13 eurstc Search We address the problem by usng heurstc search efne a search space: nodes are possble structures edges denote adjacency of structures Traverse ths space lookng for hgh-scorng structures Search technques: reedy hll-clmbng est frst search Smulated nnealng... Search space 25 eurstc Search Typcal operatons: 26 13

14 Explotng ecomposablty ecomposablty: Score : Score achng: To update the score after a local change, we only need to re-score the famles that were changed Pa : 27 reedy ll lmbng Smplest heurstc local search Start wth a gven network empty network best tree tree learnng a random network t each teraton Evaluate all possble changes pply change that leads to best mprovement n score Reterate Stop when no modfcaton mproves score Each step requres evaluatng On 2 new changes elete urrent network 28 14

15 reedy ll lmbng Ptfalls reedy ll-lmbng can get stuck n: Local axma ll one-edge changes reduce the score Plateaus Some one-edge changes leave the score unchanged appens because I-equvalent networks receved the same score and are neghbors n the search space oth occur durng structure search Standard heurstcs can escape from both Randomzaton and restart TU search: Keep a lst of recent operators we appled, and n each step, we do not consder operators that reverse the effect of recently appled operators. 29 odel Selecton So far, we focused on sngle model ven {[1],,[]}, fnd best scorng model ~ arg max P ~ P [ 1] P [ 1], Use t to predct next example Implct assumpton akng predctons based on the ayesan estmaton rule: P [ 1] P [ 1], P est scorng model domnates the weghted sum Vald wth many data nstances very large Pros: We get a sngle structure llows for effcent use n our tasks ons: We are commttng to the ndependences of a partcular structure Other structures mght be as probable gven the data 30 15

16 nnouncements Soluton for PS #1 uploaded. Typo n Q5 of PS #2 Let be some clque such that Scope[φ ] 1 free late day for PS #2 due 5/3 at noon; SE536 PS #3 s ready please pck t up. SE 515 Statstcal ethods Sprng cknowledgement These lecture notes were generated based on the sldes from Prof Eran Segal. SE 515 Statstcal ethods Sprng

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton