Context-specific independence Parameter learning: MLE

Size: px

Start display at page:

Download "Context-specific independence Parameter learning: MLE"

Adrian West
5 years ago
Views:

1 Use hapter 3 of K&F as a reference for SI Reading for parameter learning: hapter 12 of K&F ontext-specific independence Parameter learning: MLE Graphical Models arlos Guestrin arnegie Mellon University October 5 th, 2005

2 nnouncements Homework 2: Out today/tomorrow Programming part in groups of 2-3 lass project Teams of 2-3 students Ideas on the class webpage, but you can do your own Timeline: 10/19: 1 page project proposal 11/14: 5 page progress report (20% of project grade) 12/2: poster session (20% of project grade) 12/5: 8 page paper (60% of project grade) ll write-ups in NIPS format (see class webpage)

3 lique trees versus VE lique tree advantages Multi-query settings Incremental updates Pre-computation makes complexity explicit lique tree disadvantages Space requirements no factors are deleted Slower for single query Local structure in factors may be lost when they are multiplied together into initial clique potential

4 lique tree summary Solve marginal queries for all variables in only twice the cost of query for one variable liques correspond to maximal cliques in induced graph Two message passing approaches VE (the one that multiplies messages) P (the one that divides by old message) lique tree invariant lique tree potential is always the same We are only reparameterizing clique potentials onstructing clique tree for a N from elimination order from triangulated (chordal) graph Running time (only) exponential in size of largest clique Solve exactly problems with thousands (or millions, or more) of variables, and cliques with tens of nodes (or less)

5 Global Structure: Treewidth w O( n exp( w))

6 Local Structure 1: ontext specific indepencence attery ge lternator Fan elt attery harge Delivered Fuel Pump Fuel Line Starter Gas Distributor attery Power Gas Gauge Spark Plugs Radio Lights Engine Turn Over Engine Start

7 Local Structure 1: ontext specific indepencence attery ge lternator Fan elt ontext Specific Independence (SI) fter observing a variable, some vars become independent attery harge Delivered Fuel Pump Fuel Line attery Power Starter Gas Distributor Gas Gauge Spark Plugs Radio Lights Engine Turn Over Engine Start

8 SI example: Tree PD pply ST Letter Represent P(X i Pa Xi ) using a decision tree Path to leaf is an assignment to (a subset of) Pa Xi Leaves are distributions over X i given assignment of Pa Xi on path to leaf Interpretation of leaf: For specific assignment of Pa Xi on path to this leaf X i is independent of other parents Representation can be exponentially smaller than equivalent table Job

9 Tabular VE with Tree PDs If we turn a tree PD into table Sparsity lost! Need inference approach that deals with tree PD directly!

10 Local Structure 2: Determinism Determinism attery ge lternator Fan elt If attery Power = Dead, then Lights = OFF attery harge Delivered Lights ON OFF Fuel Pump Fuel Line attery Power attery Power OK Starter WEK DED Gas Distributor Gas Gauge Radio Lights Engine Turn Over Engine Start Spark Plugs

11 Determinism and inference Determinism gives a little sparsity in table, but much bigger impact on inference Multiplying deterministic factor with other factor introduces many new zeros Operations related to theorem proving, e.g., unit resolution attery Power OK WEK DED Lights ON OFF

12 Today s Models Often characterized by: Richness in local structure (determinism, SI) Massiveness in size (10,000 s variables) High connectivity (treewidth) Enabled by: High level modeling tools: relational, first order dvances in machine learning New application areas (synthesis): ioinformatics (e.g. linkage analysis) Sensor networks Exploiting local structure a must!

13 Exact inference in large models is possible N from a relational model

14 Recursive onditioning Treewidth complexity (worst case) etter than treewidth complexity with local structure Provides a framework for time-space tradeoffs Only quick intuition today, details: Koller&Friedman: , Recursive onditioning, dnan Darwiche. In rtificial Intelligence Journal, 125:1, pages 5-41

15 The omputational Power of ssumptions attery ge lternator Fan elt Leak attery harge Delivered Fuel Line attery Power Starter Gas Gauge Gas Distributor Spark Plugs Radio Lights Engine Turn Over Engine Start

16 The omputational Power of ssumptions attery ge lternator Fan elt Leak attery harge Delivered Fuel Line attery Power Starter Gas Gauge Gas Distributor Spark Plugs Radio Lights Engine Turn Over Engine Start

17 Decomposition attery ge lternator Fan elt Leak attery harge Delivered Fuel Line attery Power Starter Gas Distributor Gas Gauge Spark Plugs Radio Lights Engine Turn Over Engine Start

18 ase nalysis attery ge lternator Fan elt attery ge lternator Fan elt Leak Leak attery harge Delivered Fuel Line attery harge Delivered Fuel Line attery Power Starter Gas Distributor attery Power Starter Gas Distributor Gas Gauge Spark Plugs Gas Gauge Spark Plugs Radio Lights Engine Turn Over Engine Start Radio Lights Engine Turn Over Engine Start p + p

19 ase nalysis attery ge lternator Fan elt attery ge lternator Fan elt Leak Leak attery harge Delivered Fuel Line attery harge Delivered Fuel Line attery Power Starter Gas Distributor attery Power Starter Gas Distributor Gas Gauge Spark Plugs Gas Gauge Spark Plugs Radio Lights Engine Turn Over Engine Start Radio Lights Engine Turn Over Engine Start p * + l p r p

20 ase nalysis attery ge lternator Fan elt attery ge lternator Fan elt Leak Leak attery harge Delivered Fuel Line attery harge Delivered Fuel Line attery Power Starter Gas Distributor attery Power Starter Gas Distributor Gas Gauge Spark Plugs Gas Gauge Spark Plugs Radio Lights Engine Turn Over Engine Start Radio Lights Engine Turn Over Engine Start p * + * l p r p l p r

21 ase nalysis attery ge lternator Fan elt attery ge lternator Fan elt harge Delivered Leak Leak attery Fuel Line attery harge Delivered Fuel Line attery Power Starter Gas Distributor attery Power Starter Gas Distributor Gas Gauge Spark Plugs Gas Gauge Spark Plugs Radio Lights Engine Turn Over Engine Start Radio Lights Engine Turn Over Engine Start p * + * l p r p l p r

22 ase nalysis attery ge lternator Fan elt attery ge lternator Fan elt harge Delivered Leak Leak attery Fuel Line attery harge Delivered Fuel Line attery Power Starter Gas Distributor attery Power Starter Gas Distributor Gas Gauge Spark Plugs Gas Gauge Spark Plugs Radio Lights Engine Turn Over Engine Start Radio Lights Engine Turn Over Engine Start p * + * l p r p l p r

23 Decomposition Tree D E utset f() f(,) f(,) D f(,d) D E f(,d,e)

24 Decomposition Tree D E utset f() f(,) f(,) D f(,d) D E f(,d,e)

25 Decomposition Tree D E utset f() f(,) f(,) Time: O(n exp(w log n)) Space: Linear (using appropriate dtree) D f(,d) D E f(,d,e)

26 R1 R1(T,e) // compute probability of evidence e on dtree T If T is a leaf node Return Lookup(T,e) Else p := 0 for each instantiation c of cutset(t)-e do p := p + R1(Tl,ec) R1(Tr,ec) return p

27 Lookup(T,e) Θ X U : PT associated with leaf T If X is instantiated in e, then x: value of X in e u: value of U in e Return θ x u Else return 1 = Σ x θ x u

28 aching D E F D D E ontext E F.27.39

29 aching D E F Recursive onditioning n any-space algorithm with treewidth complexity Darwiche IJ-01 Time: O(n exp(w)) Space: O(n exp(w)) (using appropriate dtree) D D E ontext E F.27.39

30 R2 R2(T,e) If T is a leaf node, return Lookup(T,e) y := instantiation of context(t) If cache T [y] <> nil, return cache T [y] p := 0 For each instantiation c of cutset(t)-e do p := p + R2(T l,ec) R2(T r,ec) cache T [y] := p

31 Decomposition with Local Structure X Independent of, given,, X

32 Decomposition with Local Structure X Independent of, given,, X

33 Decomposition with Local Structure X Independent of, given,, No need to consider an exponential number of cases (in the cutset size) given local structure X

34 aching with Local Structure,,, Structural cache X

35 aching with Local Structure,,, Structural cache X

36 aching with Local Structure No need to cache an exponential number of results (in the context size) given local structure Non- Structural cache, X,, Structural cache

37 Determinism X X,, X X natural setup to incorporate ST technology: Unit resolution to: Derive values of variables X Detect/skip inconsistent cases Dependency directed backtracking lause learning

38 SI Summary Exploit local structure ontext-specific independence Determinism Significantly speed-up inference Tackle problems with tree-width in the thousands cknowledgements Recursive conditioning slides courtesy of dnan Darwiche Implementation available:

39 Where are we? ayesian networks Represent exponentially-large probability distributions compactly Inference in Ns Exact inference very fast for problems with low treewidth Exploit local structure for fast inference Now: Learning Ns Given structure, estimate parameters

40 Thumbtack inomial Distribution P(Heads) = θ, P(Tails) = 1-θ Flips are i.i.d.: Independent events Identically distributed according to inomial distribution Sequence D of α H Heads and α T Tails

41 Maximum Likelihood Estimation Data: Observed set D of α H Heads and α T Tails Hypothesis: inomial distribution Learning θ is an optimization problem What s the objective function? MLE: hoose θ that maximizes the probability of observed data:

42 Your first learning algorithm Set derivative to zero:

43 MLE for conditional probabilities MLE estimate of P(X=x) = MLE estimate of P(X=x Y=y) Only consider subset of data where Y=y

44 Learning the PTs Data x (1) x (m)

45 MLE learning PTs for general N Vars X 1,,X n and N structure given Each i.i.d. data point assigns a value all vars Likelihood of the data: MLE for PT P(X i Pa Xi ):

BN Semantics 3 Now it s personal! Parameter Learning 1

Readings: K&F: 3.4, 14.1, 14.2 BN Semantics 3 Now it s personal! Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 1 Building BNs from independence