Parameter Esmaon: Cracking Incomplete Data

Size: px

Start display at page:

Download "Parameter Es*ma*on: Cracking Incomplete Data"

Lydia Parrish
6 years ago
Views:

1 Parameter Es*ma*on: Cracking Incomplete Data Khaled S. Refaat Collaborators: Arthur Choi and Adnan Darwiche

2 Agenda Learning Graphical Models Complete vs. Incomplete Data Exploi*ng Data for Decomposi*on EDML vs. EM

3 Learning Graphical Models

4 Learning Graphical Model Parameters!!!! V X!!!!!!!!!!!! Y Z V X Y Z False True? False True False? True True True? False Our goal is to find parameter es*mates that maximize the likelihood:

5 Complete vs. Incomplete Data

6 Complete Data V X Y Z False True True False True False False True True True True False Incomplete Data V X Y Z False True? False? False? True True True? False

7 Complete Data V X Y Z False True True False True False False True True True True False Incomplete Data V X Y Z False True? False? False? True True True? False closed- form or a convex op*miza*on problem

8 Complete Data V X Y Z False True True False True False False True True True True False closed- form or a convex op*miza*on problem Incomplete Data V X Y Z False True? False? False? True True True? False hard non- convex op*miza*on problem

9 Complete Data V X Y Z False True True False True False False True True True True False closed- form or a convex op*miza*on problem Incomplete Data V X Y Z False True? False? False? True True True? False hard non- convex op*miza*on problem

10 Incomplete Data V X Y Z False True? False? False? True True True? False

11 Incomplete Data Fully- observed variables V X Y Z False True? False? False? True True True? False

12 Incomplete Data Hidden variables V X Y Z False True? False? False? True True True? False

13 Exploi*ng Data for Decomposi*on

14 Op*miza*on Op*miza*on Algorithm Inference Engine The op*miza*on algorithm (e.g. EM, EDML, Gradient Method) calls the inference engine with every unique data example at each itera*on.

15 Inference Decomposi*on Techniques!!!! V!!!! V X!!!!!!!! Y X!!!!!!!! Y!!!! Z!!!! Z Prune edges outgoing from observed nodes before compu*ng probabili*es.

16 Main Idea (NIPS 14) Op*miza*on Algorithm Op*miza*on Algorithm Op*miza*on Algorithm Inference Engine Inference Engine Inference Engine We decompose the op*miza*on problem itself to get decomposed convergence and data compression.

17 Learning from Incomplete Data!!!! V X!!!!!!!!!!!! Y Z V X Y Z False True? False True False? True True True? False

18 Decomposing the Op*miza*on Problem!!!! V!!!! V X!!!!!!!! Y X!!!!!!!! Y!!!! Z!!!! Z Get three components:

19 The components of a network par**on its parameters into groups:

20 !!!! V!!!!!!!! X Y Boundary Variable!!!! Z!!!! V Boundary Variable!!!! X

21 !!!! V!!!!!!!! X Y!!!! Z!!!! V!!!! X

22 Learned Parameters!!!! V!!!! V!!!!!!!! X Y!!!! X!!!! Z

23 Theorem (NIPS 14) Any sta*onary points for the sub- problems combine to create a sta*onary point for the original problem.

24 Theorem (NIPS 14) Any sta*onary points for the sub- problems combine to create a sta*onary point for the original problem. Every sta*onary point for the original problem induces sta*onary points for the sub- problems.

25 Experimental SeYng

26 Experimental SeYng EM: uses an inference engine that decomposes inference.

27 Experimental SeYng EM: uses an inference engine that decomposes inference. D- EM: decomposes the op*miza*on problem itself, solves each sub- problem using EM, and combines the solu*ons.

28 The Computa*onal Benefit of Decomposi*on Speed up 500 Speed up Observed % Observed % Figure: Speed- up of D- EM over EM on chain networks: three chains (180, 380, and 500 variables) (le^), and tree networks (63, 127, 255, and 511 variables) (right).

29 Table: Speed- up of D- EM over EM on UAI networks.

30 Reasons for Speed- up

31 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.

32 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.

33 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.

34 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.

35 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.

36 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.

Decomposed Convergence # iterations 4000 2000 0 0 200 400 Sub network Figure:

37 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.

38 Data Compression

39 Data Compression A B C D E F G H I J K L

40 Data Compression A B C D E F G H I J K L

41 Data Compression A B C D E F G H I J K L A B Count True True 1000 True False 5000 False True 3000 False False 8000

42 Speed up Data Compression Dataset Size Figure: Speed- up of D- EM over EM as a func*on of dataset size (log- scale).

43 EDML vs. EM

44 So^ Evidence

45 Hard Evidence X X! {S 1, S 2, S 3 }

46 Hard Evidence X X = S 1

47 So^ Evidence X X = S 1 with some probability

48 So^ Evidence X η η = true

49 So^ Evidence η X p(η X) true S 1! 1 X true true S 2 S 3! 2! 3 η η = true

50 Edge Dele*on

51 Edge Dele*on (cont.)

52 Choi et al 2006 Assert So^ Evidence

53 Problem Defini*on Original Bayesian Network: H S E? true? true???? true S H 2 E

54 Meta Network Crea*on H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3 Example 1 Example 2 Example 3

55 Meta Network Crea*on (cont.)! H Prior knowledge on parameters H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!

56 Meta Network Crea*on (cont.)! H Prior knowledge on parameters H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!

57 Assert Data as Evidence! H H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!

58 H S E? true? true???? true! H H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!

59 EDML (Delete Edges)! H H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!

60 EDML (Learning from So^ Evidence)! H

61 EDML (Learning from So^ Evidence)! H H 1 H 2 H 3 So^ Evidence from Example 1 So^ Evidence from Example 2 So^ Evidence from Example 3

62 EDML (Learning from So^ Evidence)! H H 1 H 2 H 3 Maximizing the posterior probability is a convex op*miza*on problem (UAI 11, UAI 12).

64 EDML Fixed Points (UAI 12) Theorem: EDML fixed points are precisely the EM fixed points.

65 Convergence (UAI 11) Theorem: When only leaves have missing values, EDML converges in one itera*on, whereas EM may not. H 2 S 2 E 2

66 Experiment EM vs. EDML (itera*ons) Category %EDML beqer %EM beqer EDML Speed- up % EM Speed- up % Hiding 10% 93.82% 6.18% 84.59% 87.13% Hiding 25% 90.95% 9.05% 83.83% 75.70% Hiding 35% 82.24% 17.76% 86.26% 75.09% Hiding 50% 77.61% 22.39% 87.80% 80.21% Hiding 70% 75.65% 24.35% 84.48% 74.21% Average 83.05% 16.95% 85.41% 76.96%

67 Experiment EM vs. EDML (itera*ons) Category %EDML beqer %EM beqer EDML Speed- up % EM Speed- up % Hiding 10% 93.82% 6.18% 84.59% 87.13% Hiding 25% 90.95% 9.05% 83.83% 75.70% Hiding 35% 82.24% 17.76% 86.26% 75.09% Hiding 50% 77.61% 22.39% 87.80% 80.21% Hiding 70% 75.65% 24.35% 84.48% 74.21% Average 83.05% 16.95% 85.41% 76.96%

68 Andes (Hiding 25% of the nodes)

69 EDML Generaliza*on (NIPS 13) We generalized EDML as a parallel coordinate descent algorithm. This helps derive new EDML algorithms for other graphical models.

70 EDML for Learning MRFs from Complete Data (NIPS 13)

71 Conclusion Learning from incomplete data can be difficult.

72 Conclusion Learning from incomplete data can be difficult. Good news: pajerns of incompleteness may be exploited.

73 Conclusion Learning from incomplete data can be difficult. Good news: pajerns of incompleteness may be exploited. EDML becomes more exact as the data becomes more complete.

74 Thanks!

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign