CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

Size: px

Start display at page:

Download "CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings"

Ella Shepherd
5 years ago
Views:

1 CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1

2 Motivation CS8803: STR Hilbert Space Embeddings 2

3 Overview Multinomial Distributions Marginal, Joint, Conditional Sum, Product, Bayes rules Hilbert Space Embeddings Marginal, Joint, Conditional Sum, Product, Bayes rules Gram/Kernel Matrices CS8803: STR Hilbert Space Embeddings 3

4 Multinomial Distributions Marginal Probabilities: P[Y ] µ Y = Y Joint Probabilities: P[Y,X] YX = X Y Conditional Probabilities: P[Y X] Y X = X CS8803: STR Hilbert Space Embeddings 4

5 Sum Rule P[Y ]= X X P[YX] µ Y = YX 1 = µ Y YX 1 CS8803: STR Hilbert Space Embeddings 5

6 Product Rule P[Y,X]=P[Y X]P[X] Y X = YX 1 XX YX = Y X XX = Y X YX 1 XX CS8803: STR Hilbert Space Embeddings 6

7 Sum Rule (Revisited) P[Y ]= X X P[Y,X] = X X P[Y X]P[X] µ Y = YX 1 = YX 1 XX µ X = µ Y YX 1 XX µ X CS8803: STR Hilbert Space Embeddings 7

8 Conditioning P[Y X = x] =P[Y X] (X = x) µ Y x = Y X µ x = µ Y x YX 1 XX µ x CS8803: STR Hilbert Space Embeddings 8

9 Bayes Rule etc. P[X Y ]= P[Y X]P[X] P[Y ] X Y =( Y X XX ) > 1 YY = XY 1 YY Y X Y Y X = ( X X )Y X Y ( Y X XX ) > 1 YY CS8803: STR Hilbert Space Embeddings 9

10 Bayes Rule etc. P[X Y ]= P[Y X]P[X] P[Y ] X Y =( Y X XX ) > 1 YY = XY 1 YY P[X Y = y] = P[Y = y X]P[X] P[Y = y] µ X y =( Y X XX ) > 1 YY µ y = XY 1 YY µ y CS8803: STR Hilbert Space Embeddings 10

11 Bayes Rule etc. P[X Y = y, Z = z] = P[X, Y = y Z] (Z = z) P[Y = y Z] (Z = z) XY z = (XY )Z 1 ZZµ YY z = (YY)Z 1 z ZZµ z µ X y,z = XY z 1 YY z µ y CS8803: STR Hilbert Space Embeddings 11

12 Learning XY Z b XY Z = 1 N NX i=1 y i x i z i YX b YX = 1 N NX i=1 y i x > i µ X ˆµ X = 1 N NX i=1 x i where x i y i z i are indicator vectors CS8803: STR Hilbert Space Embeddings 12

13 Generalization how do we make a conditional probability table out of this? how do we learn parameters? (what are the parameters??) how do we perform inference? CS8803: STR Hilbert Space Embeddings 13

14 Could Discretize the Distribution loses information, hard to learn for high cardinality CS8803: STR Hilbert Space Embeddings 14

15 Key Idea: Sufficient Statistics P[Y ] µ Y = E[Y ] Problem: lots of distributions have the same mean P[Y ] µ Y = E[Y ] E[Y 2 ] Better, but lots of distributions still have the same mean and variance!! P[Y ] µ Y = E[Y ] E[Y 2 ] E[Y 3 ] 1 A Even better, but lots of distributions still have first 3 moments! CS8803: STR Hilbert Space Embeddings 15

16 Key Idea: Sufficient Statistics P[Y ] µ Y = 0 E[Y ] E[Y 2 ] E[Y 3 ]. 1 C A CS8803: STR Hilbert Space Embeddings 16

17 Overview Multinomial Distributions Marginal, Joint, Conditional Sum, Product, Bayes rules Hilbert Space Embeddings Marginal, Joint, Conditional Sum, Product, Bayes rules Gram/Kernel Matrices CS8803: STR Hilbert Space Embeddings 17

18 David Hilbert CS8803: STR Hilbert Space Embeddings 18

19 Representation Marginal Distributions: P[Y ] Joint Distributions: P[Y,X] Conditional Distributions: P[Y X] Use kernel representations for distributions CS8803: STR Hilbert Space Embeddings 19

20 Embedding Distributions Summary statistics for distributions P[Y ] E[Y ] Mean E YY > Covariance E[ y0 (Y )] Probability P[y 0 ] E[ (Y )] Expected Features Pick a kernel k(y, y 0 )=h (y), (y 0 )i, and generate a different statistic CS8803: STR Hilbert Space Embeddings 20

21 Embedding Marginal Distributions P[Y ] (Y )=k(y, ) F (RKHS) µ Y = E[ (Y )] ˆµ Y = 1 TX (y i ) T i=1 CS8803: STR Hilbert Space Embeddings 21

22 Embedding Marginal Distributions P[Y ] (Y )=k(y, ) F (RKHS) One-to-one mapping µ Y = from E[ (Y P[Y )] ] to µ Y for certain kernels (e.g. Gaussian, Laplacian ˆµ Y = 1 TX RBF kernels ) (y i ) T Recover discrete probability i=1with delta kernel Sample average converges to true mean at O p m 1 2 CS8803: STR Hilbert Space Embeddings 21

23 Embedding Joint Distributions using outer- Embedding joint distributions P[Y,X] product feature map (Y )'(X) > µ YX = E (Y )'(X) > ˆµ YX = 1 m mx (y i )'(x i ) > i=1 µ YX is also the covariance operator C YX Recover discrete probabilities with delta kernels Empirical estimate converges at O p (m 1 2 ) CS8803: STR Hilbert Space Embeddings 22

24 Y Embedding Conditional Distributions P[Y x 1 ] P[Y x 2 ] E[ (Y ) x] µ Y x1 µ Y x2 (Y )=l(y, ) G (RKHS) x 1 x 2 X For each value X = x, return the summary statistic for P[Y X = x] Some X = x are never observed CS8803: STR Hilbert Space Embeddings 23

25 Embedding Conditional Distributions E[ (Y ) x] Y P[Y x 1 ] P[Y x 2 ] (Y )=l(y, ) G (RKHS) µ Y x1 µ Y x2 x 1 x 2 X avoid data partitioning '(x 1 ) µ Y x = U Y X '(x) '(x 2 ) '(X) =k(x, ) F (RKHS) conditional embedding operator CS8803: STR Hilbert Space Embeddings 24

26 Embedding Conditional Distributions Estimation via covariance operators U Y X := C YX C 1 XX bu Y X = (K + I) 1 > := ( (y 1 ),..., (y m )), L = > := ('(x 1 ),...,'(x m )), K = > Gaussian: covariance matrices Discrete: joint probability matrix divided by marginal Empirical estimate converges at O p ( m ) CS8803: STR Hilbert Space Embeddings 25

27 Direct Correspondence NX YX b YX = 1 N C YX b C YX = 1 N i=1 NX i=1 y i x > i (y i )'(x i ) > NX µ X ˆµ X = 1 N µ X ˆµ X = 1 N i=1 NX i=1 x i (x i ) CS8803: STR Hilbert Space Embeddings 26

28 Key Rules for Inference Sum Rule: P[Y ]= Z X P[Y X]P[X] Product Rule: P[Y,X]=P[Y X]P[X] Bayes Rule: P[X Y ]= R P[Y X]P[X] P[Y X]P[X] X Do probabilistic inference in feature space CS8803: STR Hilbert Space Embeddings 27

29 Product Rule P[Y,X]=P[Y X]P[X] Discrete Y X = YX 1 XX YX = Y X XX HSE C Y X = C YX C 1 XX C YX = C Y X C XX CS8803: STR Hilbert Space Embeddings 28

30 Sum Rule P[Y ]= X X P[Y,X] = X X P[Y X]P[X] Discrete µ Y = YX 1 = YX 1 XX µ X HSE µ Y = C YX C 1 XX µ X CS8803: STR Hilbert Space Embeddings 29

31 Bayes Rule P[X Y ]= P[Y X]P[X] P[Y ] Discrete X Y =( Y X XX ) > 1 YY = XY 1 YY HSE C X Y =(C Y X ) > C 1 YY = C XY C 1 YY CS8803: STR Hilbert Space Embeddings 30

32 Overview Multinomial Distributions Marginal, Joint, Conditional Sum, Product, Bayes rules Hilbert Space Embeddings Marginal, Joint, Conditional Sum, Product, Bayes rules Gram/Kernel Matrices CS8803: STR Hilbert Space Embeddings 31

33 Jørgen Gram CS8803: STR Hilbert Space Embeddings 32

34 Gram/Kernel Matrices bc YX = 1 N bc XX = 1 N NX (y i )'(x i ) > = 1 N Y > X 2 R 1 1 i=1 NX i=1 '(x i )'(x i ) > = 1 N X > X 2 R 1 1 µ x = '(x) 2 R 1 1 Would like to calculate: µ Y x = b C YX b C 1 XX µ x CS8803: STR Hilbert Space Embeddings 33

35 Gram/Kernel Matrices µ Y x = b C YX b C 1 XX µ x ˆµ Y x = Y > X X > X + I 1 '(x) (Woodbury) Matrix Inversion Lemma = Y ( > X X + NI) 1 > X'(x) = Y (G XX + NI) 1 G XX (:,i) where G XX = 1 N > X X 2 R N N G XX (:,i)= > X'(x i ) 2 R N 1 CS8803: STR Hilbert Space Embeddings 34

36 Hilbert Space Embeddings of Distributions An alternative to (for example) exponential families and Parzan windows (KDE) Represent arbitrary distributions in feature spaces, reason using Hilbert space sum, product, and Bayes rules Linear algebra for learning and inference Can extend state space models non-parametrically to domains defined by kernels CS8803: STR Hilbert Space Embeddings 35

22 : Hilbert Space Embeddings of Distributions

10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation