Achievable rates for pattern recognition

Size: px

Start display at page:

Download "Achievable rates for pattern recognition"

Hubert Stokes
5 years ago
Views:

1 Achievable rates for pattern recognition M Brandon Westover Joseph A O Sullivan Washington University in Saint Louis Departments of Physics and Electrical Engineering

2 Goals of Information Theory Models Bounds Design?

3 The Pattern Recognition Problem Model 1: The Naïve View

4 Training set Selection process Recognition Environment X 1 X 2 X Mc Select one Test pattern x h

5 Training set Selection process Recognition Environment X 1 X 2 X Mc Select one Test pattern x h Recognition System X 1,X 2,,X Mc g ĥ Memory Recognition module Objective: g(x h )=h

6 X 1 X 2 X Mc Select one x h X 1,X 2,,X Mc g ĥ Objective: g(x h )=h

7 Mumford, 2002

8 Mumford, 2002

9 ``Easy things are hard -M Minksy What makes recognition problems hard? Problem-intrinsic challenges Data ambiguity Data complexity

10 Mumford, 95

11 The infinity of signature variation Kersten, 1998

13 Yuille and Kersten, 03

14 The Pattern Recognition Problem Model 2: The Probabilistic View

15 X 1 X 2 X Mc Select one x h X 1,X 2,,X Mc g ĥ Objective: g(x h )=h

16 Data model Imaging model X 1 X 2 p(x) X 1 Select one p(h) x h p(y x) y X Mc X 1,X 2,,X Mc g ĥ Objective: Pr{g(x h )=h}>1-ε

18 ``Easy things are hard -M Minksy What makes pattern recognition hard? Problem-intrinsic challenges Data ambiguity Data complexity Problem-solver-intrinsic challenges Faulty components Data storage limitations Data processing limitations

19 The simplest recognition circuit Barlow, 1959 Barlow, 1959

20 Redundancy Absolute Perceptual Data Compression Data volume ~10 8 bits/sec Data complexity Data cost ~10 9 ATP/bit Cortical energy budget ~10 20 ATP/sec Storage capacity Accessibility Simplicity Stability Modeling Prediction Inference

21 The Pattern Recognition Problem Model 3: Information Theoretic View

22 Data model Imaging model X 1 X 2 p(x) X 1 Select one p(h) x h p(y x) y X Mc X 1,X 2,,X Mc g ĥ Objective: Pr{g(x h )=h}>1-ε

23 X 1 X 2 p(x) X 1 Select one p(h) x h p(y x) y X Mc memory encoder f φ sensory encoder V sensory representation memory representation U 1,U 2,,U Mx g ĥ Objective: Pr{g(V )=h}>1-ε

24 X 1 X 2 p(x) X 1 Select one p(h) x h p(y x) y X Mc R x memory encoder f φ sensory encoder R y V sensory representation memory representation U 1,U 2,,U Mx g ĥ Objective: Pr{g(V )=h}>1-ε, st R=(, R x, R y )

25 Goals of Information Theory Models Bounds Design?

26 p(x) X 1 X 2 Problem statement X Mc Select one p(h) p(y x) Determine the admissible rates R x,r y, and for reliable recognition x h y R x f R y φ v U 1,U 2,,U Mx g ĥ

27 Pattern Recognition Codes: Definitions p(x) x 1 x 2 x 3 select pattern p(h) x h y p(y x) x Mc g 2 f g 1 φ u(i) i γ(i) M x v(i) i γ(i) M y 11 11

28 Achievable rates

29 Characterizing

30 Characterizing

31 All p(x,y,u,v) Outer bound: proof strategy 1/ p ** (x,y,u,v) given (f,φ,g) construct R U-X-Y, X-Y-V etc

32 Characterizing

33 All p(x,y,u,v) Inner bound: proof strategy construct (f,φ,g) 1/ given p * (x,y,u,v) R U-X-Y-V etc

34 H(Y) R x > I(X;U) R y > I(Y;V) < I(U;V)-I(U;V X,Y) I(Y;V) 0 0 I(X;U) H(X)

35 H(Y) V=Y R x > I(X;U) R y > I(Y;V) < I(U;V)-I(U;V X,Y) U=X On the border, U-X-Y-V, so R * =R ** =R I(Y;V) 0 0 I(X;U) H(X)

36 H(Y) V=Y R x > I(X;U) R y > I(Y;V) < I(U;V) U=X I(Y;V) 0 0 I(X;U) H(X)

37 H(Y) V=Y =I(X;Y) R x > I(X;U) R y > I(Y;V) < I(U;V) U=X `Unlimited U,V capacity: U=X, Y=V Rc < I(X;Y) I(Y;V) 0 0 I(X;U) H(X) Channel coding!

38 H(Y) V=Y =I(X;Y) R x > I(X;U) R y > I(Y;V) < I(U;V) =0 U=X Poor memory: U=0 Rc < I(0;V)=0 I(Y;V) 0 0 I(X;U) =0 H(X) Poor senses: V=0 Rc<I(U;0)=0

39 H(Y) V=Y =I(X;Y)-I(X;Y U) =I(X;Y) R x > I(X;U) R y > I(Y;V) < I(U;V) =0 U=X `Unlimited V capacity: V=Y Rc < I(X;Y)-I(X;Y U) I(Y;V) 0 0 I(X;U) =0 H(X)

40 H(Y) V=Y =I(X;Y)-I(X;Y U) =I(X;Y) R x > I(X;U) R y > I(Y;V) < I(U;V) =0 U=X `Unlimited U capacity: U=X Rc < I(X;Y)-I(X;Y V) I(Y;V) 0 0 I(X;U) =0 H(X)

41 H(Y) V=Y =I(X;Y)-I(X;Y U) =I(X;Y) =0 U=X =I(X;Y)-I(X;Y V) I(Y;V) 0 0 I(X;U) =0 H(X)

42 Revisiting GAP

43 All p(x,y,u,v) The gap 1/ R U-X-Y, X-Y-V etc U-X-Y-V etc

44 A related gap: The distributed source coding problem p(x,y) X Y f φ g (U,V) -Problem: Characterize the achievable (R x,r y,d x,d y ) -Posed in early 70 s -Only partial solutions so far

45 Comparison of the gaps Pattern Recognition Distributed Source Coding

46 Closing comments Objective framework for normalizing recognition system performance / guiding system design Closing the gap Extensions for finite n? Learning codebooks for real examples Connections to the information bottleneck framework Similar philosophy: distortion should be defined by the task!

48 References for borrowed images David Mumford, 1995, in Neuronal Architecturesfor Pattern-theoretic problems, from the book Large Scale Neuronal Theories of the Brain, edited by C koch and JL Davis Dan Kersten, 1998, slide from NIPS tutorial titled Computational Vision: Principles of Perceptual Inference Available at: Kersten, D, Mamassian P & Yuille A 2003 (in press), Object perception as Bayesian inference Annual Review of Psychology David Mumford and Agnes Desolneux, 2002, in the introductory chapter to Pattern Theory Through Examples Available from

Efficient Coding. Odelia Schwartz 2017

Efficient Coding. Odelia Schwartz 2017 Efficient Coding Odelia Schwartz 2017 1 Levels of modeling Descriptive (what) Mechanistic (how) Interpretive (why) 2 Levels of modeling Fitting a receptive field model to experimental data (e.g., using