A latent variable model of configural conditioning

Size: px

Start display at page:

Download "A latent variable model of configural conditioning"

Jennifer O’Neal’
5 years ago
Views:

1 A latent variable model of configural conditioning Aaron C. Courville obotics Institute, CMU Work with: Nathaniel D. Daw, UCL (Gatsby), David S. Touretzky, CMU and Geoff Gordon, CMU

2 Similarity & Discrimination in Animal Learning Similarity: How do animals respond to novel patterns of stimuli? Discrimination: How do animals learn to discriminate between overlapping patterns of stimuli? We recognize these issues as tradeoff between generalization and data-fitting 2

3 3

4 Perspectives on modeling conditioning Discriminative Bell Generative latent state Light Food Tone Bell Light Tone Food 4

5 Models of Conditioning: escorla-wagner (1972) Predicts reinforcement intensity as a linear function of stimuli, X = [A (light), B (bell),...]. V = i w i X i Learning rule is gradient descent on prediction error w i = α i β(r V )X i 5

6 Stimulus configurations Configural conditioning: discrimination and generalization between patterns of stimuli. Training: (XO) A+ B+ AB escorla-wagner: esponses (per min.) A / B AB esponse Strength A / B AB Trial Blocks Trial Blocks

7 Modeling Configurations Two dominant perspectives: 1. Added elements W, [W&, 1972] 2. Configural model [Pearce, 1994] Augment stimulus representation with configural unit. eg. XO: X=[A,B,AB]. Which units are active? 7 Observe AB [W& 1972]: All units present. X=[A=1,B=1,AB=1] [Pearce 1994]: Graded activation by generalization rule. X=[A=.5,B=.5,AB=1]

8 Expt. 1 Paired Compounds [escorla, 2003] Training Trials: AB+ CD+ Test Trials: Trained: AB, CD Transfer: AD, BC Elements: A, B, C, D esponses (per min.) Trained Transfer Element Probe Stimulus 8

9 Modeling Paired Compounds esponses (per min.) Training: AB+ CD+ Trained Transfer Element Probe Stimulus escorla-wagner: esponse Strength Pearce: esponse Strength Trained Transfer Element Probe Stimulus 9 0 Trained Transfer Element Probe Stimulus

10 Expt. 2 Asymmetric XO [edhead & Pearce, 1995] 180 Training Trials: A+ BC+ ABC- Test Trials: A BC ABC esponses (per min.) A BC ABC Trial Blocks 10

11 Modeling Asymmetric XO escorla-wagner: 1 esponses (per min.) Training: A+ BC+ ABC Trial Blocks A BC ABC 11 esponse Strength Pearce: esponse Strength A+ BC+ ABC Trial Blocks Trial Blocks A+ BC+ ABC-

12 Issues in Modeling Configural Conditioning How do we choose between the two models? Similarity: How to measure similarity between patterns of stimuli? Discrimination: How do we choose a representation that is flexible enough? A formal Bayesian approach can guide us 12

13 Perspectives on modeling conditioning Discriminative P( A,B,C,D) Generative P(,A,B,C,D) A B C D AB CD x 1 x 2 A B C D 13

14 A latent variable model Generative model: sigmoid belief network. P (S i x) = (1 + exp( w i x)) 1 x 1 x 2 Stimuli and Latent variables are binary (on = 1, off = 0) A B C D Latent variables correlate stimuli configural unit 14

15 Model Inference Learning: P(w,m D) Prediction: P( Stim,D) x 1 x 2 A B C D 15

16 Learning in the L.V. model Learning = Bayesian inference over weights & model structure conditional on training data P (w m, m D) P (D w m, m)p (w m.m) x 1 x 2 A B C D Latent variable is unknown and unwanted so we compute the marginal likelihood: P (D w m, m) = P (S t,i x, w m, m)p (x w m, m) t x i 16

17 Approximate inference Inference is analytically intractable: use reversible-jump MCMC reversible-jump mixes slowly: Exchange MCMC method to help x x x x x x x x x 17

18 L.V. model priors Prior over number of latent variables: Geometric(0.1) Prior over weight magnitudes: Laplace(2.0) num. latent variables weight Additional assumption: Stimuli are a priori rare

19 Prediction x 1 Generalization => inference over latents A B P ( A, B, m, w m ) = x 1 P ( x 1, m, w m )P (x A, B, m, w m ) P (x 1 A, B, m, w m ) P (A x 1, m, w m )P ( B x 1, m, w m )P (x 1 m, w m ) Posterior reinforcement prediction: marginalize over choice of weights and model structure. P ( Stim, m, D) = P ( Stim, m, w m, D)P (w m m, D) dw w P ( Stim, D) = m P ( Stim, m, D)P (m D) 19

20 L.V. Model of Paired Compounds x 1 x 2 MAP Model Structure: 250 Training: AB+ CD+ 1 A B C D esponses (per min.) P( Test, D ) Trained Transfer Element Probe Stimulus 0 Trained Transfer Element Test Stimulus 20

21 L.V. Model of Asymmetric XO MAP Model Structures: 4 trials: x 1 10 trials: x 1 x 2 20 trials: x 1 x 2 x 3 A B C A B C A B C esponses (per min.) Training: A+ BC+ ABC Trial Blocks A BC ABC P( A, ) P( B,C, ) P( A,B,C, ) Trial Blocks

22 What s a configuration? Can account for experiments that are traditionally deemed Configural Conditioning Previous models cast configuration as result of stimuli being trained together. We view it as the result of model complexity pressures to group stimuli. 22

23 Expt: Second-order conditioning versus Conditioned Inhibition [Yin et al, 1994] Group A+ AB- C+ Test esult Test esult No B B _ BC esp. Few B B esp. BC esp. Many B B _ BC _ 23

24 Bayesian Model of Second-Order Conditioning / Conditioned Inhibition Training: A+ AB- C+ MAP Model Structure: 4 trials: x A B C 0.6 P( B, D ) P( C, D ) P( B,C, D ) 18 trials: x 1 x A B C trials: Alternative x 1 x Number of AB- pairings 24 A B C

25 Dealing with einforcement Are reinforcers really just like other stimulus? x 1 Train: A+ B+ A B Do animals do this? 25

26 Acquired elational Equivalence [Honey & Watt, 1999] Biconditional training evaluation Test AY-food AZ-no food BY-food BZ-no food A-shock A vs C CY-no food CZ-food DY-no food DZ-food C-no shock B vs D Activity (%) Activity (%) A C B D

27 Modeling Acq. el. Equiv Activity (%) P( Stim,D) A C 0 1 A C Activity (%) P( Stim,D) B D 27 0 B D

28 Modeling Acq. el. Equiv. Biconditional training evaluation Test AY-food AZ-no food BY-food BZ-no food CY-no food CZ-food DY-no food DZ-food A-shock C-no shock A vs C B vs D x x 1 2 x x x A B C D Y Z Food 28

29 Expt: Food Devaluation [Holland, 1998] Training Trials Phase 1 Phase 2 A-F1 B-F2 B Test F1 F2 Consumption (ml) Food1 12 Food Number of B-Food 2 Trials 29

30 20 L.V. model of Devaluation A-Food1 / B-Food2, B- Consumption (ml) Food1 12 Food Number of B-Food 2 Trials 1 Few B-Food2: A B x 1 x 2 Food 1 Food Many B-Food2: x 1 x 2 x P( Food 1, D) 1-P( Food 2, D) A B Food 1 Food Number of B-Food 2 Trials 30

31 Not the whole story... (Variant on Devaluation) 60 [Holland, 1998] Training Trials Phase 1 Phase 2 A-F1 B-F2 F(1,2)- 16 Test B 160 % time in food cup Food Food Number of B-Food 2 Trials Model Structure? x 1 x 2 A B Food 1 Food 2 31

32 Future Directions Explore the priors: Experimentally manipulatable. emove independent trial assumption. 32

33 Modeling Change Should reflect our understanding of how the world is believed to change. Example: Causal model parameter drift. The marginal distribution of the diffusion process should reflect your prior 33

34 Conclusions Similarity and Discrimination are recognized as the tradeoff between compexity and data fidelity arising in Bayesians inference. A latent variable is a natural (causal) setting for the study of classical conditioning. Account for configural conditionng data and more. 34

Reinforcement learning

Reinforcement learning einforcement learning How to learn to make decisions in sequential problems (like: chess, a maze) Why is this difficult? Temporal credit assignment Prediction can help Further reading For modeling: Chapter