A latent variable model of configural conditioning

A latent variable model of configural conditioning Aaron C. Courville obotics Institute, CMU Work with: Nathaniel D. Daw, UCL (Gatsby), David S. Touretzky, CMU and Geoff Gordon, CMU

Similarity & Discrimination in Animal Learning Similarity: How do animals respond to novel patterns of stimuli? Discrimination: How do animals learn to discriminate between overlapping patterns of stimuli? We recognize these issues as tradeoff between generalization and data-fitting 2

Perspectives on modeling conditioning Discriminative Bell Generative latent state Light Food Tone Bell Light Tone Food 4

Models of Conditioning: escorla-wagner (1972) Predicts reinforcement intensity as a linear function of stimuli, X = [A (light), B (bell),...]. V = i w i X i Learning rule is gradient descent on prediction error w i = α i β(r V )X i 5

Stimulus configurations Configural conditioning: discrimination and generalization between patterns of stimuli. Training: (XO) A+ B+ AB- 120 1 escorla-wagner: esponses (per min.) 100 80 60 40 A / B AB esponse Strength 0.8 0.6 0.4 0.2 A / B AB 20 0 2 4 6 8 Trial Blocks 6 0 0 2 4 6 8 10 Trial Blocks

Modeling Configurations Two dominant perspectives: 1. Added elements W, [W&, 1972] 2. Configural model [Pearce, 1994] Augment stimulus representation with configural unit. eg. XO: X=[A,B,AB]. Which units are active? 7 Observe AB [W& 1972]: All units present. X=[A=1,B=1,AB=1] [Pearce 1994]: Graded activation by generalization rule. X=[A=.5,B=.5,AB=1]

Expt. 1 Paired Compounds [escorla, 2003] Training Trials: AB+ CD+ Test Trials: Trained: AB, CD Transfer: AD, BC Elements: A, B, C, D esponses (per min.) 250 200 150 100 50 0 Trained Transfer Element Probe Stimulus 8

Modeling Paired Compounds esponses (per min.) 250 200 150 100 50 0 Training: AB+ CD+ Trained Transfer Element Probe Stimulus escorla-wagner: esponse Strength 1 0.8 0.6 0.4 0.2 0 Pearce: esponse Strength 1 0.8 0.6 0.4 0.2 Trained Transfer Element Probe Stimulus 9 0 Trained Transfer Element Probe Stimulus

Expt. 2 Asymmetric XO [edhead & Pearce, 1995] 180 Training Trials: A+ BC+ ABC- Test Trials: A BC ABC esponses (per min.) 160 140 120 100 80 60 A BC ABC 40 0 2 4 6 8 10 Trial Blocks 10

Modeling Asymmetric XO escorla-wagner: 1 esponses (per min.) Training: A+ BC+ ABC- 180 160 140 120 100 80 60 40 0 2 4 6 8 10 Trial Blocks A BC ABC 11 esponse Strength 0.8 0.6 0.4 0.2 Pearce: esponse Strength A+ BC+ ABC- 0 0 20 40 60 80 100 Trial Blocks 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 Trial Blocks A+ BC+ ABC-

Issues in Modeling Configural Conditioning How do we choose between the two models? Similarity: How to measure similarity between patterns of stimuli? Discrimination: How do we choose a representation that is flexible enough? A formal Bayesian approach can guide us 12

Perspectives on modeling conditioning Discriminative P( A,B,C,D) Generative P(,A,B,C,D) A B C D AB CD x 1 x 2 A B C D 13

A latent variable model Generative model: sigmoid belief network. P (S i x) = (1 + exp( w i x)) 1 x 1 x 2 Stimuli and Latent variables are binary (on = 1, off = 0) A B C D Latent variables correlate stimuli configural unit 14

Model Inference Learning: P(w,m D) Prediction: P( Stim,D) x 1 x 2 A B C D 15

Learning in the L.V. model Learning = Bayesian inference over weights & model structure conditional on training data P (w m, m D) P (D w m, m)p (w m.m) x 1 x 2 A B C D Latent variable is unknown and unwanted so we compute the marginal likelihood: P (D w m, m) = P (S t,i x, w m, m)p (x w m, m) t x i 16

Approximate inference Inference is analytically intractable: use reversible-jump MCMC reversible-jump mixes slowly: Exchange MCMC method to help x x x x x x x x x 17

L.V. model priors Prior over number of latent variables: 0.1 0.08 Geometric(0.1) Prior over weight magnitudes: Laplace(2.0) 0.2 0.06 0.04 0.1 0.02 0 0 10 20 30 40 50 num. latent variables 18-10 -5 0 5 10 weight Additional assumption: Stimuli are a priori rare

Prediction x 1 Generalization => inference over latents A B P ( A, B, m, w m ) = x 1 P ( x 1, m, w m )P (x A, B, m, w m ) P (x 1 A, B, m, w m ) P (A x 1, m, w m )P ( B x 1, m, w m )P (x 1 m, w m ) Posterior reinforcement prediction: marginalize over choice of weights and model structure. P ( Stim, m, D) = P ( Stim, m, w m, D)P (w m m, D) dw w P ( Stim, D) = m P ( Stim, m, D)P (m D) 19

L.V. Model of Paired Compounds x 1 x 2 MAP Model Structure: 250 Training: AB+ CD+ 1 A B C D esponses (per min.) 200 150 100 50 P( Test, D ) 0.8 0.6 0.4 0.2 0 Trained Transfer Element Probe Stimulus 0 Trained Transfer Element Test Stimulus 20

L.V. Model of Asymmetric XO MAP Model Structures: 4 trials: x 1 10 trials: x 1 x 2 20 trials: x 1 x 2 x 3 A B C A B C A B C esponses (per min.) Training: A+ BC+ ABC- 180 160 140 120 100 80 60 40 0 2 4 6 8 10 Trial Blocks A BC ABC 21 1 0.8 0.6 0.4 0.2 P( A, ) P( B,C, ) P( A,B,C, ) 0 0 5 10 15 20 25 30 Trial Blocks

What s a configuration? Can account for experiments that are traditionally deemed Configural Conditioning Previous models cast configuration as result of stimuli being trained together. We view it as the result of model complexity pressures to group stimuli. 22

Expt: Second-order conditioning versus Conditioned Inhibition [Yin et al, 1994] Group A+ AB- C+ Test esult Test esult No B 96 0 8 B _ BC esp. Few B 96 4 8 B esp. BC esp. Many B 96 48 8 B _ BC _ 23

Bayesian Model of Second-Order Conditioning / Conditioned Inhibition Training: A+ AB- C+ MAP Model Structure: 4 trials: x 1 1 0.8 A B C 0.6 P( B, D ) P( C, D ) P( B,C, D ) 18 trials: x 1 x 2 0.4 A B C 0.2 18 trials: Alternative x 1 x 2 0 0 5 10 15 20 Number of AB- pairings 24 A B C

Dealing with einforcement Are reinforcers really just like other stimulus? x 1 Train: A+ B+ A B Do animals do this? 25

Acquired elational Equivalence [Honey & Watt, 1999] Biconditional training evaluation Test AY-food AZ-no food BY-food BZ-no food A-shock A vs C CY-no food CZ-food DY-no food DZ-food C-no shock B vs D 80 50 Activity (%) 75 70 Activity (%) 45 40 65 35 60 A C 26 30 B D

Modeling Acq. el. Equiv. 80 1 Activity (%) 75 70 65 1-P( Stim,D) 0.8 0.6 0.4 0.2 60 50 A C 0 1 A C Activity (%) 45 40 35 1-P( Stim,D) 0.8 0.6 0.4 0.2 30 B D 27 0 B D

Modeling Acq. el. Equiv. Biconditional training evaluation Test AY-food AZ-no food BY-food BZ-no food CY-no food CZ-food DY-no food DZ-food A-shock C-no shock A vs C B vs D x x 1 2 x x x 3 4 5 A B C D Y Z Food 28

Expt: Food Devaluation [Holland, 1998] Training Trials Phase 1 Phase 2 A-F1 B-F2 B- 0 16 16 28 6 40 Test F1 F2 Consumption (ml) 20 18 16 14 Food1 12 Food 2 10 0 10 20 30 40 Number of B-Food 2 Trials 29

20 L.V. model of Devaluation A-Food1 / B-Food2, B- Consumption (ml) 18 16 14 Food1 12 Food 2 10 0 10 20 30 40 Number of B-Food 2 Trials 1 Few B-Food2: A B x 1 x 2 Food 1 Food 2 0.8 0.6 Many B-Food2: x 1 x 2 x 2 0.4 0.2 1-P( Food 1, D) 1-P( Food 2, D) A B Food 1 Food 2 0 0 5 10 15 20 Number of B-Food 2 Trials 30

Not the whole story... (Variant on Devaluation) 60 [Holland, 1998] Training Trials Phase 1 Phase 2 A-F1 B-F2 F(1,2)- 16 Test 16 40 2 B 160 % time in food cup 50 40 30 Food - 1 20 Food 2-10 0 16 40 160 Number of B-Food 2 Trials Model Structure? x 1 x 2 A B Food 1 Food 2 31

Future Directions Explore the priors: Experimentally manipulatable. emove independent trial assumption. 32

Modeling Change Should reflect our understanding of how the world is believed to change. Example: Causal model parameter drift. The marginal distribution of the diffusion process should reflect your prior 33

Conclusions Similarity and Discrimination are recognized as the tradeoff between compexity and data fidelity arising in Bayesians inference. A latent variable is a natural (causal) setting for the study of classical conditioning. Account for configural conditionng data and more. 34