Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization

Size: px

Start display at page:

Download "Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization"

Suzan Lawson
5 years ago
Views:

1 Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization Jay Myung Department of Psychology Ohio State University UCI Department of Cognitive Sciences Colloquium (May 21, 2014) 1

2 Wordle View of Current Research Efforts 2

3 Outline o Introduction o Adaptive Design Optimization (ADO) o Example Applications of ADO n Memory Retention Experiment n Risky Choice Experiment o Conclusions 3

4 Introduction o Experiments are fundamental to the advancement of psychological science 4

5 Introduction o Data obtained from experiments are used to fit formal models. Multinomial Processing Trees Structural Equations (SEM) Neural Network 5

6 Introduction o Often, there are many competing models to describe the same cognitive/perceptual process. o Example: Some models of memory retention (forgetting) Power Exponential Hyperbolic 6

7 Experiments to discriminate between models Flow chart of typical investigation Formulate models Experiment/ Collect data Analyze data Statistical model selection methods (e.g., AIC & BIC) 7 Model selection criteria are limited by the data they have to work with.

8 Experiments to discriminate between models Flow chart of typical investigation Formulate models Experiment/ Collect data Analyze data Collect smarter data to highlight differences between models 8

9 Collecting smarter data o The design of an experiment determines the quality of data that are collected. Design variables include: treatment levels stimulus levels number of observations 9

10 Hard-to-recruit participants 10

11 Outline o Introduction o Adaptive Design Optimization (ADO) o Example Applications of ADO n Memory Retention Experiment n Risky Choice Experiment o Conclusions 11

12 Adaptive design optimization (ADO) o Adaptively designed experiments n n Conduct the full experiment as a sequence of miniexperiments Improve the design of the next mini-experiment using knowledge gained from the previous miniexperiments Myung & Pitt (2009) Psychological Review Cavagnaro, Myung, Pitt & Kujala (2010) Neural Computation Myung, Cavagnaro & Pitt (2013) Journal of Mathematical Psychology 12

13 Adaptive design optimization (ADO) o Sequential design framework EXPERIMENTS Y obs (d 1 ) Y obs (d 2 ) Y obs (d s ) d 1 d 2 d 3 d s DESIGNS Adapt the design of the next experiment based on the results of preceding experiments 13

14 Adaptive design optimization (ADO) o Bayesian decision theoretic framework Posterior Bayesian Updating Prior Optimal Design Observed Outcome Design Optimization Experiment 14

15 Finding optimal designs o A principled approach from Bayesian decision theory n n Treat each possible design as a gamble whose payoff is determined by the outcome of an experiment carried out with that design. Compute an expected utility of each design by taking an expectation over models, parameters, and experiment outcomes. n The design with the highest expected utility is then chosen as the optimal design Optimal design: 15

16 Finding optimal designs o A principled approach from Bayesian decision theory n n Treat each possible design as a gamble whose payoff is determined by the outcome of an experiment carried out with that design. Compute an expected utility of each design by taking an expectation over models, parameters, and experiment outcomes. Value of a hypothetical experiment with design d in the case in which the true model is m, with parameters θ m, and outcome is y is observed. 16

17 Finding optimal designs o A principled approach from Bayesian decision theory n n Treat each possible design as a gamble whose payoff is determined by the outcome of an experiment carried out with that design. Compute an expected utility of each design by taking an expectation over models, parameters, and experiment outcomes. Likelihood function; E.g., binomial in f(θ m ) 17

18 Finding optimal designs o A principled approach from Bayesian decision theory n n Treat each possible design as a gamble whose payoff is determined by the outcome of an experiment carried out with that design. Compute an expected utility of each design by taking an expectation over models, parameters, and experiment outcomes. priors 18

19 Finding optimal designs o A principled approach from Bayesian decision theory n n Treat each possible design as a gamble whose payoff is determined by the outcome of an experiment carried out with that design. Compute an expected utility of each design by taking an expectation over models, parameters, and experiment outcomes. Priors can be updated between stages 19

20 Bayesian updating o Model posterior at stage S = 1,2,.. n Updated from Bayes factor calculation p0( m) p ( m) = ( m = 1,..., K) S K p ( k) BF ( y( d )) S 1 k = 1 0 ( km, ) S p ( θ ) o Parameter posterior at stage S = 1,2, n Updated using Bayes rule py ( S θm, ds) ps 1( θm) ps( θm) = ( m = 1,..., K) py ( θ, d ) p ( θ ) dθ S m S S 1 m m 20

21 Adaptive design optimization (ADO) (38) (5) EXPERIMENT y obs (d 1 ) y obs (d 2 ) y obs (d S ) p 0 (θ m ) p 0 (m) p 1 (θ m ) p 1 (m) p 2 (θ m ) p 2 (m) p S (θ m ) p S (m) d 1 d 2 d S (3s) (18s) OPTIMAL DESIGNS 21

22 Utility function o Selection of a utility function that adequately captures the goals of the experiment is an integral part of ADO. 22

23 Utility function o Selection of a utility function that adequately captures the goals of the experiment is an integral part of ADO. Take U(d) to be the mutual information of the random variables Y d and M Entropy of M Conditional entropy of M given Y and d Essentially, U(d) measures the amount of information about true model that would be provided by an experiment with design d. 23

24 Computation o Finding an optimal design requires simultaneous optimization and high-dimensional integration (Muller, Sanso & De Iorio, 2004, JASA). o Computation achieved by Sequential Monte Carlo (SMC) particle filtering algorithm with simulated annealing (Amzal, Bois, Parent & Robert, 2006, JASA). 24

25 Outline o Introduction o Adaptive Design Optimization (ADO) o Example Applications of ADO n Memory Retention Experiment n Risky Choice Experiment o Conclusions 25

26 Example Application 1 of ADO: Discriminating Models of Memory Retention 26

27 Designing a retention experiment: What Time Intervals Should be Employed? o Retention: the rate of retrieval failure over time 27

28 Designing a retention Experiment: What Time Intervals Should be Employed? o Two models of retention 28

29 Designing a retention experiment: What Time Intervals Should be Employed? Model mimicry between POW and EXP POW EXP 29

30 Designing a retention experiment: What Time Intervals Should be Employed? Model predictions for a narrow range of parameters (100 Bernoulli trials) POW EXP Recall rate Recall rate Time (seconds) 0.58<a< <b<0.70 Time (seconds) 0.88<a< <b<

31 Designing a retention experiment: What Time Intervals Should be Employed? o A good design choice can aid in discrimination seconds: Bad designs! POW EXP Recall rate Recall rate Time (seconds) Time (seconds) 31

32 Designing a retention experiment: What Time Intervals Should be Employed? o A good design choice can aid in discrimination 2-4 seconds: Good designs! POW EXP Recall rate Recall rate Time (seconds) Time (seconds) 32

33 Designing a retention experiment: What Time Intervals Should be Employed? o More realistic situations require a more principled approach to finding optimal designs. POW EXP Recall rate Recall rate Time (seconds) a~beta(2,1) b~beta(1,4) Time (seconds) a~beta(2,1) b~beta(1,80) 33

34 ADO in Action 34

35 Simulation experiment o Data generated from EXP with a=0.71 and b=

36 Simulation results 30 o Stage 1 Compute optimal design Generate data (30 Bernoulli trials) 30 P 0 (POW)=0.5 P 0 (EXP)= Time (seconds) 7 correct responses 16 seconds Time (seconds) 36

37 Simulation results 30 o Stage 1 Update model probabilities Update parameter probabilities 30 P 1 (POW)=0.65 P 1 (EXP)= Time (seconds) 7 correct responses 16 seconds Time (seconds) 37

38 Simulation results 30 o Stage 2 Compute optimal design Generate data (30 Bernoulli trials) 30 P 1 (POW)=0.65 P 1 (EXP)= Time (seconds) 0 correct responses 96.4 seconds Time (seconds) 38

39 Simulation results 30 o Stage 2 Update model probabilities Update parameter probabilities 30 P 2 (POW)=0.30 P 2 (EXP)= Time (seconds) 0 correct responses 96.4 seconds Time (seconds) 39

40 Simulation results 30 o Stage 3 Compute optimal design Generate data (30 Bernoulli trials) 30 P 2 (POW)=0.30 P 2 (EXP)= Time (seconds) 0 correct responses 96.8 seconds Time (seconds) 40

Simulation results o Stage 3 Update model probabilities 30 P 3 (POW)=0.02 30 P 3 (EXP)=0.

41 Simulation results o Stage 3 Update model probabilities 30 P 3 (POW)= P 3 (EXP)= Time (seconds) 0 correct responses 96.8 seconds Time (seconds) 41

42 Simulation results Bayes Factor = 20 (very strong evidence for EXP) 42

43 Simulation results Bayes Factor = 20 (very strong evidence for EXP) 43

44 Outline o Introduction o Adaptive Design Optimization (ADO) o Example Applications of ADO n Memory Retention Experiment n Risky Choice Experiment o Conclusions 44

45 Example Application 2 of ADO: Discriminating Models of Risky Choice 45

46 Risky Choice Experiment q 100 choices over the course of 60 minutes.

47 Probability Weighting Functions Empirical evidence has shown that decision makers do not weight probabilities linearly. These distortions of the probability scale affect how people choose between uncertain prospects. In Cumulative Prospect Theory (CPT), such distortions are quantified with a probability weighting function. Numerous functional forms have been proposed.

48 Probability Weighting Functions Tversky and Kahneman (1992)

49 Probability Weighting Functions Axiomatically derived functions (Prelec, 1998)

50 Probability Weighting Functions Linear in log odds (Gonzalez and Wu, 1999)

51 Probability Weighting Functions Previous attempt to discriminate probability weighting functions have yielded ambiguous results (Gonzalez & Wu, 1999; Stott, 2006)

52 Model Mimicry Given that these different functions can mimic each other so closely, does it really matter which one we use? Are there ANY situations in which these two functions would imply different choice predictions?

53 Choice Predictions of CPT Which do you prefer? Gamble A Gamble B or $0 $500 $1000 $0 $500 $1000

Choice Predictions of CPT Which do you prefer? Gamble A Gamble B 0.4 0.4 0.2 0.3 0.6 0.

54 Choice Predictions of CPT Which do you prefer? Gamble A Gamble B or $0 $500 $1000 $0 $500 $1000 Weight of the probability of getting at least x i minus the weight of getting something strictly better than x i Subjective value of money

55 Choice Predictions of CPT Which do you prefer? Gamble A $0 $500 $1000 or Gamble B $0 $500 $1000 Assume WLOG that v($0)=0, v($1000) = 1, and v($500) = v where 0<v<1.

56 Choice Predictions of CPT $0 Gamble A $500 $1000 Which do you prefer? or Gamble B $500 Assuming v=0.5 and a Prelec-2 weighting function with r=0. 58 and s=1.18 So A is preferred to B $0 U(A) = > U(B) = $1000

57 Choice Predictions of CPT Gamble A $0 $500 $1000 Which do you prefer? Gamble B Assuming v=0.5 and a LinLog weighting function with r=0. 60 and s=0.65 So B is preferred to A! or U(A) = < U(B) = $0 $500 $1000

58 Choice Predictions of CPT Was this a pathological case or are there many gamble pairs with this property? One way to answer this question would be to consider the space of all possible gambles on three fixed outcomes, and search for pairs of gamble with this property. General form of a three-outcome gamble: p(x Low ) p(x Mid ) p(x High ) X Low X Mid X High

59 Designing an experiment that optimally discriminates among different forms of probability weighting function T-K Prelec-1 Prelec-2 LinLog

60 ADO-based Risky Choice Experiment q 100 choices over the course of 60 minutes. q Possible outcomes were $25, $350, $1000. q Gambles were hypothetical. Participants were paid $10. q Compared models (functional forms) based on their posterior probabilities. q Goal was to identify one form as superior (probability > 0.95)

61 Risky Choice Experiment

62 Individual results q Results for a typical participant Model probabilities across stages of the experiment MLE of each form at the conclusion of the experiment

63 Individual results q Results for another typical participant Model probabilities across stages of the experiment MLE of each form at the conclusion of the experiment

64 Individual results q Results for yet another typical participant Model probabilities across stages of the experiment MLE of each form at the conclusion of the experiment

65 Summary of results Participant ID Best Model Posterior Probability Participant ID Best Model Posterior Probability 1 LinLog Prl Prl Prl Prl EU.92 4 Prl LinLog.57 5 LinLog Prl LinLog Prl LinLog Prl Prl LinLog.80 9 EU Prl LinLog.95

66 Summary of results q Using ADO, different functional forms of the probability weighting function are discriminated decisively q One parameter functions (TK and Prl1) do not fit well at an individual participant level.. q There is striking heterogeneity in individual weighting functions.

67 Outline o Introduction o Adaptive Design Optimization (ADO) o Example Applications of ADO n Memory Retention Experiment n Risky Choice Experiment o Conclusions 67

68 Conclusions o Adaptive design optimization (ADO) is a promising new experimental tool that facilitates efficient collection of data in experiments discriminating and estimating formal models o Current and future work n n n n n n ADO for visual psychophysics ADO for cognitive neuroscience (e.g., fmri) ADO for skill learning (with AFRL) ADO for inter-temporal choice ADO with clinical patients (e.g., OCD) Hierarchical Bayes ADO 68

69 Hierarchical Bayes ADO (HADO) (Kim, Pitt, Lu, Steyvers & Myung, 2014) Posterior Predictive Joint Posterior Hierarchical Updating ADO Posterior Prior Parameter Updating Optimal Design Observed Outcome Design Optimization Experiment 69

70 Lab team and collaborators Mark Pitt Dan Cavagnaro Woojae Kim Zhong-Lin Lu (OSU) Hairong Gu (OSU) Yun Tang (OSU) Mark Steyvers (UCI) Rich Gonzalez (U Michigan) Gabe Aranovich (Stanford) Sam McClure (Stanford) 70

71 Much ADO about Nothing - William Shakespeare 71

72 Thank You! 72

Model Evaluation and Selection in Cognitive Computational Modeling

Model Evaluation and Selection in Cognitive Computational Modeling Jay Myung Department of Psychology Ohio State University, Columbus OH, USA In collaboration with Mark Pitt Mathematical Psychology Workshop