Hierarchical Bayes. Peter Lenk. Stephen M Ross School of Business at the University of Michigan September 2004

Size: px

Start display at page:

Download "Hierarchical Bayes. Peter Lenk. Stephen M Ross School of Business at the University of Michigan September 2004"

Caren Bradford
6 years ago
Views:

1 Herarchcal Bayes Peter Lenk Stephen M Ross School of Busness at the Unversty of Mchgan September 2004

2 Outlne Bayesan Decson Theory Smple Bayes and Shrnkage Estmates Herarchcal Bayes Numercal Methods Battng Averages HB Interacton Model 2

3 Bayesan Decson Model Decson Model Actons States Consequences Inference Model Pror Lkelhood Separable Parts: Don t mx them Updatng Bayes Theorem Posteror Loss Functon Integraton Result Optmal Decson: Marketng Acton & Inference 3

4 Bayes Theorem Model for the data gven parameters f(y θ) were θ = unknown parameters E.g. Y = µ + ε and θ = (µ,σ) Lkelhood l(θ) = f(y θ) f(y 2 θ) f(y n θ) Pror dstrbuton of parameters p(θ) Update pror p(θ Data ) = l(θ)p(θ)/f(y) f(y) = margnal dstrbuton of data 4

5 Easy Example: Estmate mean from a normal dstrbuton. Y = µ + ε Error terms {ε } are d normal Mean s zero Standard devaton of error terms s σ. Assume that σ s known 5

6 Conjugate Pror for Mean Pror dstrbuton for µ s normal Pror mean s m 0 Pror varance s v 0 2 Precson s / v 0 2 6

7 7 Posteror Dstrbuton and 0 ) ( v n v w v n n w m w wy m n n + = < < + = + = σ σ σ Observe n data ponts Posteror dstrbuton s normal Mean s m n Varance s v n 2

8 Shrnkage Estmators Bayes estmators combnes pror guesses wth sample estmates If the pror precson s larger than sample precson (pror has more nformaton), then put more weght on pror mean. If the sample precson s larger the pror precson (sample has more nformaton), then put more weght on sample average 8

9 Example Y s normal wth mean 0 and Varance 6 Normal pror for the populaton mean Mean = 5 & Varance = 2 Pror s nformatve and way off Data n = 5, Average = 0.9, Varance = 4.7 Posteror s normal Mean = 7.4 and varance s.2 9

10 Pror & Posteror n=5 Pror Posteror Lkelhood Pror Mean Posteror Mean Mean Sample Mean 0

11 Pror & Posteror n=50 Pror Posteror Lkelhood Mean

12 Use Less Informatve Pror Y s normal wth mean 0 and Varance 6 Normal pror for the populaton mean Mean = 5 & Varance = 0 nstead of 2 Pror s flatter Data n = 5, Average = 0.9, Varance = 4.7 Posteror s normal Mean = 9.6 and varance s 2.3 2

13 Pror & Posteror n=5 Pror Posteror Lkelhood Mean 3

14 Pror & Posteror n=50 Pror Posteror Lkelhood Mean 4

15 Summary Pror has less effect as sample sze ncreases Very nformatve prors gve good results wth smaller samples f pror nformaton s correct If you really don t know, then use flatter or less nformatve prors 5

16 What about Marketng? HB revoluton n how we thnk about customers 6

17 Henry Ford All Customers are the Same 7

18 Alfred Sloan Several Common Preferences 8

19 Contnuous Heterogenety 9

20 Proft Maxmzaton 20

21 It Can Get Wld! 2

22 HB Model for Weekly Spendng Wthn-subject model: Y,j = µ + ε,j and var(ε,j ) = σ 2 Heterogenety n mean weekly spendng or between-subjects µ = θ + δ and var(δ ) = τ 2 Pror Dstrbuton θ s N(u 0,v 02 ) Varances are known 22

23 Varances & Covarances Var(Y,j µ ) = σ 2 (known µ ) Var(Y,j θ) = τ 2 + σ 2 (unknown µ ) Cov(Y,j, Y,k ) = τ 2 for j not equal to k Observatons from dfferent subjects are ndependent 23

24 24 Precsons = /Varance ( ) ( ) ( ) ( ) ( ) ( ) n Y n Y Y Y v j j , 2, and Pr Pr and Pr Pr Pr pror precson s Pr σ τ θ σ µ σ τ θ σ µ τ θ µ θ + = = + = = = =

25 25 Jont Dstrbuton ( ) ( ) ( ) ( ) = = = n j j N y f g v u h P Y 2, ,,,,, σ µ τ θ µ θ θ µ Pror Between Subjects Wthn Subjects

26 Bayes Theorem P ( µ, θ Y ) = (, µ, θ ) (, µ, θ ) P Y P Y d µ d θ P P ( ) ( ) P Y, µ, θ µ, θ Y = P( Y ) ( µ, θ Y ) P( Y, µ, θ ) Constant because Y s fxed & known 26

27 Bayes Estmator Posteror means are optmal under squared error loss E(µ Y) and E(θ Y) Measure of accuracy s posteror varance var(µ Y) and var(θ Y) 27

28 Posteror Dstrbuton of θ Normal dstrbuton Posteror mean s u N Posteror varance s v N 2 Posteror precson s Pr(θ Y) = /v N 2 28

29 Pr Posteror Precson of θ Pr = Precson = /Varance ( ) ( ) θ Y Pr( θ ) + Pr Y θ N = = Pr ( ) ( ) θ = and Pr Y θ v 2 0 = τ 2 + σ 2 n 29

30 Posteror Mean of θ N u N = w u = wy w 0 = Pr Pr( θ ) ( θ Y ) and w = Pr Pr ( ) θ Y ( θ Y ) 30

31 Updatng of θ Pror Mean Posteror Mean u 0 u N Pror Var Posteror Var v 0 2 v N 2 3

32 32 Posteror Mean of µ [ ] ( ) ( ) ( ) ( ) Pr Pr Pr N n n Y Y u Y Y E σ τ σ µ θ µ µ α α α µ + = + = + =

33 Between-Subject Heterogenety n Mean Household Spendng 0.4 Heterogenety Dstrbuton Spendng 33

34 Between & Wthn Subjects Dstrbutons Heterogenety Subject Subject 2 Subject Dstrbuton Spendng 34

35 2 Observatons per Subject Heterogenety Subject Subject 2 Subject Dstrbuton Spendng 35

36 Subject Averages Heterogenety Subject Subject 2 Subject Dstrbuton Spendng 36

37 Pooled Estmate of Mean Heterogenety Subject Subject 2 Subject 3 Dstrbuton Spendng Pooled estmate of populaton average spendng 37

38 Sample Estmates Dsaggregate estmate of µ only uses the observatons for subject. Super f 30 or more observatons per subject Pooled or aggregate estmator Y of θ Y Smaller samplng error Ignores ndvdual dfference 38

39 HB Shrnkage Estmator Take combnaton of ndvdual average and pooled average wy What are the correct weghts? HB automatcally gves optmal weghts based on Pror varance of µ Number of observatons for subject Varance of past spendng for subject Number of subjects ( w )Y + Amount of heterogenety n household means 39

40 Shrnkage Estmates Heterogenety Subject Subject 2 Subject Dstrbuton Spendng 40

41 20 Observatons per Subject Heterogenety Subject Subject 2 Subject Dstrbuton Spendng 4

42 Bayes & Shrnkage Estmates Bayes estmators automatcally determne the optmal amount of shrnkage to mnmze MSE for true parameters and predctons Borrows strength from all subjects Tradeoff some bas for varance reducton 42

43 Good & Bad News Only smple models result n equatons Models we use n marketng requre numercal methods to compute posteror mean, posteror standard devatons, predctons and so on. 43

44 Numercal Integraton Compute posteror mean of functon T(θ). E[ T ( θ ) y] = T ( θ ) p( θ y) dθ 44

45 T(x) and f(x) X T(x)f(x) X 45

46 Trapezod Rule T(x)f(x) X 46

47 Grd Methods Very accurate wth few functonal evaluatons Need to know where the acton s Does not scale well to hgher dmensons You need to be very smart to make t work 47

48 Monte Carlo Generate random draws θ, θ 2,, θ m from posteror dstrbuton usng a random number generator. [ ( ) ] ( ) θ y T θ E T m j= What happened to the densty of θ? m j 48

49 Good & Bad News If your computer has a random number generator for the posteror dstrbuton, Monte Carlo s a snap to do. Your computer almost never has the correct random number generator. 49

50 Importance Samplng Would lke to sample from densty f You have a good random number generator for the densty g Importance samplng lets you generate random devates from g to evaluate expectatons wth respect to f. Generate φ,, φ m from g 50

51 5 Importance Samplng Approxmaton ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) = = = = = = = m j j m m m r r w g f r w T r r T d g g f T d f T and ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ θ θ θ

52 Markov Chan Monte Carlo Extenson of Monte Carlo Random draws are not ndependent Jont dstrbuton f(β,φ) does not have a convenent random number generator. Condtonal dstrbutons g(φ β) and h(β φ) are easy to generate from. 52

53 Iteratve Generaton from Full Condtonals Start at φ 0 Generate β from h(β φ 0 ). Generate φ from g(φ β ). Generate β m+ from h(β φ m ) Generate φ m+ from g(φ β m+ ) 53

54 Baseball Example 90 MLB Players n 2000 season. Observe at bats (AB) and hts (BA) n May Infer dstrbuton of battng averages across players. Predct battng averages n October usng data from May. 54

55 Baseball Battng Averages The Cleveland Indans May October BA AB BA AB Murray Belle Vzquel Pena

56 Estmatng a Probablty n at bats n May X = number of hts p = battng average for season X has a bnomal dstrbuton mean np varance np(-p) 56

57 0.4 n = 50 p = 0.3 Bnomal Dstrbuton mean = 5.00 STD = Probablty Sample Successes 57

58 Need Pror for Battng Average p 0 < p < Beta dstrbuton s popular choce It has two parameters: α and β Densty s proportonal to p α- (-p) β- Pror Mean = α/(α+β) 58

59 59 Beta Pror for p [ ] ( ) ( ) ( ) ( ) 0 ), (, Γ Γ + Γ = = p for p p p Beta p f β α β α β α β α β α ( ) = Γ 0 dx e x x α α

60 Mean and Varance α E( p) = α + β E( p) [ E( p) ] V ( p) = α + β + 60

61 Beta Dstrbuton alpha = beta = mean = 0.50 Densty P 6

62 Beta Dstrbuton 6 alpha = 5 beta = mean = Densty P 62

63 Beta Dstrbuton Densty alpha = 5 beta = 0 mean = P 63

64 Beta Dstrbuton 2 alpha = 0.5 beta = 2 mean = Densty P 64

65 Beta Dstrbuton 5 alpha = 0.5 beta = 0.5 mean = Densty P 65

66 66 Bayes Theorem: Update pror for p after observng n and x ( ) ), ( ], [ ] Pr[ ], [ ] Pr[ ], [ ] Pr[ ],, [ 0 x n x p Beta p p p f p x dq q f q x p f p x x p f x n x + + = = + + β α β α β α β α β α β α

67 Inference About P: Posteror s also Beta Pror Parameters Posteror Parameters α α+x β β+n-x 67

68 Posteror Mean of p: Its another shrnkage estmator w pˆ n n + ( w n ) ( Pror Mean) x n pˆ n = n and w n = α + β + n 68

69 Densty pror mean = 0.25 alpha = 5 beta = Pror p=beta & x=bnomal N=50 & X = 8 Posteror P 69

70 Herarchcal Bayes Model Varaton wthn batter : X gven p has a bnomal dstrbuton Varaton among batters: p s a beta dstrbuton wth parameters α and β. Pror dstrbuton for α and β Gamma (ch-square) dstrbuton 70

71 7 Gamma Dstrbuton s Y E Y V and s r Y E y for e y r s s r y G y g sy r r ) ( ) ( ) ( 0 ) ( ), ( ) ( = = > Γ = =

72 0.2 Gamma Dstrbuton alpha = beta = 0. mean = 0.00 STD = De nsty X 72

73 0.07 Gamma Dstrbuton alpha = 0 beta = 0.5 mean = STD = De ns ty X 73

74 Specfy Pror Parameters: r, s, u & v Prors: α s G(r,s) & β s G(u,v). E(α) = r/s and V(α) = E(α)/s. s determnes varance relatve to mean. I used s = 0.25 or the varance s four tmes larger than the mean. Same for v. 74

75 75 [ ] 0 ), ( ) ( p u r r p E E p E = + = = β α [ ] ( ) ), ( = = u r p p p V E c β α ( ) ( ) ( ) = = c p p p u and c p p p r

76 Specfy Pror Parameters Guess a mean of all battng averages: p 0 = 0.25 Measure of my uncertanty of that guess: c = 0.0 Parameter r = 4.4 Parameter u =

77 MCMC for Battng Averages Need full condtonals for p gve α and β Beta dstrbuton Need full condtonals for α and β gven p. Unknown dstrbuton Use Metropols algorthm 77

78 78 MCMC: Full Condtonals for Player Battng Average p [ ] ( ) ( ) ( ) ( ) x n x a x n x p Beta p p p f p x x p f + + = + + β α β α β α β,, Pr,,

79 79 MCMC: Full Condtonal for α and β ( ) ( ) ( ) ( ) = n n n v u g s r g p p p p x x g,,,,,, β α β α β α K K

80 Metropols Algorthm Want to generate θ from f Instead, generate canddate value φ from g(. θ) Densty g can depend on θ eg Random walk: φ = θ + δ Wth probablty α(θ,φ) accept φ as the new value of θ Wth probablty α(θ,φ) keep θ 80

81 Transton Probablty α ( θ, ϕ) ( ϕ) g( θ ϕ) ( θ ) g( ϕ θ ) f = max, f f s the full condtonal densty of θ Ratos: do not need to know constants Usually compute α on log scale. Works f denstes are not zero Works better f g s close to f 8

82 Alpha and Beta vs Iteraton Parameter BETA ALPHA MCMC Iteraton 82

83 Posteror of Alpha Alpha Frequency

84 Posteror of Beta Beta Frequency

85 Parameters Estmates α (std) β (std) Pror 7.8 (8.4) 53.2 (4.6) Posteror 26.2 (4.6) 68.2 (.7) 85

86 Dstrbuton of Battng Averages Densty Posteror after May Pror Battng Average 86

87 Predcton of Season Averages RMSE MAPE MLE % Bayes % 87

88 Battng Averages Bayes Shrnks MLE Season MLE Bayes May

89 HB Conjont Lenk, DeSarbo, Green, Young (996) Evaluated computer profles on a 0 to 0 scale for lkelhood to purchase 0 = Would not buy 0 = Would defntely buy Desgn 78 subjects 3 attrbutes wth two levels each 20 profles per subject 89

90 Attrbutes: Effect Codng. Hotlne support 2. RAM 3. Screen Sze 4. CPU 5. Hard Dsk 6. Multmeda 7. Cache 8. Color 9. Retal Store 0.Warrantee.Software 2.Guarantee 3.Prce 90

91 Subject-Covarates Female: f female and 0 f male Years: # years of work experence Own: f has computer & 0 else Nerd: f techncal background & 0 else Apply: # of software applcatons Expert: self-report expertse ratng 9

92 Summary Statstcs for Covarates Varable Mean Std Dev MIN MAX FEMALE YEARS OWN NERD APPLY EXPERT

93 Interacton Model Wthn-Subjects Y [ ] ( ) ε = N ε 0, σ 2 I = X β + ε for =, K, m m n Between-Subjects Heterogenety β = Θ z + [ δ ] = N ( δ 0, ) p δ 93

94 Average over Posteror Means and Std Dev of Partworths Across Subjects PostMean PostSTD PostMean PostSTD CNST Cache HotLne Color RAM Dstrbtn ScrnSz Wrrnty CPU Sftwr HrdDsk Grntee MultMd Prce

95 Impact of Covarates on Partworths CNST RAM CPU Dstrbtn Wrrnty Grntee Prce CNST FEMALE YEARS -0. OWN NERD APPLY 0.0 EXPERT

96 Summary HB allows ndvdual-level coeffcents Two level model Wth-n subjects Between subjects (heterogenety) HB shrnks unstable, subject-level estmators to populaton mean 96

97 Summary BDT provdes ntegrated framework for makng decsons and nference Good models consder all sources of uncertanty Good methods keep track of all sources of uncertanty Bayes does both 97

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs