Threshold models with fixed and random effects for ordered categorical data

Size: px

Start display at page:

Download "Threshold models with fixed and random effects for ordered categorical data"

David Poole
6 years ago
Views:

1 Threshold models with fixed and random effects for ordered categorical data Hans-Peter Piepho Universität Hohenheim, Germany Edith Kalka Universität Kassel, Germany

2 Contents 1. Introduction. Case studies 3. Numerical issues 4. Simulations 5. Conclusion

3 1. Introduction Analysis of ordered categorical data: ANOVA, LSD Transformation to normality Nonparametric methods Threshold model

4 Percent Consumer Employee Fig. 1.1: Observed frequencies for snack food scores (Best et al., 000).

5 0.16 Consumers π π 1 π π 6 π 3 π 4 π 7 π 8 π θ 1 θ θ 3 θ 4 θ 5 θ 6 θ 7 θ 8 m Latent variable

6 0.16 Employees θ 1 θ θ 3 θ 4 θ 5 θ 6 θ 7 θ 8 m Latent variable

7 Latent variables Latent variable m (quantitative) observed score y (ordered categorical) " increasing the energy of the physical stimulus, or the concentration or amount of food ingredient, should result in an increase in how strong something feels, looks, smells, or tastes. We increase the salt in a soup, and it tastes saltier." (Lawless and Heymann, 1998, p. 08)

8 Multinomial probabilities Threshold values θ 1, θ,... Multinomial probabilities π 1, π, π1 π π... 3 π I π = Φ 1 Φ Φ ( θ η) ( θ η) Φ( θ η) ( θ η) Φ( θ η) = 1 = 3 ( θ I η) Φ( θ η) ( θ η) = Φ Φ 1 I 1 I = 1 I Φ = standard normal c.d.f. η = mean on the latent scale I = number of categories

9 Linear modelling Expected value on latent scale: η k = µ + α k where η k = mean for the k-th panel (k = 1: consumers k = : employees) α k = effect of k-th panel Random variable on latent scale: m jk = η k + e jk e jk ~ N ( 0,1)

10 Percent Consumer Employee Fig. 1.3: Fitted frequencies of scores for snack food scores.

11 Percent Mean= 0 Mean= Var(scores) = 3.65 Var(scores) = Fig. 1.4: Fitted frequencies based on threshold model for snack food scores, assuming means of η = 0 and η = 3 for the latent variable.

12 Percent Sample1 Sample Var(scores)=1.77 Var(scores)= Fig. 1.5: Histogram of observed overall acceptance data (Villanueva et al., 000). Scores display significant variance heterogeneity by an LR-test, assuming normality (p = < ).

13 Percent sample1 sample Fig. 1.6: Fitted histogram of overall acceptance data (Villanueva et al., 000), assuming threshold model with homogeneous variance ( log L = ).

14 Percent sample1 sample σ 1 = 1 σ = Fig. 1.7: Fitted histogram of overall acceptance data (Villanueva et al., 000), assuming threshold model with heterogeneous variance ( log L = ). σ k = standard deviation on latent scale for k-th product.

15 Unequal variance on latent scale invalidates ANOVA of observed scores! Example: Date ganerated by threshold model with three categories (θ 1 = 1 and θ = 1) Parameter Group 1 Group Latent scale: Mean Variance Observed scale: Mean.4.

16 . Case studies Case study 1: Consumer preferences for on farm and industrially processed quarg samples nine point hedonic scale 1 = dislike extremely to 9 = like extremely 15 consumers Seven quarg products

17 An initial model m ij = α i + u j + e ij (.1) where m ij = latent variable for i-th product and j-th assessor α i = expected value of i-th product u j = random effect of j-th assessor; u j ~ N(0, σ u e ij = random deviation of j-th assessor for i-th product; e ij N(0, 1) σ u corr( m ij, m i ' j ) = σ + 1 u )

18 Table.1: The Johnson System for u j. Distribution Model for u j Support ( ) S L u j exp w j S B u j = 0 < u j exp = φ 1+ exp ( w ) j ( w ) ( ) S U u j φ sinh w j ( ) : w N µ, σ j ~ w w j 0 u j φ = < u j

19 Factor-analytic models m ij = α i + λ i u j + e ij λ i = factor loading for i-th product u j = latent value for j-th assessor; u j N(0,1) e ij = error; e ij N(0,1) Model implies heterogeneity: var( m ) λ ij = i + corr( m ij, m i ' j 1 ) = i ( )( λ + 1 λ + 1) i λ λ i i

20 Model selection by AIC Akaike Information Criterion = AIC: AIC = log L + p where log L = maximized log-likelihood p = number of parameters "Smaller is better"

21 Table.: Model fits for quarg data. Model for latent variable Distribution for random AIC effects m ij = α i + e ij ( 0, ) m ij = α i + u j + e ij j ~ N u u σ ( u, µ, σ ) u u j ~ S L w w φ (, µ, ) w σ (, µ, σ ) j ~ S B w φ j ~ S U w w m ij = α i + λ i u j + e ij ~ N( 0,1 ) m ij = α i + λ i u j1 + λ i u j + e ij ~ N( 0, 1) m ij = α i + λ i u j1 + λ i u j + λ i3 u j3 + e ij ~ N( 0,1) φ u j u jk u jk

22 Factor-analytic threshold model (two multiplicative terms): Estimate of Product α i ANOVA of scores: Product Score mean 1.10 bc.0 c d be de 6.11 bc 7.88 a SED b 5.60 b c b c b a LSD Estimates in a column followed by the same letter are not significantly different.

23 Artificial example Four treatments (T1, T, T3, T4) Comparison of T1 and T4 significant Initial display: T1 a T a T3 a misses significance T4 a of T1-T4 comparison Insertion: T1 a b T1 a b T1 b T a b T a b T a b T3 a b T3 a b T3 a b T4 a b T4 a b T4 a

24 7 Observed score mean Estimated effect Fig..1: Plot of observed score means (ANOVA) versus estimated effects under threshold model for seven quarg products.

25 Expected score mean Linear predictor Fig..4: Plot of score mean versus linear predictor under estimated threshold model for quarg data.

26 Case study : Sensory descriptive analysis of on farm and industrial processed quarg samples by a trained panel trained panel of 15 assessors 8 quarg samples (B1-B8) 1 sensory attributes 1 sessions ordinal scale (0 = non- detectable to 5 = very strong)

27 Mixed linear model (per attribute): m ijk = u j + τ i + α k + (ατ) ik + e ijk (.6) where u j τ i α k = effect of j-th assessor = effect of i-th product = effect of k-th session (random) (ατ) ik = interaction of i-th product and k-th session (random)

28 Principal component analysis Two standardizations: σ = I i= 1 ( τ i τ ) ( I 1) (.7a) σ = σ ατ +1 (.7b) singular value decomposition of standardized values

29 PC Axis.5 B5 1.5 e b l B7 g B B8 B3 B4 i B1 f h c k a j d B PC Axis 1 Fig..5: Biplot for PCA of 8 quarg samples and 1 sensory attributes based on standardization (.7a). 83% of variance are explained. B1-B8: samples; letters: attributes)

30 PC Axis.5 B5 1.5 e g B3 B4 c d k b l B i a d B1 j B6-1.5 B7 B8 h PC Axis 1 Fig..6: Biplot for PCA of 8 quarg samples and 1 sensory attributes based on standardization (.7b). 91% of variance are explained. B1-B8: samples; letters: attributes).

31 Heterogeneity of variance We hypothesized that variance was larger for samples B1 through B4 Fit model with heterogeneity in variance of interaction (ατ) ik. σ σ ατ (1) ατ () : variance for products B1 to B4 : variance for products B5 to B8

32 Table.7: Variance components (case study ). *: p-value < Homogeneous Heterogeneous LR-test B1-B8 B1-B4 B5-B8 σ σ σ ατ ατ 1) ατ () Attributes ( Granular ns Separation of whey * Creamy yellow color * Bitter ns Sour ns Flavour of starter cultures * Flavour of yoghurt * Rancid * Other flavour ns Firm * Gritty ns Creamy ns

33 Inter-assessor agreement Assessed separately for each session m ij = u j + τ i + e ij (.8) u j = effect of j-th assessor τ i = effect of i-th product τ i ~ N(0, σ τ ) e ij = random error; standard normal Intra-class correlation: ρ IC = σ τ 1+ σ τ

34 Table.9: Estimates of the intra-class correlation ρ IC and Kendall's coefficient of concordance for quarg data. Attributes Session granular colour sour Kendall's measure of concordance: Intraclass correlation (ρ IC ):

35 Case study 3 Consumer preferences for raspberry beverages Two regions (geographies) 8 beverages Panel 1 (13 assessors): 1 = dislike extremely, 9 = like extremely Panel (7 assessors): 1 =bad, 5 =good Questions: Difference in accuracy among scales Differences in ratings among regions Correlation among attributes

36 Difference in accuracy Linear model: m ij = α i + u jk + e ijk (.9) α i = effect of i-th product u jk = effect of j-th assessor on k-th scale u ~ N(0, σ ) jk e ijk = error e ~ ijk N(0, σ u(k ) e(k ) ) Separate set of thresholds for both scales! Hypotheses of interest: H 01 : H 0 : σ = σ u( 1) u() σ = σ e( 1) e()

37 Table.9: Likelihood statistics and variance component estimates for liking, raspberry beverages. Geography 1: Model restriction log L σ e(1) σ e() σ u(1) σ u() None σ = σ u( 1) u() σ e( 1) = σ e() σ e( 1) = σ e() ; σ u( 1) = σ u()

38 Geography : Model restriction log L σ e(1) σ e() σ u(1) σ u() None σ = σ u( 1) u() σ e( 1) = σ e() σ e( 1) = σ e() ; σ u( 1) = σ u()

39 Differences in ratings among regions m ijh = α i + γ ih + u jh + e ijh (.11) where α i = main effect of i-th product γ ih = deviation of i-th product in h-th geography (h = 1, ) u jh = effect of j-th assessor in h-th geography, u ~ N(0, ) jh σ e ijh = error (standard normal) u Ratings the same? H 0 : γ i1 = γ i for every product i

40 Table.1: Comparison of mixed threshold models for testing H 0 : γ 1i = γ i (consumers behave identically in both regions). Attribute: liking (case study 3). Scale Model for m ijh AIC Test of H 0 : γ 1i = γ i Wald-χ p-value 5 point α i + γ ih + u jh + e ijh α i + γ ih + λ i u jh + e ijh point α i + γ ih + u jh + e ijh < α i + γ ih + λ i u jh + e ijh <0.0001

41 Correlation among attributes liking and appearance geography 1 five-point ordinal scale Bivariate mixed model: m ij1 = α i1 + u j1 + e ij1 (.13) m ij = α i + u j + e ij where α ih = effect of i-th product for h-th attribute u jh = effect of j-th assessor for h-th attribute e ijh = residual corresponding to m ijh

42 Distributional assumptions: , 0 0 ~ u u u u u u u u j j BVN u u σ σ σ ρ σ σ ρ σ 1 1, 0 0 ~ 1 e e ij ij BVN e e ρ ρ where BVN(.,.) = bivariate normal distribution ρ u = between-product correlation ρ e = within-product correlation

43 Bivariate multinomial probabilities of observed scores: π ab with = ( θ η, θ η ρ ) P( y π 1 = a, y = b) = Φ 1a 1 b, e a b r = 1 s= 1 rs< ab (a = 1,, 5; b = 1,, 5) rs (.14) y 1, y = observed scores for liking and appearance Φ = bivariate standard normal c.d.f θ 1a, θ b = thresholds; θ 15 = θ 5 = η 1, η = linear predictors (means) on latent scale Result: ˆ ρ = ( s. e. = 0.011) (within-product analysis) e

44 Case study 4 Recovery of inter-assessor information Sensory panels of 33 assessors Three groups of cheese products Hedonic 9-point scale Tests in incomplete blocks: five or six products per assessor Type of analysis Intra-assessor analysis Assumption about assessor effects fixed Recovery of inter-assessor information random

45 Table.14: Average standard errors of a difference (s.e.d.). Average s.e.d. Use of inter-assessor Hard Fresh cheese Fresh Information cheese with additives cheese yes no

46 3. Numerical issues y = observed scores u = random effects Joint density of u and y: φ yu (y, u) = φ yu (y u)φ u (u) (3.1) where φ yu (y u) = conditional density of y, given u φ u (u) (multinomial) = marginal density of u (normal) Estimation maximize φ y ( y) = φyu ( y v) φu ( v) dv (3.)

47 Numerical integration Approximations of the likelihood poor performance for small samples Gaussian quadrature Computationally demanding Handles only one subject level All methods rely on asymptotic theory!

48 4. Simulations 4.1 No random effects Hypothesis: Show that ANOVA may be problematic with ordinal data Simulate according to: m ijk = α i + u j + e ijk (4.1) where α i = product effect (i = 1,, I) u j = random assessor effect (j = 1,, J), e ijk = error (standard normal distribution) (k = 1,.., R) Three (nine) categories

49 Simulations under: global null hypothesis departure from global null Teste evaluated: Global H 0 : ANOVA F-test LR test for threshold model H 0 : α 1 = α : ANOVA-LSD paired t-test Wald-test for threshold model

50 Table 4.5: Simulation results for threshold model with fixed effects, Type I error at nominal significance level of 5%, Global H 0 false, nine categories (θ 1 = 0.5, θ = 0.4, θ 3 = 0.3, θ 4 = 0.1, θ 5 = 0.1, θ 6 = 0., θ 7 = 0.3, θ 8 = 0.5, σ u = 0). Test of H 0 : α 1 = α. R J I α 1 α α i> ANOVA Threshold Paired LSD Wald t-test

51 4. Pairwise comparison with heteroscedasticity on latent scale Hypothesis: Heteroscedasticity on latent scale of threshold model invalidates ANOVA Simulated according to: m ij = α i + e ij (4.) where α i = product effect (i = 1, ) e ij = error of j-th replicate for i-th product (j = 1,, J i ) σ 1 = standard deviation of e 1j (product 1) σ = standard deviation of e j (product ) Nine categories.

52 Tests evaluated: Simple t-test Satterthwaite t-test (heteroscedastic) Standard threshold model Threshold model with heteroscedastic errors

53 Table 4.6: Simulation results for heteroscedastic threshold model, Type I error at nominal significance level of 5%, α 1 = α = 0. Test of H 0 : α 1 = α. Threshold model θ 1 θ θ 3 θ 4 θ 5 θ 6 θ 7 θ 8 σ t-test σ 1 = σ σ 1 σ J 1 = 50; J = 150:

54 Threshold model θ 1 θ θ 3 θ 4 θ 5 θ 6 θ 7 θ 8 σ t-test σ 1 = σ σ 1 σ J 1 = 150; J = 50:

55 4.3 One random effect Hypothessis: Asymptotics are even more critical for mixed models Simulated according to: m ijk = α i + u j + f ij + e ijk (4.3) where f ij = random assessor product interaction

56 Table 4.7: Simulation results for threshold model with fixed and random effects (4.3), Type I errors at nominal significance level of 5%. Global H 0 true, R = 4 replicates, three categories. Test of H 0 : α 1 = α. $: Wald-Test, : LR-test. Overall ANOVA Threshold Paired model ANOVA θ 1 θ σ u σ f J I global $ α 1 = α $ global α 1 = α $ α 1 = α

57 5. Conclusion ANOVA can fail badly with ordinal data Threshold model well suited to analyse ordinal data Many experimental settings call for mixed models Threshold model can be extended to cover random effects Allows great flexibility in answering relevant research questions Maximum likelihood estimation computationally demanding Asymptotics work for large samples only Tests in particular need to be taken with a gain of salt More simulations needed

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the