Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Similar documents
(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

Two-sample inference for normal mean vectors based on monotone missing data

General Certificate of Education Advanced Level Examination June 2010

General Certificate of Education Advanced Level Examination June 2010

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

Statistical Astronomy

Statistics for Applications. Chapter 7: Regression 1/43

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

Two-Stage Least Squares as Minimum Distance

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Formulas for Angular-Momentum Barrier Factors Version II

Online Appendices for The Economics of Nationalism (Xiaohuan Lan and Ben Li)

Maximum eigenvalue versus trace tests for the cointegrating rank of a VAR process

A. Distribution of the test statistic

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

A Comparison Study of the Test for Right Censored and Grouped Data

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

Lecture 6: Moderately Large Deflection Theory of Beams

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

BIO6X/PM2. General Certificate of Education Advanced Level Examination June Unit 6X A2 Externally Marked Practical Assignment Task Sheet 2

A GENERALIZED SKEW LOGISTIC DISTRIBUTION

On a geometrical approach in contact mechanics

Automobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn

Process Capability Proposal. with Polynomial Profile

AST 418/518 Instrumentation and Statistics

Supplement to the papers on the polyserial correlation coefficients and discrepancy functions for general covariance structures

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Pricing Multiple Products with the Multinomial Logit and Nested Logit Models: Concavity and Implications

Consistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems

4 Separation of Variables

A Brief Introduction to Markov Chains and Hidden Markov Models

Robust Modifications of the Levene and O Brien Tests for Spread

Math 124B January 17, 2012

Supplementary Material: An energy-speed-accuracy relation in complex networks for biological discrimination

8 Digifl'.11 Cth:uits and devices

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS

BP neural network-based sports performance prediction model applied research

Reichenbachian Common Cause Systems

Supporting Information for Suppressing Klein tunneling in graphene using a one-dimensional array of localized scatterers

Two-stage least squares as minimum distance

Inverse-Variance Weighting PCA-based VRE criterion to select the optimal number of PCs

Tests for Multiple Breaks in the Trend with Stationary or Integrated Shocks

Statistical Inference, Econometric Analysis and Matrix Algebra

Nonlinear Analysis of Spatial Trusses

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled.

Intuitionistic Fuzzy Optimization Technique for Nash Equilibrium Solution of Multi-objective Bi-Matrix Games

Haar Decomposition and Reconstruction Algorithms

Improving the Reliability of a Series-Parallel System Using Modified Weibull Distribution

Matrices and Determinants

MAT 167: Advanced Linear Algebra

STA 216 Project: Spline Approach to Discrete Survival Analysis

Analysis of rounded data in mixture normal model

Efficient Generation of Random Bits from Finite State Markov Chains

THE DIRECT KINEMATICS OF REDUNDANT PARALLEL ROBOT FOR PREDICTIVE CONTROL

Proceedings of Meetings on Acoustics

Cointegration analysis in the presence of structural breaks in the deterministic trend

HYDROGEN ATOM SELECTION RULES TRANSITION RATES

Probabilistic Graphical Models

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

A Change Point Approach for Phase I Analysis in Multistage Processes

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

with a unit root is discussed. We propose a modication of the Block Bootstrap which

SydU STAT3014 (2015) Second semester Dr. J. Chan 18

Partial permutation decoding for MacDonald codes

On the Goal Value of a Boolean Function

An explicit Jordan Decomposition of Companion matrices

The Group Structure on a Smooth Tropical Cubic

Investigation on spectrum of the adjacency matrix and Laplacian matrix of graph G l

Sparse canonical correlation analysis from a predictive point of view

Global sensitivity analysis using low-rank tensor approximations

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

Explicit overall risk minimization transductive bound

CS229 Lecture notes. Andrew Ng

An Extension of Almost Sure Central Limit Theorem for Order Statistics

Lecture Note 3: Stationary Iterative Methods

Supplemental Notes to. Physical Geodesy GS6776. Christopher Jekeli. Geodetic Science School of Earth Sciences Ohio State University

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0

Chemical Kinetics Part 2

Testing for the Existence of Clusters

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

Rate-Distortion Theory of Finite Point Processes

arxiv:math/ v2 [math.pr] 6 Mar 2005

AALBORG UNIVERSITY. The distribution of communication cost for a mobile service scenario. Jesper Møller and Man Lung Yiu. R June 2009

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC

Wavelet Galerkin Solution for Boundary Value Problems

Efficiently Generating Random Bits from Finite State Markov Chains

Smoothers for ecient multigrid methods in IGA

BALANCING REGULAR MATRIX PENCILS

XSAT of linear CNF formulas

Strain Energy in Linear Elastic Solids

A proposed nonparametric mixture density estimation using B-spline functions

Mode in Output Participation Factors for Linear Systems

Summation of p-adic Functional Series in Integer Points

A Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation

Chemical Kinetics Part 2. Chapter 16

Smoothness equivalence properties of univariate subdivision schemes and their projection analogues

WAVELET LINEAR ESTIMATION FOR DERIVATIVES OF A DENSITY FROM OBSERVATIONS OF MIXTURES WITH VARYING MIXING PROPORTIONS. B. L. S.

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

Transcription:

CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu Abstract The cassica method (Mosteer, 95 for estimating Thurstone's Case V mode for ranking data consists in a transforming the observed ranking patterns to patterns of binary paired comparisons, b obtaining the norma deviate corresponding to the men of each binary variabe, and c estimate the mode parameters from these deviates by east squares. However, cassica procedures do not take into account the dependencies among the deviates and as a resut, asymptotic standard errors (SEs and goodness of fit (GOF test are incorrect. We provide formuae that provide correct asymptotic SEs and GOF in this situation. A sma simuation study shows that adequate standard errors and goodness of fit tests can be obtained with rather sma sampe sizes even in very arge modes. Keywords: categorica data anaysis, preference data, random utiity

IE WORKING PAPER WP 24/06 29//2006. Introduction To mode ranking data, Thurstone (93 proposed transforming the observed ranking patterns to patterns of binary paired comparisons and fitting his paired comparisons mode (Thurstone, 927 to the transformed data. The cassica method for estimating this mode (Mosteer, 95a; Torgerson, 958 consists in obtaining the norma deviate corresponding to each paired comparisons mean, and then estimate the mode parameters from these deviates by east squares. This is a mean structure approach to estimating Thurstone's Case V mode as it ony uses univariate information (the means from the paired comparisons. Most Thurstonian modes for paired comparisons and ranking data are not identified when estimated as a mean structure. Estimating them as a mean structure requires introducing unnecessary identification restrictions on the modes. The most notabe exception is Thurstone's Case V mode for ranking data. This mode is identified if estimated ony from the means of the paired comparisons. Here, we provide asymptoticay correct standard errors and goodness of fit test for this mode when it is estimated as a mean structure using the cassica estimation procedure described. 2. Mean structure estimation of Thurstone s Case V ranking mode Suppose a random sampe of N individuas ranks n stimui according to some preference criterion. To transform the rankings to paired comparisons we construct a dichotomous variabe y for each ordered pairwise combination of stimui to indicate which stimuus was ranked above the other y if stimuus i is ranked above stimuus i = 0 if stimuus i is ranked beow stimuus i, ( where (i,i,(i =,...,n - ; i' = i +,..., n. With n stimui there are paired comparisons. n n( n n = = 2 2 Maydeu-Oivares (999 showed that the probabiity of any such binary pattern under Thurstone s mode is Pr = :, n * * y φ n ( z 0 Ρ dz (2 R = 2

IE WORKING PAPER WP 24/06 29//2006 where φ ( denotes a ñ-variate norma density function, and R is a rectanguar region with n imits ( τ, if y = and ( τ, if y = 0. Let τ be a vector obtained by stacking a threshods τ in exicographic order, Thurstone s mode imposes the foowing restrictions on τ and on the correation matrix Ρ (Maydeu-Oivares, 999 τ = DAnμ Ρ = ( nσ n DA A D. (3 In (3 μ and Σ are the mean vector and covariance matrix of the unobserved preferences (discrimina processes assumed by Thurstone s mode, A n is a ñ n design matrix where each coumn corresponds to one of the stimui, and each row to one of the paired comparisons, and ( ( 2 Diag Σ = n n D A A. Thurstone s Case V mode is a popuar restricted form of Thurstone s genera mode 2 in which it is assumed that Σ = σ I. Not a parameters of mode (2 subject to the Case V restrictions are identified. Given the comparative nature of the data, we need to introduce one restriction among the eements of μ. Aso, σ 2 is not identified. To identify Thurstone's Case V ranking mode it is convenient to set μ n = 0 and σ 2 = ½. With these identification restrictions, (3 simpifies to τ = Anμ, (4 Ρ = AA, (5 2 n n where Ρ is a correation matrix whose eements, ρ, can take the vaues ½, -½, or 0. Now, since each of the variabes y is binary, their mean is simpy π : = Pr( y = which we sha denote by π. Under Thurstone s Case V mode, * * ( y ( z dz ( π : = Pr = = φ :0, = Φ τ, (6 τ where Φ ( denotes a univariate standard norma distribution function. Let π and p denote the ñ dimensiona vectors obtained by stacking a popuation means and sampe means, respectivey, in exicographic order. We observe in (6 that the reation between τ and π is one-to-one. 3

IE WORKING PAPER WP 24/06 29//2006 Let μ = ( μ μ,, denote the vector of identified parameters in Thurstone s Case n V mode for ranking data. Then, (4 can be written as τ = K μ where, K is a ñ (n matrix of fu coumn rank obtained by deeting the ast coumn of A n. Thus, Thurstone's Case V mode for ranking data can be estimated as a mean structure mode, as the means π of the binary variabes y suffice to identify this mode. The simpest estimation approach for this mean structure is (Mosteer, 95a; Torgerson, 958 to first estimate each threshod τ τ = Φ p and then estimate the mode separatey from the sampe mean p using ( parameters μ by east squares as ˆ μ, ( = Hˆτ ˆ H = K K K. (7 We sha next provide asymptotic standard errors for these estimates of μ, as we as a goodness of fit test of the mode. 3. Standard errors and goodness of fit tests Let e: = ( p π. First, we notice that Ne N ( 0, Γ where d denotes convergence Ν Var π = π π and off diagona eements in distribution. Γ has diagona eements ( ( ΝCov ( π, π = π ππ. Under the mode, * * * * : = Pr ( y = ( y = = ( z, z : 0,0,,, dz dz = (,, π φ ρ Φ τ τ ρ, 2 2 τ τ where ρ denotes an eement of Ρ in (5. Thus, the sampe means p need not be statisticay independent. We sha now obtain the asymptotic distribution of ˆ μ π in (7. Let Δ = be a τ diagona matrix with eements φ ( τ :0, i. By the mutivariate deta theorem, a N ( τˆ τ = Δ Ne, where = a denotes asymptotic equaity. Couping this resut with (7 a ( μ ˆ μ = HΔ Ne. Finay, ( ˆ d N μ μ N ( 0H, Δ ΓΔ H N Δˆ and Γˆ denote Δ and Γ evauated at ˆμ, d. Thus, etting ( ˆ ( ˆ ˆ ˆ μ = HΔ ΓΔ H SE VecDiag / N (8 To test the goodness of fit of the mode, consider the residua vector ˆe : = ( p πˆ, a πˆ : π μˆ N πˆ π = ΔK N μ ˆ μ. Since where = (. By a Tayor expansion, ( ( 4

IE WORKING PAPER WP 24/06 29//2006 ˆ = ( ˆπ π e I KH Ne. Let now Δ c be an orthogona compement to Δ : = ΔK, that is, ΔΔ= 0. Then, Δ ˆ a c c Ne= Δ c Ne, and d Δ (, ˆ c Ne N 0 Δ cγδc. Thus, e e, Nˆ= a ( +Δ Δ ˆ ( 2 ˆ d ˆ ˆ ˆ ˆ ˆ T = Ne Δ Δ ΓΔ Δ e χ. (9 B c c c c n n+ Now, partition the ñ ñ diagona matrix Δ as verified that Δ 0 n Δ = Δ. Then it can be readiy 0 n n+ ( + + Δ = Δ A Δ I. (0 c n n n n n n Then, Δ ˆ c is simpy obtained by evauating (0 at ˆμ. 3. Some numerica resuts To iustrate the present discussion, we asked 56 Psychoogy undergraduate students to express their preferences using a ranking experiment for the foowing Psychoogy career areas {A = Academic, C = Cinica, E = Educationa, and I = Industria}. Their rankings were transformed to 6 paired comparisons variabes using (. The proportion of respondents that chose the first career in each of the foowing pairs {{A, C}, {A, E}, {A, I}, {C, E}, {C, I}, {E, I}} was p = (4, 6, 5, 4, 4, 3/56. We estimated Thurstone's Case V ranking mode from these univariate proportions obtaining T b = 6.03 on 3 d.f., p = 0.. Thus, the mode reproduces we these proportions. The estimated mean preferences with asymptotic standard errors in parentheses were μ = -0.80 (0.7, μ = 0.7 (0.7, and ˆμ ˆA ˆC E = 0.22 (0.6. The mean preference for an industria career was fixed to zero for identification purposes. Thus, we concude that the most preferred career path among these Psychoogy undergraduates is in Cinica Psychoogy, foowed by Educationa, then 2 Industria, and finay in Academia. Given that the mode assumes that Σ = σ I, we aso concude from these resuts that the preferences for these career areas may be independent. They need not be independent, however, to be consistent with the data. Tsai (2000 showed that there is a set of covariance structures on Σ that do not assume that preferences are independent (i.e. where Σ is not diagona that yied equivaent modes to Thurstone's Case V ranking mode in the sense as providing the same set of probabiities (2. 5

IE WORKING PAPER WP 24/06 29//2006 The sampe sized used in this exampe was rather sma. Thus, it is of interest to investigate whether the standard errors and goodness of fit tests obtained using asymptotic theory can be trusted when ony sma sampes are avaiabe. To investigate the sma sampe performance of the procedures described in this paper we performed a simuation study. We generated 000 repications with sampe size N = 50 from a Thurstone's Case V ranking mode for n = 0 stimui. The vaues used to generate the data were simiar to those estimated in the numerica exampe, μ = (.8,.7,.2,.8,,.2,0, σ 2 = ½. The resuts suggest that adequate parameter estimates, standard errors and goodness of fit tests for this mode can be obtained with as few as 50 observations. The reative bias of the parameter estimates ranged from 0% to 4%, and the reative bias of the standard errors ranged from -3% to 3%. Furthermore, with 36 degrees of freedom avaiabe for testing, the mean and variance of the T b statistic across repications were 35.8 and 69.3. We aso computed a Komogorov-Smirnoff one-sampe test to investigate the match of the empirica distribution of the T b statistic to its reference chi-square distribution, obtaining D KS = 0.80 which is ess than the critica vaue of.35 at the 5% eve of significance. 4. Concuding remarks We have introduced formuae for obtaining standard errors and goodness of fit tests when Thurstone's Case V mode for ranking data (Thurstone, 93 is estimated using the cassica east squares procedure for paired comparisons data (Mosteer, 95a; Torgerson, 958. Our proposed test statistic is anaogous to a statistic proposed by Browne (984 in the context of covariance structure anaysis. The cassica east squares estimation procedure is a computationay very attractive procedure to estimate Thurstone's Case V ranking mode. Our sma simuation study reveas that adequate parameter estimates, standard errors and goodness of fit tests can be obtained for arge modes with very few observations. Mosteer (95b introduced a goodness of fit test for this estimation procedure that shoud not be used when the paired comparisons are obtained from ranking patterns. This is because Mosteer's test assumes that the paired comparisons are statisticay independent, an assumption that is vioated in the case of ranking data. When appied to ranking data, Mosteer's test is overy optimistic. For instance, the vaue of Mosteer's test for the numerica exampe presented here is 0.59 on 3 d.f., p = 0.90. Mosteer's test shoud not be appied to paired comparisons data when the individuas respond to a paired comparisons 6

IE WORKING PAPER WP 24/06 29//2006 (i.e., mutipe judgment paired comparisons as again the paired comparisons are not statisticay independent. For that matter, the cassica estimation procedure described here shoud not be used to estimate Thurstone's Case V mode from mutipe judgment paired comparisons data as in this case σ 2 is identified, but it is not identified from univariate information aone. To estimate Thurstone's Case V mode from mutipe judgment paired comparisons data one must at east use univariate and bivariate information on the paired comparisons (Maydeu- Oivares, 200. Estimation methods for Thurstonian modes using univariate and bivariate information are avaiabe using Mpus (Muthén & Muthén, 200. For an overview see Maydeu-Oivares and Böckenhot (2005. 7

IE WORKING PAPER WP 24/06 29//2006 References Browne, M.W. (984. Asymptoticay distribution free methods for the anaysis of covariance structures. British Journa of Mathematica and Statistica Psychoogy, 37, 62-83. Maydeu-Oivares, A. (999. Thurstonian modeing of ranking data via mean and covariance structure anaysis. Psychometrika, 64, 325-340. Maydeu-Oivares, A. (200. Limited information estimation and testing of Thurstonian modes for paired comparison data under mutipe judgment samping. Psychometrika, 66, 209-228. Maydeu-Oivares, A. & Böckenhot, U. (2005. Structura equation modeing of paired comparisons and ranking data. Psychoogica Methods, 0, 285-304. Mosteer, F. (95a. Remarks on the method of paired comparisons: I. The east squares soution assuming equa standard deviations and equa correations. Psychometrika, 6, 3-9. Mosteer, F. (95b. Remarks on the method of paired comparisons: III. A test of significance for paired comparisons when equa standard deviations and equa correations are assumed. Psychometrika, 6, 207-28. Muthén, L. & Muthén, B. (200. Mpus. Los Angees, CA: Muthén & Muthén. Torgerson, W.S. (958. Theory and methods of scaing. New York: Wiey. Thurstone, L.L. (927. A aw of comparative judgment. Psychoogica Review, 79, 28-299. Thurstone, L.L. (93. Rank order as a psychoogica method. Journa of Experimenta Psychoogy, 4, 87-20. Tsai, R.-C. (2000. Remarks on the identifiabiity of Thurstonian ranking modes: Case V, Case III or neither? Psychometrika, 75, 233-240. 8