A proposed discrete distribution for the statistical modeling of

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5059 A proposed discrete distributio for the statistical modelig of Likert data Kidd, Marti Cetre for Statistical Cosultatio Uiversity of Stellebosch, Private Bax X Matielad 760, South Africa E-mail: mkidd@su.ac.za Laubscher, Nico IduStat Pro Stellebosch, 7600, South Africa E-mail: fl@idustat.co.za Abstract Whe Likert scale data are subjected to statistical aalyses, the ormal distributio is usually assumed as uderlyig distributio. Alteratively oparametric statistical techiques are applied. Other techiques like polychoric correlatio assumes that the Likert scale divides the sample space of the ormal distributio ito itervals. I this paper, a alterative distributio based o the ormal distributio is proposed. The sample space is assumed to be discrete ad cosists oly of the values of the Likert scale. This distributio has two parameters (oe for locatio ad oe for scale) correspodig to those of its ormal couterpart. This (what will be called the Likert) distributio differs from the ormal distributio i that its shape depeds o both parameters. A umerical procedure for obtaiig maximum likelihood estimators for the two parameters is exhibited ad some desirable properties of the distributio discussed. There are theoretical aspects of the distributio that remai to be researched ad the purpose of this paper is to preset the iitial cocept ad to test its acceptability amog peers. Results from a study o real world Likert scale data idicate that i 67% of goodess-of-fit tests, the Likert distributio provided a acceptable fit at a 5% sigificace level. A test statistic based o the Likert distributio is proposed for comparig meas of two groups, ad results from a comprehesive simulatio study idicated superior power of this test over the stadard t-test for small samples.. Itroductio The Likert scale is widely used for measurig latet variables through the use of questioaires. It takes o discrete specified ordial values eg,, 3, 4, 5, ad i may cases descriptive words like Completely Disagree to Completely Agree accompay such a scale. Statistical aalyses of Likert scale data take o may forms from comparig differet groups, doig correlatio aalyses, to more complex aalyses like factor aalysis ad structural equatios modelig. I most of these cases the data are assumed to come from a ormal distributio, or where appropriate

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5060 oparametric techiques are applied. Other techiques like tetrachoric ad polychoric correlatio assume that the Likert scale divides the sample space of the ormal distributio ito itervals, ad the the statistical techiques are derived from this assumptio. I this paper a differet distributio based o a discrete sample space defied by the Likert scale is itroduced. The basic cocepts of the distributio are preseted i sectio. Sectios 3 ad 4 deal with the expected value ad maximum likelihood estimators for the parameters. I sectio 5 goodess-of-fit tests doe o real world data are reported to give a idicatio of the appropriateess of this proposed distributio. A test statistic for comparig the meas of two groups is proposed i sectio 6. A summary ad outlie of future work are preseted i sectio 7.. The Likert Distributio The sample space of the proposed distributio is a discrete ordial sample space takig o the values of the Likert scale. For example, for a 5-poit Likert scale, the sample space typically cosists of the itegers,, 3, 4, 5. Thus the sample space is a ordered set of cosecutive itegers. What will be referred to as the Likert distributio, the assigs probabilities to each of the sample poits based o two parameters, ad similar to the parameters of a ormal distributio. The proposed probability mass fuctio for the distributio based o a sample space of cotiguous iteger-valued poits S k, k, k, k is defied as: f x, e K, x where x S,,, 0, ad K k jk j, e. The expressio K, esures that f( x, ) is a probability fuctio. Some oteworthy properties of the distributio are the followig:. The larger the differece betwee x ad, the smaller the poit probability f( x, ).. As k ad k the k j e ad thus the distributio teds to the ormal distributio. This property was jk umerically verified, but still requires theoretical proof.

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.506 3. As, the k j e k k ad f x jk k, the uiform distributio. k 4. As, the f( k) ad as, the f( k) 5. The shape of the distributio depeds o both ad. Whe =middle value of the Likert scale, the the distributio is symmetric. As, the distributio becomes left skewed ad as, it becomes right skewed. Icreasig flattes out the distributio util it evetually becomes a uiform distributio (see poit 3). 3. Expected value of the distributio. The expected value of the distributio is give by: k E x, j f ( j, ) jk k j e K, jk x It is importat to ote here is that is ot the expected value of the distributio. The expected value lies betwee k ad k, whereas ca rage betwee ad. As, k ad as, Ex k E x (see poit 4 i sectio ). j 4. Maximum Likelihood Estimatio For a set of realisatios of x uder the Likert distributio, say, x,, x, let: K K, xi ui j v j w e v j. The the likelihood fuctio is: LF K, e i K i e xi ui From this the log likelihood ca be writte as:

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.506 l LF l K ui. i To estimate ad, the above expressio is maximised wrt ad. The derivatives with respect to ad ca be writte as: k l LF ui v jwj i K j k ad k l LF ui v j wj i K j k. Numerical algorithms ca be used to solve for ad from the above ML equatios. The solutio will be deoted by ˆ ad ˆ respectively. Of course, if ˆ ad ˆ are the MLE s of ad, the Eˆ ˆ, ˆ L E x the expected value. A property empirically observed was that Eˆ L i arithmetic mea equals the MLE of the expected value of the Likert distributio. will be the MLE of xi. This meas that the sample 5. Goodess-of-fit o actual data To get a idea of how well the Likert distributio fits actual data, 697 data sets were used, ad tests doe to check whether the distributio fits the data. No claim is made that this collectio of data sets is a represetative sample from the populatio of all real world data sets, but it does give a idicatio of the validity of the distributio. The followig results emerged: O a 5% sigificace level, 33% of the data sets did ot support the Likert hypothesis (the ullhypothesis was rejected by the goodess-of-fit test). This meas that 67% of the data sets did ot cotradict the Likert distributio hypothesis. For smaller sample sizes (<00) the % rejected dropped to 4%. There was a tred that the goodess-of-fit icreased for Likert scales with a smaller umber of outcomes. For 4-poit Likert scale data, oly 5% (7% for < 00) of the tests were rejected. For 7-poit scale data, the % rejected icreased to 50% (4% for < 00). 6. Comparig two Likert distributio group meas I order to test for equality of the meas of two groups usig the Likert as uderlyig distributio, the followig test statistic is proposed: Let ˆ, ˆ ad ˆ, ˆ be the maximum likelihood estimates of the Likert parameters obtaied from the two radom samples, ad

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5063 x E x ˆ, ˆ x x E x ˆ ˆ, xi,, i i i be the Likert expected value MLE s for the two samples sets respectively. The differece of the sample meas, L x x, is proposed as test statistic for the ull-hypothesis that the samples come from two Likert populatios with equal expected values. The distributio of L is determied through simulatio by drawig B( 000) pairs of radom samples of sizes ad from the Likert distributio usig parameter sets ˆ, ˆ ad ˆ, ˆ respectively. The p-value of the test statistic for the data is the determied from the locatio of 0 i the simulated empirical distributio. A comprehesive simulatio study was coducted to compare this Likert test with the stadard t-test (assumig ormality of the data). Various parameters like sample sizes, effect sizes etc were radomly varied i this simulatio study. Data was simulated from the Likert distributio. Results from this study showed that i the majority of cases, the Likert test ad t-test gave the same outcomes (both either rejectig or acceptig the ull hypothesis), especially for larger sample sizes. The simulatio did however show, that for small samples 0, the Likert test was more iclied to idicate sigificat differeces tha the t-test. Figure shows a extract of the simulatio results where the Likert test was compared to the t-test ad a bootstrap test for the equality of two meas. The figure idicates that with icreasig effect size, the Likert test had superior power over the other two tests. proportio of tests rejected.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 0.0 H 0 Icreasig effect size step umber t Likert groups Bootstrap Figure Results from a simulatio study idicatig superior power of the Likert two groups test over the t-test ad bootstrap test for small samples ( 5 ). 7. Summary ad further research This paper proposes a distributio for aalysig Likert scale data based o the ormal distributio. Desirable properties, likig it to the ormal distributio were show. Some of the properties preseted here, have bee theoretically derived ad others have bee umerically verified (still to be prove theoretically).

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5064 A test statistic for comparig meas of two samples from the Likert distributio was proposed, ad simulatio studies suggested possible advatages over the stadard t-test for small samples. A importat extesio of this work will be to exted this distributio to the bivariate case. This should the eable oe to calculate correlatios based o the Likert distributio. Correlatios are importat i the aalysis of multivariate Likert scale data because factor aalysis, structural equatios modelig (SEM) etc, are all techiques that are based o covariaces ad correlatios. REFERENCES Tamhae, Ajit C, Akema, Bruce E, Yag, Yig (00). The Beta Distributio as a latet respose model for ordial data (I): Estimatio of Locatio ad Dispersio Parameters. J.Statist. Comput. Simul., 00, Vol. 7(6), pp. 473-494. Poo, Wai-Yi (004). A latet ormal distributio model for aalysig ordial resposes with applicatios i meta-aalysis. Statist. Med. 004; 3:557. Tag, Ma-Lai, Poo, Wai-Yi (007). Statistical iferece for equivalece trials with ordial resposes: A latet ormal distributio approach. Computatioal Statistics & Data Aalysis 5 (007) 598-596. Olsso, Ulf (979). Maximum likelihood estimatio of the polychoric correlatio coefficiet. Psychometrika, Vol. 44, No. 4, pp.443-460