A proposed discrete distribution for the statistical modeling of

Similar documents
Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

1 Inferential Methods for Correlation and Regression Analysis

CHAPTER 4 BIVARIATE DISTRIBUTION EXTENSION

GG313 GEOLOGICAL DATA ANALYSIS

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

A statistical method to determine sample size to estimate characteristic value of soil parameters

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Chapter 13, Part A Analysis of Variance and Experimental Design

Sampling Distributions, Z-Tests, Power

Lecture 2: Monte Carlo Simulation

Properties and Hypothesis Testing

11 Correlation and Regression

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Random Variables, Sampling and Estimation

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Final Examination Solutions 17/6/2010

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Common Large/Small Sample Tests 1/55

Topic 18: Composite Hypotheses

This is an introductory course in Analysis of Variance and Design of Experiments.

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Power Comparison of Some Goodness-of-fit Tests

Sample Size Determination (Two or More Samples)

Topic 9: Sampling Distributions of Estimators

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

V. Nollau Institute of Mathematical Stochastics, Technical University of Dresden, Germany

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Chapter 2 Descriptive Statistics

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

There is no straightforward approach for choosing the warmup period l.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

STA6938-Logistic Regression Model

Stat 200 -Testing Summary Page 1

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Economics Spring 2015

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Problem Set 4 Due Oct, 12

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Basis for simulation techniques

The standard deviation of the mean

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Topic 9: Sampling Distributions of Estimators

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Infinite Sequences and Series

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Data Analysis and Statistical Methods Statistics 651

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Topic 9: Sampling Distributions of Estimators

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Output Analysis (2, Chapters 10 &11 Law)

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Exponential Families and Bayesian Inference

Understanding Samples

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Stat 421-SP2012 Interval Estimation Section

Stat 319 Theory of Statistics (2) Exercises

Machine Learning Brett Bernstein

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Control Charts for Mean for Non-Normally Correlated Data

Linear Regression Models

Lesson 2. Projects and Hand-ins. Hypothesis testing Chaptre 3. { } x=172.0 = 3.67

Chapter 12 Correlation

Statistical inference: example 1. Inferential Statistics

Information-based Feature Selection

Simulation. Two Rule For Inverting A Distribution Function

Frequentist Inference

Statistics 20: Final Exam Solutions Summer Session 2007

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Regression, Inference, and Model Building

Math 140 Introductory Statistics

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Transcription:

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5059 A proposed discrete distributio for the statistical modelig of Likert data Kidd, Marti Cetre for Statistical Cosultatio Uiversity of Stellebosch, Private Bax X Matielad 760, South Africa E-mail: mkidd@su.ac.za Laubscher, Nico IduStat Pro Stellebosch, 7600, South Africa E-mail: fl@idustat.co.za Abstract Whe Likert scale data are subjected to statistical aalyses, the ormal distributio is usually assumed as uderlyig distributio. Alteratively oparametric statistical techiques are applied. Other techiques like polychoric correlatio assumes that the Likert scale divides the sample space of the ormal distributio ito itervals. I this paper, a alterative distributio based o the ormal distributio is proposed. The sample space is assumed to be discrete ad cosists oly of the values of the Likert scale. This distributio has two parameters (oe for locatio ad oe for scale) correspodig to those of its ormal couterpart. This (what will be called the Likert) distributio differs from the ormal distributio i that its shape depeds o both parameters. A umerical procedure for obtaiig maximum likelihood estimators for the two parameters is exhibited ad some desirable properties of the distributio discussed. There are theoretical aspects of the distributio that remai to be researched ad the purpose of this paper is to preset the iitial cocept ad to test its acceptability amog peers. Results from a study o real world Likert scale data idicate that i 67% of goodess-of-fit tests, the Likert distributio provided a acceptable fit at a 5% sigificace level. A test statistic based o the Likert distributio is proposed for comparig meas of two groups, ad results from a comprehesive simulatio study idicated superior power of this test over the stadard t-test for small samples.. Itroductio The Likert scale is widely used for measurig latet variables through the use of questioaires. It takes o discrete specified ordial values eg,, 3, 4, 5, ad i may cases descriptive words like Completely Disagree to Completely Agree accompay such a scale. Statistical aalyses of Likert scale data take o may forms from comparig differet groups, doig correlatio aalyses, to more complex aalyses like factor aalysis ad structural equatios modelig. I most of these cases the data are assumed to come from a ormal distributio, or where appropriate

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5060 oparametric techiques are applied. Other techiques like tetrachoric ad polychoric correlatio assume that the Likert scale divides the sample space of the ormal distributio ito itervals, ad the the statistical techiques are derived from this assumptio. I this paper a differet distributio based o a discrete sample space defied by the Likert scale is itroduced. The basic cocepts of the distributio are preseted i sectio. Sectios 3 ad 4 deal with the expected value ad maximum likelihood estimators for the parameters. I sectio 5 goodess-of-fit tests doe o real world data are reported to give a idicatio of the appropriateess of this proposed distributio. A test statistic for comparig the meas of two groups is proposed i sectio 6. A summary ad outlie of future work are preseted i sectio 7.. The Likert Distributio The sample space of the proposed distributio is a discrete ordial sample space takig o the values of the Likert scale. For example, for a 5-poit Likert scale, the sample space typically cosists of the itegers,, 3, 4, 5. Thus the sample space is a ordered set of cosecutive itegers. What will be referred to as the Likert distributio, the assigs probabilities to each of the sample poits based o two parameters, ad similar to the parameters of a ormal distributio. The proposed probability mass fuctio for the distributio based o a sample space of cotiguous iteger-valued poits S k, k, k, k is defied as: f x, e K, x where x S,,, 0, ad K k jk j, e. The expressio K, esures that f( x, ) is a probability fuctio. Some oteworthy properties of the distributio are the followig:. The larger the differece betwee x ad, the smaller the poit probability f( x, ).. As k ad k the k j e ad thus the distributio teds to the ormal distributio. This property was jk umerically verified, but still requires theoretical proof.

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.506 3. As, the k j e k k ad f x jk k, the uiform distributio. k 4. As, the f( k) ad as, the f( k) 5. The shape of the distributio depeds o both ad. Whe =middle value of the Likert scale, the the distributio is symmetric. As, the distributio becomes left skewed ad as, it becomes right skewed. Icreasig flattes out the distributio util it evetually becomes a uiform distributio (see poit 3). 3. Expected value of the distributio. The expected value of the distributio is give by: k E x, j f ( j, ) jk k j e K, jk x It is importat to ote here is that is ot the expected value of the distributio. The expected value lies betwee k ad k, whereas ca rage betwee ad. As, k ad as, Ex k E x (see poit 4 i sectio ). j 4. Maximum Likelihood Estimatio For a set of realisatios of x uder the Likert distributio, say, x,, x, let: K K, xi ui j v j w e v j. The the likelihood fuctio is: LF K, e i K i e xi ui From this the log likelihood ca be writte as:

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.506 l LF l K ui. i To estimate ad, the above expressio is maximised wrt ad. The derivatives with respect to ad ca be writte as: k l LF ui v jwj i K j k ad k l LF ui v j wj i K j k. Numerical algorithms ca be used to solve for ad from the above ML equatios. The solutio will be deoted by ˆ ad ˆ respectively. Of course, if ˆ ad ˆ are the MLE s of ad, the Eˆ ˆ, ˆ L E x the expected value. A property empirically observed was that Eˆ L i arithmetic mea equals the MLE of the expected value of the Likert distributio. will be the MLE of xi. This meas that the sample 5. Goodess-of-fit o actual data To get a idea of how well the Likert distributio fits actual data, 697 data sets were used, ad tests doe to check whether the distributio fits the data. No claim is made that this collectio of data sets is a represetative sample from the populatio of all real world data sets, but it does give a idicatio of the validity of the distributio. The followig results emerged: O a 5% sigificace level, 33% of the data sets did ot support the Likert hypothesis (the ullhypothesis was rejected by the goodess-of-fit test). This meas that 67% of the data sets did ot cotradict the Likert distributio hypothesis. For smaller sample sizes (<00) the % rejected dropped to 4%. There was a tred that the goodess-of-fit icreased for Likert scales with a smaller umber of outcomes. For 4-poit Likert scale data, oly 5% (7% for < 00) of the tests were rejected. For 7-poit scale data, the % rejected icreased to 50% (4% for < 00). 6. Comparig two Likert distributio group meas I order to test for equality of the meas of two groups usig the Likert as uderlyig distributio, the followig test statistic is proposed: Let ˆ, ˆ ad ˆ, ˆ be the maximum likelihood estimates of the Likert parameters obtaied from the two radom samples, ad

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5063 x E x ˆ, ˆ x x E x ˆ ˆ, xi,, i i i be the Likert expected value MLE s for the two samples sets respectively. The differece of the sample meas, L x x, is proposed as test statistic for the ull-hypothesis that the samples come from two Likert populatios with equal expected values. The distributio of L is determied through simulatio by drawig B( 000) pairs of radom samples of sizes ad from the Likert distributio usig parameter sets ˆ, ˆ ad ˆ, ˆ respectively. The p-value of the test statistic for the data is the determied from the locatio of 0 i the simulated empirical distributio. A comprehesive simulatio study was coducted to compare this Likert test with the stadard t-test (assumig ormality of the data). Various parameters like sample sizes, effect sizes etc were radomly varied i this simulatio study. Data was simulated from the Likert distributio. Results from this study showed that i the majority of cases, the Likert test ad t-test gave the same outcomes (both either rejectig or acceptig the ull hypothesis), especially for larger sample sizes. The simulatio did however show, that for small samples 0, the Likert test was more iclied to idicate sigificat differeces tha the t-test. Figure shows a extract of the simulatio results where the Likert test was compared to the t-test ad a bootstrap test for the equality of two meas. The figure idicates that with icreasig effect size, the Likert test had superior power over the other two tests. proportio of tests rejected.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 0.0 H 0 Icreasig effect size step umber t Likert groups Bootstrap Figure Results from a simulatio study idicatig superior power of the Likert two groups test over the t-test ad bootstrap test for small samples ( 5 ). 7. Summary ad further research This paper proposes a distributio for aalysig Likert scale data based o the ormal distributio. Desirable properties, likig it to the ormal distributio were show. Some of the properties preseted here, have bee theoretically derived ad others have bee umerically verified (still to be prove theoretically).

It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5064 A test statistic for comparig meas of two samples from the Likert distributio was proposed, ad simulatio studies suggested possible advatages over the stadard t-test for small samples. A importat extesio of this work will be to exted this distributio to the bivariate case. This should the eable oe to calculate correlatios based o the Likert distributio. Correlatios are importat i the aalysis of multivariate Likert scale data because factor aalysis, structural equatios modelig (SEM) etc, are all techiques that are based o covariaces ad correlatios. REFERENCES Tamhae, Ajit C, Akema, Bruce E, Yag, Yig (00). The Beta Distributio as a latet respose model for ordial data (I): Estimatio of Locatio ad Dispersio Parameters. J.Statist. Comput. Simul., 00, Vol. 7(6), pp. 473-494. Poo, Wai-Yi (004). A latet ormal distributio model for aalysig ordial resposes with applicatios i meta-aalysis. Statist. Med. 004; 3:557. Tag, Ma-Lai, Poo, Wai-Yi (007). Statistical iferece for equivalece trials with ordial resposes: A latet ormal distributio approach. Computatioal Statistics & Data Aalysis 5 (007) 598-596. Olsso, Ulf (979). Maximum likelihood estimatio of the polychoric correlatio coefficiet. Psychometrika, Vol. 44, No. 4, pp.443-460