Paul Barrett

Similar documents
Paul Barrett

Basic IRT Concepts, Models, and Assumptions

Item Response Theory and Computerized Adaptive Testing

The application and empirical comparison of item. parameters of Classical Test Theory and Partial Credit. Model of Rasch in performance assessments

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

What Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

UCLA Department of Statistics Papers

Lesson 7: Item response theory models (part 2)

An Overview of Item Response Theory. Michael C. Edwards, PhD

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

1. THE IDEA OF MEASUREMENT

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

Probability and Statistics

Monte Carlo Simulations for Rasch Model Tests

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE

The Rasch Model, Additive Conjoint Measurement, and New Models of Probabilistic Measurement Theory

Equating Tests Under The Nominal Response Model Frank B. Baker

A PSYCHOPHYSICAL INTERPRETATION OF RASCH S PSYCHOMETRIC PRINCIPLE OF SPECIFIC OBJECTIVITY

How to Measure the Objectivity of a Test

Statistical and psychometric methods for measurement: Scale development and validation

Latent Trait Reliability

Bayesian Nonparametric Rasch Modeling: Methods and Software

Prentice Hall Mathematics, Geometry 2009 Correlated to: Connecticut Mathematics Curriculum Framework Companion, 2005 (Grades 9-12 Core and Extended)


Logistic Regression: Regression with a Binary Dependent Variable

Application of Item Response Theory Models for Intensive Longitudinal Data

Introduction to Confirmatory Factor Analysis

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters

Assessment, analysis and interpretation of Patient Reported Outcomes (PROs)

Studies on the effect of violations of local independence on scale in Rasch models: The Dichotomous Rasch model

ADDITIVITY IN PSYCHOLOGICAL MEASUREMENT. Benjamin D. Wright MESA Psychometric Laboratory The Department of Education The University of Chicago

Classical Test Theory. Basics of Classical Test Theory. Cal State Northridge Psy 320 Andrew Ainsworth, PhD

Comparing IRT with Other Models

Tutorial on Mathematical Induction

Lecture Notes on Certifying Theorem Provers

An Introduction to Mplus and Path Analysis

Experimental Design and Graphical Analysis of Data

A FLOW DIAGRAM FOR CALCULATING LIMITS OF FUNCTIONS (OF SEVERAL VARIABLES).

The roots of computability theory. September 5, 2016

Introduction To Confirmatory Factor Analysis and Item Response Theory

New Developments for Extended Rasch Modeling in R

Pairwise Parameter Estimation in Rasch Models

An Introduction to Path Analysis

Package paramap. R topics documented: September 20, 2017

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

Diagnostics and Transformations Part 2

A Little History Incompleteness The First Theorem The Second Theorem Implications. Gödel s Theorem. Anders O.F. Hendrickson

Bayesian Methods for Testing Axioms of Measurement

Local response dependence and the Rasch factor model

appstats27.notebook April 06, 2017

Model Estimation Example

Chapter 27 Summary Inferences for Regression

The Discriminating Power of Items That Measure More Than One Dimension

Dimensionality Assessment: Additional Methods

Chapter One. The Real Number System

Estimating ability for two samples

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

California Content Standard. Essentials for Algebra (lesson.exercise) of Test Items. Grade 6 Statistics, Data Analysis, & Probability.

Walkthrough for Illustrations. Illustration 1

Slope Fields: Graphing Solutions Without the Solutions

Test Homogeneity The Single-Factor Model. Test Theory Chapter 6 Lecture 9

Learning Causal Direction from Transitions with Continuous and Noisy Variables

BLAST: Target frequencies and information content Dannie Durand

Ch. 16: Correlation and Regression

Chapter 7 Linear Regression

Item Response Theory (IRT) Analysis of Item Sets

What is an Ordinal Latent Trait Model?

A Guide to Proof-Writing

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur

MATHEMATICS (MIDDLE GRADES AND EARLY SECONDARY)

Chapter 5: Preferences

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Conditional Probability in the Light of Qualitative Belief Change. David Makinson LSE Pontignano November 2009

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Semiparametric Generalized Linear Models

Fairfield Public Schools

Regression Analysis: Exploring relationships between variables. Stat 251

Introduction to Survey Analysis!

Drug Combination Analysis

Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ability Metric Transformations

Warm-up Using the given data Create a scatterplot Find the regression line

Inferences About the Difference Between Two Means

4. DEDUCING THE MEASUREMENT MODEL

INTRODUCTION TO ANALYSIS OF VARIANCE

USING BAYESIAN TECHNIQUES WITH ITEM RESPONSE THEORY TO ANALYZE MATHEMATICS TESTS. by MARY MAXWELL

Sampling Distributions

Foundations of Mathematics 11 and 12 (2008)

Contents. Acknowledgments. xix

1 Random and systematic errors

MIDLAND ISD ADVANCED PLACEMENT CURRICULUM STANDARDS AP CALCULUS BC

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov

MIDLAND ISD ADVANCED PLACEMENT CURRICULUM STANDARDS AP CALCULUS AB

Linearity in Calibration:

Transcription:

Paul Barrett email: p.barrett@liv.ac.uk http://www.liv.ac.uk/~pbarrett/paulhome.htm Affiliations: The The State Hospital, Carstairs Dept. of of Clinical Psychology, Univ. Of Of Liverpool 20th 20th November, 1998/*Addendum on on 12/10/99

What is is Rasch Scaling.1.1 A mathematical procedure that attempts to scale responses to individual items, such that the probability of answering an item in a certain way (whether YES/NO, or multiple-choice Likert format) is computed solely from the amount of a latent variable that a person is measured to possess and from the difficulty measure for the item.

What is is Rasch Scaling.2.2 Scaling can be defined as: the encoding of empirical observations using numbers to represent attribute/variable magnitudes, given a set of rules or axioms that the proposed measurement must subsequently satisfy.

What is is Rasch Scaling.3.3 latent variable can be defined as: the particular, inferred, construct that we are trying to measure with a set of items. This may be an ability, an attitude, or a personality variable such as Anxiety. A factor from a factor analysis is what we would also generally refer to as a latent variable or attribute.

What is is Rasch Scaling.4.4 difficulty measure for an item can be defined as: Classically, the ratio of the number of respondents scoring an item in the keyed or correct direction, over the total number of respondents. In Rasch scaling Rasch scaling, it is an index that expresses the position of the item on the latent variable scale, where 50% of the respondents on the test would respond in the keyed or correct direction.

What is is Rasch Scaling.5.5 Critical Point Rasch scaling uses the same scale of measurement for expressing both item difficulty and person ability. That is, the same unit of measurement is used to express difficulty and ability.

What is is Rasch Scaling.6.6 1.0 An Item Characteristic Curve (ICC) Probability of a correct reponse to this item 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 a = discrimination parameter. The value of the slope of the line at the midpoint of the curve (inflexion point) b = item difficulty parameter. The location of the inflexion point of the curve on the Theta axis 0.0-4 -3-2 -1 0 1 2 3 4 Latent Variable Measure -Theta- (in z-scores)

What is is Rasch Scaling.7.7 1.0 The same ICC - which now includes the guessing parameter - c=0.2 Probability of a correct response to this item 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 c = guessing probability. the lower asymptote of the ICC curve. a = discrimination parameter. The value of the slope of the line at the midpoint of the curve (inflexion point) b = item difficulty parameter. The location of the inflexion point of the curve on the Theta axis 0.0-4 -3-2 -1 0 1 2 3 4 Latent Variable Measure (in z-scores)

What is is Rasch Scaling.8.8 The one-parameter Rasch Model The two-parameter IRT Model p p i i ( θ ) ( θ ) 1 = D( θ 1+ e 1 b i ) = Da i ( θ b i ) 1+ e The three-parameter IRT Model p i ( θ ) c (1 c ) 1+ e = i + i Da i ( θ b i ) whereθ = the measure (score) of a person on the latent trait b = the difficulty of item i a = thediscrimination of item i c = the guessing probability for item i D = a constant used for "normalisation" Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998 1

What is is Rasch Scaling.9.9 So, what we are doing is attempting to model the responses to items in a test, given the amount of the latent variable inferred to be present within every individual who provided responses, and the difficulty of each item. But, given we do not know the individuals latent variable scores or the item difficulties, these have to be estimated - jointly. The solution is iterative, requiring a computer to implement the estimation process.

What is is Rasch Scaling.10 How do we create a test score (measure)? By summing the probabilities of keyed or correct responses for each item in the test, using our model parameters of item difficulty and person ability. X j where K = And: i= 1 P i X P ( θ ) i j = = P ( θ ) i the test score for person the probability for item 1 = D( θ 1+ e items Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998 b i ) i j of with ability θ K

Why should we prefer Rasch over CTT??.1.1 Classical Test Theory: CTT The equation x = t + e provides the essence of the foundational proposition of this theory. x = the observed test score t = a hypothetical error-free true-score e = the random error associated with a true score. Further, items are assumed to be sampled from universes or domains. Estimation of reliability and other parameters may be made using the algebra of linear sums.

Why should we prefer Rasch over CTT??.2.2 A Probabilistic form of Additive Conjoint Measurement Conjoint Measurement.1 Conjoint Measurement.1 The function that describes the concatenation relation between two variables and a third can be deduced axiomatically from the measurements made of the outcome (the third variable) produced by combining the values of the two variables. In our case, the items and the amount of latent variable are combined to produce a third variable (the test score).

Why should we prefer Rasch over CTT??.3.3 Conjoint Measurement.2 It requires that the two variables in the concatenation operation are non-interactive (i.e. values on each variable can be manipulated independently of each other). It enables quantitative structure to be detected via ordinal relations upon a variable. As Cliff (1992) has written a certain kind of mild-looking ordinal consistency among three or more variables is necessary and sufficient to define equal-interval scales. Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998

Why should we prefer Rasch over CTT??.4.4 Why should this matter? Because: measures created using the Rasch measurement model also satisfy the constraints of conjoint measurement. This means that creating tests using the Rasch measurement model gives you equal interval measurement AND additivity of units. Further, Rasch measurement also gives you unidimensional measurement, given the measurement axioms are met (i.e. the model fits the data).

Why should we prefer Rasch over CTT??.5.5 Resume of Features associated with the model: Equal-interval, additive units of measurement An explicit ordering of items as a cumulative response scale Sample Free calibration of item and person parameters Computation of both item and person reliability Computation of the location-sensitive standard error-of-measurement over the range of the test measures.

So far so good. But this is all theory. What happens in practice? Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998

Data: EPQ -N (Neuroticism Scale) - UK reference sample Number of Items = 23 Number of respondents = 4140 Mixed gender adults Scale alpha = 0.865 First, I take a look at a single item characteristic curve prior to fitting the Rasch model. I am predicting the probability of respondents keying this item in the scored direction, for each possible scale score for the test (0-23). For convenience, I convert the scale scores into standardized z-score values prior to fitting a scaled logistic Rasch function to the item.

Probability of correct response 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Fitting the EPQ - N item #3 (Does your mood often go up or down?) Model is: Probability=1/(1+euler^(-1.7*(1.06778)*(x-(0.2657436)))) Least Squares fit = 0.997 (% variance accounted for) -2.0-1.5-1.0-0.5 0.0 0.5 1.0 1.5 2.0 Z score Transformed Raw score level

I then fit the Rasch model to the scale of items (using the Andrich et al RUMM software package). The data fail to fit the Rasch model (using the Chi-Square test of model fit) at P < 0.00001 (actually p ~2.5*10-116 ) Apart from 2 items, none fit the model (using standardised chi-square residual tests) However, I note that the Rasch person measures correlate at 0.99 with the conventional raw score.

Raw Scale Scores Scatterplot: Rasch measures vs Raw EPQ - N scale scores (r=0.99) UK Reference sample Data. N=4140 Mixed Gender sample 24 22 20 18 16 14 12 10 8 6 4 2 0-4 -3-2 -1 0 1 2 3 4 RASCH Measures

Next, I find that my item-difficulties are now quite different to those computed individually, as shown on the previous slide. The plot of Rasch predicted vs actual proportions of respondents scoring an item in the keyed direction, at each Raw Scale score (or Rasch measure) exemplifies this discrepancy for item #3.

Rasch Expected Cumulative Probabilities vs Observed Probabilities 1.1 1.0 Probability of a Keyed response 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Observed Data Predicted Data 0.0-4 -3-2 -1 0 1 2 3 4 Rasch Measures (correlate 0.99 with Raw Scores) Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998

It would appear that by concentrating solely upon adjusting item difficulty location parameters, and latent trait person parameters, whilst minimising the discrepancy between the predicted raw test scores and actual raw test scores, the Rasch modelling procedure has indirectly induced considerable item misfit whilst attempting to remain within the constraints required by the axiomatic measurement properties. This is somewhat unexpected and more than a cause for serious concern - especially as the model (and all items) fit when N=200 respondents! Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998

This is obviously a problem with the Chi-Square test being too sensitive to discrepancies between the observed and model-generated proportions of respondents (at each scale score/rasch measure) getting the item correct. Which leaves us with the problem of just how to assess fit of items to the model. There are solutions - but somewhat heuristic I m afraid.

The next exploration looks directly at a major purported benefit of the Rasch model - the creation of equal-interval, additive units of measurement for the latent variable. Here I comprehensively extended an example briefly presented by Fisher (1992) - where we use a bad ruler (unequal units over a range of measurement) to make ordinal measurement of true equal-interval unit lengths (in cm units). This tests the capacity of the Rasch model to uncover the true equal-interval scale that underlies the raw score measurement.

Here, I present 40 objects for measurement to my bad-ruler which consists of 16 unequal divisions of length the objects are actually cm units on a real ruler - expressed in terms of my bad-ruler units. Each measurement is in the form of a dichotomy - a 1 is assigned to a bad measure unit if my cm measure extends beyond than this unit. Where my cm measure is smaller than the remaining bad-ruler units, I assign a 0 to these units. E.g.

The real ruler The bad ruler A 1 cm measure on the good ruler would generate the following record: 1100000000000000. For 2cm... 1111000000000000 etc. In this way, we build up 40 records - which are like responses to items on a test.

Fitting these slightly jittered data (for the Rasch is a probabilistic model) we have 40 persons and 16 items to be provided with parameters. The rather simple-minded test here is whether the Rasch model will recover the equal-interval cm scale from the ordinal measures made by me. Fisher claimed it would - and my simplistic reasoning with regard conjoint measurement would seem to suggest it should.

The model fits almost perfectly (chi-square probability ~ 0.99) All items fit the model (via chi-square) The actual raw scores for each person correlate 0.999 with the expected scores computed from the Rasch Modelling. This is no longer any surprise because the model-fit procedure is minimising this discrepancy. The Rasch item location/difficulty parameters correlate 0.991 with the bad-ruler units. Now this IS interesting.

Rasch "Difficulty" parameters and Ordinal units against actual cm measurement A scale for Rasch measures and the Raw Score units 18 15 12 9 6 3 0-3 -6-9 -12 Item Difficulty Raw Score Units 0 5 10 15 20 25 30 35 40 45 Measurement in actual cm Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998

The Rasch estimates are mirroring my badruler units. If we did not know that my 16 units were (in reality) unequally spaced, we would probably treat them as equal-interval, and plot them accordingly which means that the Rasch difficulty/location parameters would also now be equally spaced. Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998

The Rasch Difficulty Measure for each item "unit". Rasch difficulty parameters vs the Ordinal Rank units 10 8 6 4 2 0-2 -4-6 -8-10 0 2 4 6 8 10 12 14 16 The Ordinal "bad-ruler"units

A final test, to confirm my suspicion that the Rasch model is NOT able to address the issue of a fundamental unit of measurement! Here, I map my bad-ruler units onto log e (cm).. Then use the units to make measurement as before. The graph on the next slide shows the extent to which my bad-ruler is now making curvilinear measurement of a set of extensive, equal-interval measurement of cm unit objects.

Centimetres(cm) Centimetre "objects" vs log(cm)... 16 ordinal units 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Log(cm) with imposed ordinal "ticks"

The model fits almost almost perfectly (chi-square probability ~ 0.97) All but 1 items fit the model (via chi-square) The Rasch item location/difficulty parameters correlate 0.987 with the bad-ruler units. Once again, the Rasch model is seen to attempt to linearise the 16 items and 40 length measures - and it succeeds well. BUT, those item units were mapped onto extensive units of measurement, using a logarithmic concatenation of cm to bad-ruler units. Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998

Rasch Difficulty and Ordinal Units against actual cm measurement A scale for Rasch measures and the raw score units 20 16 12 8 4 0-4 -8 Item Difficulty Raw Score Units 0 5 10 15 20 25 30 35 40 45 Measurement in actual cm Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998

The Rasch Difficulty Measure for each item "unit" Rasch Difficulty parameters vs the Ordinal Rank Units 8 6 4 2 0-2 -4-6 -8 0 2 4 6 8 10 12 14 16 The Ordinal "bad-ruler" Units

Critical Point Given that the deterministic axioms of Luce and Tukey s (1964) simultaneous conjoint measurement hold for this form of probabilistic model, then Rasch scaling is producing equal interval, additive units of measurement. but of what exactly? By demonstrating its virtual identity with the raw scores and ranked, (but equal-interval ) ordinal units, I am led to conclude it is producing an equal-interval scaling of the numerals representing the ranked items. Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998

Critical Point Given my mapped units were equal interval, it is then no surprise that the Rasch locations and rankunit locations are so closely related in a linear function. But, the key issue is that the real unit of measurement (cm) was never exposed by the model. It is this result that causes me to question the automatic use of the model for psychological science investigations.

Critical Point My use of the Rasch model here seems to be dependent upon some form of inductive logic - that is, I use the model to determine a unit of unknown meaning. But surely science proceeds by first defining a meaningful (in some theoretical sense) unit, then designs measurement to determine if that unit functions in the manner specified by some theory?

Critical Point Thus, if we are to use the Rasch model productively, it cannot be used as an inductive unit-generating procedure, but rather, as part of a hypothetico-deductive process of investigation.

So, might we conclude that the Rasch model is of more value to scientific investigation than True-Score Theory?. Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998 On balance, YES. Given that a quantitative science requires equal-interval, additive units of measurement for variables, then there really is no alternative to the Rasch model for psychological measurement. However, we have also seen that scaling, in the absence of theory for the fundamental unit of measurement for a latent variable, is not of value except perhaps pragmatically.

Four clear, justifiable, and pragmatic reasons to use the Rasch Model - given fit of the model to your data. Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998 An explicit ordering of items as a cumulative response scale, on a common linear metric shared with person measures. Additive units of measurement Computation of both item and person reliability Computation of the location-sensitive standard error-of-measurement/information function over the range of the latent variable measures.

Conclusions.1 Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998 ❶ The Rasch Model is dangerous for the wrong reasons! Whereas I was thinking of it as a means to assist in the development of fundamental standard units of measurement for latent variables, this is not possible without first having a model for what and how these units should be instantiated within a deductive theoretical framework.

Conclusions.2 Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998 ❷ The problem remains with our conception of the constituent properties of latent variables - and their proposed units of measurement. ❸The Rasch model provides more information about any test respondent, and test items, than does CTT. For pragmatic purposes alone, this is surely of benefit to the applied psychological professions, both test developers and test users.

Conclusions.3 Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998 ❹ CERTAIN domains of Psychometric tests do have good validity - the recent paper by Schmidt and Hunter (Sept. 1998) in the Psychological Bulletin demonstrates this clearly but also demonstrates the poverty of scientific understanding in this area. Schmidt. F.L. and Hunter, J.E. (1998) The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 years of research findings. Psychological Bulletin, 124, 2, 262-274.

Conclusions.4 Paul Barrett: BPS Millenium Conference: Beyond Psychometrics November 1998 ❺ If we question the use of the Rasch model, on the basis of the stability of the constructs we are measuring (because of environmental or situational factors that may change over time), then we are in fact NOT questioning the Rasch model at all, but the very rules and meaning by which we are proposing to instantiate our constructs. To spurn the possibility of equal-interval measurement on this basis is quite wrong. Rather, we need to consider the conceptual status of what it is that we think we are measuring.

Addendum.1-12/10/99 Paul Barrett: Addendum October 1999 Following this presentation in November 1998, Ben Wright from the MESA group at Chicago re-analysed my data - and concluded that there was insufficient stochasticity (random error) in my observations. In short, my data may have been artificially too clean for the Rasch model to fit well (as it is a form of probabilistic conjoint scaling, not deterministic). I am not happy with the implications of this, although I see exactly the veracity of his argument. I think our disagreement may lie in the fact that others (like William Fisher Jnr.) see the meaning-measurement unit linkage as a construction that is created after Rasch measurement is created. For me, this is the wrong way round. You first need to define the meaning, then develop the measurement, based upon the conceptualisation of a meaningful unit. Simply scaling a set of items (aka arithmetic items), then deciding that the equal-interval unit can be known as arithmets, is acceptable if the only purpose of the measurement is pragmatic, but not acceptable if the aim of this work is to make statements about the magnitude of some psychological attribute/process that underlies an individual s ability to solve arithmetic items.

Addendum.2-12/10/99 So, I am now setting up a better data-generation program that gives me greater control over the amount of error I introduce, and the amount of ordinality in my measurement ruler. Then, I intend to produce some better tests of my propositions. Further, as David Andrich pointed out - he does not use the chi-square statistic as a measure of model or item fit- but rather, uses more a mixture of graphical, tabular, and other data to examine the issue of model fit. However, I note that George Karabatsos (gkarab@lsumc.edu) is working on model fit from the perspective of additive conjoint measurement (ACM). He noted in a recent email: At face value, this approach to correspond goodness of fit with the ACM axioms may seem convenient and sensible. But the goodness of fit stats are hampered by several issues. As I found out in my dissertation work, and later in several sources (please refer to the bottom of this message), the correspondence between goodness of fit statistics and measurement axioms is far from perfect. For instance, the Rasch model can conclude perfect fit, while the axiom tests reveal significant departures from the measurement model. The rest of the message was almost as bleak for me! (Rasch listserv rasch@acer.edu.au message date...27/09/99). Anyway, time for some more detailed exploration into this measurement model. Paul Barrett: Addendum October 1999