Investigation into the use of confidence indicators with calibration

Size: px

Start display at page:

Download "Investigation into the use of confidence indicators with calibration"

Cuthbert Briggs
5 years ago
Views:

1 WORKSHOP ON FRONTIERS IN BENCHMARKING TECHNIQUES AND THEIR APPLICATION TO OFFICIAL STATISTICS 7 8 APRIL 2005 Investigation into the use of confidence indicators with calibration Gerard Keogh and Dave Jennings

Investigation into the Use of Confidence Indicators with Calibration Paper prepared for: Workshop on Frontiers in Benchmarking Techniques and Their Application to Official Statistics OECD / EUROSTAT

2 Investigation into the Use of Confidence Indicators with Calibration Paper prepared for: Workshop on Frontiers in Benchmarking Techniques and Their Application to Official Statistics OECD / EUROSTAT Luxembourg, 7 & 8 April 2005 Subject Areas: Authors: Institution: Abstract: Benchmarking techniques and data quality issues; Applications of benchmarking techniques in official statistics. Gerard Keogh (gerard.keogh@cso.ie) Dave Jennings (dave.jennings@cso.ie) Central Statistics Office (Statistical Methods and Development section). Using a standard input-output table with known marginal totals and independent (and unbalanced) estimates in individual cells as a basis, this paper investigates methods by which the cell estimates can be calibrated to the marginal totals. The paper also investigates the use of confidence indicators for each cell estimate which allow a user the ability to ensure that the amendments required to calibrate the data are more concentrated in the cells in which s/he has smaller confidence.

3 1. Introduction The techniques of calibration can be extended to areas of statistical non-stochastic estimation. In calibration, a set of input data is adjusted to ensure that one or more data constraints holds. In an estimation situation when, say, a proportion of the input data is known to be accurate and the remainder is of poor quality, the techniques of calibration can be used to assist in the estimation of the poorer quality items (provided of course that a set of data constraints applies). What we will look at is a situation where we wish to estimate a set of constrained parameters from an initial set of unconstrained estimates of the parameters. In particular we will look at supply and use tables, specifically an Industry by Product use table, for which first estimates of the cells are available, but the marginal totals of the cells estimates do not agree with the known marginals, which are available from another source. In other words if Z is the array of initial estimates and T the known marginal totals, we will use Z to estimate an array θ whose marginals agree with T. In doing this we will attempt to take the statistician s confidence in the individual estimates in Z into account. Frequently, the Z estimates will have come from very disparate sources and the statistician s judgement and experience will often allow him to categorise the estimates into some reliability classes. Firstly θ is estimated using standard calibration techniques and, subsequently, estimates are provided using the Expectation-Maximation algorithm. 2. Problem Background and Initial Analysis For the purposes of the analysis we are using a preliminary estimate of an industry by product use table with nine rows and columns. The data is in million and, for our purposes, represents an aggregation of a much more detailed table with 55 rows and columns. We have an accompanying table of confidence indicators supplied by the statistician in the area. These indicators are in the range 0.1 (least confidence) to 0.9 (greatest confidence). We deliberately excluded an indicator of 1 for absolute certainty, on the basis that if a cell estimate was known to be correct then, for the purposes of estimation, it could simply be removed from the table and added back 2

4 later. Finally we have the given known marginal totals, in million, with which our final estimates must agree. The data is shown in Tables 1 and 2. Table 1 Initial Estimates and Row and Column Control Totals m Initial Estimates Control (1) (2) (3) (4) (5) (6) (7) (8) (9) Total Totals (1) (2) (3) , ,168.3 (4) (5) (6) (7) (8) (9) Total , ,789.0 Control Totals , ,925.1 All the data being used, i.e. the initial estimates and the control totals, relate to In other words there is an assumption that the initial estimates represent reasonable and unbiased estimates of the true values of their respective cells. If, for instance, some of the initial estimates are derived from cell data for a previous year, then we are assuming that the user has updated them to the current year. This could be done, for example, by the use of suitable estimates of price and volume changes. Table 2 User Supplied Confidence Indicators (1) (2) (3) (4) (5) (6) (7) (8) (9) (1) (2) (3) (4) (5) (6) (7) (8) (9)

5 Tabular Analysis In any tabular analysis it is instructive to glean insight into the structure in a table. A good starting point is to treat the data as if they were counts and the cell entries a multinomial sample. Denoting the observed cell entry in row i and column j by z i,j the simplest structure for a two-way table is the multiplicative independence model for the rows and columns of the table z i,j = T i T j / T and on taking logs this gives the well-known loglinear independence model log(z i,j ) = µ + α i + β j with µ = log(t). Using this model on the data in Table 1, the deviance (i.e. the lack of fit) is 2,448 and the likelihood ratio statistic is χ 2 59(111) = Since this is highly significant, the independence model is not appropriate and so there is an interaction effect in the table. The interaction observed is a specific type referred to as mobility. This is usually observed in social mobility tables that relate, say, a father s social status to that of his sons. Table 1 is an input-output table that associates products with industries. This means that the main diagonal can be expected to dominate. In a number of rows/columns this tends to be the case, but there are some large deviations. In any event, the so-called quasi-perfect mobility model in multiplicative form is ž i,j = T i T j when i j; = z i,j when i = j, where ž i,j is the expected cell estimate under this model. Note that this model states that the estimated diagonal elements are equal to their observed counterparts. To fit the model the diagonal elements are removed, i.e. made structural zeros, and the independence model re-fit. The deviance is reduced to 2,027 and the likelihood ratio statistic is χ 2 51(77) = While this is still significant at the 5% level, it is a considerable improvement supporting the belief that some form of imperfect mobility explains the structure of the table. A key attribute of any calibration procedure is that the underlying structural relationships are preserved. The methods adopted in this paper attempt to meet this requirement. 4

6 The Problem Our aim, given this data, is to arrive at final estimates which (a) fulfil the control constraints; (b) are close to the initial estimates, that is retain the underlying tabular structure; and (c) have the higher confidence cells changing less than the lower confidence ones. This is set out more clearly below. 3. Methodology The Loss Function For ease of notation we will represent the initial estimates as a vector Z of length 81 (i.e. the table cells in row major order). The control totals will be represented as a vector T of length 18 (row totals first then column totals). Also the confidence indicators (P) and final estimates (θ) will be in vector form. Firstly, let R be a matrix such that R`Z gives the marginal totals of Z. R is an 81 * 18 (the number of marginals) matrix, with each column containing 1 in the positions of the cells contributing to its marginal and 0 elsewhere. Then our aim is to find θ such that R`θ = T L(Z,θ,P) is minimised (where L is a suitable loss function). We looked at four different candidates for L but decided only to present results for two of them in this paper. The four we looked at were : L1 = (θ-z)`diag(p)(θ-z), the (weighted) squared absolute loss; L2 = (θ-z)`diag(p/z)(θ-z), the (weighted) chi-square loss; L3 = (θ-z)`diag(p/z 2 )(θ-z), the (weighted) squared relative loss; L4 = ΣP i ((θ i /Z i )log(θ i /Z i ) θ i /Z i + 1), the (weighted) entropy loss. We decided to go with loss functions L1 and L4. L1 can be useful to cover situations where additive adjustments are considered more relevant than relative or proportionate ones (e.g. where negative cell values can arise or where zero initial values are not necessarily to remain unchanged). L4 does not give very different results from L2 and L3 but we chose it because it is the more common one (in its unweighted form) used in calibration. For L1 we computed the solution directly as: θ = Z + diag(1/p) R (R` diag(1/p) R) -1 (T - R` Z) 5

7 which is the solution to the minimisation of L1 + λ` (T R` Z), subject to R` Z = T, with λ being a Lagrange multiplier. For L4, the solution to L4 + λ` (T R` Z) is Õ = Z # exp(z # R λ / P). We first solved the control equation, R` Õ - T = 0, for λ and then substituted this in θ = Z # exp(z # R λ / P) to get our adjusted cell values. The symbols # and / represent elementwise multiplication and division respectively. The EM and IPF approach Returning to the tabular layout, it is assumed for simplicity that the unknown complete data are a multinomial sample of size T falling into I*J cells. The complete data are made up of the manifest (i.e. observed) data z ij classified according to the I*J cells and row and column control totals T i and T j that arise from latent data θ ij whose distribution is unknown. The EM (Expectation-Maximisation) algorithm consists of computing the cell probabilities from the current estimates of the complete data and using these to estimate the distribution of the latent data. In this particular case the EM algorithm is in fact better known as IPF (Iterative Proportional Fitting). Starting with θ (0) ij = z ij, the observed data, IPF for the unknown θ ij is based on the simple update row (and column replacing i with j ) rule (t+1) θ ij = [1+( (T i -θ (t) i )/ θ (t) (t) i )] θ ij until convergence is achieved. This rule simply takes the difference between the control total for row i and corresponding current row total estimate and uses it to update the current cell estimates. To introduce weighting into this scheme a straightforward Bayesian approach is adopted. Specifically, weights p ij are the entries in the confidence matrix. These weights of course are not probabilities but are adjusted using the simple IPF rule so that each row and column adds to 1. These adjust weights are denoted by w ij. The weighted IPF (row) update rule is given by (t+1) θ ij = [1+( (T i -θ (t) i )*{ w ij /Σ w ij θ (t) (t) ij )})] θ ij where the summation is over the columns j, and now θ (t) ij is the conditional likelihood with w ij being prior probabilities of confidence in the corresponding cell. The factor w ij θ ij (t)/σ w ij θ ij (t) is then simply the posterior updating weight. 6

8 The results given in Section 5 relate to the weighted IPF calibration only. Using the above updating scheme fits the saturated model to the data. This model may be too detailed to describe the data and as mentioned above the quasi-perfect mobility model may also be of interest. In this situation the update scheme separates the diagonal of the observed data (and confidences) from the remainder and applies the independence model to the off-diagonal elements and the saturated model to the diagonal. Therefore the update rules for the off-diagonal elements is: θ ij = w ij T i T j /(Σ w ij T i T j ) which of course is independent of z ij. These will generally give slightly biased estimates of the marginals. This is corrected by re-fitting the true marginals with simple IPF under the saturated model. This is referred to as weighted mobility calibration in Section 3. Finally, the confidence matrix is represented in the weighted IPF model by a prior weight that multiplies the current estimated (i.e. likelihood) proportions. Therefore it is possible to replace the confidence matrix by simulated distribution of imputed weights. In this case the weight matrix is chosen to be the Gamma distribution w ij = (p ij ) * Gamma(p ij ) These Gamma weights then multiply the cell proportions (having an assumed underlying Poisson distribution) giving in effect Gamma posterior weights. This data augmentation procedure is simulated 100 times to give estimated coefficients of variation (CVs) for the weighted IPF calibration results. 4. Loss Function Results Additive Calibration To get a basis from which to assess the effect of using the confidence indicators we first estimated θ using unweighted loss functions (i.e. putting all p = 1). For the additive (L1) scenario the resulting changes to each cell are shown in Table 3. We just show the actual changes since relative changes are not relevant in this situation. The total row and columns changes are also given to indicate where in the table change is needed most. In some sense this represents the fairest or most democratic way of assigning the required changes among the various cells. It takes no account of (initial) cell size nor 7

9 does it listen to any arguments regarding the relative accuracy or quality of the initial cell estimates.. Table 3 Additive Calibration with No Confidence Indicators Changes to Cells m (1) (2) (3) (4) (5) (6) (7) (8) (9) Total (1) (2) (3) (4) (5) (6) (7) (8) (9) Total To assist in later comparisons, Table 4 gives a simple analysis of the absolute changes in Table 3 classified by the cells user supplied confidence indicator. This backs up the democratic argument as, not surprisingly, there are no significant differences among the (unused) user confidence indicators. Table 4 Additive Calibration with No Confidence Indicators Analysis of Absolute Change Confidence Indicator N Obs N Mean Std Dev Minimum Maximum When we now introduce the user supplied confidence indicators the situation changes somewhat as seen in Table 5. The changes now reflect, to a large extent, the level of confidence which the user assigned to each cell. The ideal situation we would like to 8

10 see is something like: if P i > P j then θ i Zi < θ j Z j, or if P i < P j then θ i Z i > θ j Z j, In fact this happens for 2,197 of the 2,876 cell comparisons for which P i did not equal P j. Given the constraints imposed by the control totals this is probably a reasonable result. Table 5 Additive Calibration with Confidence Indicators Changes to Cells m (1) (2) (3) (4) (5) (6) (7) (8) (9) Total (1) (2) (3) (4) (5) (6) (7) (8) (9) Total However, it is more realistic to compare the confidence influenced results with the earlier ones in which no confidence indicators were used. Table 6 gives an analysis of the difference between the two situations. A positive difference indicates an increase in absolute change from the no confidence to the confidence influenced situation. The way to read Table 6 is as follows. The first line relates to those cells which were given a confidence level of 0.1. There were six such cells and 4.8 was the mean difference between the absolute change required for calibration without confidence indicators and that required for calibration with the confidence indicators. Thus on average these cells were changed more when the confidence indicators were used. This is a desired result. At the other end of the confidence levels, 0.9, the opposite happened in that cells were changed on average by a smaller amount (-0.8). 9

11 Table 6 Additive Calibration Difference in Absolute Changes Arising from with and without Confidence Indicators Conf N Obs N Mean Std Dev Minimum Maximum Overall, the mean column in Table 6 indicates that the introduction of the confidence indicators did have the anticipated effect. However there were some deviations, for example one cell whose confidence indicator was 0.7 was in fact changed by more (1.3) when the confidence indicators were applied. Minimum Entropy Calibration Again, for the L4 scenario we first estimated θ using unweighted loss functions (p = 1) and the resulting percentage changes to each cell are in Table 7. The total row and column percentage changes are also given to indicate where the greatest relative changes must take place. Table 7 Min Ent Calibration with No Confidence Indicators Percentage Changes to Cells % (1) (2) (3) (4) (5) (6) (7) (8) (9) Total (1) (2) (3) (4) (5) (6) (7) (8) (9) Total

12 This method is effectively minimising the entropy in the table and so is, to a large extent, assigning the necessary changes on a relative basis with the larger cells being chaged more than the smaller ones. As with the additive situation earlier, we give, in Table 8, a simple analysis of the absolute percentage changes in Table 7 classified by the cells user supplied confidence indicator. This shows, using the earlier analogy, that the percentage changes in this case are democratic in that there is no clear relationship between them and the cells confidence indicator. In Table 8 the cells which had an initial and final value of zero, are included as zero percent change. Table 8 Min Ent Calibration with No Confidence Indicators Analysis of Absolute Percentage Change Confidence Indicator N Obs N Mean Std Dev Minimum Maximum When we introduce the confidence indicators the situation changes significantly, see Table 9. The percentage changes follow fairly well, and inversely, the size of the confidence indicator. The wished for situations are those in which P i > P j and (θ i Z i )/ Z i (θ j Z j )/ Z j, or P i < P j and (θ i Z i )/ Z i (θ j Z j )/ Z j. This occurred for 1,737 of the 2,876 cell comparisons for which P i did not equal P j. This is not as good a result as for the additive calibration where 2,179 of the comparisons were in the desired direction. 11

13 Table 9 Min Ent Calibration with Confidence Indicators Percentage Changes to Cells % (1) (2) (3) (4) (5) (6) (7) (8) (9) Total (1) (2) (3) (4) (5) (6) (7) (8) (9) Total To get a more valid measure of the effect of the confidence indicators, Table 10 gives an analysis of the difference between the with and without confidence indicators results. A positive difference indicates an increase in absolute percentage change from the no confidence to the confidence influenced situation. Table 10 Min Ent Calibration Difference in Absolute % Changes Arising from with and without Confidence Indicators Conf N Obs N Mean Std Dev Minimum Maximum Overall the table shows that we got a desired result with the percentage change decreasing with increasing confidence (except for a minor blip between 0.8 and 0.9). In particular, all the twelve cells with confidence indicators of 0.8 and 0.9 showed a decrease in change, except for the zero cells which did not change. On the other hand the six cells with confidence 0.1 showed an increase in change. As with the additive situation, see Table 6, there were a few cases where the introduction of the confidence indicators did not have the anticipated effect. 12

14 5. EM/IPF Results Weighted IPF Calibration The weighted IPF update rule was next applied to give weighted IPF calibration estimates. The resulting percentage changes are computed and displayed in Table 11 along with total row and column percentage changes, given to indicate where in the table change is needed most. It is clear from the table that this weighted IPF procedure gives very large percentage changes that are due in part to small initial values in those cells. In addition the largest changes occur where the corresponding row and column changes are greatest indicating table structure is preserved. Table 11 Weighted IPF Calibration Percent Changes to Cells % (1) (2) (3) (4) (5) (6) (7) (8) (9) Total (1) (2) (3) (4) (5) (6) (7) (8) (9) Total Looking the Table 12 the weighted IPF results are compared against their without weighting counterparts. That is, the analysis shows the percentage improvement or otherwise of weighted IPF absolute percent changes against their straightforward counterpart where no weights are applied. It is clear from the figures that where confidence is low the IPF without weighting is more appropriate. But where confidence is high the weighting method gives better results or smaller changes. This is in line with expectations as it indicates that cells with higher confidence values are subject to a positive improvement and therefore smaller changes under the weighting scheme. 13

15 Table 12 Weighted IPF Calibration Improvement Analysis of Absolute Percent Changes over IPF (without weighting) Confidence Indicator N Obs N Mean Std Dev Minimum Maximum Weighted Mobility Calibration The weighted quasi-perfect mobility update rule is applied to Table 1 with weights derived from the confidence Table 2. The percentage changes are computed and displayed in Table 13 along with total row and columns changes given to indicate where in the table change is needed most. It is clear from the table that this weighted IPF procedure gives very large percentage changes in virtually all cells. The diagonal is reduced and size of the off diagonal elements indicates that this model is not appropriate for the data. This of course is in line with expectations as the deviance of the model was quite large. Given the poor estimates there is no value in looking at the smoothness of the confidence indicators. Table 13 Weighted Mobility Calibration Percent Changes to Cells % (1) (2) (3) (4) (5) (6) (7) (8) (9) Total (1) (2) (3) (4) (5) (6) (7) (8) (9) Total

16 Weighted IPF Calibration CVs The data augmentation technique outlined in Section 3 was applied to Table 1 and the confidences given in Table 2. The actual CVs produced are given in Table 14 and it is clear there is quite wide variation across the table but once again most of the larger values are due to small values occurring in that cell in Table 1. Excluding these the range for the CVs is between 0 and 50% with an average of about 33%. Thus on average it may be expected that a calibrated cell value of 1.0 would lie between 0.33 and 1.66 in about 95% of samples of confidence values. Table 14 Weighted IPF Calibration Estimated CVs of Cell Estimates % (1) (2) (3) (4) (5) (6) (7) (8) (9) (1) (2) (3) (4) (5) (6) (7) (8) (9) Turning to the distribution of CV by confidence level, Table 15, the mean variation shows a decrease with increasing confidence. This however is not linear. In fact the mean level has high value of 123% at confidence level 0.1 and then is fairly constant at about 50% from levels 0.2 through 0.8. It then falls to a low value of 12% when confidence is highest at 0.9. This then suggests that the range of confidence levels from 0.1 to 0.9 are not distinctive enough and should be replaced by levels 0.1, 0.5 and 0.9 representing complete uncertainty about the data, some knowledge and a high level of certainty respectively. These gradations of confidence of course are specific to this scheme. However, the data augmentation imputations reflect how variation in the confidence impacts on the calibration. It is therefore quite general and so the conclusion about the gradations can also be accepted in general. 15

17 Table 15 Weighted IPF Calibration Distribution of CVs by Confidence value Confidence Indicator N Obs N Mean Std Dev Minimum Maximum Conclusion We have shown that calibration, even in the presence of overlapping and interrelated constraints, can take account of external user supplied confidence indicators. However we have not been able to ensure the ideal situation in which every high-confidence cell was changed by a smaller amount than every low-confidence one. The methods we outlined can be of particular use in the provision of early or preliminary estimates of statistics when (a) only provisional or uncertain estimates are available for intermediate statistics, and (b) more accurate estimates are available for some overall constraining data. In such cases, the user can improve, through calibration, the intermediate estimates and in doing so can also take account of their confidence in the original estimates, based on knowledge or experience. Further investigation could profitably be done in the area of finding ways of getting a more structured relationship between the confidence indicator and the size of the change in each cell. For example, if one cell has confidence twice of another then the expected change to the former would be half that to the latter. Other, perhaps more difficult, investigations could be done into methodologies that look at all possible calibrations to a given set of constraints and select the one which best meets the requirement that: if P i > P j then θ i Zi < θ j Z j, or if P i < P j then θ i Z i > θ j Z j. This stochastic analysis could also examine the impact of sampling effects in the given marginal totals as well as the weights to arrive at an ideal selection. A bayesian framework that examines the posterior distribution of the cell 16

18 estimates, with say Markov Chain Monte Carlo methods, might provide a good setting for further study of weighing in the context of balancing a table to given marginal totals. 17

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models