Efficiency Measurement Using Independent Component Analysis And Data Envelopment Analysis

Size: px

Start display at page:

Download "Efficiency Measurement Using Independent Component Analysis And Data Envelopment Analysis"

Alannah Baker
5 years ago
Views:

1 Efficiency Measurement Using Independent Component Analysis And Data Envelopment Analysis Ling-Jing Kao a, Chi-Jie Lu b, Chih-Chou Chiu a a Department of Business Management, National Taipei University of Technology, Taiwan, ROC b Department of Industrial Engineering and Management, Ching Yun University, Taiwan, ROC (This paper has been published in European Journal of Operational Research, DOI: /j.ejor ) ABSTRACT Efficiency measurement is an important issue for any firm or organization. Efficiency measurement allows organizations to compare their performance with their competitors and then develop corresponding plans to improve performance. Various efficiency measurement tools, such as conventional statistical methods and non-parametric methods, have been successfully developed in the literature. Among these tools, the data envelopment analysis (DEA) approach is one of the most widely discussed. However, problems of discrimination between efficient and inefficient decision-making units also exist in the DEA context (Adler and Yazhemsky, 010). In this paper, a two-stage approach of integrating independent component analysis (ICA) and data envelopment analysis (DEA) is proposed to overcome this issue. We suggest using ICA first to extract the input variables for generating independent components, then selecting the ICs representing the independent sources of input variables, and finally, inputting the selected ICs as new variables in the DEA model. A simulated dataset and a hospital dataset provided by the Office of Statistics in Taiwan s Department of Health are used to demonstrate the validity of the proposed two-stage approach. The results show that the proposed 1

2 method can not only separate performance differences between the DMUs but also improve the discriminatory capability of the DEA s efficiency measurement. Keywords: Independent component analysis, Data envelopment analysis, Efficiency measurement

3 1. Introduction Efficiency measurement is an important issue for any type of business or organization. Efficiency measurement allows businesses or organizations to compare their performance with their competition and develop a corresponding strategy to improve performance. Among various efficiency measurement tools, such as conventional statistical methods, non-parametric methods, and artificial intelligence methods, developed in the literature, it has been affirmed that DEA can effectively measure the relative efficiencies of multiple decision making units (DMUs) with similar goals and objectives. For example, DEA has been widely applied in different types of businesses or organizations, such as banks (Kao and Liu, 009; Sahoo and Tone, 009; Cooper et al., 008), schools (Hu et al., 009; Ray and Jeon, 008; Mancebón and Mũiz, 008) and hospitals (Hua et al., 009; Ancarani et al., 009; Kirigia et al., 008). However, DMU has two shortcomings. First, when the inputs of DMU are strongly correlated, the efficiency estimates of DMU obtained from the slack variable analysis of DEA can be biased. This is caused by the fact that DEA techniques mainly use the method of weighting to calculate the ratio between the inputs and outputs of each DMU. Second, when the model is incorrectly specified or the number of units is too small, DEA may fail in its discrimination capability. To solve these problems, Adler and Golany (001, 00) suggested using the principal component analysis (PCA) to produce uncorrelated linear combinations of original inputs and outputs. Alder and Yazhemsky (010) concluded that PCA-DEA outperforms PCA-variable reduction by comparing their discrimination performance in a simulation exercise. Independent component analysis (ICA) is the other solution to the problem of input correlation. Essentially, ICA is a novel statistical signal processing technique used to extract independent sources from observed multivariate statistical data where no relevant data mixture mechanisms are available (Hyvärinen et al., 001; Hyvärinen and Oja, 000). It is a methodology for capturing both second and higher order statistics, and it projects the input data onto the basis vectors that are as statistically independent as possible (Bartlett et al., 00; Draper et al., 003). These characteristics of ICA distinguish ICA from PCA which is used to find a set of the most representative projection vectors such that the projected samples retain the 3

4 most information about the original samples (Turk and Pendland, 1991). The literature has applied ICA in human face recognition on FERET database (Bartlett et al., 00; Liu and Wechsler, 1999) and the Olivetti and Yale databases (Yuen and Lai, 00). The latter study, Liu and Wechsler (1999), and Bartlett et al. (00) have shown that ICA outperforms PCA. However, Moghaddam (00) states that the performances of ICA and PCA have no significant difference. In this paper, we propose a new ICA-DEA approach of efficiency measurement not only to address the problem of input correlation but also to improve discrimination capability. Our proposed approach consists of two stages. In the first stage, we use ICA to generate independent components (ICs) to ensure statistical independence among the input variables. In the second stage, the estimated ICs containing the key factors that affect efficiency measurement are applied in DEA as the new input variables. To evaluate the performance of the proposed method, a simulated dataset and a hospital dataset provided by the Office of Statistics in Taiwan s Department of Health are used in this study. We also compare the discrimination capability of ICA-DEA with other approaches such as PCA-DEA and variable reduction (VR). The result shows that the efficiency analysis performed by the proposed ICA-DEA approach can avoid efficiency misjudgment. Moreover, the optimal level of input and output for each inefficient hospital can be estimated successfully. The rest of this paper is organized as follows: Sections and 3 give a brief introduction to independent component analysis and data envelopment analysis, respectively. In section 4, we begin by developing the proposed two-stage model and comparing it to PCA-DEA and VR with one simulated dataset. In section 5, one empirical application is provided. The hospital data from the Office of Statistics in Taiwan s Department of Health is analyzed by the proposed approach. And concluding remarks are offered in section 6.. Independent component analysis Let X = [ x% 1, x%, L, x% ] T m be a multivariate data matrix of size m n, m n, consisting of observed random variables x% i of size 1 n, i = 1,,..., m. In the basic ICA model, the matrix X can be modeled as 4

5 m X AS as %% (1) = = th th where a% i is the i column of unknown mixing matrix A of size m m ; s% i is the i row of source matrix S of size m n. The vectors s% i are unknown latent sources (variables) that cannot be directly observed from the observed variables x% i. The ICA model aims at finding an m m de-mixing matrix W such that i= 1 i i B= [ b % ] = WX= [ w % X], () i where b % i is the i th row of the matrix B, i = 1,,..., m. The vectors b % i must be as statistically independent as possible, and are called independent components (ICs). The ICs are used to estimate the latent variables s% i. The vector w% i in eq. () is the i th row of the de-mixing matrix W, i = 1,,..., m. It is used to transform the observed multivariate matrix X to generate the corresponding IC, i.e. b % i = w % ix, i = 1,,..., m. The ICA modeling is formulated as an optimization problem by setting up the measure of independence of ICs as an objective function and using some optimization techniques to solve for the de-mixing matrix W (Bell and Sejnowski, 1995; Sánchez, 00). In general, the ICs are obtained by using the de-mixing matrix W to multiply the matrix X. The de-mixing matrix W can be determined using an unsupervised learning algorithm with the objective of maximizing the statistical independence of ICs. And the statistical independence of ICs can be measured in terms of their non-gaussian properties (Hyvärinen et al., 001; Hyvärinen and Oja, 000). Normally, non-gaussianity can be verified by two common statistics: kurtosis and negentropy. The kurtosis of a random variable b %, fourth-order cumulant, is classically defined by i % % 4 %. (3) kurt( b) = E( b ) 3( E( b )) If variable b % is assumed to be zero mean and unit variance, the right-hand side simplifies to 4 Eb (% 4 ) 3. This shows that kurtosis is simply a normalized version of the fourth moment E( b % ). For a Gaussian b %, the fourth moment equals 3( Eb ( % )) random variable and non-zero for most non-gaussian random variables.. Thus, kurtosis is zero for a Gaussian Unlike kurtosis, negentropy is determined according to the information quantity of 5

6 (differential) entropy. Entropy is a measure of the average uncertainty in a random variable. The differential entropy H of random variable b % with density f ( b % ) is defined as H ( b% ) p( b% = )log p( b% ) db%. According to a fundamental result of information theory, a Gaussian variable will have the highest entropy value among a set of random variables with equal variance (Hyvärinen and Oja, 000). For obtaining a measure of non-gaussianity, the negentropy J is defined as follows: Jb (%) = Hb (% ) Hb (%) (4) where b % gauss is a Gaussian random vector of the same covariance matrix as b %. The negentropy is always non-negative and is zero if and only if b % has a Gaussian distribution. Since negentropy is very difficult to compute, an approximation of negentropy is proposed as follows (Hyvärinen and Oja, 000): gauss % % % (5) Jb ( ) [ EGb { ( )} EGo { ( )}] where o% is a Gaussian variable of zero mean and unit variance, and b % is a random variable with zero mean and unit variance. G is a nonquadratic function, and is given by Gb ( % ) = exp( b % / ) in this study. The FastICA algorithm proposed by Hyvärinen et al. (001) is adopted in this paper to solve for the de-mixing matrix W. The ICA modeling usually consists of two preprocessing steps: centering and whitening (Hyvarinen an Oja, 000). In the centering step, the matrix X is centered by subtracting the row means of the matrix, i.e., x% ( x% E( x% )). The matrix X with zero mean is then passed through i i i the whitening matrix Q to remove the second order statistic of the input matrix, i.e. Z = QX. The whitening matrix Q is twice the inverse square root of the covariance matrix of the input (1/ ) matrix, i.e., Q = ( C )) T %, where C% = E( xx %% ) is the covariance matrix of X. The rows of the x x whitened input matrix Z, denoted by z%, are uncorrelated and have unit variance, i.e., T C% = E( zz %% ). x ICA can be viewed as an extension of principal component analysis (PCA) (Hyvärinen and Oja, 000). However, the objective of ICA is different from that of PCA. PCA is a dimensionality reduction technique that reduces the data dimension by projecting the correlated 6

7 variables into a smaller set of new variables that are uncorrelated and retain most of the original variance. But the objective of PCA is only to de-correlate variables, not to make them independent. PCA can only impose independence up to second order statistical information while constraining the direction vectors to be orthogonal, whereas ICA has no orthogonal constraint and involves high-order statistics, i.e., it not only de-correlates the data (second order statistics) but also reduces high order statistical dependencies (Lee, 1998). Hence, ICs reveal more useful information from observed data than principal components (PCs) do. The differences between PCA and ICA also exist between ICA and principal axis factoring (PAF) which is a well-known exploratory factor analytical model. Basically, PAF extracts factors based on second order statistical information, and the factors are assumed to be uncorrelated and Gaussian distributed (Hyvärinen et al., 001). A simple comparison between PCA and ICA is shown in Figure 1. Figure 1(a) shows three original variables that are two different types of sinusoidal variables ( s% 1 and s% ) and a random variable ( s% 3 ). These original variables are transformed into observed mixing variables ( x% 1, x% and x% ) using unknown mixing matrix A, i.e., X=AS. When PCA is applied to these 3 mixing variables, it gives three principal components as observed in Figure 1(c). Figure 1(c) shows that the PCA results are different from the original variables in Figure 1(a). The ICA solution is depicted in Figure 1(d). Figure 1(d) clearly reveals that the original variables can be reconstructed by ICA without any knowledge of the original variables and mixing matrix. The ICA model in Eq(1) and the simple ICA example in Figure 1 show that the ICA model contains two limitations (Hyvärinen and Oja, 000). The first of them is that the variances (energies) of the independent components cannot be determined because both S and A are unknown. In other words, S and A are unidentified unless one of them is fixed. To alleviate this problem, most researchers assume that each IC has zero mean and unit variance in the ICA model (Hyvärinen and Oja, 000). However, that still leaves the ambiguity of the sign because one could multiply the IC by -1 without affecting the model. The other limitation of the ICA model is that one cannot determine the order of the ICs because of the simultaneous unknowns of S and A. A number of methods have been suggested to 7

8 determine the component order (Cardoso and Souloumiac, 1993; Back and Weigend, 1997; Cheung and Xu, 001). Hyvärinen (1999) suggested that ICs can be sorted according to their non-gaussianity. In this paper, the ICs are sorted based on their kurtosis values. s% 1 x% 1 s% x% s% 3 x% 3 (a) Original variables (S) (b) Observed variables (X) PC 1 IC 1 PC IC PC 3 IC 3 (c) PCA results (d) ICA results Figure 1. A simple comparison between PCA and ICA. 3. Data envelopment analysis basis DEA is also known as the efficient frontier approach (Cooper et al., 000; Cooper et al., 004). The term envelopment refers to the idea that inefficient DMUs are located inside an area enveloped by the efficient DMUs. DEA is constructed based on the concept of relative efficiency, which is defined as the ratio of the weighted sum of outputs to the weighted sum of inputs (Cooper et al., 004). The solution of DEA requires that the weights for inputs and 8

9 outputs of each unit are selected to maximize its efficiency under certain constraints. Thus, the mathematical programming form of the CCR model is formulated as follows (Cooper et al., 004): Maximize Subject to u1 y θ = v x u y 1 1 j v x 1 1 j 1p 1 1p + u + v + u y j + v x j y p x p u v u y d v x m v > 0 for i=1,,, m i dj mj u > 0 for r=1,,, d r d m y x dp mp 1, j=1 to n (6) where x 1 j, x j,..., xmj are the m inputs, y1 j, y j,..., ydj are the d outputs of the unit j, v 1, v,..., vm are weights for the inputs, and u, u, 1..., us are the weights for the outputs, p is the designated unit for an optimization run, and n is the total number of units in the study. Because each decision unit has its own preferences in input and output variables, the implementation of the model in Eq.(6) can derive the values of v i (i=1,,.., m) and u r (r=1,,.., d) for each decision unit. 4. Research methodology and simulation study The proposed ICA-DEA approach is illustrated in Figure. After deciding the input variables, we used ICA to convert observed input data into separate independent signals. Then, these separated signals were integrated into the DEA approach in the second stage of our technique to construct a model that could measure efficiency. Stage I Stage II observations of inputs data conversion by ICA construction of efficiency measurement model by DEA efficiency measurement 9

10 Figure. Research structure The performance of the proposed ICA-DEA model relative to the performance of PCA- DEA and VR is investigated by a simulation analysis. To be consistent with other studies, we used the dataset determined by simulated Cobb-Douglas production functions y% = x% x% ( x% x% x% x% x% x% ) e τ with a half normal inefficiency distribution (Adler and Yazhemsky, 010) for comparison. There are 50 DMUs in total, and 8 inputs are included in the dataset. Following Adler and Yazhemsky (010), we first generated 10,000 positive observations of the variables x% from a normal distribution with mean 10 and variation 1. A single output was then chosen to permit the utilization of standard production function to compute the output values. Consequently, over the entire simulated population of 10,000 observations and a subsample size of 50 DMUs, we can have the average percentage reported from 10,000/50=00 samples. In the data generation process, we assumed no probability mass along the frontier, and only one single inefficiency (τ) was simulated from each DMU. In this study, we draw the inefficiency independently from a half normal distribution with μ=0 and σ=1. The simulated data was then used to compare results from the PCA-DEA model, the VR model, and our ICA-DEA model. To investigate the influence of the percentage of retained information on the number of inputs in the analysis, we followed Adler and Yazhemsky (010) and reduced the percentage of retained information from 100% to 80% in 4 percent steps. As in the work done by Adler and Yazhemsky (010), we used the percentage of retained information to decide the number of PCs or variables included in the DEA model. In other words, we manually determined the number of ICs, PCs, or variables to retain, such that the percentage of information remaining is greater or equal to the setup level. Table 1 provides the incorrect efficiency classification using ICA-DEA, PCA-DEA and VR with different input dimensions. According to Table 1, the level of information reduction has appreciable influence when the three approaches are used to classify the inefficiency. Moreover, the ICA-DEA model gives the lowest incorrect classification result, and the classification results made by the PCA-DEA model are slightly better than those generated by VR. They are 10

11 consistent with the conclusion made by Lee et al., (004). (The ICA solution extracts the original source signal to a much greater extent than the PCA solution if the latent variables follow non- Gaussian distribution.) 5. Empirical application 5.1 Data Hospital data (from 557 hospitals in 005) provided by the Office of Statistics in Taiwan s Department of Health is used in this study to illustrate the proposed ICA-DEA approach and to compare the performance of the proposed method with alternative approaches. These hospitals can be categorized into three different classes according to their functional complexity: (1) medical centers and regional hospitals (highest level); () district hospitals (medium and low level); and (3) primary clinics (basic level). Usually, hospitals with different sizes provide different levels of medical treatment and service. According to the government accreditation definition, a medical center has at least 500 beds, and a regional hospital has at least 50 beds. In order to control the difference in sizes and make the data sample more homogeneous, we selected 1 hospitals (i.e., 1 DMUs) with more than 500 beds for analysis. Since the results of efficiency measurement are related to the selection of input and output variables, we chose variables by reviewing the existing literature on hospital efficiency (Hu et al., 009; Puig-Junoy, 000; Sahin and Ozcan, 000; Parkin and Hollingsworth, 1997; Hu and Huang, 004). In our empirical study, five input and three output variables were chosen to calculate the efficiency of each DMU. These five input variables are four labor-related variables (including doctors, nurses, paramedical persons, and administrative staffers) and one capitalrelated variable (beds). A patient s health improvement is the most commonly used output indicator for a hospital, but measuring the level of improved health is very difficult. One possible solution is to adopt the immediate products of a hospital, such as a hospital s service volume, as a proxy for medical outputs. Therefore, in this empirical study, we selected outpatient visits, emergency visits, and operations as our three output variables. Table illustrates the definitions 11

12 and explanations of input and output variables. Table 3 provides summary statistics of input and output variables. 1

13 Table 1 (a) Incorrect efficiency classification under varying inputs reduction for y% = x% x% ( x% x% x% x% x% x% ) e τ function Incorrect classification Incorrectly defined inefficient Incorrectly defined efficient Method ICA- DEA * PCA-DEA * VR ICA- DEA * PCA-DEA * VR Percentage information retained * The DEA model is under a constant return-to-scale case (CRS). Table 1 (b) The number of samples containing m inputs as a function of information retained Method ICA-DEA PCA-DEA VR m Percentage information retained

14 Table 3 also shows that the variance of each input or output variable is still large. The variables having the largest standard deviation of input and output variables are beds and outpatient visits, respectively. As discussed above, the correlation between input and output variables will cause biased estimates in DEA. We report the result of correlation analysis in Table 4. Table 4 shows that there are significant positive correlations among all of the variables. The lowest correlation coefficient is between doctors and emergency visits. The highest coefficient 0.97 exists between nurses and doctors. Practically, subsets of the inputs or outputs are always correlated. The high correlations between variables could cause issues with the distribution of the weights. Dropping some from the assessment sometimes could reduce the efficiency ratings for some DMUs (Nunamaker, 1985). However, it also occasionally leads to significant changes in efficiencies (Dyson et al., 001). Therefore, there is a need to convert the observed input data into separate independent signals by the ICA approach before conducting DEA. 5. Efficient Score computations After the ICs are determined, efficiency scores are computed by the CCR input orienting model, proposed by Charnes, Cooper and Rhodes (000), in the DEA-PRO software. To demonstrate the validity of the proposed model, the performance of the proposed ICA-DEA method is compared to the single DEA model and the PCA-DEA model. The single DEA model simply applies the DEA model to input variables to measure the efficiency of hospitals without using ICA or PCA as preprocessing tools. The PCA-DEA method first applies PCA to the input variables for generating principal components (PCs) and then conducts DEA analysis based on the generated PCs. 14

15 Inputs Outputs Table. Definition and explanation of variables Variables Definition and explanation Beds The total number of registered beds within the hospital, including acute, chronic, and special beds The total number of physicians who are full-time Doctors employees, including dentists and Chinese medicine doctors Nurses The total number of nurses employed in hospitals (including midwives) The total number of health service providers employed in Administrative hospitals, including pharmacists, dietitians, persons physiotherapists, occupational therapy technologists, and radiological technologists Administrative staffers Outpatient visits Emergency visits Operations The total number of full-time equivalent personnel (including social workers, researchers, and nonprofessionals) The total number of patients to outpatient departments within a year The total number of patients to emergency room within a year The total number of inpatient and outpatient surgeries within a year Inputs Outputs Table 3. Summary statistics of variables Variables Mean Std. Dev. Min. Max. Beds ( x% 1 ) 1, ,36 Doctors ( x% ) ,106 Nurses ( x% 3 ) 1, ,73 Administrative persons ( x% 4 ) Administrative staffers ( x% 5 ) ,64 Outpatient visits ( y% 1 ) 1,70, ,016 86,63,514,534 Emergency visits ( y% ) 83,094 30,404 40,615 13,550 Operations ( y% ) 3 8,60 15,750 5,614 75,348 15

16 Table 4. Correlation coefficients between variables x% 1 x% x% 3 x% 4 x% 5 y% 1 y% y% 3 x% x% * x% 0.97 * * x% * * * x% * * * * y% * * * * * y% * * * * * 0.64 * y% 0.85 * * * * * * * * p-value <0.05 In the single DEA model, the overall efficiency scores of 1 hospitals (DMUs) are summarized in Table 5. From Table 5, it can be found that the single DEA model produces a high average efficiency score (0.948) and a small standard deviation (0.1). The number of efficient DMUs in the single DEA model is 1. The single DEA model results in too many efficient DMUs and cannot distinguish the difference in performance between the DMUs very well. In the PCA-DEA method, component loadings and the variances regarding the components were computed for the five input variables first. The proportion of the total variance explained by each principal component is additive, with each new component contributing less than the preceding one to the explained variance. Subsequently, the components were rotated to eliminate medium-range loadings to make the interpretation of the components easier (Johnson and Wichern, 007). An interpretation of the rotated five principal components in Table 6 is made by examining the component loadings, noting the relationship to the original variables. Among the five principal components, the first component appears to measure capital and partial labor (nurse) investment, with bed and nurse increasing with positive values. In the second component, doctors and administrative staffers are important; the effect of medical expertise is demonstrated by the positive loading of doctors in this component. According to Table 6, we found that PC 1 16

17 can explain % of data variation, and PC can explain.8061% of sample variation. Because the two main PCs can totally explain data variation, these two PCs would be enough to represent the features of input data and are used as new input variables for the DEA model. The results of the PCA-DEA model are also summarized in Table 5. From the table, we found that the average efficiency score, standard deviation, and number of efficient DMUs in the PCA-DEA model are 0.48, 0.195, and 8, respectively. Compared to the single DEA model, the PCA-DEA method has a lower average efficiency score, a smaller number of efficient DMUs, and a larger standard deviation. In the proposed ICA-DEA model, we first applied the basic ICA approach to estimate a demixing matrix W and five independent components ( b % 1, b %,, b % 5 ). In order to select the more meaningful ICs, the statistical independence of ICs is evaluated by computing the kurtosis values of the ICs herein. The estimated de-mixing matrix W and the kurtosis value for each IC are summarized in Table 7. Because the IC with the larger kurtosis value can be considered the more important IC (Hyvärinen, 1999), IC 1, IC and IC 3 (i.e. b % 1, b %, b % 3 ) are regarded as key factors affecting the results of efficiency measurement and used as three new input variables for the DEA model. Note that the extracted IC might have negative values which violate the semipositive assumption for the DEA model, i.e., all inputs and all outputs are non negative, and at least one input and one output are positive. To solve this problem, we simply subtract each IC i from its corresponding minimum value, i.e. min(ic i ). Table 5. Summary of the results of single DEA, ICA-DEA and PCA-DEA models Single DEA model PCA-DEA model ICA-DEA model Average score Standard deviation Maximum efficiency score Minimum efficiency score Number of efficient DMUs Total number of DMUs % of efficient DMUs

18 Table 6. Varimax rotated five components Rotating loadings PC 1 PC PC 3 PC 4 PC 5 x% x% x% x% x% Proportion of Var Cumulative Var Table 7. The Kurtosis values and de-mixing matrix (W) corresponding to the ICs b % 1 b % b % 3 b % 4 b % 5 x% x% x% x% x% Kurtosis values From Eq., we know that b % i can be obtained by multiplying X by its corresponding row vector of the de-mixing matrix (i.e. w% i ). Thus, the values in the de-mixing matrix can be used to explain the relationship between the selected b % i and the original input variables X as follows: b % 1 = x% x% x% x% x% 5 (7) b % = x% x% x% x% x% 5 (8) b % 3 = x% x% x% x% x% 5 (9) From Eq. (7), we understand that x% 3 and x% 5 have a relatively significant impact on b % 1. This implies that b % 1 is mainly affected by nurses ( x% 3 ) and administrative staffers ( x% 5 ). Thus, b % 1 18

19 can be used to represent the features of nurses and administrative staffers. As seen from Eqs. (8-9), b % is mainly influenced by beds ( x% 1 ); b % 3 contains more information about doctors ( x% ) and administrative persons ( x% 4 ). Therefore, b % and b % 3 can be used to represent the information about beds and the features of doctors and administrative persons, respectively. The efficiency measurement results of the proposed ICA-DEA model are included in Table 5. It can be seen that the average efficiency score, standard deviation, and number of efficient DMUs in the proposed method are 0.16, 0.54 and 4, respectively. Compared to the single DEA and PCA-DEA models, the proposed method has the lowest average efficiency score, the least number of efficient DMUs, and the highest standard deviation. This indicates that the proposed method has more ability to distinguish between the performances of DMUs. In other words, the ICA approach can improve the discriminatory capability of the DEA model in performance measurement. 5.3 Slack analysis DEA provides not only efficiency results but also slack analysis. Slack analysis can provide guidelines to derive the optimal level of input and output resources for each DMU. As a result, each DMU could have its input and output resources set at the optimal level the original level minus the inefficient and slack amounts from the DEA results (Luo and Donthu, 001). Table 7 reports the results of slack analysis for DMU 1 by using the single DEA model and ICA-DEA approach. The slack entries of the single DEA model are all positive. This implies that, compared to the efficient hospitals, the investment in various inputs is excessive, The slack analysis suggests that many of the expenditures could have been reduced while maintaining the same outputs -Outpatient visits, Emergency visits, and Operations in this case thus improving the efficiency. For example, to be as efficient as 0 other hospitals, DMU 1 can maintain the same output levels by cutting down 91 in Beds, 38 in Doctors, 3 in Nurses, 5 in Administrative persons, and 51 in Administrative staffers. Although the slack analysis can investigate the utilization of input or output resources to improve efficiency scores, the analysis results from the single DEA method could be either 19

20 underestimated or overestimated as long as the inputs of DMU are correlated. To solve this problem, we replaced the single DEA model s inputs with the transformed inputs made by the ICA approach before running the DEA analysis. The result of ICA-DEA model is summarized in Table 7. Due to the adoption of the ICA technique, the slack entries generated by the ICA-DEA model need to be re-transformed in order to understand exactly the utilization of input and output resources to improve efficiency scores. In the re-transformed procedure, we first define the slack entries generated by the ICA-DEA model as Δ bi. Then, based on Eqs. (7-9) and correlation coefficients ( ρ ij ) in Table 4, we can formulate our re-transforming procedure as an optimization problem shown as follows: Minimize ε 1 + ε + ε 3 Subject to Δb Δx Δx Δx Δx Δx5 ε 1 (10) Δb Δx Δx Δx Δx Δx5 ε Δb Δx Δx Δx Δx Δx5 ε 3 x% i = ρijx% i=1,,, 5; j=1,,, 5; j Δx Integer i=1,,, 5 i where x% or i x% is the original inputs; ρ j ij is the correlation coefficient between input variables x% i and x% ; Δ b j i is the slack entries generated by the ICA-DEA model; and Δ xi is the re-transformed slack entries of the ICA-DEA model. Because the above optimization problem is in a simple linear programming format, solutions can be derived with the simplex method as in the usual linear programming approach. The 3 rd column in Table 8 shows the final slack analysis results of the proposed ICA-DEA method for DMU 1. Similar to the results made by the single DEA model, the slack entries are all positive compared to the efficient hospitals. But the results made by the ICA-DEA approach suggest fewer slack entries for Beds, Doctors, Nurses, and Administrative staffers, and a greater number of slack entries for Administrative persons. 0

21 Table 8. Slack analysis of single DEA method and ICA-DEA method using DMU 1 as example Original Slack entry by Slack entry by Variables real value single DEA model ICA-DEA model Beds ( x% 1 ) Doctors ( x% ) Nurses ( x% 3 ) Administrative persons ( x% 4 ) Administrative staffers ( x% 5 ) DMU 1 efficiency score Conclusions One of the most important managerial issues of any type of institution is measuring the relative efficiency of its decision making units. DEA is an efficiency comparison method based on linear programming and has been used in a variety of contexts. In this study, a two-stage ICA- DEA approach is proposed to improve the discriminatory capability of DEA results. The proposed method first applies ICA to the input variables to generate ICs representing the independent sources of input variables. Then, the important ICs are identified and selected based on their kurtosis values. The selected ICs, regarded as the key factors affecting efficiency measurement, are utilized as new input variables in the DEA model. The proposed ICA-DEA approach is illustrated by Taiwanese hospital data from 85 hospitals (DMUs) in 005. Compared to the single DEA and integrated PCA and DEA models, the proposed ICA-DEA method has the lowest average efficiency score, the fewest efficient DMUs, and the highest standard deviation. This result provides evidence that the proposed ICA-DEA approach has a superior ability to differentiate the performance of the DMUs and to overcome the shortcomings of DEA. References Adler, N., Golany, B., 001. Evaluation of deregulated airline networks using data envelopment analysis combined with principal component analysis with an application to Western Europe. European Journal of Operational Research 13 (),

22 Adler, N., Golany, B., 00. Including principal component weights to improve discrimination in data envelopment analysis. Journal of the Operational Research Society 53(9), Adler, N., Yazhemsky, E., 010. Improving discrimination in data envelopment analysis: PCA- DEA or variable reduction. European Journal of Operational Research 0(1), Ancarani, A., Di Mauro, C., Giammanco, M.D., 009. The impact of managerial and organizational aspects on hospital wards' efficiency: Evidence from a case study. European Journal of Operational Research 194 (1), Back, A., Weigend, A., A first application of independent component analysis to extracting structure from stock returns. International Journal of Neural Systems 8, Bartlett, M.S., Movellan, J.R., Sejnowski, T.J., 00. Face recognition by independent component analysis. IEEE Transactions Neural Networks 13(6), Bell, A.J., Sejnowski, T.J., An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7 (6), Biørn, E., Hagen, T.P., Iversen, T., Magnussen, J., 003. The effect of activity-based financing on hospital efficiency: A panel data analysis of DEA efficiency scores Health Care Management Science 6 (4), Cardoso, J.F., Souloumiac, A., Blind beamforming for non-gaussian signals. IEE Proceedings, Part F: Radar and Signal Processing 140 (6), Charnes, A., Cooper, W.W., Rhodes, E., Measuring the efficiency of decision making units. European Journal of Operational Research (6), Cheung, Y.-M., Xu, L., 001. Independent component ordering in ICA time series analysis. Neurocomputing 41, Cooper, W.W., Seiford, L.M., Zhu, J., 004. Handbook on Data Envelopment Analysis. Kluwer Academic, Boston, MA. Cooper, W.W., Seiford, L.M., Tone, K., 000. Data Envelopment Analysis-a Comprehensive Text with Models, Applications, References and DEA-Solver Software. Kluwer Academic, Boston, MA. Cooper, W.W., Timothy, W.R., Deng, H., Wu, J., Zhang, Z., 008. Are state-owned banks less efficient? A long- vs. short-run data envelopment analysis of Chinese banks. International Journal of Operational Research 3 (5), Draper, B., Baek, K., Bartlett, M.S., Beveridge, J.R., 003. Recognizing faces with PCA and ICA. Computer Vision and Image Understanding 91(1-), Dyson, R., Allen, R., Camanho, A.S., Podinovski, V.V., Sarrico, C.S., Shale, E.A., 001. Pitfalls and protocols in DEA. European Journal of Operational Research 13(),

23 Hu, J.L., Huang, Y.F., 004. Technical efficiencies in large hospitals: A managerial perspective. International Journal of Management 1, Hu, Y., Zhang, Z., Liang, W., 009. Efficiency of primary schools in Beijing, China: An evaluation by data envelopment analysis. International Journal of Educational Management 3 (1), Hua, H., Li-xin, S., Xian-li, Z., Sheng-xin, C., 009. Data envelopment analysis-based evaluation of pharmacy efficiencies of military hospitals. Academic Journal of Second Military Medical University 30 (5), Hyvärinen, A., Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 10 (3), Hyvärinen, A., Karhunen, J., Oja, E., 001. Independent Component Analysis. John Wiley & Sons, New York. Hyvärinen, A., Oja, E., 000. Independent component analysis: Algorithms and applications. Neural Networks 13 (4-5), Johnson, R.A., Wichern, D.W., 007. Applied Multivariate Statistical Analysis. Prentice Hall, New Jersey. Kao, C., Liu, S.-T, 009. Stochastic data envelopment analysis in measuring the efficiency of Taiwan commercial banks. European Journal of Operational Research 196 (1), Kirigia, J.M., Emrouznejad, A., Cassoma, B., Asbu, E.Z., Barry, S., 008. A performance assessment method for hospitals: The case of municipal hospitals in Angola. Journal of Medical Systems 3 (6), Lee, J.M.; Yoo, C., Lee, I.B., 004. Statistical process monitoring with independent component analysis. Journal of Process Control 14(5), Lee, T.W., Independent Component Analysis: Theory and Application. Kluwer Academic Publishers, Boston, MA. Liu, C., Wechsler, H., Comparative assessment of independent component analysis (ICA) for face recognition. In: Proceedings of the Second International Conference on Audio- and Video-based Biometric Person Authentication (AVBPA'99), Washington DC. Luo, X., Donthu, N., 001. Benchmarking advertising efficiency. Journal of Advertising Research 41(6), Mancebón, M.J., Mũiz, M.A., 008. Private versus public high schools in Spain: Disentangling managerial and programme efficiencies. Journal of the Operational Research Society 59 (7), Moghaddam, B., 00. Principal manifolds and probabilistic subspaces for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 4(6),

24 Nunamaker, T. R Using data envelopment analysis to measure the efficiency of non-profit organization: A critical evaluation. Managerial and Decision Economics 6(1), Parkin, D., Hollingsworth, B., Measuring production efficiency of acute hospitals in Scotland : validity issues in data envelopment analysis. Applied Economics 9, Puig-Junoy, J., 000. Partitioning input cost efficiency into its allocative and technical components: An empirical DEA application to hospitals. Socio-economic Planning Sciences 34 (3), Ray, S.C., Jeon, Y., 008. Reputation and efficiency: A non-parametric assessment of America's top-rated MBA programs. European Journal of Operational Research 189 (1), Sahin, I., Ozcan, Y.A., 000. Public sector hospital efficiency for provincial markets in Turkey. Journal of Medical Systems 4 (6), Sahoo, B.K., Tone, K., 009. Decomposing capacity utilization in data envelopment analysis: An application to banks in India. European Journal of Operational Research 195 (), Sánchez, V.D.A., 00. Frontiers of research in BSS/ICA. Neurocomputing 49, 7-3. Turk, M., Pentland, A., Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1), Yuen, P.C., Lai, J.H., 00. Face representation using independent component analysis. Pattern Recognition 35(6),

AN IMPROVED APPROACH FOR MEASUREMENT EFFICIENCY OF DEA AND ITS STABILITY USING LOCAL VARIATIONS

Bulletin of Mathematics Vol. 05, No. 01 (2013), pp. 27 42. AN IMPROVED APPROACH FOR MEASUREMENT EFFICIENCY OF DEA AND ITS STABILITY USING LOCAL VARIATIONS Isnaini Halimah Rambe, M. Romi Syahputra, Herman