Public Sector Management I Produktivitätsanalyse Introduction to Efficiency and Productivity Measurement Note: The first part of this lecture is based on Antonio Estache / World Bank Institute: Introduction to Efficiency Measurement in Infrastructure Regulation; Presentation given at the 3rd Berlin Summer School on Infrastructure;and Nicole Adler/ Hebrew University of Jerusalem: Productivity Analysis ; Presentation given at the Infratrain Spring School 2004 at TU-Berlin; The material has been adapted by Astrid Cullmann, Andreas Kappeler and Christian von Hirschhausen --
Agenda. Introduction 2. Regulation Concepts 3. Total Factor Productivity (TFP) 4. Data Envelopment Analysis (DEA) 5. Stochastic Frontier Analysis (SFA) 6. Malmquist Indices 7. Average Cost Functions 8. Yardstick Regulation 9. Case Studies Literature -2-
Efficiency Measurement Using Stochastic Frontiers Introduction: Efficiency measures: assume production function of fully efficient firms is known: DEA: non parametric piece-wise-linear technology, linear programming SFA: parametric function, econometric estimation methodologies. -3-
Stochastic Frontier Analysis (SFA) (Coelli (998), 85) Description: An econometric method that estimates a production frontier of the form ln(y i )=x i ß+v i -u i, where ln(y i ) is the logarithm of the (scalar) output, x i is a (K+)-row vector, whose first element is and the remaining elements are the logarithms of the K- input quantities used by the i-th firm. ß is a (K+)-column vector of the unknown parameters. v accounts for measurement errors and other random factors, such as the effects of weather, strikes, luck etc. on the value of the output variable, together with the combined effects of unspecified input variables in the production function. It is assumed that v is independent and identically distributed (i.i.d.) random variables with mean zero and constant variance, independent of u which are assumed to be i.i.d. exponential or half-normal random variables. The model is called the stochastic frontier production function because the output values are bounded above by the stochastic (random) variable (exp (x i b+v)). The random error, v, can be positive or negative and so the stochastic frontier outputs vary about the deterministic part of the frontier model (exp (x ib )). A cost frontier (short run or long run) or a distance function can be used instead. -4-
The Stochastic Frontier Production Function ln( y i ) ln( y ) = xβ + v u xi i i i i v i β u i Logarithm of the output Logarithms of the input quantities Unknown parameters to be estimated Cobb Douglas functional form v 2 random error, independent and identically distributed i.i.d, N(0, ) i σ ui Idenpendent and identically distributed, half normal random variables; estimate technical inefficiency of the firm -5-
Technical Efficiency of the i-th firm yi ln( xiβ ui) y exp( ui) = = TEi = = exp( xiβ) ln xiβ y The ratio of the observed output for the i-th firm, relative to the potential output, defined by the frontier function Takes a value between zero and one Indicates the magnitude of the output of the i-th firm relative to the output that could be produced by a fully efficient firm using the same input vector -6-
The Stochastic Frontier Production Function -7-
Stochastic Frontier Analysis (SFA) (Frontier Economics (2003), A-6) Distribution of Errors of SFA compared to other parametric approaches. -8-
Parametric Methods (Pollitt (200), 6) X B E C A C OLS = α+f (Y) C SFA = f 2 (Y) C MOLS = (α-ca)+f (Y)+r i C COLS = (α-ca)+f (Y) Efficiency of firm B SFA = EF/BF 0 F Y -9-
Maximum-Likelihood Estimation Parameters can be estimated using: ) Maximum Likelihood Method; 2) COLS Method Maximum Likelihood Deriving log likelihhod function in terms of the two variance parameters: Assumption: us i N 2 (0, σ ) σ = σ + σ 2 2 2 s v vs i γ = 2 N(0, σ v ) N N ln( L) = ( π 2) log( σ ) + ln θ( z ) (ln y x β) 2 2 2 σ σ N N 2 2 s [ i ] 2 i i i= σ s i= 2 2 s The ML estimates of Beta, sigma and Gamma are obtained by finding the maximum of the log likelihood function. ML estimates are consistent and asymptotically efficient. -0-
The Three-Step Estimation Method Progam will follow a three-step procedure: Estimating the maximum likelihood estimates of the parameters of a stochastic frontier production function ) Oridnary Least Squares (OLS) estimates of the function are obtained 2) A two phase grid search of is conducted, with the β parameters set to OLS values 3) The values selected in the grid search are used as starting values in an iterative procedure to obtain the the final maximum likelihood estimates γ --
Prediction of Firm-level Technical Efficiencies Technical efficiency of the i th firm is defined as: exp( ui) = TE i v only difference could be observed!! i u i best predictor for u i Is conditional expectation of given the value of [ u] Eu v i i i v i u i -2-
Tests of Hypotheses null hypothesis that there are no technical inefficiency effects in the model H : σ > 0 H 2 0 : σ = 0 2 One side generalised Likelihood-Ratio Test Test Statistic is calculated in in Frontier α = 0,05 Critical value for a test of size is 2.7-3-
SFA Generalized Likelihood Ratio Test (Coelli (998), 88 sq) For the frontier model, the null hypothesis, that there are no technical inefficiency effects in the model can be conducted by testing the null and alternative hypothesis: H 0 : σ²=0 versus H : σ²>0. This hypothesis can be tested using a number of test statistics. One alternative is the Generalized Likelihood-Ratio Test testing alternative hypothesis H 0 : γ=0 versus H : γ>0 with γ=σ²/σ s ²and σ s ²=σ² + σ v ² The test requires the estimation of the model under both the null and alternate hypotheses. Under the null hypothesis the model is equivalent to the traditional average response function, without the technical inefficiency effect, u i. The test statistic is calculated as LR = -2{ln[L(H 0 )/L(H )]}=-2{ln[L(H 0 )]-ln[l(h )]} Where L(H 0 ) and L(H ) are the values of the likelihood function under the null and alternative hypothesis respectively. -4-
SFA Generalized Likelihood Ratio Test (Coelli (998), 90 sq) If H 0 is true, this test statistic is usually assumed to be asymptotically distributed as a chi-square random variable with degrees of freedom equal to the number of restrictions involved. The calculation of the critical value for this one-sided generalized likelihood ratio test is quite simple. The critical value for a test of size α is equal to the value, Χ 2 (2α), where this is the value which is exceeded by the Χ 2 random variable with probability equal to 2α. Thus the one-sided generalized likelihood-ratio test of size α is: Reject H 0 : γ=0 in favour of H : γ>0 if LR exceeds Χ 2 (2α). Thus the critical value for a test of size, α=0.05 is 2.7 rather than 3.84. -5-
Additional Topics on Stochastic Frontiers ) Technical inefficiency effects had half normal distribution ) truncated normal distribution 2) Cobb-Douglas functional form 2) Translog Functions 3) Production function 4) Cross-sectional data 3) Cost Function, Distance Functions 4) panel data -6-
Translog Functional form / Truncated normal distribution ln( Q) = β ln( K ) + β ln( L) + β ln( K ) + β ln( L) 5 2 2 i 0 i 2 i 3 i 4 i + β ln( K ) ln( L) + ( V U ) i i i i u i Are non-negative random variables which are assumed to account for technical inefficiency in production and are assumed to be iid as truncations at zero of the 2 N ( µ, σ u ) Interpretation of Coefficients: β0, β0 β0-7-
Panel data Models Number of firms are observed over a number of time periods, PANEL DATA Advantages: larger number of degrees of freedom, permit the simultaneous investigation of both: technical change and technical efficiency change over time Panel data version of the stochastic frontier model ln( y ) = x β + v u it it it it -8-
Panel Data Models : Time-varying Inefficiency Model In frontier 4. error component model 3. The Battese and Coelli (992) specification (Model) Propose a stochastic frontier production function for panel data, firm effects which are assumed to be distributed as truncated normal random variables, permitted to vary systematically with time Y = x β + ( V U ) it it it it -9-
Panel data Models 2: Modelling Inefficiency Effects Investigate the determinants of technical inefficiencies Inefficiency effects were defined to be explicit functions of some firm specific factors, all parameters were estimated in a single stage ML procedure Model 2: TE effects model Possible to include time trend The Battese and Coelli (995) Specification, (Model 2) -20-
Conclusions Problems of stochastic frontiers: ) selection of distributional form for inefficiency effects may be arbitrary, 2) production technology must be specified by a particular functional form 3) stochastic frontier approach only well developed for single output technologies next step: distance functions or Indices -2-
SFA Analysis FRONTIER Version 4. Prof. Coelli -22-
Files Needed ) The executable file FRONT4.EXE 2) A data file 3) An instruction file 4) An Output file Data and instruction files: created by the user prior to execution Output file: created by FRONTIER during execution -23-
The Program Program requires that the data be listed in text files: Listed by observation: Firm number period number output inputs (z-variables) Data must be transformed if a functional form other than a linear function is required: Cobb Douglas and Translog functional forms are most often used in stochastic frontier analysis Program can receive instructions either from a file or from a terminal. User is asked whether instructions come from a file or a terminal -24-
The Three-Step Estimation Method Program will follow a three-step procedure: Estimating the maximum likelihood estimates of the parameters of a stochastic frontier production function ) Ordinary Least Squares (OLS) estimates of the function are obtained 2) A two phase grid search of is conducted, with the parameters set to OLS values 3) The values selected in the grid search are used as starting values in an iterative procedure to obtain the the final maximum likelihood estimates -25-
Program Output presented in the output file ) OLS estimates, 2) the estimates after the grid search, 3) the final maximum likelihood estimates all approximate standard errors (estimate of the covariance matrix) Estimates of individual technical or cost efficiencies are calculated Mean efficiencies: simply arithmetic average of the individual efficiencies -26-
A Few Short Examples A Cobb-Douglas production frontier using cross-sectional data and assuming a half normal distribution A Translog production frontier using cross sectional data assuming a truncated normal distribution The Battese and Coelli specification Model The Battese and Coelli specification Model 2 3. und 4. For Panel Data -27-
. Cobb Douglas Production Frontier / half normal Distribution ln( Q) = β + β ln( K ) + β ln( L) + ( V U ) i 0 i 2 i i i K i L i V i U i Capital, Labor Q i Output Normal distributed Half normal distributed Create Data file: inputs and output have to be logged!! -28-
Data File: sfa.dta.000000.000000 2.547725 2.24240 3.55969 2.000000.000000 3.89859.53536 4.347655 3.000000.000000 3.037594.628260 4.497574 4.000000.000000 2.5820.596353 3.575095 5.000000.000000 2.486406 2.65275 3.327838......... 58.00000.000000 3.06426 2.23328 4.467332 59.00000.000000 3.30049 2.058473 4.099995 60.00000.000000 2.646529.72650 3.78932-29-
Instruction File: sfa.ins =ERROR COMPONENTS MODEL, 2=TE EFFECTS MODEL eg.dta DATA FILE NAME eg.out OUTPUT FILE NAME =PRODUCTION FUNCTION, 2=COST FUNCTION y LOGGED DEPENDENT VARIABLE (Y/N) 60 NUMBER OF CROSS-SECTIONS NUMBER OF TIME PERIODS 60 NUMBER OF OBSERVATIONS IN TOTAL 2 NUMBER OF REGRESSOR VARIABLES (Xs) n MU (Y/N) [OR DELTA0 (Y/N) IF USING TE EFFECTS MODEL] n ETA (Y/N) [OR NUMBER OF TE EFFECTS REGRESSORS (Zs)] n STARTING VALUES (Y/N) IF YES THEN BETA0 BETA TO BETAK SIGMA SQUARED GAMMA MU [OR DELTA0 ETA DELTA TO DELTAP] NOTE: IF YOU ARE SUPPLYING STARTING VALUES AND YOU HAVE RESTRICTED MU [OR DELTA0] TO BE ZERO THEN YOU SHOULD NOT SUPPLY A STARTING VALUE FOR THIS PARAMETER. -30-
Exercice : Empirical application 25 German electricity distribution companies ln( Q) = β + β ln( K ) + β ln( L) + ( V U ) i 0 i 2 i i i. A Cobb-Douglas production frontier using cross-sectional data and assuming a half normal distribution -3-
2. A translog production frontier / truncated normal distribution ln( Q) = β ln( K ) + β ln( L) + β ln( K ) + β ln( L) 5 2 2 i 0 i 2 i 3 i 4 i + β ln( K ) ln( L) + ( V U ) i i i i The squared and interaction terms have to be generated!! ui Are non-negative random variables which are assumed to account for technical inefficiency in production and are asumed to be iid as truncations at zero of the 2 N ( µ, σ u ) µ Calculated by Frontier 4. -32-
Data File: sfa2.dta.000000.000000 2.547725 2.24240 3.55969 5.028404 2.66769 7.988 2.000000.000000 3.89859.53536 4.347655 2.357333 8.902 6.67529 3.000000.000000 3.037594.628260 4.497574 2.65230 20.2287 7.32328......... 58.00000.000000 3.06426 2.23328 4.467332 4.986860 9.95706 9.97624 59.00000.000000 3.30049 2.058473 4.099995 4.23732 6.80996 8.439730 60.00000.000000 2.646529.72650 3.78932 2.980835 4.35752 6.54973-33-
Instruction File: sfa2.ins =ERROR COMPONENTS MODEL, 2=TE EFFECTS MODEL eg2.dta DATA FILE NAME eg2.out OUTPUT FILE NAME =PRODUCTION FUNCTION, 2=COST FUNCTION y LOGGED DEPENDENT VARIABLE (Y/N) 60 NUMBER OF CROSS-SECTIONS NUMBER OF TIME PERIODS 60 NUMBER OF OBSERVATIONS IN TOTAL 5 NUMBER OF REGRESSOR VARIABLES (Xs) y MU (Y/N) [OR DELTA0 (Y/N) IF USING TE EFFECTS MODEL] n ETA (Y/N) [OR NUMBER OF TE EFFECTS REGRESSORS (Zs)] n STARTING VALUES (Y/N) IF YES THEN BETA0 BETA TO BETAK SIGMA SQUARED GAMMA MU [OR DELTA0 ETA DELTA TO DELTAP] NOTE: IF YOU ARE SUPPLYING STARTING VALUES AND YOU HAVE RESTRICTED MU [OR DELTA0] TO BE ZERO THEN YOU SHOULD NOT SUPPLY A STARTING VALUE FOR THIS PARAMETER. -34-
Exercise 2: Empirical Application 25 German electricity distribution companies ln( Q) = β ln( K ) + β ( L) + β ln( K ) + β ln( L) 5 2 2 i 0 i 2ln i 3 i 4 i + β ln( K ) ln( L) + ( V U ) i i i i. A Translog production frontier using cross sectional data assuming a truncated normal distribution -35-
3. The Battese and Coelli (992) specification (Model) Propose a stochastic frontier production function for panel data, firm effects which are assumed to be distributed as truncated normal random variables, permitted to vary systematically with time Y = x β + ( V U ) it it it it -36-
4. The Battese and Coelli (995) Specification, (Model 2) Number of empirical studies: estimated stochastic frontiers and predicted firm-level efficiencies using these estimated functions; then regressed predicted efficiencies upon firm specific variables, identify reasons for differences; inconsistent: regarding independence of the inefficiency effects in the 2 estimation stages Propose stochastic frontier models in which inefficiency effects are expressed as an explicit function of a vector of firm specific variables and a random error. -37-
Advanced Methods on Parametric Efficiency Measurement ) Distance Function 2) Summary Panel Data Models -38-
Agenda. Introduction 2. Distance Function 3. Panel Data Models for Stochastic Frontier Analysis -39-
Introduction - Aggregation within the Production Functions - Multi-output production process within the SFA only achieved by aggregating? - Modeling the inefficiency in time within the panel data: important issue - How to treat environmental variables which influence the efficiency of the firms within the Stochastic Frontier Analysis? -40-
Agenda. Introduction 2. Distance Function 3. Panel Data Models for Stochastic Frontier Analysis -4-
Multi-output Production and Distance Functions Single output production function Cobb Douglas and Translog To accommodate multiple output situations Specify a multi-output production function: The Distance Function Is a function, d = h(x,y), that measures the efficiency wedge for a firm in a multi-input, multioutput production context. It is thus a generalization of the concept of the production frontier -42-
Multi-input multi-output Production Technology Technology set S is then definied as: {(, ) : } S = x y x y X = non negative K* input vector Y = non negative M* output vector Set of all input output vectors (x,y) such that x produce y Allow one to describe a multi-input, multi-output production technology, without the need tp specify a behavioral objective (cost minimization, profit maximization) May specify both: Input Distance Function Output Distance Function -43-
Output Distance Function Maximal proportional expansion of the output vector, given an input vector! Production technology defined by the set S, equivalently defined using output sets, P(x) { } Px ( ) = y: x y Properties: inaction is possible, non zero output levels cannot be produced from zero levels of inputs, strong disposability of outputs, strong disposability of inputs, P(x) is closed, bounded and convex. Output distance function is defined on the output set, P(x): 0 { δ δ } d ( x, y ) = min :( y / ) Px ( ) Distance is equal to unity if y belongs to the frontier of the production possibility set d ( x, y ) = 0 If y belongs to the production possibility set of x then the distance is d ( x, y) 0-44-
Output Distance Function and Production Possibility Set y2 B Y2 A C PPC-P(x) Y y -45-
Input Distance Function Characterizes the production technology by looking at a minimal proportional contraction of the input vector given an output vector Defined on the input set L(y) d ( x, y) = max ρ:( x/ ρ) L( y) i { } L(y) represents set of all input vectors x which can produce output vector y Ly ( ) = x: x y { } -46-
Distance Function Specification (Coelli (998), p. 66) x 2 x 2A A L(x) B C Isoq- L(x) x A x -47-
Estimate Distance Functions Using Econometric Methods Specify translog functional form, estimate unknown parameters of the distance function Input distance function in log form: di = g( x, y) Impose homogeneity of degree one in inputs di xk = g ( x xk), y We obtain x = g ( x x ), y d k k i Function to estimate, ML or COLS, Frontier 4. -48-
Input Distance Function Specification (Coelli (2002), p. 2 sq.) The original Translog Form of an Input Distance-Function with M outputs and K inputs and D as distance function value is given by ln ( DI ) + 2 = α0 + K k= l K = M γm m= lny + 2 Restrictions required for homogeneity of degree + in inputs are m βkmlnxk lnxl + K M m= k= M M n= m= γmnlny m lny δkmlnxk lny m n + K βk k= lnx k K k = β k K =, β kl = 0 and δ l = K k = km = 0 And those for symmetry are γ mn = γnm and β kl = β lk The level of inefficiency can be estimated from a stochastic frontier production function of the form y = f(x)+v-u, where v is the error term (assumed to be N[0, σ ] ) and u is the one-sided inefficiency term. The level of efficiency is estimated by exp(-u)). Consequently, lnd 0i =-u i. -49-
Input Distance Function Specification (Coelli (2002), p. 2 sq.) -50- Imposing the homogeneity restrictions (by dividing the whole equation by an optional input) results in Where lnd I can be interpreted as inefficiency term (u i ) given the stochastic error (v i ) this model is formulated in the common SFA form and can be estimated with conventional SFA software. For estimation purposes, the negative sign on the dependent variable can be ignored. This results in the signs of the estimated coefficient being reversed. ( ) I m M m K k km K k K l K l K k kl K k K k K k k M n n m mn M m M m m m K D y x x x x x x x x y y y x ln ln ln ln ln 2 ln ln ln 2 ln ln 0 + + + + + = = = = = = = = = β β β γ γ α
Input Distance Function Estimation (based on Coelli (2002), p. 3 and Bjorndal (2002), p. 8) For for I (i =, 2,, I) firms, this econometric specification with lnd i = -u i, in its normalized form is expressed by: + ln 2 ( xki) K k = = α 0 + K l= M m= x βkl ln x γmln ki Ki y mi x ln x + li Ki 2 + M m= K k = M n= M γ m= mn ln y mi ln x βkmln x ki Ki y ni + ln y K k = mi βk + v x ln x i u i ki Ki For estimation the sign of the explained variable is not of importance. If one uses lnx rather than lnx, the estimated coefficients are reversed. However this is more consistent with the expected signs of conventional production functions (Coelli and Perelman 996). Further it provides a convenient means of qualitatively assessing the model. As for the Error Component Model in SFA, a distribution for u i has to be assumed. Again normal distribution truncated at zero, u j ~ [N (µ,σ 2 )] and a half-normal distribution truncated at zero, u j ~ [N (0, σ 2 )] are most common. -5-
Agenda. Introduction 2. Distance Function 3. Panel Data Models for Stochastic Frontier Analysis -52-
Panel Data Models - Number of firms are observed over a number of time periods, PANEL DATA - Advantages: larger number of degrees of freedom, permit the simultaneous investigation of both: technical change and technical efficiency change over time Panel data version of the stochastic frontier model ln( y ) = x β + v u it it it it Use panel data to view the unobserved factors affecting the dependent variable: those that are constant and those that vary over time. ) first differentiating 2) fixed effects estimation 3) random effects estimation Fixed Effects or Random Effects -53-
Advanced Panel Data Models 3) Fixed effects estimator: Uses transformation to remove the unobserved effect prior to estimation Any time-constant explanatory variables are removed along with the unobserved effect Within Estimator 4) Random effects estimator: Attractive when we think that the unobserved effect is uncorrelated with all explanatory variables. Condition: good controls in our equation, we believe that any leftover neglected heterogeneity only induces serial correlation in the composite error term Estimation of Random effects model by Generalized Least Squares Between Estimator -54-
Overview of Panel Data in Stochastic Frontier Analysis - Pitt and Lee (98) The first who used panel data models within the SFA Considered the panel data random effect as inefficiency; Half normal distribution Time Invariance - Schmidt and Sickles (984) Applied the fixed effects model Overestimation Extension: including time variant inefficiency Which means that technical efficiency level can change systematically over time u = δ ( t) u Time invariance issue: different formulation it i -55-
Time Invariance Issue ) Lee and Schmidt (993) δ ( t) = δ d t t t 2) Kumbhakar (990) δ ( t) = [ + exp( δ t 2 t + δ 2 )] 3) Battese and Coelli (992, 995) δ ( t) = exp[ η ( t T )] Maximum likelihood estimation is described by Pitt and Lee (98) In FRONTIER Battese and Coelli 992, 995 Involves only one unknown parameter, less flexible than the others. Specify a function that determines how technical inefficiency varies over time Shortcoming: in all the models the variation of efficiency with time is considered as a deterministic function that is commonly defined for all the firms!!! -56-
Battese and Coelli 992 - In FRONTIER 4. Error Component Model Propose a stochastic frontier production function for panel data with random effects Y = x β + ( V U ) δ ( t) = exp[ η ( t T )] it it it it Firm effects which are assumed to be distributed as truncated normal random variables, permitted to vary systematically with time - Maximum Likelihood estimation is described by Pitt and Lee (98) -57-
Battese and Coelli 995 - In FRONTIER 4. Technical Effects Model Investigate additionally the determinants of technical inefficiencies Inefficiency effects were defined to be explicit functions of some firm specific factors, all parameters were estimated in a single stage ML procedure Y = x β + ( V U ) δ ( t) = exp[ δ ( t T )] it it it it ' γ, σ Observable environmental variables id to allow them to directly influence the stochastic component of the production frontier The inefficiency effects have distributions that vary with z (no longer identically distributed) Generalization of the log likelihood function derived by Battese and Coelli (993,995) u ~ N ( + i z i 2 u ) -58-