Using Genetic Algorithms for Maximizing Technical Efficiency in Data Envelopment Analysis

Using Genetic Algorithms for Maximizing Technical Efficiency in Data Envelopment Analysis Juan Aparicio 1 Domingo Giménez 2 Martín González 1 José J. López-Espín 1 Jesús T. Pastor 1 1 Miguel Hernández University, 2 University of Murcia Spain ICCS, Reykjavík, June 3, 2015

Outline 1 Data Envelopment Analysis 2 Valid Solutions 3 Genetic algorithm 4 Hybrid metaheuristics 5 Conclusions and future works

DEA (Data Envelopment Analysis): non-parametric technique to estimate the level of efficiency of a set of entities, DMU (Decision Making Unit), all of them operating in the same technological environment. Each DMU j consumes m inputs, denoted as (x 1j,..., x mj ), to produce s outputs, denoted as (y 1j,..., y sj ). DEA also provides information on how to remove inefficiency through the determination of benchmarking information. Objetive: the estimation of the production frontier and the technical efficiency of each DMU (the distance from each interior DMU to the boundary of the technology).

Model of mathematical lineal programming (Aparicio et al., 2007) t ik x ik max β k 1 m m i=1 s.t. β k + 1 s t + rk s r=1 = 1 (c.1) y rk β k x ik + n j=1 α jkx ij + t ik = 0 i (c.2) β k y rk + n j=1 α jky rj t + rk = 0 r (c.3) m i=1 ν ikx ij + s r=1 µ rky rj + d jk = 0 j (c.4) ν ik 1 i (c.5) µ rk 1 r (c.6) d jk Mb jk j (c.7) α jk M(1 b jk ) j (c.8) b jk = 0, 1 (c.9) β k 0 (c.10) t ik 0 i (c.11) t + rk 0 r (c.12) d jk 0 j (c.13) α jk 0 j (c.14) It must be solved n times, one for each DMU.

Approaches to the problem Problem: combinatorial NP-hard problem, solved with unsatisfactory methods. Exact solutions only for small problem sizes. Possible solution: Metaheuristic algorithms. The main problem to apply metaheuristics is the difficulty of obtaining solutions satisfying all the constraints: In ICCS 2014, 9 of 14 constraints were considered. Now, all the constraints and generation of a higher percentage of valid solutions, with a Genetic Algorithm.

Representation of solutions A solution is represented by a vector of real and binary values. Binary part: b 0k... b jk Real part: β k α 0k... α jk t 0k satisfying the 14 constraints.... t t +... t + ik 0k rk fitness: Value returned by the objective function. β k 1 m m t ik x ik i=1 Heuristics to generate valid solutions.

First heuristic 1 Generate b jk j (c.9). Restrictions: number of b jk equal to 0, > s and < s + m. 2 Calculate the values of α jk and d jk j by means of a system of equations. 3 t + rk r and β k are generated to satisfy c.1, with a refinement process: Generate r, t + rk randomly between 0 and 1; Obtain β k using c.1. while β k 0 OR β k 1 do if β k < 0 then Generate r randomly, and t + rk = t+ rk /(2.0 + random(0, 1, 2)) else Generate r randomly, and t + rk = t+ rk (2.0 + random(0, 1, 2)) end if Obtain β k using c.1. end while 4 α jk j are calculated using c.3 by solving the system of equations. 5 t ik calculated using c.2. by solving the system of equations. 6 Finally, ν ik i are generated randomly, µ rk r are obtained by solving system c.4 and the number of d jk equal to 0 is the same as the number of α different from 0.

Second heuristic used to recalculate non valid solutions after the first heuristic 1 b jk j generated as in heuristic one; values α generated randomly. 2 α jk j modified to satisfy c.1, c.2., c.3., c.11. and c.12. for i = 1,..., m do if x ik < n j=1 α jkx ij then j 0 / 1 m m i=1 x ij 0 1 s s i=1 y ij 0 = max j=1,...,n { 1 m m i=1 x ij 1 s s i=1 y ij } α j0 k = α j0 k 0.95 end if end for for r = 1,..., s do j 0 /... α j0 k = α j0 k 1.05 end for j adjust α jk with a similar refinement method. Adjust β k to satisfy c.11. and c.12. Obtain t + rk r and t ik i using c.2. and c.3. 3 Similar refinement to do β k satisfy c.2., c.3., c.11. and c.12. 4 ν ik i, µ rk r and d jk j as in the first method.

Percentage of valid solutions size 9 constraints - ICCS14 13 constraints - ICAC14 14 constraints m n s time (sec) % val. time (sec) % val. time (sec) % val. 2 15 1 26.42 51.44 82 35.58 33.21 10.82 72 18.12 0.09 0.02 100 0.00 3 25 2 6.72 16.03 90 30.46 72.89 15.56 24 20.97 0.88 0.68 96 2.85 4 30 2 0.22 0.16 100 0.00 89.84 18.63 16 21.13 0.88 1.74 95 1.49 5 40 3 13.13 20.64 74 43.40 116.39 12.86 1.6 2.49 27.22 42.38 92 9.07 6 60 4 2.01 1.13 35 44.07 117.26 14.15 0.06 0.10 93.46 70.08 53 35.57 Now higher percentage of valid solutions and for all the constraints apply metaheuristics to improve solutions.

Initialization: with the heuristics. End Condition: a maximum number of iterations or a maximum number without improving the best solution. Selection: valid solutions are selected for combination. Non-valid solutions are substituted for new valid solutions. Crossover Individual with components of six types, each combination works with one of these types. 1 Only β is considered. The mean of β 1 and β 2 of the two ascendants is obtained and randomly perturbed. The values of t ik and t+ rk are recalculated so that constraints c.1, c.2 and c.3 are fulfilled. 2 Values of t +, t, ν, µ or d are crossed. In each combination only parameters of one type randomly selected, with middle point combination. 3 Combination of the previous crossovers. All the parameters are candidates, and one is randomly selected. Mutation: each individual a 10% probability of being mutated. One parameter is selected randomly, and new values are randomly generated.

Comparison with CPLEX Fitness Time (logarithmic scale) 0.7 0.6 m=4,n=30, s=3 fitness 0.5 0.4 0.3 0.2 CPLEX crossover 1 crossover 2 crossover 3 0.1 0 0 5 10 15 20 25 30 iterations Small problems: solutions with GA close to those with CPLEX. Large problems: CPLEX impracticable.

Parameterized scheme Initialize(S,ParamIni) while not EndCondition(S,ParamEnd) do SS = Select(S,ParamSel) SS1 = Combine(SS,ParamCom) SS2 = Improve(SS1,ParamImp) S = Include(SS2,ParamInc) end while Different values of the Metaheuristic parameters different metaheuristics and hybridizations.

Metaheuristics in the experiments And Hyperheuristic by searching the best combination of Metaheuristic parameters.

Mean fitness Comparison of fitness Promedio Fitness 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 m=2 s=1 N=50 m=3 s=2 N=30 m=4 s=2 N=28 m=4 s=3 N=20 m=5 s=3 N=20 Tipo Problem de problema size CPLEX Hiperheuristic SS GA GR

Roadmap ICCS 2014 Increment the number of valid solutions with hybrid metaheuristics: combination of local search with distributed metaheuristics. Analyze the application of other metaheuristics, and hyperheuristics on top of them. Inclusion of the methods in metaheuristics for the optimization problem with a reduced number of restrictions. Extend the methodology to include the remaining restrictions.

Conclusions Application of Genetic algorithms and hybrid metaheuristics for a mathematical programming model for Data Envelopment Analysis. The results of previous works are improved: all the constraints are considered, and larger number of valid solutions are generated. Small problems: metaheuristics give fitness values close to the optimum, and hyperheuristics can be used to obtain satisfactory hybrid metaheuristics. Metaheuristics can be applied for large problems, for which huge execution times make exact methods impracticable.

Future works Improvement of heuristics to generate valid solutions. Hybridization of metaheuristics and exact methods. Improvement of the hyperheuristic. Parallelism to reduce the high execution time of metaheuristics, and specially of hyperheuristics.