European Journal of Operational Research

Size: px

Start display at page:

Download "European Journal of Operational Research"

Oscar Barry Tucker
5 years ago
Views:

European Journal of Operatonal Research xxx (2013) xxx xxx Contents lsts avalable at ScVerse ScenceDrect European Journal of Operatonal Research journal homepage: www.elsever.

Johnson b,, Erck Moreno-Centeno b,1, Tmo Kuosmanen c a Insttute of Manufacturng Informaton and Systems, Natonal Cheng Kung Unversty, Tanan Cty 701, Tawan b Department of Industral and Systems

1 European Journal of Operatonal Research xxx (2013) xxx xxx Contents lsts avalable at ScVerse ScenceDrect European Journal of Operatonal Research journal homepage: Interfaces wth Other Dscplnes A more effcent algorthm for Convex Nonparametrc Least Squares Cha-Yen Lee a, Andrew L. Johnson b,, Erck Moreno-Centeno b,1, Tmo Kuosmanen c a Insttute of Manufacturng Informaton and Systems, Natonal Cheng Kung Unversty, Tanan Cty 701, Tawan b Department of Industral and Systems Engneerng, Texas A&M Unversty, College Staton, TX 77840, USA c School of Busness, Aalto Unversty, Helsnk, Fnland artcle nfo abstract Artcle hstory: Receved 7 March 2012 Accepted 26 November 2012 Avalable onlne xxxx Keywords: Convex Nonparametrc Least Squares Fronter estmaton Productve effcency analyss Model reducton Computatonal complexty Convex Nonparametrc Least Squares (CNLSs) s a nonparametrc regresson method that does not requre a pror specfcaton of the functonal form. The CNLS problem s solved by mathematcal programmng technques; however, snce the CNLS problem sze grows quadratcally as a functon of the number of observatons, standard quadratc programmng (QP) and Nonlnear Programmng (NLP) algorthms are nadequate for handlng large samples, and the computatonal burdens become sgnfcant even for relatvely small samples. Ths study proposes a generc algorthm that mproves the computatonal performance n small samples and s able to solve problems that are currently unattanable. A Monte Carlo smulaton s performed to evaluate the performance of sx varants of the proposed algorthm. These expermental results ndcate that the most effectve varant can be dentfed gven the sample sze and the dmensonalty. The computatonal benefts of the new algorthm are demonstrated by an emprcal applcaton that proved nsurmountable for the standard QP and NLP algorthms. Ó 2012 Elsever B.V. All rghts reserved. 1. Introducton Convex Nonparametrc Least Squares (CNLSs) s a nonparametrc regresson method used to estmate monotonc ncreasng (decreasng) and convex (concave) functons. Hldreth ntroduced the CNLS concept n hs semnal work (1954), whle Hanson and Pledger (1976) were the frst to prove the statstcal consstency of the CNLS estmator n the sngle regresson case. The method has attracted attenton prmarly from statstcans, see Mammen (1991), Mammen and Thomas-Agnan (1999), and Groeneboom et al. (2001). Statstcal propertes such as consstency, Lm and Glynn (2012), Sejo and Sen (2012), and unform convergence propertes, Agulera et al. (2012), have been shown. Functons of ths type commonly arse n economcs. For example, Varan (1982, 1984) descrbes monotoncty and convexty as standard regularty condtons n the mcroeconomc theory of utlty and producton functons. Recently, CNLS has attracted consderable nterest n the lterature of productvty and effcency analyss (Kuosmanen and Johnson, 2010), and we wll focus our dscusson on ths doman. The two most common ways to estmate a fronter producton functon are Stochastc Fronter Analyss (SFA) and Data Envelopment Analyss (DEA) (e.g., Fred et al., 2008). The former one s a parametrc regresson method that requres a pror specfcaton Correspondng author. Tel.: E-mal addresses: cylee@mal.ncku.edu.tw (C.-Y. Lee), ajohnson@tamu.edu (A.L. Johnson), e.moreno@tamu.edu (E. Moreno-Centeno), Tmo.Kuosmanen@aalto. f (T. Kuosmanen). 1 Tel.: of the functonal form of the fronter. The latter one s a nonparametrc mathematcal programmng approach that avods the functonal form assumpton, but also assumes away stochastc nose. Attractvely, CNLS avods the functonal form assumpton, buldng on the same axoms as DEA, but t also takes nto account nose. CNLS estmates an average producton functon. However, CNLS can be used n a two-stage approach called Stochastc sem- Nonparametrc Envelopment of Data (StoNED) to combne the man benefts of both DEA and SFA (Kuosmanen and Kortelanen, 2012). Snce Kuosmanen (2008) ntroduced CNLS to the lterature of productve effcency analyss, several extensons to the methodology (Johnson and Kuosmanen, 2011 and Johnson and Kuosmanen, 2012; Mekaroonreung and Johnson, 2012) and emprcal applcatons have been reported n such areas as agrculture (Kuosmanen and Kuosmanen, 2009), power generaton (Mekaroonreung and Johnson, 2012), and electrcty dstrbuton (Kuosmanen, 2012). However, the computatonal complexty of CNLS presents a sgnfcant barrer for large-sample applcatons. Ths study focuses on ths barrer and proposes a generc algorthm to reduce the computatonal tme to solve the CNLS problem. 2 A varety of work has been done on the computatonal aspects of CNLS. Hldreth (1954) developed an algorthm based on Karush Kuhn Tucker (KKT) condtons that can potentally take an nfnte number of steps to dentfy the optmal soluton. 2 The proposed method could potentally beneft a varety of nonparametrc methods whch mpose shape constrants on the underlyng functon for example the estmator descrbed n Du et al. (n press). Here we focus on the least squares estmaton method CNLS /$ - see front matter Ó 2012 Elsever B.V. All rghts reserved. Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

2 2 C.-Y. Lee et al. / European Journal of Operatonal Research xxx (2013) xxx xxx Wlhelmsen (1976), and Pshenchny and Danln (1978) provded algorthms projectng the data ponts of the dependent varable to the faces of a polyhedral cone; both algorthms converge n a fnte number of teratons. Wu (1982) offered a smpler soluton that also converges n a fnte number of teratons. Dykstra (1983) proposed an teratve algorthm that s based on the projecton onto the closed convex cones and mnmzes a least squares objecton functon subject to concave restrctons. Goldman and Ruud (1993), and Ruud (1995) proposed an approach usng a dual quadratc programmng (QP) problem. They used a large number of parameters to cover all the permssble functons and obtaned a smooth multvarate regresson usng a projecton technque and structural restrctons of monotoncty and concavty. Fraser and Massam (1989) presented an algorthm to fnd the least square estmate of the mean n a fnte number of steps by dvdng the cone nto subspaces. Meyer (1999) generalzed Fraser and Massam s study and extended the algorthm to the case of more constrants than observatons. Some related work exsts applyng baggng and smearng methods to convex optmzaton, Hannah and Dunson (2012). Recently, Kuosmanen (2008) transformed the nfnte dmensonal CNLS problem to a fnte dmensonal lnearly-constraned quadratc programmng problem (QP), 3 whch enables one to solve the CNLS problem by usng standard QP algorthms and solvers (such as CPLEX, MINOS, MOSEK). However, the number of constrants of the QP problem grows as a quadratc functon of the number of observatons. Standard QP algorthms are lmted by the number of constrants, thus the computatonal burden when usng quadratc programmng to solve the CNLS problem s challengng even wth relatvely small sample szes. In lght of the computatonal ssues, the purpose of ths study s to develop a more effcent approach to model the concavty constrants n the QP. For ths purpose we use Dantzg et al. s (1954, 1959) strategy to solve large scale problems by teratvely dentfyng and addng volated constrants (modern cuttng plane methods for nteger programmng are based on ths semnal work). Indeed, we show that Dantzg et al. s strategy s not only useful when solvng NP-hard problems 4 (he used the strategy to solve the travellng salesperson problem), but s also useful when solvng large scale nstances of problems that are solvable n polynomal tme. Specfcally, the underlyng dea of the proposed generc algorthm s to solve a relaxed CNLS problem contanng an ntal set of constrants, those that are lkely to be bndng, and then teratvely add a subset of the volated concavty constrants untl a soluton that does not volate any constrant s found. In other words, the generc algorthm sgnfcantly reduces the computatonal cost to solve the CNLS problem by solvng a sequence of QPs that contan a consderably smaller number of nequaltes than the orgnal QP formulaton of the CNLS problem. Therefore, ths algorthm has practcal value especally n large sample applcatons and smulaton-based methods such as bootstrappng or Monte Carlo studes. The remander or ths paper s organzed as follows, Secton 2 ntroduce nonparametrc regresson, dscusses the relatonshp between the Afrat nequaltes and convex functons, and presents the QP formulaton of the CNLS problem. Secton 3 presents an algorthm to solve CNLS by dentfyng a set of ntal constrants and teratvely addng constrants. Secton 4 nvestgates the performance of the algorthm through Monte Carlo smulatons. Secton 5 presents an applcaton that was prevously too large to 3 Hereafter we wll refer to a lnearly-constraned quadratc programmng problem smply as a quadratc program (QP). 4 An algorthm s polynomal tme f the algorthm s runnng tme s bounded from above by a polynomal expresson n the sze of the nput for the algorthm; for the purposes of ths paper the nputs are the number of observatons and the number of components of the nput vector. NP-hard means non-determnstc polynomal-tme hard; n practcal terms, most computer scentsts beleve these problems cannot be solved n polynomal tme. solve usng the standard formulaton of CNLS and descrbes the performance of the algorthm and Secton 6 concludes. 2. CNLS and Afrat s theorem In ths secton we wll present the quadratc programmng formulaton of CNLS. An example wll llustrate the general result that typcally much fewer constrants are needed to solve CNLS than are ncluded n the standard formulaton. The functon estmated by CNLS s typcally not unque; however, the lower concave envelope of the functon s. Thus we wll present methods to estmate the lower concave envelope of the set of functons that are optmal to the CNLS formulaton. We wll also revew the results of Afrat (1972) n whch he proposed methods to mpose convexty on the estmaton of a producton functon. Afrat s results are mportant because they wll provde nsght nto dentfyng bndng constrants n the quadratc programmng formulaton of CNLS. Consder a producton functon wth shape restrctons that s estmated va CNLS, and specfy a regresson model y ¼ f ðxþþe where y s the dependent varable, x s a vector of nput varables, and e s a random varable satsfyng E(ejx) = 0. CNLS can be used to estmate any functon that belongs to the class of functons, F, satsfyng monotoncty and concavty. As an example, n ths paper we wll prmarly focus on the well know Cobb Douglas functonal form, f ðxþ ¼ Q M m¼1 xðb mþ m. Note f P M m¼1 b m 6 1, then the Cobb Douglas functon belongs to the class F. The specfcaton of the Cobb Douglas functon we use n our Monte Carlo smulatons for the data generaton process s y ¼ Q M m¼1 x ð 0:8 M Þ m. 5 Note, the addtve specfcaton of the dsturbance term n our regresson model does not allow one to estmate the underlyng Cobb Douglas functon by applyng ordnary least squares (OLSs) to the log-transformed regresson equaton. In ths case, the Cobb Douglas functon should be estmated by nonlnear regresson. However, the addtve verson of the Cobb Douglas regresson model has been wdely used, ncludng such semnal work as Just and Pope (1978). The producton functon, f(x), could be estmated by assumng a parametrc functonal form and applyng OLS or maxmum lkelhood methods; however n parametrc functons such as translog, the regularty condtons (monotoncty and concavty) are typcally dffcult to mpose (Hennngsen and Hennng, 2009). Recent developments n the nonparametrc regresson lterature now allow the estmaton of producton functons consstent wth monotoncty and concavty as descrbed below CNLS estmaton Nonparametrc regresson s a method that does not specfy the functonal form a pror. The contnuty, monotoncty and concavty constrants are enforced n the least squares estmaton method CNLS (Hldreth, 1954; Kuosmanen, 2008), mn a;b;e s:t: X n e 2 ¼1 y ¼ a þ b 0 x þ e for ¼ 1;...; n ðaþ a þ b 0 x 6 a h þ b 0 h x for ; h ¼ 1;...; n and h ðbþ b P 0 for ¼ 1;...; n; ðcþ ð1þ where y denotes the output, x = (x 1,..., x M ) 0 s the nput vector, and e s the dsturbance term that represents the devaton of frm 5 In the onlne appendx we consder a constant returns-to-scale verson, y ¼ Q M m¼1 x ðþ 1 M m. Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

3 C.-Y. Lee et al. / European Journal of Operatonal Research xxx (2013) xxx xxx 3 from the estmated functon. Constrants (1a) represent a basc lnear hyperplane and estmates ntercept a and slope b = (b 1,...,b m ) T parameters characterzng the margnal products of nputs for each observaton. Constrants (1b) mpose concavty usng Afrat s nequaltes. Fnally, constrants (1c) mpose monotoncty on the underlyng unknown producton functon. In general the parameters a ; b 0 estmated usng CNLS n formulaton (1) are non-unque; however the ftted values, ^y ¼ ^a þ ^b 0 x, are unque (Groeneboom et al., 2001). Thus one can calculate a lower concave envelope for the producton functon estmated usng CNLS. Usng the results from problem (1), Kuosmanen and Kortelanen (2012) propose to solve problem (2) to estmate the lower concave envelope. mn a le ;b le a le; þ b 0 le; x s:t: a le; þ b 0 le; x P ^y for ¼ 1;...; n ð2þ Here we us the notaton a le, and b 0 le; to ndcate these parameters are reestmated n (2) to fnd the lower concave envelope and may be dstnct from the parameters estmated n (1). The optmal soluton to problem (2) s unque and s also an optmal soluton to problem (1). Ths unqueness facltates the analyss of the generc algorthm (descrbed n the followng secton) and, thus, hereafter we refer to the optmal soluton to problem (2) as the optmal soluton to the CNLS problem. We note that the models (1) and (2) can be combned n a sngle optmzaton problem by usng a mult-crtera objectve functon and non-archmedean weghts to make the mnmzaton of squared errors lexcopgraphcally more mportant, but the use of a non-archmedean has caused consderable debate n the closely related Data Envelopment Analyss (DEA) lterature (see, for example, Boyd and Fare, 1984; Charnes and Cooper, 1984). Thus we prefer to mantan the two models n whch model (1) estmates the ftted values ^y and model (2) calculates the lower concave envelope. The ftted values are unque (Groeneboom et al., 2001); however, the set of hyperplanes need not be unque. Thus the lower concave envelope s consstent wth the mnmum extrapolaton prncple, Banker et al. (1984), and provdes a unque dentfcaton of the hyperplanes (Kuosmanen and Kortelanen, 2012). As s clear from formulaton (1) and (2), CNLS estmates the unknown producton functon usng n hyperplane segments. However, typcally the number of unque hyperplane segments s much lower than n (Kuosmanen and Johnson, 2010), whch presents an opportunty to reduce the number of constrants and decrease the tme requred to solve problem (1). To llustrate ths phenomenon and the CNLS estmator, we generated 100 observatons of a sngle-nput sngle-output equaton, y = x v. The observatons, x, were randomly sampled from a Unform [1, 10] dstrbuton and v was drawn from a normal dstrbuton wth standard devaton of 0.7. Fg. 1 shows the obtaned CNLS estmator. Note that, n ths case, the CNLS curve s characterzed by seven unque hyperplanes (dashed lnes) and the other 93 hyperplanes estmated are redundant (that s, even though we are estmatng 100 hyperplanes, 93 of the estmated hyperplanes are dentcal to one of the 7 hyperplanes that form the lower concave envelope). 6 6 CNLS alone s used to estmate an average producton functon and s the focus of our paper. However, StoNED (an effcency analyss method) uses CNLS n the frst stage (Kuosmanen and Kortelanen, 2012) and the Jondrow decomposton (Jondrow et al., 1982) n the second stage. The Jondrow decomposton assumes homoskedastcty of both nose and neffcency, thus the fronter producton functon s smply a parallel shft of the functon estmated consderng only nose. The shape of producton functon s unchanged, and thus does not affect the computatonal complexty. Fg. 1. The CNLS estmate, ncludes only seven unque hyperplanes (dashed lnes) Afrat s theorem The method to mpose concavty n CNLS s based on Afrat s theorem (Afrat, 1967; Afrat, 1972), whch s a fundamental result n mcroeconomc theory (e.g., Varan, 1982, Varan, 1984). Afrat s theorem can be used for two purposes: (1) nonparametrcally testng f a gven set of data satsfes the regularty condtons (concavty) mpled by the economc theory (Varan, 1984). If ths s ndeed the case, Afrat s numbers (defned n Theorem 1) can be used for constructng nner and outer bounds for the possble functons f that can descrbe the data. Or alternatvely (2) Afrat s nequaltes (defned n Theorem 1) have been used n the context of nonparametrc regresson to enforce global curvature condtons (Kuosmanen and Kortelanen, 2012). There are many potental applcatons n areas such as demand analyss, producton analyss, and fnance. Kuosmanen (2008) transformed the nfnte dmensonal CNLS problem to the fnte dmensonal QP problem usng Afrat s nequaltes defned n Afrat s theorem: Theorem 1. (Afrat s Theorem 7 ):If n s the number of observatons and m s the number of nputs, the followng condtons are equvalent: () There exsts a contnuous globally concave functon f: R m? R that satsfes y = f(x ) n a fnte number of ponts =1,...,n. () There exst fnte coeffcents (from now Afrat s numbers) a,b = (b 1...b m ) 0 such that y ¼ a þ b 0 x for ¼ 1;...; n, that satsfy the followng system of nequaltes (henceforth Afrat s nequaltes): a þ b 0 x 6 a h þ b 0 h x for ; h ¼ 1;...; n and h: The above statement of Afrat s theorem refers to f as a classc concave producton functon, but other applcatons (e.g., utlty functons, cost/expendture functons, dstance functons, etc.) are equally possble. The followng propertes are derved from Afrat s theorem: (1) Instead of concavty, convexty s easly mplemented by reversng the sgn of nequaltes n condton above. (2) Strct concavty (convexty) s obtaned by usng strct nequaltes n condton above. (3) Monotoncty can be mposed ndependently by nsertng further constrants b P 0," (ncreasng) or b 6 0, " (decreasng). 7 There are alternatve equvalent statements of Afrat s theorem, see e.g., Afrat (1972) or Varan (1982) or Fostel et al. (2004); however we follow Kuosmanen (2008). Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

4 4 C.-Y. Lee et al. / European Journal of Operatonal Research xxx (2013) xxx xxx In the context of nonparametrc regresson, n should be generally large to accurately estmate a functon n hgh dmensons; ths s often referred to as the curse of dmensonalty (Yatchew, 1998). The system of Afrat s nequaltes, presented n condton above, nvolves n(m + 1) unknown varables and n(n 1) nequalty constrants, where the number of observatons, n, s usually much larger than the number of nputs, m. When the data contan a large number of observatons, mposng Afrat s nequaltes can become computatonally demandng. For example when n = 100, the number of nequaltes s A generc algorthm for CNLS model reducton 3.3. Solve RCNLS to obtan soluton a ðtþ1þ 3.4. Set t = t +1. ; b ðtþ1þ Proposton 1. The generc algorthm obtans an optmal soluton to CNLS (problem (1)). Proof. The result follows from the followng two observatons: (1) For any t, a t ; bt s an optmal soluton to a relaxaton of problem (1). (2) The termnaton condton for the generc algorthm (step 3) guarantees that, at termnaton, a t ; bt s a feasble soluton of problem (1) Approaches to determne the set of ntal constrants : Ths secton develops a generc algorthm based on the semnal work of Dantzg et al. (1954, 1959) to address the computatonal burden of solvng CNLS. Specfcally, Dantzg et al. proposed the followng approach of solvng large-scale problems: Solve a relaxed model contanng only a subset of the constrants, and teratvely add volated constrants to the relaxed model untl an optmal soluton to the relaxed model s feasble for the orgnal problem. Recall that, gven n observatons, CNLS requres n (n 1) concavty constrants. If n s large, the number of concavty constrants s sgnfcant, because the performance of standard QP algorthms s lmted by the number of constrants. To address ths ssue we use Dantzg et al. s strategy: we start wth a set of nequaltes that are lkely to be satsfed at equalty n the optmal soluton, and teratvely add volated constrants to the relaxed model untl the optmal soluton to the relaxed model s feasble for the CNLS problem. Hereafter we defne relevant constrants as the set of nequaltes from problem (1) that are satsfed at equalty by the optmal soluton to problem (2). 8 The generc algorthm terates between two operatons: (A) solvng model (1) but ncludng only a subset V of the constrants n (1b), and (B) verfyng whether the obtaned soluton satsfes all of the constrants n (1b); f t does then the algorthm termnates, otherwse V s appended wth some of the volated constrants and the process s restarted. Secton 3.1 gves two strateges to dentfy an ntal subset of constrants that ncludes a large proporton of the relevant constrants wth a relatvely hgh level of accuracy. Secton 3.2 descrbes three strateges to determne the subset of volated constrants to be added at each teraton. In order to gve an algorthmc descrpton of the generc algorthm, we use the followng formulaton, whch we refer to as the relaxed CNLS problem (RCNLS). mn a;b;e X n e 2 ¼1 s:t: y ¼ a þ b 0 x þ e for ¼ 1;...; n ðaþ a þ b 0 x 6 a h þ b 0 h x 8ð; hþ 2V ðbþ b P 0 for ¼ 1;...; n; ðcþ Here V s a subset of all the observaton pars; thus the concavty constrants (3b) are a subset of all the concavty constrants (1b). Generc Algorthm 1. Let t = 0 and let V be a subset of the observaton pars. 2. Solve RCNLS to fnd an ntal soluton, a ð0þ ; b ð0þ : 3. Do untl a ðtþ ; b ðtþ satsfes all concavty constrants (Eqs. (1b)): 3.1. Select a subset of the concavty constrants that a ðtþ ; b ðtþ volates and let V (t) be the correspondng observaton pars Set V = V [ V (t). 8 As noted n Secton 2.2, ths soluton s also an optmal soluton to problem (1), and has the advantage of beng unque (thus the relevant constrants are welldefned). ð3þ Crtcal to the generc algorthm s performance s the dentfcaton of a set of ntal concavty constrants that ncludes a large proporton of the relevant constrants. Ths secton descrbes two methods for constructng such a set of ntal constrants Elementary Afrat approach For ntuton, let us start from a unvarate case m = 1. The number of unknowns s only 2n, but we stll have n(n 1) nequalty constrants. It s possble to reduce the number of nequaltes by sortng the observed data n ascendng order accordng to x. Wthout loss of generalty, assume the data have been sorted as x 1 6 x x n. In ths case, t s easy to show the followng: Elementary Afrat s theorem unvarate case (Hanson and Pledger, 1976): The followng condtons are equvalent: () There exsts a contnuous globally concave functon f: R? R that satsfes y = f(x ) n a fnte number of ponts =1,...,n. () There exst fnte coeffcents a, b : y = a + b x " =1,...,n, that satsfy the followng system of nequaltes (orgnal Afrat s nequaltes): a þ b x 6 a h þ b x 8; h ¼ 1;...; n and h: () There exst fnte coeffcents a, b : y = a + b x " =1,...,n, that satsfy the followng system of nequaltes (elementary Afrat s nequaltes): 9 b 6 b 1 8 ¼ 2;...; n a P a 1 8 ¼ 2;...; n Condton ) nvolves n (n 1) constrants, whereas condton ) requres only 2(n 1) constrants. In the case of n = 100, the orgnal condtons requre 9900 nequaltes, whereas our elementary condton requres only 198 nequaltes. Thus, a substantal decrease n the number of nequaltes s possble by usng the pror rankng of the observed data and the transtvty of nequalty relatons. Moreover, note that mposng monotoncty n the sngle nput case, condton ) requres only a sngle constrant b n P 0, whereas mposng monotoncty n the general case, condton ) requres n m constrants b P 0 " =1,...,n. The elementary Afrat s theorem motvates the followng method for generatng an ntal set of constrants when the producton functon beng estmated has multple nputs. 10 Arbtrarly pck one of the nputs (say, varable k) and ndex the observed data n ascendng order accordng to the selected nput (.e., such that x 6 k x +1, where the nequalty compares only the kth entry of the nput vector, but the entre nput matrx s sorted). Then, let the ntal set of 9 In the case of tes (.e., when x = x 1 ), the nequaltes should be changed to equaltes. 10 Alternatve methods such as kernel based methods or prncple component analyss could also be used; however, we suggest the elementary Afrat approach because nputs are often hghly correlated. If the nputs vectors are perfectly correlated, the elementary Afrat constrants would be suffcent to mpose concavty. Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

5 C.-Y. Lee et al. / European Journal of Operatonal Research xxx (2013) xxx xxx Pars of Observatons Dstance Hstogram of the dstances between all pars related to one partcular observaton that correspond to relevant concavty constrants Hstogram of the dstances between all pars related to one partcular observaton Fg. 2. The concavty constrants correspondng to nearby observatons are sgnfcantly more lkely to be relevant than those correspondng to dstant observatons. observaton pars n V to be {(1,2), (2,3),...,( 1,), (, + 1),...,(n 1,n)}; thus, the explct formulaton for ntal relaxed CNLS problem s as follows: 8 >< mn a;b;e >: X n e 2 ¼1 y ¼ a þ b 0 x 9 þ e 8 ¼ 1;...; n; >= a þ b 0 x 6 a þ1 þ b 0 þ1 x 8 ¼ 1;...; n 1; >; b P 0 8 ¼ 1;...; n; In the Monte Carlo smulatons descrbed n Secton 4, we refer to the use of ths strategy for determnng the set of ntal constrants as CNLS r. 11 Ths study uses the Eucldean norm measured n the M+1 dmensonal space of nputs and output to measure the dstance between two observatons. Alternatvely the dstance could be measure n an M dmensonal space, however, expermental results ndcated ths dd not have a sgnfcant effect n the computatonal tme. 12 The 3rd percentle worked well n our Monte Carlo smulatons. In the emprcal example gven n Secton 5, we test several dfferent percentles and show that ndeed the 3rd percentle works well. ð4þ Sweet spot approach The sweet spot approach ams to predct the relevant concavty constrants and uses these as the ntal set of constrants. Ths approach s mplemented as: for each observaton, nclude the concavty constrants correspondng to the observatons whose dstance 11 to observaton s less than a pre-specfed threshold value d (dstance percentle parameter). The range between the zeroth percentle and the d th percentle s defned as the sweet spot. Emprcally, we found that an effectve value for d s the 3rd percentle of the dstances from all observatons to observaton. 12 If both elementary Afrat approach and sweet spot approach are appled to dentfy ntal set of constrants (sweet spot constrants), then we wll refer to the approach as CNLS +. The remnder of ths secton motvates CNLS +. As prevously mentoned (and llustrated n Fg. 1), Kuosmanen and Johnson, 2010 showed that the number of unque hyperplanes to construct a CNLS producton functon s generally much lower than n. From Eq. (1b), observe that, n the optmal soluton of CNLS (problem 1), the concavty constrants that are satsfed at equalty correspond to pars of observatons that share a hyperplane n the CNLS functon. Therefore only a small number of the concavty constrants are relevant. Moreover, t s reasonable to assume that observatons that are close to each other are more lkely to correspond to relevant concavty constrants than those that are far apart. The followng smulaton further motvates the dea that the relevant concavty constrants correspond to pars of nearby observatons. We generated 300 observatons of a two-nput sngle-output equaton, y ¼ x 04 1 x04 2 þ v. The observatons, x 1, x 2, were randomly sampled from a Unform [1, 10] dstrbuton and v was drawn from a normal dstrbuton wth standard devaton of 0.7. Then we solved the CNLS problem (snce t s not possble to drectly solve problem 1 wth more than 200 observatons, we solved t usng one of the algorthms heren proposed and based on Theorem 1 the results are equvalent) and dentfed the relevant constrants. Fg. 2 shows a hstogram of the dstances between all pars related to one partcular observaton (black) and the hstogram of the dstances between all pars related to one partcular observaton that correspond to relevant concavty constrants (whte). One can observe that ndeed, as prevously argued, the concavty constrants correspondng to nearby observatons are sgnfcantly more lkely to be relevant than those correspondng to dstant observatons Strateges for selectng from the set of volated concavty constrants We propose three strateges to select the volated (concavty) constrants (VCs) that are added n each teraton of the generc algorthm. The frst strategy, referred to as one-vc-added CNLS (CNLS-O), s to select the most volated constrant from all concavty constrants (See table 1 for the defnton of most volated). Ths strategy, n each teraton, adds at most one volated constrant to the set V. The second strategy, referred to as group-vc-added CNLS (CNLS-G), s to select, for each observaton, the most volated constrant among the n 1 concavty constrants related to observaton. Ths strategy, n each teraton, adds at most n 1 volated constrants to the set V. The last strategy, referred to as all- VC-added (CNLS-A), s to select all the volated constrants. Ths strategy adds at most (n 1) 2 volated constrants to the set V. Table 1 shows the summary of these three strateges and provdes Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

6 6 C.-Y. Lee et al. / European Journal of Operatonal Research xxx (2013) xxx xxx Table 1 Strateges for selectng from the set of volated concavty constrants. Strategy Formula to dentfyng the volated constrants to add Upper bound on the # of VC added per teraton n o CNLS-O (one-vc-added) max a t þ bt0 x a t h þ bt0 h x 1 > 0 ;h CNLS-G (group-vc-added) n o max h a t þ bt0 x a t h þ bt0 h x > 0 8 n CNLS-A (all-vc-added) a t þ bt0 x a t h þ bt0 h x > 0 8; h (n 1) 2 Table 2 CNLS + -G sgnfcantly outperforms the other methods for two dmensonal problems wth at least 100 observatons. Number of observatons Average run tme measured n seconds (standard devaton) CNLS CNLS r -O CNLS r -G CNLS r -A CNLS + -O CNLS + -G CNLS + -A (0.02) 24.8 (1.81) 4.2 (0.26) 1.4 (0.09) 19.5 (2.22) 3.7 (0.65) 1.1 (0.17) (0.61) 70.4 (2.69) 9.4 (1.38) 4.0 (0.80) 44.3 (5.72) 6.4 (0.90) 2.6 (0.73) (40.93) (10.04) 83.0 (12.32) 63.9 (13.32) (9.40) 24.0 (5.77) 28.6 (11.81) (600.93) (50.69) (135.27) (66.61) (23.69) 30.6 (6.55) 98.0 (40.72) 200 N/A (118.36) 1424 (180.32) (371.21) (37.67) 47.2 (12.12) (22.31) 250 N/A 1779 (188.44) 2555 (861.06) 1344 (413.18) (50.88) 62.8 (14.20) (84.82) 300 N/A 3144 (445.20) 8084 ( ) 1833 (585.07) (95.83) 93.4 (18.45) (204.05) N/A: system out of memory. the formulas to dentfy and quantfy the volated constrants (VC) to add. 4. Monte Carlo smulatons Ths secton descrbes four smulaton studes analyzng the performance of the varants of the generc algorthm and comparng them to drectly solvng CNLS (1). These experments were performed on a personal computer (PC) wth an Intel Core 7 CPU 1.60 GHz and 8 GB RAM. The optmzaton problems were solved n GAMS 23.3 usng the CPLEX 12.0 QCP (Quadratcally Constraned Program) solver. The sx varants analyzed are all the possble combnatons between determnng the ntal constrant set (CNLS r and CNLS + ) and selectng the volated constrants to add (CNLS-O, CNLS-G and CNLS-A). The frst smulaton study nvestgates the performance of the algorthms as a functon of the number of observatons. The second smulaton study nvestgates the performance of the algorthms as a functon of the number of nputs. The thrd smulaton compares the algorthms by smultaneously varyng the number of nputs and the number of observatons. Fnally, the fourth smulaton study ams to determne the largest problem szes that can be solved wth the varants that were found to be the most effectve n the other three studes. The frst study assumed a two-nput, one-output producton functon, f ðxþ ¼x 0:4, and the correspondng regresson equaton, y ¼ x 0:4 1 x0:4 1 x0:4 2 2 þ v, where the observatons x 1 and x 2 were randomly sampled from a Unform [1, 10] dstrbuton and v was drawn from a normal dstrbuton wth standard devaton of 0.7. Ths study smulates seven scenaros, each wth dfferent number of observatons. Each of the rows n Table 2 correspond to one scenaro and gve the average tme, n seconds and standard devaton (shown Fg. 3. CNLS + -G has the least runnng tme for problems wth at least 100 observatons. Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

7 C.-Y. Lee et al. / European Journal of Operatonal Research xxx (2013) xxx xxx 7 Fg. 4. CNLS + -G adds almost the same the number of constrants as CNLS r -O and CNLS + -O (whch add the fewest constrants). Table 3 CNLS + -G outperforms the other methods for 100 observatons n problems wth 2 7 nputs. #Of Average runnng tme (seconds) nputs CNLS CNLS r -O CNLS r -G CNLS r -A CNLS + -O CNLS + -G CNLS + -A n parenthess) that each algorthm requred to solve the problem. The averages and standard devatons were obtaned by smulatng each scenaro 10 tmes. Table 2 shows that CNLS + -G sgnfcantly outperforms the other methods for two dmensonal problems wth at least 100 observatons. An extended verson of Table 2, Table 9, s shown n the onlne appendx. Table 9 shows that: (1) CNLS r -O adds the least number of constrants to construct the CNLS producton functon but requres the most teratons whch leads to longer computaton tmes; (2) conversely, CNLS r -A requres the least number of teratons, but adds the most constrants; (3) the orgnal CNLS model generates 90,000 constrants for a problem wth 300 observatons, but actually, on average, n such model only 1238 constrants are relevant (ncludng 938 concavty constrants). In the two nput case Fgs. 3 and 4 llustrate the average runnng tme and average constrants needed, respectvely, for each of the seven strateges CNLS, CNLS r -O, CNLS r -G, CNLS r -A, CNLS + -O, CNLS + -G and CNLS + -A whle ncreasng the number of observatons. The second study assumed an M-nput one-output producton functon, y ¼ Q M m¼1 x ð 0:8 M Þ m þ v, where the observatons x m were randomly sampled form a Unform[1,10] dstrbuton and v was a drawn from a normal dstrbuton wth standard devaton of 0.7. Ths study smulates seven scenaros, each wth a dfferent number of nputs. Each of the rows n Table 3 correspond to one scenaro and gve the average tme (n seconds) that each varant requred to solve the problem. The averages were obtaned by smulatng each scenaro 10 tmes. Table 3 shows that CNLS + -G outperforms all other varants. Only n the 8-nput scenaro a varant, CNLS r -G, s slghtly better than CNLS + -G. Recall that, n ths scenaro, CNLS s estmatng an eght-dmensonal functon therefore, n the context of nonparametrc regresson and ts curse of dmensonalty, 100 observatons s, f at all, barely enough to obtan meanngful results. Indeed, from Table 3, one can observe that, for a fxed number of observatons and as the number of nputs ncrease, the performance of CNLS r -G mproves wth respect to the performance of CNLS + -G. However, we consder that, for practcal purposes, ths mprovement s not relevant because, to obtan a meanngful nonparametrc regresson curve, the number of observatons should grow exponentally as the number of dmensons ncrease (see Yatchew (1998), curse of dmensonalty). Table 10, found n the onlne appendx, s an extended verson of Table 3. From Table 10 one can observe the followng nterestng phenomenon: For a fxed number of observatons, as the number of nputs ncrease, the number of relevant concavty constrants decreases slghtly. Nevertheless, ths reducton s mnuscule so, for practcal purposes, we conclude that the dmensonalty of the problem has lttle mpact on the number of hyperplanes requred. The thrd study assumed the same producton functon as the second study. The observatons and nose were also sampled from the same dstrbutons used n the second study. In ths study the number of nputs s vared from two to eght and the number of observatons s one of {25, 50, 100, 200, 300, 400, 500, 600, 700}. Thus the thrd study conssts of 63 scenaros and, as before, each scenaro was smulated 10 tmes to obtan the average performance of each algorthm. Table 4 gves, for each scenaro, the best strategy n terms of average soluton tme. Note that CNLS and CNLS + -A are the best strateges when the number of observatons s small (n fact, too small to be useful n practce) whle CNLS + -G and CNLS r -G are the best strateges when the number of observatons s large (practcal-szed problems). For hgh dmensonal models, when the number of observatons ranges from 100 to 400, CNLS r -G domnates CNLS + -G. Ths s because CNLS + -G takes more teratons to dentfy the volated concavty constrants then CNLS r -G even though CNLS + -G uses fewer constrants on average. Also note the CNLS + -G method average tme to solve a 500 observaton formulaton s less than a 400 observaton formulaton. A 500 observaton formulaton adds more concavty constrants ntally based on the dstance crtera and uses fewer teratons to reach optmal soluton leadng to a shorter run tme. In general, the CNLS + -G s suggested n large scale or hgh dmensonalty scenaros. Revewng the results n Tables 3 and 4, the run tme for the entre proposed algorthm varant s reported. Consder the example wth 300 observatons, 2 nputs and solved usng varant Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

8 8 C.-Y. Lee et al. / European Journal of Operatonal Research xxx (2013) xxx xxx Table 4 In most scenaros CNLS+-G s the best strategy to solve problem (1). Obs. Performance Number of nputs Best strategy CNLS CNLS CNLS CNLS CNLS CNLS CNLS Avg. constr Avg. tme (seconds) Best strategy CNLS + -A CNLS + -A CNLS + -A CNLS + -A CNLS + -A CNLS + -A CNLS + -A Avg. constr Avg. ter Avg. tme (seconds) Best strategy CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS r -G Avg. constr Avg. ter Avg. tme (seconds) Best strategy CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS r -G CNLS r -G CNLS r -G Avg. constr Avg. ter Avg. tme (seconds) Best strategy CNLS + -G CNLS + -G CNLS + -G CNLS r -G CNLS r -G CNLS r -G CNLS r -G Avg. constr ,725 10, Avg. ter Avg. tme (seconds) Best strategy CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS r -G CNLS r -G CNLS r -G Avg. constr. 10,015 10,106 10,117 10,491 18,095 14,336 11,262 Avg. ter Avg. tme (seconds) Best strategy CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G Avg. constr. 11,232 11,692 12,153 12,689 12,078 11,186 11,639 Avg. ter Avg. tme (seconds) Best strategy CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G Avg. constr. 14,699 15,022 15,339 15,454 15,669 14,948 14,231 Avg. ter Avg. tme (seconds) Best strategy CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G CNLS + -G Avg. constr. 18,684 18,948 19,384 19,660 19,934 19,423 19,309 Avg. ter Avg. tme (seconds) Table 5 Wthn a 5 hours lmt, CNLS + -G can solve problems wth a larger number of observatons than CNLS r -G. Proposed models Number of nputs CNLS r -G CNLS + -G Table 6 Descrptve statstcs. All dstrcts (N = 604) Mean Std. dev. Mean Std. dev. Performance score (Y) Admn. expendture per pupl (X 1 ) Buldng operaton exp. per pupl (X 2 ) Instructonal exp. per pupl (X 3 ) Pupl support exp. per pupl (X 4 ) Table 7 Runnng tme (seconds) n educatonal data set. Year CNLS CNLS r -O CNLS r -G CNLS r -A CNLS + -O CNLS + -G CNLS + -A N/A 11, N/A N/A N/A N/A N/A N/A: system out of memory. Fg. 5. The fronter of nput space by admnstraton expendture per pupl (X 1 ), nstructonal expendture per pupl (X 3 ), and pupl support expendture per pupl (X 4 ) gven fxed performance score and buldng operaton expendture per pupl on average n CNLS r -G, to nvestgate the run tme of each step of the algorthm, we fnd the ntal soluton, step 2, ncludes 600 constrants and takes less than 1 second; steps 3.1 and 3.2 are also very fast, takng less than 1 second because they are smple calculatons; and step 3.3 presents a sgnfcant burden because the run tme ncreases wth the number of constrants. The 1st teraton ncludng 900 constrants takes 1 second and the 80th teraton ncludng around 22,000 constrants takes 14 mnutes. These computatonal tmes are representatve of the general results of our Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

9 C.-Y. Lee et al. / European Journal of Operatonal Research xxx (2013) xxx xxx 9 Table 8 More than 26% of the relevant constrants are found n the sweet spot. Year Dstance percentle 3rd 6th 9th 3rd 6th 9th Benchmarks Total number of CNLS constrants (A) 364, , , , , ,816 Number of relevant constrants (B) Performance of the CNLS + -G Algorthm CNLS + -G runnng tme (sec.) Number of Constrants Total number of constrants n CNLS + -G, (C)=(D)+(E)+(F)+(G) 13,285 23,554 33,823 21,785 23,553 34,422 Number of lnear regresson constrants a (D) Number of orderng constrants b 2 V (E) Number of sweet spot constrants c 2 V (F) 10,872 21,744 32,616 17,587 21,744 32,616 Number of VC d added, V (G) Number of relevant constrants Total number of relevant constrants found n CNLS + -G, (H)=(I)+(J)+(K)+(L)=(B) Number of relevant constrants found n lnear regresson, (I) Number of relevant constrants found n orderng constrants, (J) Number of relevant constrants found n the sweet spot (K) Number of relevant constrants found n the VC added (L) Ratos assessng the effectveness of the CNLS + -G Algorthm Percentage of constrant reducton (1-C/A) (%) Percentage of relevant constrants to CNLS constrants (B/A) (%) Percentage of relevant constrants to CNLS + -G constrants (B/C) (%) Percentage of sweet spot constrants that are relevant constrants (K/F) (%) Percentage of relevant constrants n the sweet spot to all relevant constrants (K/H) (%) Percentage of VC added that are relevant constrants (L/G) (%) N/A a These are shown n (1a). b These are the second type of constrants shown n Eq. (4). c These are all the second type of constrants shown n Eq. (3), excludng the orderng constrants. d These are the thrd type of constrants shown n Eq. (3). experments wth alternatve varants of the proposed algorthm and varyng the sze of the problem nstance n terms of observatons and nputs. The fnal smulaton study ams to determne the lmt on the problem szes (n terms of number of observatons) that can be solved wthn 5 hours wth the CNLS r -G and CNLS + -G varants. The results, presented n Table 5, ndcate that CNLS + -G can solve problems wth a larger number of observatons than CNLS r -G. 5. Emprcal study 13 A varety of computatonal effcent algorthms exst for DEA; for a partcular fast example see Dulá (2011). Ths secton demonstrates the computatonal benefts of the generc algorthm, and n partcular the benefts of the CNLS + -G varant. Ths varant was appled to an emprcal study about State of Oho kndergarten through twelfth grade schools for the and the school years. The dataset n ths study s thoroughly descrbed n Johnson and Ruggero (n press). There are four classes of expendtures per pupl as nputs: admnstratve (X 1 ), buldng operaton (X 2 ), nstructonal (X 3 ), and pupl support (X 4 ). The nput prce of each expendture s deflated by an ndex of frst-year teacher salares and s measured on a per student bass. The output s an ndex of student performance (Y) developed by the State of Oho. Ths ndex aggregates the measure of 30 statewde outcome goals ncludng standardzed tests n an overall measure of performance. Descrptve statstcs of 604 observatons are reported n Table 6. Prevously an analyst would have to use other producton functon estmaton methods such as Data Envelopment Analyss (DEA) 13 or Stochastc Fronter Analyss (SFA) wth ther modelng lmtatons, (Kuosmanen and Kortelanen, 2012), because a CNLS model was computatonal nfeasble. To llustrate the computatonal benefts of the proposed methods, the runnng tme of and wth respect to proposed models are shown n Table 7. The standard CNLS formulaton (problem 1) cannot be drectly solved due to out-of-memory errors. The CNLS + -G algorthm performs well n both cases. Fg. 5 shows the fronter of nput space spanned by factors admnstraton expendture per pupl (X 1 ), nstructonal expendture per pupl (X 3 ), and pupl support expendture per pupl (X 4 ) gven performance score and buldng operaton expendture per pupl are fxed at ther averages for the data The fgure llustrates the substtutablty among nput factors. Recall that n the sweet spot approach, used n CNLS + -G, the ntal set of constrants, V, n RCNLS s bult as follows: For each observaton, nclude the concavty constrants correspondng to the observatons whose dstance to observaton h s less than a pre-specfed threshold value d (dstance percentle parameter). Also recall that the range between the zeroth percentle and the d th percentle s defned as the Sweet Spot. Throughout the paper, the threshold value n CNLS + -G s set to the 3rd percentle, because ths threshold was found to work well for our Monte Carlo smulatons. Here we nvestgate the effects of the dstance percentle parameter n the emprcal study. For ths purpose Table 8 shows a senstvty analyss usng 3rd, 6th, and 9th percentle respectvely on the data for both perods. We make the followng observatons: 1. Typcally a hgher percentle wll result n a longer runnng tme because more constrants are added ntally. 2. CNLS + -G reduced the number of constrants by more than 90% usng any percentle. 3. The percentage of relevant constrants ncluded n the CNLS formulaton s low, %, n contrast, the percentage of relevant constrants ncluded n CNLS + -G s relatvely hgh, %. That s, the rato of relevant constrants to all the constrants ncluded throughout the executon of the algorthm s tmes greater n CNLS + -G than n CNLS. 4. The percentage of sweet spot constrants that are relevant constrants ncreases as the percentle used to defne the sweet spot ncreases. When the 9th percentle s used, more than 46.9% of Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

10 10 C.-Y. Lee et al. / European Journal of Operatonal Research xxx (2013) xxx xxx all the relevant constrants needed are found wthn the sweet spot, thus the method for defnng the sweet spot s effectve at dentfyng relevant constrants. However, as the percentle ncreases, the rato of sweet spot constrants to relevant constrants decreases; thus we get an effect of dmnshng returns. 5. Our strategy for selectng volated constrants s reasonable effcent as % of added volated constrants (VC) are relevant constrants. 6. The results of our Monte Carlo smulaton valdate our proposal of usng the 3rd percentle because of the sgnfcant benefts n terms of runnng tme. Ths n part can be attrbuted to the hgher percentage of CNLS + -G constrants that are relevant constrants (17.7% and 8.0%). 6. Concluson Ths study proposes a generc algorthm to reduce the tme to solve the CNLS problem. Ths algorthm s necessary because current methods are very slow n the case of small sample szes ( observatons) and, n our experence ntractable for large sample szes (>300). The underlyng prncples of ths strategy are: (1) usng a dstance analyss to determne a set of ntal constrants that are lkely to be satsfed at equalty n the optmal soluton and (2) effectvely dentfyng volated constrants whch are teratvely added to the model. A partcular varant of the generc algorthm, the CNLS + -G varant, was determned to be the best algorthm by an extensve smulaton study. CNLS + -G was successfully appled to a real-lfe emprcal study for whch estmatng CNLS was prevously mpossble usng CPLEX and reasonable computatonal power. The dstance analyss allows 25 75% of the relevant constrants to be dentfed ntally. Although CNLS + -G requres solvng multple quadratc programmng problems, the largest nstance that needs to be solved s at least 90% smaller n terms of the number of constrants requred compared to the orgnal CNLS formulaton. The generc algorthm to solve the CNLS problem s based on the strategy of Dantzg et al. (1954, 1959) to solve large scale problems by teratvely dentfyng and addng volated constrants. Most studes that apply Dantzg et al. s strategy consder NP-hard problems. In contrast, we demonstrate that Dantzg et al. s strategy s also valuable to solve problems that are theoretcally tractable (.e., n P), but that n practce were prevously not solvable due to ther large scale. Appendx A. Supplementary materal Supplementary materal assocated wth ths artcle can be found, n the onlne verson, at j.ejor References Afrat, S.N., The constructon of a utlty functon from expendture data. Internatonal Economc Revew 8, Afrat, S., Effcency estmaton of producton functons. Internatonal Economc Revew 13, Agulera, N., Forzan L., Morn, P., On unform consstent estmators for convex regresson. Journal of Nonparametrc Statstcs 23 (4), Banker, R.D., Charnes, A., Cooper, W.W., Some models for estmatng techncal and scale neffcences n data envelopment analyss. Management Scence 30 (9), Boyd, G., Fare, R., Measurng the effcency of decson makng unts: a comment. European Journal of Operatonal Research 15, Charnes, A., Cooper, W.W., The non-archmedean CCR rato for effcency analyss: a rejonder to boyd and fare. European Journal of Operatonal Research 15, Dantzg, G.B., Fulkerson, D.R., Johnson, S.M., Soluton of a large-scale travelng salesman problem. Operatons Research 2, Dantzg, G.B., Fulkerson, D.R., Johnson, S.M., On a lnear-programmng combnatoral approach to the travelng-salesman problem. Operatons Research 7, Du, P., Parmeter, C.F., Racne, J.S., n press. Nonparametrc Kernel Regresson wth Multple Predctors and Multple Shape Constrants, Statstca Snca. Dulá, J.H., An algorthm for data envelopment analyss. INFORMS Journal on Computng 23 (2), Dykstra, R.L., An algorthm for restrcted least squares regresson. Journal of the Amercan Statstcal Assocaton 78 (384), Fostel, A., Scarf, H.E., Todd, M.J., Two new proofs of Afrat s theorem. Economc Theory 24 (1), Fraser, D.A.S., Massam, H., A mxed prmal-dual bases algorthm for regresson under nequalty constrants: applcaton to concave regresson. Scandnavan Journal of Statstcs 16, Fred, H.O., Lovell, C.A.K., Schmdt, S.S., The Measurement of Productve Effcency and Productvty Growth. Oxford Unversty Press, New York. Goldman, S.M., Ruud, P.A., Nonparametrc Multvarate Regresson Subject to Constrant. Techncal Report , Department of Economcs, Unversty of Calforna at Berkeley. Groeneboom, P., Jongbloed, G., Wellner, J.A., Estmaton of convex functons: characterzatons and asymptotc theory. Annals of Statstcs 26, Hannah, L.A., Dunson, D.B., Ensemble methods for convex regresson wth applcatons to geometrc programmng based crcut desgn. In: Proceedngs of the 29th Internatonal Conference on Machne Learnng, ICML, vol. 1, pp Hanson, D.L., Pledger, G., Consstency n concave regresson. Annals of Statstcs 4 (6), Hennngsen, A., Hennng, C., Imposng regonal monotoncty on translog stochastc producton fronters wth a smple three-step procedure. Journal of Productvty Analyss 32 (3), Hldreth, C., Pont estmates of ordnates of concave functons. Journal of the Amercan Statstcal Assocaton 49 (267), Johnson, A.L., Kuosmanen, T., One-stage estmaton of the effects of operatonal condtons and practces on productve performance: asymptotcally normal and effcent, root-n consstent StoNEZD method. Journal of Productvty Analyss 36 (2), Johnson, A.L., T. Kuosmanen, One-Stage and two-stage DEA estmaton of the effects of contextual varables. European Journal of Operatonal Research 220 (2), Johnson, A.L., Ruggero, J., n press. Nonparametrc measurement of productvty and effcency n educaton. Annals of Operatons Research. Jondrow, J., Lovell, C.A.K., Materov, I.S., Schmdt, P., On the estmaton of techncal neffcency n the stochastc fronter producton functon model. Journal of Econometrcs 19 (2-3), Just, R.E., Pope, R.D., Stochastc specfcaton of producton functon and economc mplcatons. Journal of Econometrcs 7 (1), Kuosmanen, T., Representaton theorem for convex nonparametrc least squares. Econometrcs Journal 11, Kuosmanen, T., Kuosmanen, N., Role of Benchmark Technology n Sustanable Value Analyss: an Applcaton to Fnnsh Dary Farms. Agrcultural and Food Scence 18 (3-4), Kuosmanen, T., Stochastc sem-nonparametrc fronter estmaton of electrcty dstrbuton networks: applcaton of the StoNED method n the fnnsh regulatory model. Energy Economcs 34, Kuosmanen, T., Johnson, A.L., Data envelopment analyss as nonparametrc least-squares regresson. Operatons Research 58 (1), Kuosmanen, T., Kortelanen, M., Stochastc non-smooth envelopment of data: sem-parametrc fronter estmaton subject to shape constrants. Journal of Productvty Analyss 38 (1), Lm, E., Glynn, P.W., Consstency of multdmensonal convex regresson. Operatons Research 60 (1), Mammen, E., Nonparametrc regresson under qualtatve smoothness assumptons. Annuals of Statstcs 19, Mammen, E., Thomas-Agnan, C., Smoothng splnes and shape restrctons. Scandnavan Journal of Statstcs 26, Mekaroonreung, M., Johnson, A.L., Estmatng the Shadow Prces of SO2 and NOx for U. S. Coal Power Plants: a convex nonparametrc least squares approach. Energy Economcs 34 (3), Meyer, M.C., An extenson of the mxed prmal-dual bases algorthm to the case of more constrants than dmensons. Journal of Statstcal Plannng and Inference 81, Pshenchny, B.N., Danln, M.Y., Numercal Methods n Extremal Problems. Mr Publshers, Moscow. Ruud, P.A., Restrcted Least Squares Subject to Monotoncty and Concavty Constrants. Workng Paper, Unversty of Calforna at Berkeley. Sejo, E., Sen, B., Nonparametrc least squares estmaton of a multvarate convex regresson. Annals Statstcs 39, Varan, H., The nonparametrc approach to demand analyss. Econometrca 50, Varan, H., The nonparametrc approach to producton analyss. Econometrca 52, Wlhelmsen, D.R., A nearest pont algorthm for convex polyhedral cones and applcatons to postve lnear approxmaton. Mathematcs of Computaton 30 (133), Wu, C.F., Some algorthms for concave and sotonc regresson. TIMS Studes n Management Scence 19, Yatchew, A., Nonparametrc regresson technques n economcs. Journal of Economc Lterature 36 (2), Please cte ths artcle n press as: Lee, C.-Y., et al. A more effcent algorthm for Convex Nonparametrc Least Squares. European Journal of Operatonal Research (2013),

Representation Theorem for Convex Nonparametric Least Squares. Timo Kuosmanen

Representation Theorem for Convex Nonparametric Least Squares. Timo Kuosmanen Representaton Theorem or Convex Nonparametrc Least Squares Tmo Kuosmanen 4th Nordc Econometrc Meetng, Tartu, Estona, 4-6 May 007 Motvaton Inerences oten depend crtcally upon the algebrac orm chosen. It