Stochatic Optimization with Inequality Contraint Uing Simultaneou Perturbation and Penalty Function I-Jeng Wang* and Jame C. Spall** The John Hopkin Univerity Applied Phyic Laboratory 11100 John Hopkin Road Laurel, MD 20723-6099, USA Abtract We preent a tochatic approximation algorithm baed on penalty function method and a imultaneou perturbation gradient etimate for olving tochatic optimization problem with general inequality contraint. We preent a general convergence reult that applie to a cla of penalty function including the quadratic penalty function, the augmented Lagrangian, and the abolute penalty function. We alo etablih an aymptotic normality reult for the algorithm with mooth penalty function under minor aumption. Numerical reult are given to compare the performance of the propoed algorithm with different penalty function. I. INTRODUCTION In thi paper, we conider a contrained tochatic optimization problem for which only noiy meaurement of the cot function are available. More pecifically, we are aimed to olve the following optimization problem: minl(θ), (1) θ G where L: R d R i a real-valued cot function, θ R d i the parameter vector, and G R d i the contraint et. We alo aume that the gradient of L( ) exit and i denoted by g( ). We aume that there exit a unique olution θ for the contrained optimization problem defined by (1). We conider the ituation where no explicit cloed-form expreion of the function L i available (or i very complicated even if available), and the only information are noiy meaurement of L at pecified value of the parameter vector θ. Thi cenario arie naturally for imulation-baed optimization where the cot function L i defined a the expected value of a random cot aociated with the tochatic imulation of a complex ytem. We alo aume that ignificant cot (in term of time and/or computational cot) are involved in obtaining each meaurement (or ample) of L(θ). Thee contraint prevent u from etimating the gradient (or Heian) of L( ) accurately, hence prohibit the application of effective nonlinear programming technique for inequality contraint, for example, the equential quadratic programming method (ee; for example, ection 4.3 of [1]). Throughout the paper we ue θ n to denote the nth etimate of the olution θ. Thi work wa upported by the JHU/APL Independent Reearch and Development Program. *Phone: 240-228-6204; E-mail: i-jeng.wang@jhuapl.edu. **Phone: 240-228-4960; E-mail: jame.pall@jhuapl.edu. Several reult have been preented for contrained optimization in the tochatic domain. In the area of tochatic approximation (SA), mot of the available reult are baed on the imple idea of projecting the etimate θ n back to it nearet point in G whenever θ n lie outide the contraint et G. Thee projection-baed SA algorithm are typically of the following form: θ n+1 = π G [θ n a n ĝ n (θ n )], (2) where π G : R d G i the et projection operator, and ĝ n (θ n ) i an etimate of the gradient g(θ n ); ee, for example [2], [3], [5], [6]. The main difficulty for thi projection approach lie in the implementation (calculation) of the projection operator π G. Except for imple contraint like interval or linear contraint, calculation of π G (θ) for an arbitrary vector θ i a formidable tak. Other technique for dealing with contraint have alo been conidered: Hiriart-Urruty [7] and Pflug [8] preent and analyze a SA algorithm baed on the penalty function method for tochatic optimization of a convex function with convex inequality contraint; Kuhner and Clark [3] preent everal SA algorithm baed on the Lagrange multiplier method, the penalty function method, and a combination of both. Mot of thee technique rely on the Kiefer-Wofolwitz (KW) [4] type of gradient etimate when the gradient of the cot function i not readily available. Furthermore, the convergence of thee SA algorithm baed on non-projection technique generally require complicated aumption on the cot function L and the contraint et G. In thi paper, we preent and tudy the convergence of a cla of algorithm baed on the penalty function method and the imultaneou perturbation (SP) gradient etimate [9]. The advantage of the SP gradient etimate over the KW-type etimate for uncontrained optimization ha been demontrated with the imultaneou perturbation tochatic approximation (SPSA) algorithm. And whenever poible, we preent ufficient condition (a remark) that can be more eaily verified than the much weaker condition ued in our convergence proof. We focu on general explicit inequality contraint where G i defined by G θ R d : q j (θ) 0, j = 1,...,}, (3)
where q j : R d R are continuouly differentiable realvalued function. We aume that the analytical expreion of the function q j i available. We extend the reult preented in [10] to incorporate a larger clae of penalty function baed on the augmented Lagragian method. We alo etablih the aymptotic normality for the propoed algorithm. Simulation reult are preented to illutrated the performance of the technique for tochatic optimization. II. CONSTRAINED SPSA ALGORITHMS A. Penalty Function The baic idea of the penalty-function approach i to convert the originally contrained optimization problem (1) into an uncontrained one defined by min θ L r(θ) L(θ)+rP(θ), (4) where P: R d R i the penalty function and r i a poitive real number normally referred to a the penalty parameter. The penalty function are defined uch that P i an increaing function of the contraint function q j ; P > 0 if and only if q j > 0 for any j; P a q j ; and P l (l 0) a q j. In thi paper, we conider a penalty function method baed on the augmented Lagrangian function defined by } [max0,λ j + rq j (θ)}] 2 λ j 2, L r (θ,λ) = L(θ)+ 1 2r (5) where λ R can be viewed a an etimate of the Lagrange multiplier vector. The aociated penalty function i P(θ) = 1 2r 2 [max0,λ j + rq j (θ)}] 2 λ 2 j }. (6) Let r n } be a poitive and trictly increaing equence with r n and λ n } be a bounded nonnegative equence in R. It can be hown (ee; for example, ection 4.2 of [1]) that the minimum of the equence function L n }, defined by L n (θ) L rn (θ,λ n ), converge to the olution of the original contrained problem (1). Since the penalized cot function (or the augmented Lagrangian) (5) i a differentiable function of θ, we can apply the tandard tochatic approximation technique with the SP gradient etimate for L to minimize L n ( )}. In other word, the original problem can be olved with an algorithm of the following form: θ n+1 = θ n a n ˆ Ln (θ n ) = θ n a n ĝ n a n r n P(θ n ), where ĝ n i the SP etimate of the gradient g( ) at θ n that we hall pecify later. Note that ince we aume the contraint are explicitly given, the gradient of the penalty function P( ) i directly ued in the algorithm. Note that when λ n = 0, the penalty function defined by (6) reduce to the tandard quadratic penalty function dicued in [10] L r (θ,0) = L(θ)+r [max0,q j (θ)}] 2. Even though the convergence of the propoed algorithm only require λ n } be bounded (hence we can et λ n = 0), we can ignificantly improve the performance of the algorithm with appropriate choice of the equence baed on concept from Lagrange multiplier theory. Moreover, it ha been hown [1] that, with the tandard quadratic penalty function, the penalized cot function L n = L + r n P can become illconditioned a r n increae (that i, the condition number of the Heian matrix of L n at θ n diverge to with r n ). The ue of the general penalty function defined in (6) can prevent thi difficulty if λ n } i choen o that it i cloe to the true Lagrange multiplier. In Section IV, we will preent an iterative method baed on the method of multiplier (ee; for example, [11]) to update λ n and compare it performance with the tandard quadratic penalty function. B. A SPSA Algorithm for Inequality Contraint In thi ection, we preent the pecific form of the algorithm for olving the contrained tochatic optimization problem. The algorithm we conider i defined by θ n+1 = θ n a n ĝ n (θ n ) a n r n P(θ n ), (7) where ĝ n (θ n ) i an etimate of the gradient of L, g( ), at θ n, r n } i an increaing equence of poitive calar with lim n r n =, P(θ) i the gradient of P(θ) at θ, and a n } i a poitive calar equence atifying a n 0 and n=1 a n =. The gradient etimate ĝ n i obtained from two noiy meaurement of the cot function L by (L(θ n + c n n )+ε + n ) (L(θ n c n n )+ε n ) 2c n 1 n, (8) where n R d i a random perturbation vector, c n 0 i a poitive equence, ε n + and εn [ are noie in ] the meaurement. and 1 n denote the vector 1,..., 1. For analyi, we 1 n d n rewrite the algorithm (7) into θ n+1 = θ n a n g(θ n ) a n r n P(θ n )+a n d n a n, (9) 2c n n where d n and ε n are defined by d n g(θ n ) L(θ n + c n n ) L(θ n c n n ) 2c n n, ε n ε + n ε n, repectively. We etablih the convergence of the algorithm (7) and the aociated aymptotic normality under appropriate aumption in the next ection. ε n
III. CONVERGENCE AND ASYMPTOTIC NORMALITY A. Convergence Theorem To etablih convergence of the algorithm (7), we need to tudy the aymptotic behavior of an SA algorithm with a time-varying regreion function. In other word, we need to conider the convergence of an SA algorithm of the following form: θ n+1 = θ n a n f n (θ n )+a n d n + a n e n, (10) where f n ( )} i a equence of function. We tate here without proof a verion of the convergence theorem given by Spall and Crition in [13] for an algorithm in the generic form (10). Theorem 1: Aume the following condition hold: (A.1) For each n large enough ( N for ome N N), there exit a unique θn uch that f n (θn) = 0. Furthermore, lim n θn = θ. (A.2) d n 0, and n k=1 a ke k converge. (A.3) For ome N <, any ρ > 0 and for each n N, if θ θ ρ, then there exit a δ n (ρ) > 0 uch that (θ θ ) T f n (θ) δ n (ρ) θ θ where δ n (ρ) atifie n=1 a nδ n (ρ) = and d n δ n (ρ) 1 0. (A.4) For each i = 1,2,...,d, and any ρ > 0, if θ ni (θ ) i > ρ eventually, then either f ni (θ n ) 0 eventually or f ni (θ n ) < 0 eventually. (A.5) For any τ > 0 and nonempty S 1,2,...,d}, there exit a ρ (τ,s) > τ uch that for all θ θ R d : (θ θ ) i < τ when i S, (θ θ ) i ρ (τ,s) when i S.}, lim up i S (θ θ ) i f ni (θ) n i S (θ θ ) i f ni (θ) < 1. Then the equence θ n } defined by the algorithm (10) converge to θ. Baed on Theorem 1, we give a convergence reult for algorithm (7) by ubtituting L n (θ n ) = g(θ n ) + r n P(θ n ), ε n 2c n n d n, and into f n (θ n ), d n, and e n in (10), repectively. We need the following aumption: (C.1) There exit K 1 N uch that for all n K 1, we have a unique θn R d with L n (θn) = 0. (C.2) ni } are i.i.d. and ymmetrically ditributed about 0, with ni α 0 a.. and E 1 ni α 1. a k ε k 2c k k (C.3) n k=1 converge almot urely. (C.4) If θ θ ρ, then there exit a δ(ρ) > 0 uch that (i) if θ G, (θ θ ) T g(θ) δ(ρ) θ θ > 0. (ii) if θ G, at leat one of the following two condition hold (θ θ ) T g(θ) δ(ρ) θ θ and (θ θ ) T P(θ) 0. (θ θ ) T g(θ) M and (θ θ ) T P(θ) δ(ρ) θ θ > 0 (C.5) a n r n 0, g( ) and P( ) are Lipchitz. (See comment below) (C.6) L n ( ) atifie condition (A5) Theorem 2: Suppoe that aumption (C.1 C.6) hold. Then the equence θ n } defined by (7) converge to θ almot urely. Proof: We only need to verify the condition (A.1 A.5) in Theorem 1 to how the deired reult: Condition (A.1) baically require the tationary point of the equence L n ( )} converge to θ. Aumption (C.1) together with exiting reult on penalty function method etablihe thi deired convergence. From the reult in [9], [14] and aumption (C.2 C.3), we can how that condition (A.2) hold. Since r n, we have condition (A.3) hold from aumption (C.4). From (9), aumption (C.1) and (C.5), we have (θ n+1 θ n ) i < (θ n θ ) i for large n if θ ni (θ ) i > ρ. Hence for large n, the equence θ ni } doe not jump over the interval between (θ ) i and θ ni. Therefore if θ ni (θ ) i > ρ eventually, then the equence f ni (θ n )} doe not change ign eventually. That i, condition (A.4) hold. Aumption (A.5) hold directly from (C.6). Theorem 2 given above i general in the ene that it doe not pecify the exact type of penalty function P( ) to adopt. In particular, aumption (C.4) eem difficult to atify. In fact, aumption (C.4) i fairly weak and doe addre the limitation of the penalty function baed gradient decent algorithm. For example, uppoe that a contraint function q k ( ) ha a local minimum at θ with q k (θ ) > 0. Then for every θ with q j (θ) 0, j k, we have (θ θ ) T P(θ) > 0 whenever θ i cloe enough to θ. A r n get larger, the term P(θ) would dominate the behavior of the algorithm and reult in a poible convergence to θ, a wrong olution. We alo like to point out that aumption (C.4) i atified if cot function L and contraint function q j, j = 1,..., are convex and atify the later condition, that i, the minimum cot function value L(θ ) i finite and there exit a θ R d uch that q j (θ) < 0 for all j (thi i the cae tudied in [8]). Aumption (C.6) enure that for n ufficiently large each element of g(θ) + r n P(θ) make a non-negligible contribution to product of the form (θ θ ) T (g(θ)+r n P(θ)) when (θ θ ) i 0. A ufficient condition for (C.6) i that for each i, g i (θ)+r n i P(θ) be uniformly bounded both away from 0 and when (θ θ ) i ρ > 0 for all i. Theorem 2 in the tated form doe require that the penalty function P be differentiable. However, it i poible to extend the tated reult to the cae where P i Lipchitz but not differentiable at a et of point with zero meaure, for example, the abolute value penalty function P(θ) = max,..., max 0,qj (θ) }}.
In the cae where the denity function of meaurement noie (ε n + and εn in (8)) exit and ha infinite upport, we can take advantage of the fact that iteration of the algorithm viit any zero-meaure et with zero probability. Auming that the et D θ R d : P(θ)doe not exit} ha Lebegue meaure 0 and the random perturbation n follow Bernoulli ditribution (P( i n = 0) = P( i n = 1) = 1 2 ), we can contruct a imple proof to how that Pθ n D i.o.} = 0 if Pθ 0 D} = 0. Therefore, the convergence reult in Theorem 2 applie to the penalty function with non-moothne at a et with meaure zero. Hence in any practical application, we can imply ignore thi technical difficulty and ue P(θ) = max0,q J(θ) (θ)} q J(θ) (θ), where J(θ) = argmax,..., q j (θ) (note that J(θ) i uniquely defined for θ D). An alternative approach to handle thi technical difficulty i to apply the SP gradient etimate directly to the penalized cot L(θ) + rp(θ) and adopt the convergence analyi preented in [15] for nondifferentiable optimization with additional convexity aumption. Ue of non-differentiable penalty function might allow u to avoid the difficulty of ill-conditioning a r n without uing the more complicated penalty function method uch a the augmented Lagrangian method ued here. The rationale here i that there exit a contant r = λ j (λ j i the Lagrange multiplier aociate with the jth contraint) uch that the minimum of L + rp i identical to the olution of the original contrained problem for all r > r, baed on the theory of exact penaltie (ee; for example, ection 4.3 of [1]). Thi property of the abolute value penalty function allow u to ue a contant penalty parameter r > r (intead of r n ) to avoid the iue of ill-conditioning. However, it i difficult to obtain a good etimate for r in our ituation where the analytical expreion of g( ) (the gradient of the cot function L( )) i not available. And it i not clear that the application of exact penalty function with r n would lead to better performance than the augmented Lagrangian baed technique. In SectionIV we will alo illutrate (via numerical reult) the potential poor performance of the algorithm with an arbitrarily choen large r. B. Aymptotic Normality When differentiable penalty function are ued, we can etablih the aymptotic normality for the propoed algorithm. In the cae where q j (θ ) < 0 for all j = 1,..., (that i, there i no active contraint at θ ), the aymptotic behavior of the algorithm i exactly the ame a the uncontrained SPSA algorithm and ha been etablihed in [9]. Here we conider the cae where at leat one of contraint i active at θ, that i, the et A j = 1,...: q j (θ ) = 0} i not empty. We etablih the aymptotic Normality for the algorithm with mooth penalty function of the form P(θ) = p j (q j (θ)), which including both the quadratic penalty and augmented Lagrangian function. Aume further that E[e n F n, n ] = 0 a.., E[e 2 n F n ] σ 2 a.., E[( i n) 2 ] ρ 2, and E[( i n) 2 ] ξ 2, where F n i the σ- algebra generated by θ 1,...,θ n. Let H(θ) denote the Heian matrix of L(θ) and H p (θ) = 2 (p j (q j (θ))). j A The next propoition etablihe the aymptotic normality for the propoed algorithm with the following choice of parameter: a n = an α, c n = cn γ and r n = rn η with a,c,r > 0, β = α η 2γ > 0, and 3γ α 2 + 3η 2 0. Propoition 1: Aume that condition (C.1-6) hold. Let P be orthogonal with PH p (θ )P T = a 1 r 1 diag(λ 1,...,λ d ) Then n β/2 (θ n θ ) dit N(µ,PMPT ), n where M = 4 1a2 r 2 c 2 σ 2 ρ 2 diag[(2λ 1 β + ) 1,...,(2λ d β + ) 1 ] with β + = β < 2min i λ i if α = 1 and β + = 0 if α < 1, and 0 if 3γ α µ = 2 + 3η 2 > 0, (arh p (θ ) 2 1β +I) 1 T if 3γ 2 α + 3η 2 = 0, where the lth element of T i [ 1 6 ac2 ξ 2 L (3) lll (θ )+3 p i=1,i l Proof: For large enough n, we have L (3) iii (θ ) E[ĝ n (θ n ) θ n ] = H( θ n )(θ n θ )+b n (θ n ), P(θ n ) = H p ( θ n)(θ n θ ), where b n (θ n ) = E[ĝ n (θ n ) g(θ n ) θ n ]. Rewrite the algorithm into θ n+1 θ = (I n α+η Γ n )(θ n θ )+n (α η+β)/2 Φ n V n where Γ n +n α+η β/2 T n, = an η H( θ n )+arh p ( θ n), V n = n γ [ĝ n (θ n ) E(ĝ n (θ n ) θ n )] Φ n = ai, T k = an β/2 η b n (θ n ). Following the technique ued in [9] and the general Normality reult from [16] we can etablih the deired reult. Note that baed on the reult in Propoition 1, the convergence rate at n 1 3 i achieved with α = 1 and γ = 1 6 η > 0. ].
IV. NUMERICAL EXPERIMENTS We tet our algorithm on a contrained optimization problem decribed in [17, p.352]: min θ L(θ) = θ 2 1 + θ 2 2 + 2θ 2 3 + θ 2 4 5θ 1 5θ 2 21θ 3 + 7θ 4 ubject to q 1 (θ) = 2θ 2 1 + θ 2 2 + θ 2 3 + 2θ 1 θ 2 θ 4 5 0 q 2 (θ) = θ 2 1 + θ 2 2 + θ 2 3 + θ 2 4 + θ 1 θ 2 + θ 3 θ 4 8 0 q 3 (θ) = θ 2 1 + 2θ 2 2 + θ 2 3 + 2θ 2 4 θ 1 θ 4 10 0. The minimum cot L(θ ) = 44 under contraint occur at θ = [0,1,2, 1] T where the contraint q 1 ( ) 0 and q 2 ( ) 0 are active. The Lagrange multiplier i [λ 1,λ 2,λ 3 ]T = [2,1,0] T. The problem had not been olved to atifactory accuracy with determinitic earch method that operate directly with contraint (claimed by [17]). Further, we increae the difficulty of the problem by adding i.i.d. zeromean Gauian noie to L(θ) and aume that only noiy meaurement of the cot function L are available (without gradient). The initial point i choen at [0,0,0,0] T ; and the tandard deviation of the added Gauian noie i 4.0 (roughly equal to the initial error). We conider three different penalty function: Quadratic penalty function: P(θ) = 1 2 [max0,q j (θ)}] 2. (11) In thi cae the gradient of P( ) required in the algorithm i P(θ) = Augmented Lagrangian: P(θ) = 1 2r 2 max0,q j (θ)} q j (θ). (12) [max0,λ j + rq j (θ)}] 2 λ 2. (13) In thi cae, the actual penalty function ued will vary over iteration depending on the pecific value elected for r n and λ n. The gradient of the penalty function required in the algorithm for the nth iteration i P(θ) = 1 r n max0,λ n j + r n q j (θ)} q j (θ). (14) To properly update λ n, we adopt a variation of the multiplier method [1]: λ n j = max0,λ n j + r n q j (θ n ),M}, (15) where λ n j denote the jth element of the vector λ n, and M R + i a large contant calar. Since (15) enure that λ n } i bounded, convergence of the minimum of L n ( )} remain valid. Furthermore, λ n } will be cloe to the true Lagrange multiplier a n. Average error over 100 imulation 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Abolute value penalty r = 10 Quadratic penalty Augmented Lagrangian Abolute value penalty r = 3.01 0 0 500 1000 1500 2000 2500 3000 3500 4000 Number of Iteration Fig. 1. Error to the optimum ( θ n θ ) averaged over 100 independent imulation. Abolute value penalty function: P(θ) = max max 0,qj (θ) }}. (16),..., A dicued earlier, we will ignore the technical difficulty that P( ) i not differentiable everywhere. The gradient of P( ) when it exit i P(θ) = max0,q J(θ) (θ)} q J(θ) (θ), (17) where J(θ) = argmax,..., q j (θ). For all the imulation we ue the following parameter value: a n = 0.1(n + 100) 0.602 and c n = n 0.101. Thee parameter for a n and c n are choen following a practical implementation guideline recommended in [18]. For the augmented Lagrangian method, λ n i initialized a a zero vector. For the quadratic penalty function and augmented Lagrangian, we ue r n = 10n 0.1 for the penalty parameter. For the abolute value penalty function, we conider two poible value for the contant penalty: r n = r = 3.01 and r n = r = 10. Note that in our experiment, r = λ j = 3. Hence the firt choice of r at 3.01 i theoretically optimal but not practical ince there i no reliable way to etimate r. The econd choice of r repreent a more typical cenario where an upper bound on r i etimated. Figure 1 plot the averaged error (over 100 independent imulation) to the optimum over 4000 iteration of the algorithm. The imulation reult in Figure 1 eem to ugget that the propoed algorithm with the quadratic penalty function and the augmented Lagrangian led to comparable performance (the augmented Lagrangian method performed lightly better than the tandard quadratic technique). Thi ugget that a more effective update cheme for λ n than (15) i needed for the augmented Lagrangian technique. The abolute value function with r = 3.01( r = 3) ha the bet performance. However, when an arbitrary upper bound on r i
ued (r = 10), the performance i much wore than both the quadratic penalty function and the augmented Lagrangian. Thi illutrate a key difficulty in effective application of the exact penalty theorem with the abolute penalty function. V. CONCLUSIONS AND REMARKS We preent a tochatic approximation algorithm baed on penalty function method and a imultaneou perturbation gradient etimate for olving tochatic optimization problem with general inequality contraint. We alo preent a general convergence reult and the aociated aymptotic Normality for the propoed algorithm. Numerical reult are included to demontrate the performance of the propoed algorithm with the tandard quadratic penalty function and a more complicated penalty function baed on the augmented Lagrangian method. In thi paper, we conider the explicit contraint where the analytical expreion of the contraint are available. It i alo poible to apply the ame algorithm with appropriate gradient etimate for P(θ) to problem with implicit contraint where contraint can only be meaured or etimated with poible error. The ucce of thi approach would depend on efficient technique to obtain unbiaed gradient etimate of the penalty function. For example, if we can meaure or etimate a value of the penalty function P(θ n ) at arbitrary location with zero-mean error, then the SP gradient etimate can be applied. Of coure, in thi ituation further aumption on r n need to ) be atified (in general, 2 we would at leat need n=1( an r n c n < ). However, in a typical application, we mot likely can only meaure the value of contraint q j (θ n ) with zero-mean error. Additional bia would be preent if the tandard finite-difference or the SP technique were applied to etimate P(θ n ) directly in thi ituation. A novel technique to obtain unbiaed etimate of P(θ n ) baed on a reaonable number of meaurement i required to make the algorithm propoed in thi paper feaible in dealing with implicit contraint. VI. REFERENCES [1] D. P. Berteka, Nonlinear Programming, Athena Scientific, Belmont, MA, 1995. [2] P. Dupui and H. J. Kuhner, Aymptotic behavior of contrained tochatic approximation via the theory of large deviation, Probability Theory and Related Field, vol. 75, pp. 223 274, 1987. [3] H. Kuhner and D. Clark, Stochatic Approximation Method for Contrained and Uncontrained Sytem. Springer-Verlag, 1978. [4] J. Kiefer and J. Wolfowitz, Stochatic etimation of the maximum of a regreion function, Ann. Math. Statit., vol. 23, pp. 462 466, 1952. [5] H. Kuhner and G. Yin, Stochatic Approximation Algorithm and Application. Springer-Verlag, 1997. [6] P. Sadegh, Contrained optimization via tochatic approximation with a imultaneou perturbation gradient approximation, Automatica, vol. 33, no. 5, pp. 889 892, 1997. [7] J. Hiriart-Urruty, Algorithm of penalization type and of dual type for the olution of tochatic optimization problem with tochatic contraint, in Recent Development in Statitic (J. Barra, ed.), pp. 183 2219, North Holland Publihing Company, 1977. [8] G. C. Pflug, On the convergence of a penalty-type tochatic optimization procedure, Journal of Information and Optimization Science, vol. 2, no. 3, pp. 249 258, 1981. [9] J. C. Spall, Multivariate tochatic approximation uing a imultaneou perturbation gradient approximation, IEEE Tranaction on Automatic Control, vol. 37, pp. 332 341, March 1992. [10] I-J. Wang and J. C. Spall, A contrained imultaneou perturbation tochatic approximation algorithm baed on penalty function method, in Proceeding of the 1999 American Control Conference, vol. 1, pp. 393 399, San Diego, CA, June 1999. [11] D. P. Berteka, Contrained Optimization and Lagragne Mulitplier Method, Academic Pre, NY, 1982. [12] E. K. Chong and S. H. Żak, An Introduction to Optimization. New York, New York: John Wiley and Son, 1996. [13] J. C. Spall and J. A. Crition, Model-free control of nonlinear tochatic ytem with dicrete-time meaurement, IEEE Tranaction on Automatic Control, vol. 43, no. 9, pp. 1178 1200, 1998. [14] I-J. Wang, Analyi of Stochatic Approximation and Related Algorithm. PhD thei, School of Electrical and Computer Engineering, Purdue Univerity, Augut 1996. [15] Y. He, M. C. Fu, and S. I. Marcu, Convergence of imultaneou perturbation tochatic approximation for nondifferentiable optimization, IEEE Tranaction on Automatic Control, vol. 48, no. 8, pp. 1459 1463, 2003. [16] V. Fabian, On aymptotic normality in tochatic approximation, The Annal of Mathematical Statitic, vol. 39, no. 4, pp. 1327 1332, 1968. [17] H.-P. Schwefel, Evolution and Optimum Seeking. John Wiley and Son, Inc., 1995. [18] J. C. Spall, Implementation of the imultaneou perturbation algorithm for tochatic optimization, IEEE Tranaction on Aeropace and Electronic Sytem, vol. 34, no. 3, pp. 817 823, 1998.