Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Similar documents
Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

A Constraint Propagation Algorithm for Determining the Stability Margin. The paper addresses the stability margin assessment for linear systems

One Class of Splitting Iterative Schemes

Multi-dimensional Fuzzy Euler Approximation

CHAPTER 4 DESIGN OF STATE FEEDBACK CONTROLLERS AND STATE OBSERVERS USING REDUCED ORDER MODEL

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Codes Correcting Two Deletions

By Xiaoquan Wen and Matthew Stephens University of Michigan and University of Chicago

Social Studies 201 Notes for November 14, 2003

An Inequality for Nonnegative Matrices and the Inverse Eigenvalue Problem

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

IEOR 3106: Fall 2013, Professor Whitt Topics for Discussion: Tuesday, November 19 Alternating Renewal Processes and The Renewal Equation

Beta Burr XII OR Five Parameter Beta Lomax Distribution: Remarks and Characterizations

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS

Clustering Methods without Given Number of Clusters

Problem Set 8 Solutions

LINEAR system identification has been the subject of

Research Article Existence for Nonoscillatory Solutions of Higher-Order Nonlinear Differential Equations

Factor Analysis with Poisson Output

General System of Nonconvex Variational Inequalities and Parallel Projection Method

Social Studies 201 Notes for March 18, 2005

Efficient Methods of Doppler Processing for Coexisting Land and Weather Clutter

THE STOCHASTIC SCOUTING PROBLEM. Ana Isabel Barros

EFFECT ON PERSISTENCE OF INTRA-SPECIFIC COMPETITION IN COMPETITION MODELS

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

Multicolor Sunflowers

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =

Optimal Coordination of Samples in Business Surveys

Online Appendix for Managerial Attention and Worker Performance by Marina Halac and Andrea Prat

Convergence criteria and optimization techniques for beam moments

[Saxena, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Control Systems Analysis and Design by the Root-Locus Method

SOME RESULTS ON INFINITE POWER TOWERS

Some Sets of GCF ϵ Expansions Whose Parameter ϵ Fetch the Marginal Value

Simple Observer Based Synchronization of Lorenz System with Parametric Uncertainty

TRIPLE SOLUTIONS FOR THE ONE-DIMENSIONAL

A SIMPLE NASH-MOSER IMPLICIT FUNCTION THEOREM IN WEIGHTED BANACH SPACES. Sanghyun Cho

Research Article A New Kind of Weak Solution of Non-Newtonian Fluid Equation

Estimation of Current Population Variance in Two Successive Occasions

Stochastic Perishable Inventory Control in a Service Facility System Maintaining Inventory for Service: Semi Markov Decision Problem

Linear Quadratic Stochastic Differential Games under Asymmetric Value of Information

A FUNCTIONAL BAYESIAN METHOD FOR THE SOLUTION OF INVERSE PROBLEMS WITH SPATIO-TEMPORAL PARAMETERS AUTHORS: CORRESPONDENCE: ABSTRACT

A Simple Approach to Synthesizing Naïve Quantized Control for Reference Tracking

NCAAPMT Calculus Challenge Challenge #3 Due: October 26, 2011

Performance Bounds for Constrained Linear Min-Max Control

STOCHASTIC EVOLUTION EQUATIONS WITH RANDOM GENERATORS. By Jorge A. León 1 and David Nualart 2 CINVESTAV-IPN and Universitat de Barcelona

On the Stability Region of Congestion Control

EXTENDED STABILITY MARGINS ON CONTROLLER DESIGN FOR NONLINEAR INPUT DELAY SYSTEMS. Otto J. Roesch, Hubert Roth, Asif Iqbal

USING NONLINEAR CONTROL ALGORITHMS TO IMPROVE THE QUALITY OF SHAKING TABLE TESTS

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

A BATCH-ARRIVAL QUEUE WITH MULTIPLE SERVERS AND FUZZY PARAMETERS: PARAMETRIC PROGRAMMING APPROACH

Lecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004

Unbounded solutions of second order discrete BVPs on infinite intervals

Reliability Analysis of Embedded System with Different Modes of Failure Emphasizing Reboot Delay

Bogoliubov Transformation in Classical Mechanics

Gain and Phase Margins Based Delay Dependent Stability Analysis of Two- Area LFC System with Communication Delays

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

New bounds for Morse clusters

ONLINE APPENDIX: TESTABLE IMPLICATIONS OF TRANSLATION INVARIANCE AND HOMOTHETICITY: VARIATIONAL, MAXMIN, CARA AND CRRA PREFERENCES

The Hassenpflug Matrix Tensor Notation

1. The F-test for Equality of Two Variances

Assessment of Performance for Single Loop Control Systems

ON A CERTAIN FAMILY OF QUARTIC THUE EQUATIONS WITH THREE PARAMETERS. Volker Ziegler Technische Universität Graz, Austria

CHEAP CONTROL PERFORMANCE LIMITATIONS OF INPUT CONSTRAINED LINEAR SYSTEMS

TCER WORKING PAPER SERIES GOLDEN RULE OPTIMALITY IN STOCHASTIC OLG ECONOMIES. Eisei Ohtaki. June 2012

Optimal revenue management in two class pre-emptive delay dependent Markovian queues

The Dynamics of Learning Vector Quantization

DYNAMIC MODELS FOR CONTROLLER DESIGN

White Rose Research Online URL for this paper: Version: Accepted Version

PARAMETRIC ESTIMATION OF HAZARD FUNCTIONS WITH STOCHASTIC COVARIATE PROCESSES

A Simplified Methodology for the Synthesis of Adaptive Flight Control Systems

A PLC BASED MIMO PID CONTROLLER FOR MULTIVARIABLE INDUSTRIAL PROCESSES

TAYLOR POLYNOMIALS FOR NABLA DYNAMIC EQUATIONS ON TIME SCALES

Preemptive scheduling on a small number of hierarchical machines

Alternate Dispersion Measures in Replicated Factorial Experiments

ON A CERTAIN FAMILY OF QUARTIC THUE EQUATIONS WITH THREE PARAMETERS

RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281

Approximate Analytical Solution for Quadratic Riccati Differential Equation

Chapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover

arxiv: v3 [math.pr] 21 Aug 2015

Connectivity in large mobile ad-hoc networks

Bayesian-Based Decision Making for Object Search and Characterization

Supplementary Figures

Non-stationary phase of the MALA algorithm

Nonlinear Single-Particle Dynamics in High Energy Accelerators

Design By Emulation (Indirect Method)

Semilinear obstacle problem with measure data and generalized reflected BSDE

Standard Guide for Conducting Ruggedness Tests 1

February 5, :53 WSPC/INSTRUCTION FILE Mild solution for quasilinear pde

Finding the location of switched capacitor banks in distribution systems based on wavelet transform

Chapter 12 Simple Linear Regression

696 Fu Jing-Li et al Vol. 12 form in generalized coordinate Q ffiq dt = 0 ( = 1; ;n): (3) For nonholonomic ytem, ffiq are not independent of

EE Control Systems LECTURE 14

Stratified Analysis of Probabilities of Causation

Chapter 4. The Laplace Transform Method

Local Fractional Laplace s Transform Based Local Fractional Calculus

Stochastic Neoclassical Growth Model

Advanced D-Partitioning Analysis and its Comparison with the Kharitonov s Theorem Assessment

arxiv: v1 [cs.sy] 9 Aug 2017

Transcription:

Stochatic Optimization with Inequality Contraint Uing Simultaneou Perturbation and Penalty Function I-Jeng Wang* and Jame C. Spall** The John Hopkin Univerity Applied Phyic Laboratory 11100 John Hopkin Road Laurel, MD 20723-6099, USA Abtract We preent a tochatic approximation algorithm baed on penalty function method and a imultaneou perturbation gradient etimate for olving tochatic optimization problem with general inequality contraint. We preent a general convergence reult that applie to a cla of penalty function including the quadratic penalty function, the augmented Lagrangian, and the abolute penalty function. We alo etablih an aymptotic normality reult for the algorithm with mooth penalty function under minor aumption. Numerical reult are given to compare the performance of the propoed algorithm with different penalty function. I. INTRODUCTION In thi paper, we conider a contrained tochatic optimization problem for which only noiy meaurement of the cot function are available. More pecifically, we are aimed to olve the following optimization problem: minl(θ), (1) θ G where L: R d R i a real-valued cot function, θ R d i the parameter vector, and G R d i the contraint et. We alo aume that the gradient of L( ) exit and i denoted by g( ). We aume that there exit a unique olution θ for the contrained optimization problem defined by (1). We conider the ituation where no explicit cloed-form expreion of the function L i available (or i very complicated even if available), and the only information are noiy meaurement of L at pecified value of the parameter vector θ. Thi cenario arie naturally for imulation-baed optimization where the cot function L i defined a the expected value of a random cot aociated with the tochatic imulation of a complex ytem. We alo aume that ignificant cot (in term of time and/or computational cot) are involved in obtaining each meaurement (or ample) of L(θ). Thee contraint prevent u from etimating the gradient (or Heian) of L( ) accurately, hence prohibit the application of effective nonlinear programming technique for inequality contraint, for example, the equential quadratic programming method (ee; for example, ection 4.3 of [1]). Throughout the paper we ue θ n to denote the nth etimate of the olution θ. Thi work wa upported by the JHU/APL Independent Reearch and Development Program. *Phone: 240-228-6204; E-mail: i-jeng.wang@jhuapl.edu. **Phone: 240-228-4960; E-mail: jame.pall@jhuapl.edu. Several reult have been preented for contrained optimization in the tochatic domain. In the area of tochatic approximation (SA), mot of the available reult are baed on the imple idea of projecting the etimate θ n back to it nearet point in G whenever θ n lie outide the contraint et G. Thee projection-baed SA algorithm are typically of the following form: θ n+1 = π G [θ n a n ĝ n (θ n )], (2) where π G : R d G i the et projection operator, and ĝ n (θ n ) i an etimate of the gradient g(θ n ); ee, for example [2], [3], [5], [6]. The main difficulty for thi projection approach lie in the implementation (calculation) of the projection operator π G. Except for imple contraint like interval or linear contraint, calculation of π G (θ) for an arbitrary vector θ i a formidable tak. Other technique for dealing with contraint have alo been conidered: Hiriart-Urruty [7] and Pflug [8] preent and analyze a SA algorithm baed on the penalty function method for tochatic optimization of a convex function with convex inequality contraint; Kuhner and Clark [3] preent everal SA algorithm baed on the Lagrange multiplier method, the penalty function method, and a combination of both. Mot of thee technique rely on the Kiefer-Wofolwitz (KW) [4] type of gradient etimate when the gradient of the cot function i not readily available. Furthermore, the convergence of thee SA algorithm baed on non-projection technique generally require complicated aumption on the cot function L and the contraint et G. In thi paper, we preent and tudy the convergence of a cla of algorithm baed on the penalty function method and the imultaneou perturbation (SP) gradient etimate [9]. The advantage of the SP gradient etimate over the KW-type etimate for uncontrained optimization ha been demontrated with the imultaneou perturbation tochatic approximation (SPSA) algorithm. And whenever poible, we preent ufficient condition (a remark) that can be more eaily verified than the much weaker condition ued in our convergence proof. We focu on general explicit inequality contraint where G i defined by G θ R d : q j (θ) 0, j = 1,...,}, (3)

where q j : R d R are continuouly differentiable realvalued function. We aume that the analytical expreion of the function q j i available. We extend the reult preented in [10] to incorporate a larger clae of penalty function baed on the augmented Lagragian method. We alo etablih the aymptotic normality for the propoed algorithm. Simulation reult are preented to illutrated the performance of the technique for tochatic optimization. II. CONSTRAINED SPSA ALGORITHMS A. Penalty Function The baic idea of the penalty-function approach i to convert the originally contrained optimization problem (1) into an uncontrained one defined by min θ L r(θ) L(θ)+rP(θ), (4) where P: R d R i the penalty function and r i a poitive real number normally referred to a the penalty parameter. The penalty function are defined uch that P i an increaing function of the contraint function q j ; P > 0 if and only if q j > 0 for any j; P a q j ; and P l (l 0) a q j. In thi paper, we conider a penalty function method baed on the augmented Lagrangian function defined by } [max0,λ j + rq j (θ)}] 2 λ j 2, L r (θ,λ) = L(θ)+ 1 2r (5) where λ R can be viewed a an etimate of the Lagrange multiplier vector. The aociated penalty function i P(θ) = 1 2r 2 [max0,λ j + rq j (θ)}] 2 λ 2 j }. (6) Let r n } be a poitive and trictly increaing equence with r n and λ n } be a bounded nonnegative equence in R. It can be hown (ee; for example, ection 4.2 of [1]) that the minimum of the equence function L n }, defined by L n (θ) L rn (θ,λ n ), converge to the olution of the original contrained problem (1). Since the penalized cot function (or the augmented Lagrangian) (5) i a differentiable function of θ, we can apply the tandard tochatic approximation technique with the SP gradient etimate for L to minimize L n ( )}. In other word, the original problem can be olved with an algorithm of the following form: θ n+1 = θ n a n ˆ Ln (θ n ) = θ n a n ĝ n a n r n P(θ n ), where ĝ n i the SP etimate of the gradient g( ) at θ n that we hall pecify later. Note that ince we aume the contraint are explicitly given, the gradient of the penalty function P( ) i directly ued in the algorithm. Note that when λ n = 0, the penalty function defined by (6) reduce to the tandard quadratic penalty function dicued in [10] L r (θ,0) = L(θ)+r [max0,q j (θ)}] 2. Even though the convergence of the propoed algorithm only require λ n } be bounded (hence we can et λ n = 0), we can ignificantly improve the performance of the algorithm with appropriate choice of the equence baed on concept from Lagrange multiplier theory. Moreover, it ha been hown [1] that, with the tandard quadratic penalty function, the penalized cot function L n = L + r n P can become illconditioned a r n increae (that i, the condition number of the Heian matrix of L n at θ n diverge to with r n ). The ue of the general penalty function defined in (6) can prevent thi difficulty if λ n } i choen o that it i cloe to the true Lagrange multiplier. In Section IV, we will preent an iterative method baed on the method of multiplier (ee; for example, [11]) to update λ n and compare it performance with the tandard quadratic penalty function. B. A SPSA Algorithm for Inequality Contraint In thi ection, we preent the pecific form of the algorithm for olving the contrained tochatic optimization problem. The algorithm we conider i defined by θ n+1 = θ n a n ĝ n (θ n ) a n r n P(θ n ), (7) where ĝ n (θ n ) i an etimate of the gradient of L, g( ), at θ n, r n } i an increaing equence of poitive calar with lim n r n =, P(θ) i the gradient of P(θ) at θ, and a n } i a poitive calar equence atifying a n 0 and n=1 a n =. The gradient etimate ĝ n i obtained from two noiy meaurement of the cot function L by (L(θ n + c n n )+ε + n ) (L(θ n c n n )+ε n ) 2c n 1 n, (8) where n R d i a random perturbation vector, c n 0 i a poitive equence, ε n + and εn [ are noie in ] the meaurement. and 1 n denote the vector 1,..., 1. For analyi, we 1 n d n rewrite the algorithm (7) into θ n+1 = θ n a n g(θ n ) a n r n P(θ n )+a n d n a n, (9) 2c n n where d n and ε n are defined by d n g(θ n ) L(θ n + c n n ) L(θ n c n n ) 2c n n, ε n ε + n ε n, repectively. We etablih the convergence of the algorithm (7) and the aociated aymptotic normality under appropriate aumption in the next ection. ε n

III. CONVERGENCE AND ASYMPTOTIC NORMALITY A. Convergence Theorem To etablih convergence of the algorithm (7), we need to tudy the aymptotic behavior of an SA algorithm with a time-varying regreion function. In other word, we need to conider the convergence of an SA algorithm of the following form: θ n+1 = θ n a n f n (θ n )+a n d n + a n e n, (10) where f n ( )} i a equence of function. We tate here without proof a verion of the convergence theorem given by Spall and Crition in [13] for an algorithm in the generic form (10). Theorem 1: Aume the following condition hold: (A.1) For each n large enough ( N for ome N N), there exit a unique θn uch that f n (θn) = 0. Furthermore, lim n θn = θ. (A.2) d n 0, and n k=1 a ke k converge. (A.3) For ome N <, any ρ > 0 and for each n N, if θ θ ρ, then there exit a δ n (ρ) > 0 uch that (θ θ ) T f n (θ) δ n (ρ) θ θ where δ n (ρ) atifie n=1 a nδ n (ρ) = and d n δ n (ρ) 1 0. (A.4) For each i = 1,2,...,d, and any ρ > 0, if θ ni (θ ) i > ρ eventually, then either f ni (θ n ) 0 eventually or f ni (θ n ) < 0 eventually. (A.5) For any τ > 0 and nonempty S 1,2,...,d}, there exit a ρ (τ,s) > τ uch that for all θ θ R d : (θ θ ) i < τ when i S, (θ θ ) i ρ (τ,s) when i S.}, lim up i S (θ θ ) i f ni (θ) n i S (θ θ ) i f ni (θ) < 1. Then the equence θ n } defined by the algorithm (10) converge to θ. Baed on Theorem 1, we give a convergence reult for algorithm (7) by ubtituting L n (θ n ) = g(θ n ) + r n P(θ n ), ε n 2c n n d n, and into f n (θ n ), d n, and e n in (10), repectively. We need the following aumption: (C.1) There exit K 1 N uch that for all n K 1, we have a unique θn R d with L n (θn) = 0. (C.2) ni } are i.i.d. and ymmetrically ditributed about 0, with ni α 0 a.. and E 1 ni α 1. a k ε k 2c k k (C.3) n k=1 converge almot urely. (C.4) If θ θ ρ, then there exit a δ(ρ) > 0 uch that (i) if θ G, (θ θ ) T g(θ) δ(ρ) θ θ > 0. (ii) if θ G, at leat one of the following two condition hold (θ θ ) T g(θ) δ(ρ) θ θ and (θ θ ) T P(θ) 0. (θ θ ) T g(θ) M and (θ θ ) T P(θ) δ(ρ) θ θ > 0 (C.5) a n r n 0, g( ) and P( ) are Lipchitz. (See comment below) (C.6) L n ( ) atifie condition (A5) Theorem 2: Suppoe that aumption (C.1 C.6) hold. Then the equence θ n } defined by (7) converge to θ almot urely. Proof: We only need to verify the condition (A.1 A.5) in Theorem 1 to how the deired reult: Condition (A.1) baically require the tationary point of the equence L n ( )} converge to θ. Aumption (C.1) together with exiting reult on penalty function method etablihe thi deired convergence. From the reult in [9], [14] and aumption (C.2 C.3), we can how that condition (A.2) hold. Since r n, we have condition (A.3) hold from aumption (C.4). From (9), aumption (C.1) and (C.5), we have (θ n+1 θ n ) i < (θ n θ ) i for large n if θ ni (θ ) i > ρ. Hence for large n, the equence θ ni } doe not jump over the interval between (θ ) i and θ ni. Therefore if θ ni (θ ) i > ρ eventually, then the equence f ni (θ n )} doe not change ign eventually. That i, condition (A.4) hold. Aumption (A.5) hold directly from (C.6). Theorem 2 given above i general in the ene that it doe not pecify the exact type of penalty function P( ) to adopt. In particular, aumption (C.4) eem difficult to atify. In fact, aumption (C.4) i fairly weak and doe addre the limitation of the penalty function baed gradient decent algorithm. For example, uppoe that a contraint function q k ( ) ha a local minimum at θ with q k (θ ) > 0. Then for every θ with q j (θ) 0, j k, we have (θ θ ) T P(θ) > 0 whenever θ i cloe enough to θ. A r n get larger, the term P(θ) would dominate the behavior of the algorithm and reult in a poible convergence to θ, a wrong olution. We alo like to point out that aumption (C.4) i atified if cot function L and contraint function q j, j = 1,..., are convex and atify the later condition, that i, the minimum cot function value L(θ ) i finite and there exit a θ R d uch that q j (θ) < 0 for all j (thi i the cae tudied in [8]). Aumption (C.6) enure that for n ufficiently large each element of g(θ) + r n P(θ) make a non-negligible contribution to product of the form (θ θ ) T (g(θ)+r n P(θ)) when (θ θ ) i 0. A ufficient condition for (C.6) i that for each i, g i (θ)+r n i P(θ) be uniformly bounded both away from 0 and when (θ θ ) i ρ > 0 for all i. Theorem 2 in the tated form doe require that the penalty function P be differentiable. However, it i poible to extend the tated reult to the cae where P i Lipchitz but not differentiable at a et of point with zero meaure, for example, the abolute value penalty function P(θ) = max,..., max 0,qj (θ) }}.

In the cae where the denity function of meaurement noie (ε n + and εn in (8)) exit and ha infinite upport, we can take advantage of the fact that iteration of the algorithm viit any zero-meaure et with zero probability. Auming that the et D θ R d : P(θ)doe not exit} ha Lebegue meaure 0 and the random perturbation n follow Bernoulli ditribution (P( i n = 0) = P( i n = 1) = 1 2 ), we can contruct a imple proof to how that Pθ n D i.o.} = 0 if Pθ 0 D} = 0. Therefore, the convergence reult in Theorem 2 applie to the penalty function with non-moothne at a et with meaure zero. Hence in any practical application, we can imply ignore thi technical difficulty and ue P(θ) = max0,q J(θ) (θ)} q J(θ) (θ), where J(θ) = argmax,..., q j (θ) (note that J(θ) i uniquely defined for θ D). An alternative approach to handle thi technical difficulty i to apply the SP gradient etimate directly to the penalized cot L(θ) + rp(θ) and adopt the convergence analyi preented in [15] for nondifferentiable optimization with additional convexity aumption. Ue of non-differentiable penalty function might allow u to avoid the difficulty of ill-conditioning a r n without uing the more complicated penalty function method uch a the augmented Lagrangian method ued here. The rationale here i that there exit a contant r = λ j (λ j i the Lagrange multiplier aociate with the jth contraint) uch that the minimum of L + rp i identical to the olution of the original contrained problem for all r > r, baed on the theory of exact penaltie (ee; for example, ection 4.3 of [1]). Thi property of the abolute value penalty function allow u to ue a contant penalty parameter r > r (intead of r n ) to avoid the iue of ill-conditioning. However, it i difficult to obtain a good etimate for r in our ituation where the analytical expreion of g( ) (the gradient of the cot function L( )) i not available. And it i not clear that the application of exact penalty function with r n would lead to better performance than the augmented Lagrangian baed technique. In SectionIV we will alo illutrate (via numerical reult) the potential poor performance of the algorithm with an arbitrarily choen large r. B. Aymptotic Normality When differentiable penalty function are ued, we can etablih the aymptotic normality for the propoed algorithm. In the cae where q j (θ ) < 0 for all j = 1,..., (that i, there i no active contraint at θ ), the aymptotic behavior of the algorithm i exactly the ame a the uncontrained SPSA algorithm and ha been etablihed in [9]. Here we conider the cae where at leat one of contraint i active at θ, that i, the et A j = 1,...: q j (θ ) = 0} i not empty. We etablih the aymptotic Normality for the algorithm with mooth penalty function of the form P(θ) = p j (q j (θ)), which including both the quadratic penalty and augmented Lagrangian function. Aume further that E[e n F n, n ] = 0 a.., E[e 2 n F n ] σ 2 a.., E[( i n) 2 ] ρ 2, and E[( i n) 2 ] ξ 2, where F n i the σ- algebra generated by θ 1,...,θ n. Let H(θ) denote the Heian matrix of L(θ) and H p (θ) = 2 (p j (q j (θ))). j A The next propoition etablihe the aymptotic normality for the propoed algorithm with the following choice of parameter: a n = an α, c n = cn γ and r n = rn η with a,c,r > 0, β = α η 2γ > 0, and 3γ α 2 + 3η 2 0. Propoition 1: Aume that condition (C.1-6) hold. Let P be orthogonal with PH p (θ )P T = a 1 r 1 diag(λ 1,...,λ d ) Then n β/2 (θ n θ ) dit N(µ,PMPT ), n where M = 4 1a2 r 2 c 2 σ 2 ρ 2 diag[(2λ 1 β + ) 1,...,(2λ d β + ) 1 ] with β + = β < 2min i λ i if α = 1 and β + = 0 if α < 1, and 0 if 3γ α µ = 2 + 3η 2 > 0, (arh p (θ ) 2 1β +I) 1 T if 3γ 2 α + 3η 2 = 0, where the lth element of T i [ 1 6 ac2 ξ 2 L (3) lll (θ )+3 p i=1,i l Proof: For large enough n, we have L (3) iii (θ ) E[ĝ n (θ n ) θ n ] = H( θ n )(θ n θ )+b n (θ n ), P(θ n ) = H p ( θ n)(θ n θ ), where b n (θ n ) = E[ĝ n (θ n ) g(θ n ) θ n ]. Rewrite the algorithm into θ n+1 θ = (I n α+η Γ n )(θ n θ )+n (α η+β)/2 Φ n V n where Γ n +n α+η β/2 T n, = an η H( θ n )+arh p ( θ n), V n = n γ [ĝ n (θ n ) E(ĝ n (θ n ) θ n )] Φ n = ai, T k = an β/2 η b n (θ n ). Following the technique ued in [9] and the general Normality reult from [16] we can etablih the deired reult. Note that baed on the reult in Propoition 1, the convergence rate at n 1 3 i achieved with α = 1 and γ = 1 6 η > 0. ].

IV. NUMERICAL EXPERIMENTS We tet our algorithm on a contrained optimization problem decribed in [17, p.352]: min θ L(θ) = θ 2 1 + θ 2 2 + 2θ 2 3 + θ 2 4 5θ 1 5θ 2 21θ 3 + 7θ 4 ubject to q 1 (θ) = 2θ 2 1 + θ 2 2 + θ 2 3 + 2θ 1 θ 2 θ 4 5 0 q 2 (θ) = θ 2 1 + θ 2 2 + θ 2 3 + θ 2 4 + θ 1 θ 2 + θ 3 θ 4 8 0 q 3 (θ) = θ 2 1 + 2θ 2 2 + θ 2 3 + 2θ 2 4 θ 1 θ 4 10 0. The minimum cot L(θ ) = 44 under contraint occur at θ = [0,1,2, 1] T where the contraint q 1 ( ) 0 and q 2 ( ) 0 are active. The Lagrange multiplier i [λ 1,λ 2,λ 3 ]T = [2,1,0] T. The problem had not been olved to atifactory accuracy with determinitic earch method that operate directly with contraint (claimed by [17]). Further, we increae the difficulty of the problem by adding i.i.d. zeromean Gauian noie to L(θ) and aume that only noiy meaurement of the cot function L are available (without gradient). The initial point i choen at [0,0,0,0] T ; and the tandard deviation of the added Gauian noie i 4.0 (roughly equal to the initial error). We conider three different penalty function: Quadratic penalty function: P(θ) = 1 2 [max0,q j (θ)}] 2. (11) In thi cae the gradient of P( ) required in the algorithm i P(θ) = Augmented Lagrangian: P(θ) = 1 2r 2 max0,q j (θ)} q j (θ). (12) [max0,λ j + rq j (θ)}] 2 λ 2. (13) In thi cae, the actual penalty function ued will vary over iteration depending on the pecific value elected for r n and λ n. The gradient of the penalty function required in the algorithm for the nth iteration i P(θ) = 1 r n max0,λ n j + r n q j (θ)} q j (θ). (14) To properly update λ n, we adopt a variation of the multiplier method [1]: λ n j = max0,λ n j + r n q j (θ n ),M}, (15) where λ n j denote the jth element of the vector λ n, and M R + i a large contant calar. Since (15) enure that λ n } i bounded, convergence of the minimum of L n ( )} remain valid. Furthermore, λ n } will be cloe to the true Lagrange multiplier a n. Average error over 100 imulation 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Abolute value penalty r = 10 Quadratic penalty Augmented Lagrangian Abolute value penalty r = 3.01 0 0 500 1000 1500 2000 2500 3000 3500 4000 Number of Iteration Fig. 1. Error to the optimum ( θ n θ ) averaged over 100 independent imulation. Abolute value penalty function: P(θ) = max max 0,qj (θ) }}. (16),..., A dicued earlier, we will ignore the technical difficulty that P( ) i not differentiable everywhere. The gradient of P( ) when it exit i P(θ) = max0,q J(θ) (θ)} q J(θ) (θ), (17) where J(θ) = argmax,..., q j (θ). For all the imulation we ue the following parameter value: a n = 0.1(n + 100) 0.602 and c n = n 0.101. Thee parameter for a n and c n are choen following a practical implementation guideline recommended in [18]. For the augmented Lagrangian method, λ n i initialized a a zero vector. For the quadratic penalty function and augmented Lagrangian, we ue r n = 10n 0.1 for the penalty parameter. For the abolute value penalty function, we conider two poible value for the contant penalty: r n = r = 3.01 and r n = r = 10. Note that in our experiment, r = λ j = 3. Hence the firt choice of r at 3.01 i theoretically optimal but not practical ince there i no reliable way to etimate r. The econd choice of r repreent a more typical cenario where an upper bound on r i etimated. Figure 1 plot the averaged error (over 100 independent imulation) to the optimum over 4000 iteration of the algorithm. The imulation reult in Figure 1 eem to ugget that the propoed algorithm with the quadratic penalty function and the augmented Lagrangian led to comparable performance (the augmented Lagrangian method performed lightly better than the tandard quadratic technique). Thi ugget that a more effective update cheme for λ n than (15) i needed for the augmented Lagrangian technique. The abolute value function with r = 3.01( r = 3) ha the bet performance. However, when an arbitrary upper bound on r i

ued (r = 10), the performance i much wore than both the quadratic penalty function and the augmented Lagrangian. Thi illutrate a key difficulty in effective application of the exact penalty theorem with the abolute penalty function. V. CONCLUSIONS AND REMARKS We preent a tochatic approximation algorithm baed on penalty function method and a imultaneou perturbation gradient etimate for olving tochatic optimization problem with general inequality contraint. We alo preent a general convergence reult and the aociated aymptotic Normality for the propoed algorithm. Numerical reult are included to demontrate the performance of the propoed algorithm with the tandard quadratic penalty function and a more complicated penalty function baed on the augmented Lagrangian method. In thi paper, we conider the explicit contraint where the analytical expreion of the contraint are available. It i alo poible to apply the ame algorithm with appropriate gradient etimate for P(θ) to problem with implicit contraint where contraint can only be meaured or etimated with poible error. The ucce of thi approach would depend on efficient technique to obtain unbiaed gradient etimate of the penalty function. For example, if we can meaure or etimate a value of the penalty function P(θ n ) at arbitrary location with zero-mean error, then the SP gradient etimate can be applied. Of coure, in thi ituation further aumption on r n need to ) be atified (in general, 2 we would at leat need n=1( an r n c n < ). However, in a typical application, we mot likely can only meaure the value of contraint q j (θ n ) with zero-mean error. Additional bia would be preent if the tandard finite-difference or the SP technique were applied to etimate P(θ n ) directly in thi ituation. A novel technique to obtain unbiaed etimate of P(θ n ) baed on a reaonable number of meaurement i required to make the algorithm propoed in thi paper feaible in dealing with implicit contraint. VI. REFERENCES [1] D. P. Berteka, Nonlinear Programming, Athena Scientific, Belmont, MA, 1995. [2] P. Dupui and H. J. Kuhner, Aymptotic behavior of contrained tochatic approximation via the theory of large deviation, Probability Theory and Related Field, vol. 75, pp. 223 274, 1987. [3] H. Kuhner and D. Clark, Stochatic Approximation Method for Contrained and Uncontrained Sytem. Springer-Verlag, 1978. [4] J. Kiefer and J. Wolfowitz, Stochatic etimation of the maximum of a regreion function, Ann. Math. Statit., vol. 23, pp. 462 466, 1952. [5] H. Kuhner and G. Yin, Stochatic Approximation Algorithm and Application. Springer-Verlag, 1997. [6] P. Sadegh, Contrained optimization via tochatic approximation with a imultaneou perturbation gradient approximation, Automatica, vol. 33, no. 5, pp. 889 892, 1997. [7] J. Hiriart-Urruty, Algorithm of penalization type and of dual type for the olution of tochatic optimization problem with tochatic contraint, in Recent Development in Statitic (J. Barra, ed.), pp. 183 2219, North Holland Publihing Company, 1977. [8] G. C. Pflug, On the convergence of a penalty-type tochatic optimization procedure, Journal of Information and Optimization Science, vol. 2, no. 3, pp. 249 258, 1981. [9] J. C. Spall, Multivariate tochatic approximation uing a imultaneou perturbation gradient approximation, IEEE Tranaction on Automatic Control, vol. 37, pp. 332 341, March 1992. [10] I-J. Wang and J. C. Spall, A contrained imultaneou perturbation tochatic approximation algorithm baed on penalty function method, in Proceeding of the 1999 American Control Conference, vol. 1, pp. 393 399, San Diego, CA, June 1999. [11] D. P. Berteka, Contrained Optimization and Lagragne Mulitplier Method, Academic Pre, NY, 1982. [12] E. K. Chong and S. H. Żak, An Introduction to Optimization. New York, New York: John Wiley and Son, 1996. [13] J. C. Spall and J. A. Crition, Model-free control of nonlinear tochatic ytem with dicrete-time meaurement, IEEE Tranaction on Automatic Control, vol. 43, no. 9, pp. 1178 1200, 1998. [14] I-J. Wang, Analyi of Stochatic Approximation and Related Algorithm. PhD thei, School of Electrical and Computer Engineering, Purdue Univerity, Augut 1996. [15] Y. He, M. C. Fu, and S. I. Marcu, Convergence of imultaneou perturbation tochatic approximation for nondifferentiable optimization, IEEE Tranaction on Automatic Control, vol. 48, no. 8, pp. 1459 1463, 2003. [16] V. Fabian, On aymptotic normality in tochatic approximation, The Annal of Mathematical Statitic, vol. 39, no. 4, pp. 1327 1332, 1968. [17] H.-P. Schwefel, Evolution and Optimum Seeking. John Wiley and Son, Inc., 1995. [18] J. C. Spall, Implementation of the imultaneou perturbation algorithm for tochatic optimization, IEEE Tranaction on Aeropace and Electronic Sytem, vol. 34, no. 3, pp. 817 823, 1998.