Efficient Robbins-Monro Procedure for Binary Data

Similar documents
ADAPTIVE DESIGNS FOR STOCHASTIC ROOT-FINDING

Three-phase Sequential Design for Sensitivity Experiments

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Invariant HPD credible sets and MAP estimators

MULTIPLE-OBJECTIVE DESIGNS IN A DOSE-RESPONSE EXPERIMENT

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Limit Kriging. Abstract

A new procedure for sensitivity testing with two stress factors

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Xia Wang and Dipak K. Dey

A-optimal designs for generalized linear model with two parameters

University of California, Berkeley

f (1 0.5)/n Z =

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

Robust estimation of skew-normal distribution with location and scale parameters via log-regularly varying functions

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Research Article Degenerate-Generalized Likelihood Ratio Test for One-Sided Composite Hypotheses

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Optimum designs for model. discrimination and estimation. in Binary Response Models

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Algorithm Analysis Recurrence Relation. Chung-Ang University, Jaesung Lee

Inference on reliability in two-parameter exponential stress strength model

The Game of Normal Numbers

Taguchi s Approach to Robust Parameter Design: A New Perspective

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Fiducial Inference and Generalizations

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

STAT331. Cox s Proportional Hazards Model

General Regression Model

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Thomas J. Fisher. Research Statement. Preliminary Results

AN ASYMPTOTIC THEORY OF SEQUENTIAL DESIGNS BASED ON MAXIMUM LIKELIHOOD RECURSIONS

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Asymptotic distribution of the sample average value-at-risk in the case of heavy-tailed returns

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

Statistical Practice

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Ballistic Resistance Testing Techniques

Asymptotic Relative Efficiency in Estimation

On prediction and density estimation Peter McCullagh University of Chicago December 2004

1 Bayesian Linear Regression (BLR)

Bayesian spatial quantile regression

A nonparametric two-sample wald test of equality of variances

Lecture 7 Introduction to Statistical Decision Theory

Bayesian estimation of the discrepancy with misspecified parametric models

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

A noninformative Bayesian approach to domain estimation

Mohsen Pourahmadi. 1. A sampling theorem for multivariate stationary processes. J. of Multivariate Analysis, Vol. 13, No. 1 (1983),

Finite Population Correction Methods

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Bayesian Inference: Posterior Intervals

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

A Note on Bayesian Inference After Multiple Imputation

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

A comparison study of the nonparametric tests based on the empirical distributions

Minimum distance tests and estimates based on ranks

Sample size calculations for logistic and Poisson regression models

Conditional Inference by Estimation of a Marginal Distribution

Nonparametric Modal Regression

Spectral Analysis for Intrinsic Time Processes

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1


Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

ON THE FAILURE RATE ESTIMATION OF THE INVERSE GAUSSIAN DISTRIBUTION

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

Bios 6649: Clinical Trials - Statistical Design and Monitoring

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

APPLICATION OF NEWTON RAPHSON METHOD TO NON LINEAR MODELS

Smooth simultaneous confidence bands for cumulative distribution functions

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Accounting for Complex Sample Designs via Mixture Models

Tutorial on Approximate Bayesian Computation

A New Method for Varying Adaptive Bandwidth Selection

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Estimation of the Conditional Variance in Paired Experiments

Highly Robust Variogram Estimation 1. Marc G. Genton 2

HANDBOOK OF APPLICABLE MATHEMATICS

A note on profile likelihood for exponential tilt mixture models

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

CHANGE POINT PROBLEMS IN THE MODEL OF LOGISTIC REGRESSION 1. By G. Gurevich and A. Vexler. Tecnion-Israel Institute of Technology, Haifa, Israel.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Modified Simes Critical Values Under Positive Dependence

1 Hypothesis Testing and Model Selection

Bearings-Only Tracking in Modified Polar Coordinate System : Initialization of the Particle Filter and Posterior Cramér-Rao Bound

Resistance Growth of Branching Random Networks

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

On the Conditional Distribution of the Multivariate t Distribution

The STS Surgeon Composite Technical Appendix

Transcription:

Efficient Robbins-Monro Procedure for Binary Data V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu SUMMARY The Robbins-Monro procedure does not perform well in the estimation of extreme quantiles, because the procedure is implemented using asymptotic results, which are not suitable for binary data. Here we propose a modification of the Robbins-Monro procedure and derive the optimal procedure for binary data under some reasonable approximations. The improvement obtained by using the optimal procedure for the estimation of extreme quantiles is substantial. Some key words: Quantile estimation; Sequential design; Stochastic approximation. 1

1. INTRODUCTION In many applications, interest centres on finding the threshold of a variable that will cause certain amount of successes, or failures, in the output. For example, an explosive designer may be interested in finding the level of shock necessary to make 99.99% of the explosives fire (Neyer, 1994); see Joseph and Wu (2002) for another application. The problem can be formally stated as follows. Let Y be a binary response with probability of success M(x), where x is the variable whose threshold is of interest. The objective is to find the value θ of x for which M(θ) = α, for given α. Usually M(x) is a distribution function obtained by assuming some distribution for a latent variable underlying the binary response, so that θ can be regarded as the α-quantile of this distribution. The function M(x) is unknown to the experimenter, but the experimenter can observe Y at different values of x in order to find θ. The objective is to devise an experimental strategy that will help one to estimate θ with minimum number of observations and with great accuracy. The experiment can be performed by using a sequential design in which the x s are chosen sequentially and are allowed to depend on the data already observed. One sequential design strategy known as stochastic approximation is to choose x 1, x 2, such that x n θ in probability. Robbins & Monro(1951) proposed the procedure x n+1 = x n a n (y n α), (1) where y n is the binary response observed at x n and {a n } is a pre-specified sequence of positive constants. Robbins & Monro showed that x n θ in probability if a n =, and n=1 a 2 n <. (2) n=1 2

Robbins & Monro recommended the simple but non-optimal choice a n = c/n for some constant c. Based on the results of Chung (1954), Hodges & Lehmann (1956) and Sacks (1958), the procedure is fully asymptotically efficient with a n = {nṁ(θ)} 1 under certain conditions, where Ṁ denotes the first derivative of M; for use in practical implementations see for example Wetherill and Glazebrook (1986). The optimal choice of a n in small samples has not been investigated, although in most experiments this is the most interesting case. Wetherill (1963) showed through simulation that the Robbins-Monro procedure works quite well for α = 0.5, corresponding to LD50, but performs very poorly for extreme quantiles, even with a number of modifications. Cochran & Davis (1965) and Young & Easterling (1994) obtained similar conclusions based on extensive simulation results. Although the Robbins-Monro procedure is not restricted to binary data, binary data possess two properties that can be exploited to improve the procedure: the variance of Y is a function of x given by M(x){1 M(x)}, and M(x) is a distribution function. In this article we make use of these properties to produce optimal Robbins-Monro procedures for binary data. We also propose a slight modification to the Robbins-Monro procedure to get better convergence properties. Several methods for quantile estimation based on binary data are proposed in the literature; see for example Wu (1985), McLeish & Tosh (1990), Kalish (1990), Neyer (1994) and Sitter & Wu (1999) among many others. The objective of this study is to find the optimal sequential procedure within the class of the Robbins-Monro type procedures, not to find the best overall method. 3

2. OPTIMAL ROBBINS-MONRO PROCEDURE In most applications, the distribution function M(x) is from a location family with location parameter θ. Therefore hereafter we denote M(x) by M(x θ). Thus M(0) = α (0, 1) is specified. We also assume that Ṁ(0) > 0 is known. The experimenter starts the experiment at some value x 1, which is believed to be close to θ based on some prior knowledge. Therefore we may choose a prior distribution for θ with E(Θ) = x 1 and var(θ) = τ1 2 <, where τ 1 represents the initial uncertainty of θ with respect to x 1. Let Z n = x n Θ. Note that, although x 1 is a fixed quantity, x 2,, x n are random because of their dependence on past data. Consider a modified Robbins-Monro process given by x n+1 = x n a n (y n b n ). For binary data, {b n } is a sequence of constants in (0, 1). They need not be equal to α, but {b n } is expected to get close to α as n gets larger. We will see that, if we use a b n different from α, the performance of the Robbins-Monro procedure can be greatly improved. Thus our model is y n Z n Ber{M(Z n )}, Z n+1 = Z n a n (y n b n ), E(Z 1 ) = 0 and E(Z 2 1) = τ 2 1. The objective is to find sequences {a n } and {b n } such that Z n 0 in probability at the fastest rate. First we investigate the conditions under which the desired convergence can be obtained. 4

Suppose the sequence {b n } satisfies the condition n=2 n 1 a n b n α a j <. (3) Then we have the following convergence result whose proof closely follows that of Robbins & Monro (1951). The above condition together with (2) ensures that b n converges to α. Moreover, because n 1 a j increases with n, the convergence of b n to α should be fast enough for (3) to hold. All proofs are given in the Appendix. Theorem 1. If (2) and (3) hold, then Z n 0, in probability, as n. There are infinitely many sequences for {a n } and {b n } satisfying the conditions of the Theorem 1. We are seeking the particular sequence that gives the best convergence properties. We propose to choose a n and b n such that E(Z 2 n+1) is a minimum subject to the condition that E(Z n+1 ) = 0. Similar ideas of choosing two sequences to minimise the conditional mean squared error in sequential designs was employed by Hu (1997, 1998) in a Bayesian framework. We have that E(Z n+1 ) = E(Z n ) a n [E{M(Z n )} b n ] = 0. Since the sequences a 1,, a n 1 and b 1,, b n 1 are obtained such that E(Z 2 ) = = E(Z n ) = 0, we obtain b n = E{M(Z n )}. (4) Let τ 2 n = E(Z 2 n). From (A1), we have τ 2 n+1 = τ 2 n 2a n E[Z n {M(Z n ) b n }]+a 2 n ( [E{M(Zn )} b n ] 2 + E{M(Z n )}[1 E{M(Z n )}] ). 5 (5)

Minimising τ 2 n+1 with respect to a n and using (4), we obtain a n = E{Z n M(Z n )} E{M(Z n )}[1 E{M(Z n )}]. (6) Unfortunately, the optimal sequences depend on the function M, which is unknown to the experimenter. Therefore the best we can do is to choose a function G that can closely approximate the true function M and derive the sequences. The resulting Robbins-Monro procedure is optimal only when M = G and approximately optimal otherwise. It can be considered as an efficient procedure as long as the deviation of G from the true function is not severe. The choice of a n = {nṁ(0)} 1 is based on a linear approximation of M around 0, which is not good for a distribution function particularly in the tail areas. Consider an approximation for M(z) given by, G(z) = Φ{Φ 1 (α) + βz}, (7) where β = Ṁ(0)/φ{Φ 1 (α)}, Φ is the standard normal distribution function, and φ is its density function. Now a n and b n can be obtained from (4) and (6) using G(z) instead of M(z). We also need the distribution of Z n to evaluate the expectations in (4) and (6). It is easy to show that the density function of Z n+1 is f Zn+1 (z) = {1 G(z a n b n )}f Zn (z a n b n ) + G{z + a n (1 b n )}f Zn {z + a n (1 b n )}. The f Zn (z) can be recursively computed starting with f Z1 (z). Let Z 1 N(0, τ 2 1 ), so that f Z1 (z) = 1/τ 1 φ(z/τ 1 ). Clearly it is difficult to compute the expectations in (4) and (6) with this exact distribution, and therefore we resort to approximation. It is quite natural to approximate the distribution of Z n by N(0, τ 2 n) as the first two moments of the two 6

distributions match exactly. It turns out that this is a very good approximation, as verified by plotting these functions for various values of α, τ 1, and n. Using this approximation we obtain b n = E{G(Z n )} = a n = = Φ{Φ 1 (α) + βz} 1/τ n φ(z/τ n ) dz = Φ E{Z n G(Z n )} E{G(Z n )}[1 E{G(Z n )}] = 1 b n (1 b n ) 1 βτ 2 { n Φ 1 } (α) φ. b n (1 b n ) (1 + β 2 τn) 2 1 2 (1 + β 2 τn) 2 1 2 { Φ 1 (α) (1 + β 2 τ 2 n) 1 2 z Φ{Φ 1 (α) + βz} 1/τ n φ(z/τ n ) dz Substituting in (5) we obtain τ 2 n+1 = τ 2 n b n (1 b n )a 2 n, which shows that there is an improvement by moving from x n to x n+1. Let c n = βa n b n (1 b n ) and ν n = β 2 τ 2 n. Then the optimal Robbins-Monro procedure can be written as }, where x n+1 = x n c n βb n (1 b n ) (y n b n ), (8) c n = { ν n Φ 1 } { (α) Φ 1 } (α) φ, b (1 + ν n ) 1 2 (1 + ν n ) 1 n = Φ 2 (1 + ν n ) 1 2 and ν n+1 = ν n c 2 n b n (1 b n ), (9) with ν 1 = β 2 τ1 2. Note that c n and b n are sequences that can be specified before the experiment. They can be easily computed once ν 1 (0, ) is specified. The procedure can work only with a finite value of ν 1, which implies that a noninformative prior for θ cannot be used. Therefore the Bayesian formulation of the problem with a proper prior was very crucial in the development of the above procedure. After n experiments the best estimate of θ is x n+1. We can also obtain a (1 γ) credible interval for θ as x n+1 ± Φ 1 (γ/2)τ n+1, where τ n+1 = ν 1 2 n+1 /β. 7

Proposition 1. As n, ν n 0, b n α and c n 0. By Proposition 1 we have that τn 2 0 and therefore Z n 0 in probability, if M is equal to the normal distribution function given in (7). We can expect the result to hold even when the true model is not normal. Fortunately this hypothesis is true as stated in the following proposition. Proposition 2. For the procedure in (8), Z n 0 in probability, as n It is clear from (9) that b n always lies between α and 1/2. The consequence of this is important. Even if we are interested in extreme quantiles, for small n, the optimal Robbins- Monro procedure is operated as though we are interested in a quantile between α and LD50. The search is moved closer to α as n increases since b n α. This is markedly different from the ordinary Robbins-Monro procedure in (1). Wetherill (1963) noted that the Robbins- Monro procedure performs miserably for the estimation of small or large quantiles. This is mainly because of the unequal up and down movements of the procedure. Since we use b n instead of α this imbalance is somewhat mitigated. Consider an example of Wetherill. Suppose α = 0.25 and the true model is a logistic function given by M(z) = {1 + 3 exp( z)} 1. Suppose that the experiment started exactly at θ. If we observe y 1 = 1, which can happen with probability 0.25, then x 2 = θ {1/Ṁ(0)}(1 0.25) = θ 4. Thus x 2 is lower than θ. The search will move up if we observe y = 0. Suppose we observed y 2 = = y n = 0. Then x n+1 = θ 4 + ( 0.25 1 0.25 0.75 2 + 1 3 + + 1 ), n which crosses θ from below at n = 31. In other words, it will take at least 30 steps to get back to the true value θ. Now consider the optimal Robbins-Monro procedure in (8). Here 8

β = Ṁ(0)/φ{Φ(0.25)} = 0.59. We select τ 1 such that x 2 is exactly the same as that in the Robbins-Monro procedure. This value is 4.475. Now x 7 = θ 4 + 1.579 + 0.973 + 0.655 + 0.478 + 0.370 = θ + 0.055. Thus the sequence crosses θ from below in just 5 steps as opposed to the 30 steps taken by the Robbins-Monro procedure. Note that we require the value of Ṁ(0) to compute β. In practice, it is unlikely that the experimenter will know the exact value of Ṁ(0) and the procedure has to be implemented using a guessed value of Ṁ(0). It is clear from the proof of Proposition 2 that the procedure will work irrespective of the value of β, but the convergence can slow down if the guessed value is far away from the true value. If a good guess cannot be made for Ṁ(0), one can try to estimate it adaptively from the data by fitting a parametric model such as the one in (7). For the Robbins-Monro procedure such an adaptive procedure under some truncation rule gives the same asymptotic performance as that of the original procedure; see Lai & Robbins (1979) and Wu (1985) for more details. 3. SIMULATIONS We now compare the performance of the optimal Robbins-Monro procedure in (8) with the Robbins-Monro procedure in (1) through simulations. For the Robbins-Monro procedure a n is taken as {nṁ(0)} 1, which is the standard choice. The following six models are selected for the simulation study. Normal distribution: M(z) = Φ{Φ 1 (α) + z}. 9

Uniform distribution: M(z) = 0, z < 3α α + z/3, 3α z 3(1 α) 1, z > 3(1 α). Logistic distribution: M(z) = (1 + 1 α α e z ) 1. Extreme value distribution: M(z) = 1 exp{log(1 α) e z }. Skewed logistic distribution: M(z) = (1 + 1 α α e z ) 2. Cauchy distribution: M(z) = 1 2 + 1 π tan 1 {z + tan(πα π/2)}. Let θ = 0. We choose 20 samples to estimate θ. Thus the best estimate of θ is x 21. We let τ 1 = 1 be the initial uncertainty, which means that, with probability about 95%; θ is within ±2 of the starting value x 1. The starting value is randomly generated from N(0, τ1 2 ) and is used for both procedures. Then at different values of α between 0.1 and 0.9, 1000 simulations were performed. The mean squared error of x 21 is calculated and is plotted in Fig.1. We see that the optimal Robbins-Monro procedure performs uniformly better than the Robbins- Monro procedure, with great improvement for the extreme quantiles. The performance of the two procedures around LD50 is comparable, which agrees with the findings of Wetherill (1963). Fig. 2 shows the estimated density curves for x 21 for different α values with the logistic distribution as the true model. It clearly shows how the performance of the Robbins- Monro procedure deteriorates as we move towards the extreme quantiles. The entire process is repeated with τ 1 = 2, with essentially the same conclusions. This simulation study clearly demonstrate the superior performance of the optimal Robbins- Monro procedure over the Robbins-Monro procedure. The optimal Robbins-Monro proce- 10

dure did a good job even for the estimation of extreme quantiles. The procedure appears to be robust against the model assumptions as it performed well for a wide range of distributions. ACKNOWLEDGEMENTS I would like to thank the Editor, two anonymous referees and Professor C. F. Jeff Wu for their valuable comments and suggestions. APPENDIX Proofs Proof of Theorem 1 : We have that E(Z 2 n+1 Z n ) = Z 2 n 2a n Z n {M(Z n ) b n } + a 2 ne{(y n b n ) 2 Z n }. Let τ 2 n = E(Z 2 n), d n = E[Z n {M(Z n ) b n }] and e n = E{(y n b n ) 2 }. Then τ 2 n+1 = τ 2 n 2a n d n + a 2 ne n, (A1) which gives τn+1 2 = τ1 2 2 a j d j + a 2 je j. (A2) Since b n (0, 1) and y n = 0 or 1, we have that 0 < e n < 1 for all n. Thus 0 < n a 2 je j < n a 2 j. Therefore by condition (2) the positive-term series n a 2 je j converges. We have that a j d j = a j E[Z j {M(Z j ) b j }] = a j E[Z j {M(Z j ) α}] + a j (α b j )E(Z j ). 11

Since M is a distribution function and Ṁ(0) > 0, there exist δ and δ such that 0 < δ M(z) α/z δ < for all z. This implies that E[Z j {M(Z j ) α}] δe(z 2 j ) = δτ 2 j. Thus a j d j δ a j τj 2 + a j (α b j )E(Z j ). Consider the magnitude of the second term on the right-hand side: j 1 a j (α b j )E(Z j ) a j α b j E(Z j ) < a j α b j a i, j=2 because Z n = Z 1 n 1 a j (y j b j ) E(Z n ) < n 1 a j. Also, since τ 2 n+1 0 for all n, from () we obtain n a j d j (τ 2 1 + n a 2 je j )/2. Thus i=1 δ a j τj 2 a j d j a j (α b j )E(Z j ) < 1 j 1 2 (τ 1 2 + a 2 je j ) + a j α b j a i. j=2 i=1 The right-hand side converges by condition (3) and therefore the positive-term series n a j τ 2 j converges. Now a j d j a j E[Z j {M(Z j ) α}] + a j α b j E(Z j ) n < δ j 1 a j τj 2 + a j α b j a i, j=2 i=1 Since the right-hand side converges, the series n a j d j converges absolutely and hence the series converges. Thus, from (A2), lim τ 2 n exists and is equal to τ 2 1 2 a j d j + a 2 je j. However, since 0 n=1 a n τ 2 n < and n=1 a n =, this limit must be 0 and therefore Z n 0 in probability. 12

Proof of Proposition 1 : From (9), we have ν n+1 = ν n ν2 n I{ Φ 1 (α) }, (A3) 1 + ν n (1 + ν n ) 1 2 where I(u) = φ 2 (u)/[φ(u){1 Φ(u)}] is the Fisher information of u in binary data with probability of success equal to Φ(u). It is well known that 0 < I(u) 2/π. Thus ν n+1 < ν n and ν n+1 ν n (1 2/π) ν 1 (1 2/π) n > 0 for all n. Hence the sequence {ν n } converges. Let h(ν n ) be the right-hand side of (A3). Then lim ν n satisfies the equation ν = h(ν), for which 0 is a unique solution. Thus, from (9), we obtain b n α and c n 0. Proof of Proposition 2 : Let n = 1/I{Φ 1 (α)} and ν = 1/I 2 {Φ 1 (α)}. We have that ḣ(0) = 1. Since ḣ(ν) is a continuous function in ν, there exists a ν (0, ν ] such that ḣ(ν) > 0 for all ν ν. Also, since ν n 0, there exists an n such that ν n < ν for all n n. Let ñ = max{ n, n, ν / ν } and ν = ñ ν, where x is the smallest integer greater than or equal to x. Thus νñ ν n < ν = ν /ñ. Suppose that ν n ν /n for some n ñ. Since ν ν > 1, we have for n ñ that Φ 1 (α) I{ (1 + ν /n) 1 2 } I{Φ 1 (α)} = n + ν (n + 1)ν n + ν (n + 1)ν n + ν (n + 1)ν. Since ν n ν /n ν /ñ = ν and h(ν) is increasing in ν for all ν ν, from (A3) we obtain ν n+1 ν ν [1 n n + ν I{ Φ 1 (α) (1 + ν /n) 1 2 Then by mathematical induction ν n ν /n for all n ñ. }] ν n + 1. Let ν = min{1, ñνñ}. Then νñ ν /ñ. Suppose ν n ν /n for some n ñ. Since ν 1, we have that I{Φ 1 (α)/(1 + ν /n) 1 2 } < 1 (n + ν )/((n + 1)ν ). Since ν /n ν n 13

ν /n ν, from (A3) we obtain ν n+1 ν n [1 ν Φ 1 (α) I{ n + ν (1 + ν /n) 1 2 Then by mathematical induction ν n ν /n for all n ñ. We have for all n ñ that }] ν n + 1. a n = and a n ν n φ{φ 1 (α)/(1 + ν n ) 1 2 } β(1 + ν n ) 1 2 Φ{Φ 1 (α)/(1 + ν n ) 1 2 }[1 Φ{Φ 1 (α)/(1 + ν n ) 1 2 }] ν 1/ (2π) β(n 2 + nν ) 1 2 α(1 α) ν, (2π)α(1 α)βn ν φ{φ 1 (α)} β(n 2 + nν ) 1 2 1/2(1 1/2) 4ν φ(φ 1 (α)). β(n + 1) Thus {a n } satisfies (2). Also n 1 a j = O(log n) as n. Using Taylor series expansion we obtain b n = Φ{ Φ 1 (α) (1 + ν n ) 1 2 Φ 1 (α) } = α ν n φ{φ 1 (α)} + O(ν 2 2 n). Thus b n α = O(1/n), which implies that a n b n α n 1 a j = O(log n/n 2 ) and therefore (3) holds. Thus, by Theorem 1, Z n 0, in probability. REFERENCES Chung, K. L. (1954). On a stochastic approximation method. Ann. Math. Statist. 25, 463-83. Cochran, W. G. & Davis, M. (1965). The Robbins-Monro method for estimating the median lethal dose. J. R. Statist. Soc. B 27, 28-44. 14

Hodges, J. L. & Lehmann, E. L. (1956). Two approximations to the Robbins-Monro process. Proc. Third Berkeley Symp. 1, Ed. J. Neyman, 39-55. Berkeley, CA: University of California. Hu, I. (1997). Strong consistency in stochastic regression models via posterior covariance Matrices. Biometrika 84, 744-9. Hu, I. (1998). On sequential designs in non-linear problems. Biometrika 85, 496-503. Joseph, V. R. & Wu, C. F. J. (2002). Operating window experiments: a novel approach to quality improvement. J. Qual. Technol. 34, 345-54. Kalish, L. A. (1990). Efficient design for estimation of median lethal dose and quantal dose-response curves. Biometrics 46, 737-48. Lai, T. L. & Robbins, H. (1979). Adaptive design and stochastic approximation Ann. Statist. 7, 1196-221. McLiesh, D. L. & Tosh, D. (1990). Sequential designs in bioassay. Biometrics 46, 103-16. Neyer, B. T. (1994). D-optimality-based sensitivity test. Technometrics 36, 61-70. Robbins, H. & Monro, S. (1951). A stochastic approximation method. Ann. Math. Statist. 29, 373-405. Sacks, J. (1958). Asymptotic distribution of stochastic approximation procedures. Ann. Math. Statist. 29, 373-405. 15

Sitter, R. R. & Wu, C. F. J. (1999). Two-stage design of quantal response studies. Biometrics 55, 396-402. Wetherill, G. B. (1963). Sequential estimation of quantal response curves (with Discussion). J. R. Statist. Soc. B 25, 1-48. Wetherill, G. B. & Glazebrook, K. D. (1986). Sequential Methods in Statistics. London: Chapman and Hall. Wu, C. F. J. (1985). Efficient sequential designs with binary data. J. Am. Statist. Assoc. 80, 974-84. Young, L. J. & Easterling, R. G. (1994). Estimation of extreme quantiles based on sensitivity tests: a comparative study. Technometrics 36, 48-60. 16

(a) Normal distribution (b) Uniform distribution mse 0.2 0.4 0.6 0.8 1.0 mse 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 α 0.2 0.4 0.6 0.8 α (c) Logistic distribution (d) Extreme value distribution mse 0.2 0.4 0.6 0.8 1.0 mse 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 α 0.2 0.4 0.6 0.8 α (e) Skewed logistic distribution (f) Cauchy distribution mse 0.2 0.4 0.6 0.8 1.0 mse 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 α 0.2 0.4 0.6 0.8 α Figure 1: Simulation study. Mean squared error of estimator of θ against α, for the six models, using Robbins-Monro, dashed, and optimal Robbins-Monro, solid. 17

(a) (b) 0.0 0.2 0.4 0.6 0.8 1.0 α = 0.5 0.0 0.2 0.4 0.6 0.8 1.0 α = 0.35 8 6 4 2 0 2 8 6 4 2 0 2 x 21 x 21 (c) (d) 0.0 0.2 0.4 0.6 0.8 1.0 α = 0.2 0.0 0.2 0.4 0.6 0.8 1.0 α = 0.1 8 6 4 2 0 2 8 6 4 2 0 2 x 21 x 21 Figure 2: Simulation study. Density curves of the estimates of θ for different α with M = logistic distribution and θ = 0, using Robbins-Monro, dashed, and optimal Robbins-Monro, solid. (a) α = 0.5, (b) α = 0.35, (c) α = 0.2, (d) α = 0.1. 18