ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS. Abstract. We introduce a wide class of asymmetric loss functions and show how to obtain

Similar documents
460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses

A Proof of the EOQ Formula Using Quasi-Variational. Inequalities. March 19, Abstract

Chapter 5. Measurable Functions

Minimax and -minimax estimation for the Poisson distribution under LINEX loss when the parameter space is restricted

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Stochastic dominance with imprecise information

EMPIRICAL BAYES ESTIMATION OF THE TRUNCATION PARAMETER WITH LINEX LOSS

Sociedad de Estadística e Investigación Operativa

Gamma-admissibility of generalized Bayes estimators under LINEX loss function in a non-regular family of distributions

LEBESGUE INTEGRATION. Introduction

2 ANDREW L. RUKHIN We accept here the usual in the change-point analysis convention that, when the minimizer in ) is not determined uniquely, the smal

Key Words: binomial parameter n; Bayes estimators; admissibility; Blyth s method; squared error loss.

3 Measurable Functions

CONCENTRATION FUNCTION AND SENSITIVITY TO THE PRIOR. Consiglio Nazionale delle Ricerche. Institute of Statistics and Decision Sciences

The best expert versus the smartest algorithm

Entrance Exam, Real Analysis September 1, 2017 Solve exactly 6 out of the 8 problems

c Birkhauser Verlag, Basel 1997 GAFA Geometric And Functional Analysis

Regression and Statistical Inference

The Uniformity Principle: A New Tool for. Probabilistic Robustness Analysis. B. R. Barmish and C. M. Lagoa. further discussion.

University of Toronto

1 More concise proof of part (a) of the monotone convergence theorem.

NEW SIGNS OF ISOSCELES TRIANGLES

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

COSTLY STATE VERIFICATION - DRAFT

Some Notes On Rissanen's Stochastic Complexity Guoqi Qian zx and Hans R. Kunsch Seminar fur Statistik ETH Zentrum CH-8092 Zurich, Switzerland November

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

Applied Mathematics Letters

Approximation Algorithms for Maximum. Coverage and Max Cut with Given Sizes of. Parts? A. A. Ageev and M. I. Sviridenko

Buered Probability of Exceedance: Mathematical Properties and Optimization

COMPARISON OF RELATIVE RISK FUNCTIONS OF THE RAYLEIGH DISTRIBUTION UNDER TYPE-II CENSORED SAMPLES: BAYESIAN APPROACH *

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

Analysis III. Exam 1

Mathematical Methods for Physics and Engineering

2.2 Some Consequences of the Completeness Axiom

Winter Lecture 10. Convexity and Concavity

SUPPLEMENTARY MATERIAL TO IRONING WITHOUT CONTROL

The Great Wall of David Shin

PROPERTY OF HALF{SPACES. July 25, Abstract

Fixed Term Employment Contracts. in an Equilibrium Search Model

Bounds on expectation of order statistics from a nite population

On reaching head-to-tail ratios for balanced and unbalanced coins

Topological properties

Stopped Decision Processes. in conjunction with General Utility. Yoshinobu KADOTA. Dep. of Math., Wakayama Univ., Wakayama 640, Japan

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

only nite eigenvalues. This is an extension of earlier results from [2]. Then we concentrate on the Riccati equation appearing in H 2 and linear quadr

A characterization of consistency of model weights given partial information in normal linear models

Thus f is continuous at x 0. Matthew Straughn Math 402 Homework 6

Towards Decision Making under General Uncertainty

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

The Best Circulant Preconditioners for Hermitian Toeplitz Systems II: The Multiple-Zero Case Raymond H. Chan Michael K. Ng y Andy M. Yip z Abstract In

A PRIMER OF LEBESGUE INTEGRATION WITH A VIEW TO THE LEBESGUE-RADON-NIKODYM THEOREM

An Analysis of Record Statistics based on an Exponentiated Gumbel Model

The Inclusion Exclusion Principle and Its More General Version

Economics Noncooperative Game Theory Lectures 3. October 15, 1997 Lecture 3

L p Spaces and Convexity

CONDITIONAL ACCEPTABILITY MAPPINGS AS BANACH-LATTICE VALUED MAPPINGS

Consistent semiparametric Bayesian inference about a location parameter

Chapter 4. Measurable Functions. 4.1 Measurable Functions

THE STONE-WEIERSTRASS THEOREM AND ITS APPLICATIONS TO L 2 SPACES

ON THE DIAMETER OF THE ATTRACTOR OF AN IFS Serge Dubuc Raouf Hamzaoui Abstract We investigate methods for the evaluation of the diameter of the attrac

58 Appendix 1 fundamental inconsistent equation (1) can be obtained as a linear combination of the two equations in (2). This clearly implies that the

Werner Romisch. Humboldt University Berlin. Abstract. Perturbations of convex chance constrained stochastic programs are considered the underlying


STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin.

Virtual Robust Implementation and Strategic Revealed Preference

Novel Approach to Analysis of Nonlinear Recursions. 1 Department of Physics, Bar-Ilan University, Ramat-Gan, ISRAEL

MEASURE AND CATEGORY { Department of Mathematics, University of Tennessee, Chattanooga,

Pointwise convergence rate for nonlinear conservation. Eitan Tadmor and Tao Tang

The iterative convex minorant algorithm for nonparametric estimation

Real Analysis Notes. Thomas Goller

ECONOMETRIC APPLICATIONS OF MAXMIN EXPECTED UTILITY 1. INTRODUCTION Consider an individual making a portfolio choice at date T involving two assets. T

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

The Heine-Borel and Arzela-Ascoli Theorems

Bayesian statistics: Inference and decision theory

Analog Neural Nets with Gaussian or other Common. Noise Distributions cannot Recognize Arbitrary. Regular Languages.

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS

3.1 Basic properties of real numbers - continuation Inmum and supremum of a set of real numbers

Numerical Integration for Multivariable. October Abstract. We consider the numerical integration of functions with point singularities over

Congurations of periodic orbits for equations with delayed positive feedback

LEBESGUE MEASURE AND L2 SPACE. Contents 1. Measure Spaces 1 2. Lebesgue Integration 2 3. L 2 Space 4 Acknowledgments 9 References 9

ON THE EQUIVALENCE OF CONGLOMERABILITY AND DISINTEGRABILITY FOR UNBOUNDED RANDOM VARIABLES


MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Density estimators for the convolution of discrete and continuous random variables

Citation for published version (APA): van der Vlerk, M. H. (1995). Stochastic programming with integer recourse [Groningen]: University of Groningen

Convex Analysis and Optimization Chapter 2 Solutions

Two-Step Iteration Scheme for Nonexpansive Mappings in Banach Space

Convex Sets and Functions (part III)

SYMMETRY IN REARRANGEMENT OPTIMIZATION PROBLEMS

Change-point models and performance measures for sequential change detection

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Picking Winners in Rounds of Elimination 1

ON THE ARITHMETIC-GEOMETRIC MEAN INEQUALITY AND ITS RELATIONSHIP TO LINEAR PROGRAMMING, BAHMAN KALANTARI

In Chapter 2, some concepts from the robustness literature were introduced. An important concept was the inuence function. In the present chapter, the

1. Introduction The nonlinear complementarity problem (NCP) is to nd a point x 2 IR n such that hx; F (x)i = ; x 2 IR n + ; F (x) 2 IRn + ; where F is

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding

Transcription:

ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS FUNCTIONS Michael Baron Received: Abstract We introduce a wide class of asymmetric loss functions and show how to obtain asymmetric-type optimal decision rules from standard and commonly used Bayesian procedures. Important properties of minimum risk estimators are established. In particular, we discuss their sensitivity to the magnitude of asymmetry of the loss function. Introduction Consider a statistical decision problem, where underestimating the parameter is, say, cheaper or less harmful than overestimating it by the same amount. Problems of this nature appear in actuarial science, marketing, pharmaceutical industry, quality control, change-point analysis, child's IQ estimation, estimation of water levels for dam constructions, and other situations ([], [2], [3], AMS 99 subject classications: 62A5, 62C0. Key words and phrases: loss function, prior distribution, minimum risk estimator, xed point, range of posterior expectation.

2 BARON section 4.4, [6], section 3.3, [8], [9], [0], []). For example, an underestimated premium leads to some nancial loss, whereas an overestimated premium may lead to a loss of a contract. For the sake of simplicity, it is still common in practical applications to use standard Bayesian decision rules like the posterior mean or median, even in asymmetric situations. However, these procedures do not reect the dierence in losses. It is recognized that some type of correction should be introduced to take account of the asymmetry. In this paper we introduce a wide class of asymmetric loss functions (). The corresponding asymmetric-type minimum risk decision rules can then be obtained as xed points of standard estimators (Theorem ). A simple alternative is to use linear or quadratic approximations (4) and (5) for situations when over- and underestimation costs are dierent, but compatible. In this case, a standard Bayes estimator should be corrected by a term proportional to the posterior mean absolute deviation. Suppose that a continuous or discrete parameter 2 is to be estimated. Here is either a connected subset of R or a possibly innite collection of points f: : : < 0 < < : : :g. Let be a probability measure on, which may be realized as the prior distribution, or the posterior, after a sample x = fx ; : : : ; x n g has been observed. As an initial setting, we choose an arbitrary measure on and a loss function L(; ), where is a decision rule. The loss L may be one of the standard symmetric loss functions, however, symmetry is not required for the subsequent discussion.

BARON 3 Introduce an asymmetric loss function of the form W (; ) = K L(; )If g + K 2 L(; )If > g () for some positive K and K 2. It generalizes the linear loss proposed in [3], section 4.4.2, for estimating the child's IQ. Another popular asymmetric loss function is the linex loss ([0], []). Unlike the linex, () denes a wide family of loss functions, due to an arbitrary choice of L(; ), where costs of over- and underestimation are compatible. Without loss of generality we can study the case K K 2 only, because the other situation obtains by reparameterization 7!. Also we can assume that K =. Then the loss function W has the form W (; ) = 8 >< >: L(; ) if ; ( + )L(; ) if > for some 0, which denotes the additional relative cost of overestimation. Let be the minimum risk estimator of, = () = arg min Z W (; )d(): If overestimation is costly, one will tend to underestimate. (2) A way to express this is to consider a family of weighted measures f t g obtained from by increasing probabilities of small values of, t = 8 >< >: ( + )=( + R t d()) if < t; =( + R d()) if t (3) t: For every t let Z ~(t) = arg min L(; )d t ():

4 BARON In practice, is usually a posterior distribution (jx) under some prior (). One denes an altered prior t () similarly to (3), by assigning higher probabilities for smaller values of. Then t in (3) is the corresponding posterior. In the next section, we show that is the local and global minimum, and the xed point of (t). ~ The rest of the paper gives simple methods of obtaining () in practice and establishes robust properties of with respect to the choice of. Popular examples are discussed. 2 The xed point of ~ (t) We assume that L(; ) is a strictly convex nonnegative function of for any, with L(; ) = 0. Consider the risks r() and r t () associated with (W; ) and (L; t ) respectively. One has (t) ~ = arg min r t () for any t, and = arg min r(). Then ~(t) also minimizes t() = C(; t)r t () = Z Z L(; )d() + L(; )d(); <t and minimizes () = (). Under our assumptions t () and () are strictly convex functions of. The following lemma establishes an important property of ~ (t). Lemma The decision rule ~ (t) is a non-increasing function on ( ; T ) and a non-decreasing function on (T; +), where T = inffs : ~ (s) sg = supfs : ~(s) > sg.

BARON 5 Proof : For any s t one has t() = s () + Z s<t L(; )d(): (4) Hence, by convexity, the minimum of t () is attained between the points of minima of s () and R s<t L(; )d(). In other words, since the latter attains its minimum at some point between s and t, minfs; (s)g ~ (t) ~ maxft; (s)g: ~ (5) If (s) ~ s, then (5) results in (s) ~ (t) ~ t, from which fs : (s) ~ sg is a right half-line and (t) ~ is non-decreasing on it. Let T = inffs : (s) ~ sg. Then (t) ~ > t for all t < T, and for any s < t (5) implies (t) ~ (s). ~ Thus ~(t) is non-increasing on ( ; T ). 2 We now handle discrete and continuous cases separately. First, let be a discrete distribution on = f k g, where k are enumerated in increasing order. Then it follows from (4) that t () = k () for t 2 ( k ; k ], and therefore ~(t) = ( ~ k ). In particular, () = () = k (), when k < k. We show that () attains a local minimum at = T. Then, by convexity of, = T. Let k < T k for some k. Then T = supfs : (s) ~ > sg = supfs : ( ~ k ) > sg = ( ~ k ); (6) and hence k () has a local minimum at T. If T < k, then () also attains a local minimum at = T, because both functions coincide in some neighborhood of T. Otherwise, if T = k for some k, then T = inffs : (s) ~ sg = inffs : ( ~ k+ ) sg = ~ (k+ ):

6 BARON Thus, from (6), ( ~ k ) = ( ~ k+ ), and T is a point of minimum for both functions k and k+. Since k to the left of T and k+ to the right of it, and it is continuous, it follows that T is a point of minimum for (). Formula (6) implies that solves the equation ~(t) = t: (7) Moreover, is the unique root of (7). Indeed, suppose that (t) ~ = t for some t 6= T. By the denition of T, the only possibility is t > T. Let m < t m+, and take any s 2 (maxft; m g ; t). Then (s) ~ = (t) ~ = t > s, which contradicts that s > T. The fact that is the unique xed point of (t) ~ suggests a computational method of numerical evaluation of. We also note that from (5) with s = T one has T (t) ~ for any t > T. The same inequality obtains for t < T, because (T ~ ) = T for some > 0, and by Lemma ~ s (T ~ ) for any s < T. Hence, = min t (t). ~ Now turn to the case when is a continuous distribution on, having a density (). Then, from (4), one has s! t pointwise as s! t, and the following statement holds. Lemma 2 If is absolutely continuous with respect to Lebesgue measure, then ~(t) is a continuous function of t. The proof of Lemma 2 is given in the Appendix. By continuity, it follows again from Lemma that (T ~ ) = T = min t (t). ~ It remains to show that = T. Since t () increases in t pointwise, for any t > T (t) = t (t) > t ( (t)) ~ T ( (t)) ~ T (T ) = (T ):

BARON 7 Hence, > T is impossible. Suppose that < T. Choose < t < T and a sequence t j # t, for which (t j ) is bounded by some M. Then, since = arg min, and is convex, one has (t j ) > (t), and (t j ) t (t j ) > t (t) t (t j ): (8) Consider both sides of (8). From (4), (t j ) t (t j ) = Z tj t L(; t j )()d M(x)L(t; t j )(t j t) = o(t j t); as j!, because L(t; t j )! 0. However, by convexity of t, t(t) t (t j ) m(t j t); where m = t (t) t ( ~ (t)) ~(t) t > 0: Now we have a contradiction to (8). Thus we have proved the following theorem. Theorem The minimum risk estimator of under the loss function W and the prior is the unique root of the equation (t) ~ = t. Also, it equals min t (t). ~ As an illustration, we consider the problem of estimating the binomial parameter, when one variable X from the binomial (5; ) distribution is observed, and has a uniform prior distribution. For 2 f; 6g and all possible values of X, the graphs of (t) ~ are depicted on Figure. Their shapes are in accordance with Lemma. The dotted line represents the graph (t) ~ = t. By Theorem, () coincides with the intersection points for dierent values of X. Obviously, higher values of imply a higher penalty for overestimation, which leads to lower values of (t) ~ and.

8 BARON ~(t) = (t) ~ = 6 0.8 X=5 X=4 0.6 X=3 0.4 X=2 0.2 X= X=0 0 0 0.2 0.4 0.6 0.8 t 0.8 0.6 0.4 0.2 X=5 X=4 X=3 X=2 X= X=0 0 0 0.2 0.4 0.6 0.8 t Figure. Behavior of ~ (t) and in the binomial case. 2. The range of decision rules Methods of Bayesian global posterior robustness suggest giving an interval of decision rules corresponding to a family of priors, rather than specifying one optimal procedure for one given prior ([4], [5]). One considers an interval " # D = inf ; sup 2 2 of the minimum risk decision rules corresponding to all priors of. In asymmetric problems with a xed, one considers the family of prior distributions = f t g for all values of t, as dened in (3). A direct application of Lemma and Theorem gives the form of D, D = [ (); (0)] ; (9) whereas Lemma 4 below bounds the length of D for the case of the squarederror loss.

BARON 9 Further, one can consider an extended family of priors = f t; g for all t 2 and 2 [ ; 2 ], 0 < < 2. Then the range of the posterior expectation is D = [ ( 2 ); (0)] ; (0) independent of. If a sample from f(xj) is available, and are Bayes rules under the prior distributions 2, their ranges are still given by (9) and (0). 3 Absolute error loss In order to have a unique minimizer, we required that L(; ) be a strictly convex function of. However, if L(; ) = j j, all the minimum risk estimators with respect to the loss function W are still the solutions of (7). Indeed, if L is the absolute error loss, then ~ (t) is any median of t, and (7) is equivalent to a system of two inequalities, Z t d t () = Z t ( + )d() ( + ) R t d() + R t d() 2 and a similar inequality for R t d t (). Any =( + 2)-quantile of solves this system. According to [3], section 4.4, the Bayes rule has the same form. 4 Squared error loss and weighted losses When L(; ) = ( ) 2, it is easy to check that ~(t) = E t () = + E (I <t ) + P f < tg : ()

0 BARON Here and later I A = if A holds; = 0 otherwise, and = E (). Note that E (I <t ) = E f j < tgp f < tg. Hence, (t) ~ is a convex linear combination of = ( ) ~ and E ( j < t). Then, according to () and Theorem, solves the equation E f t) + E ( t)i <t = 0; (2) which is equivalent to E ( t) + = ( + )E ( t) ; where a + = maxfa; 0g, and a = maxf a; 0g. In general, for any loss function of the form L(; ) = j j ; > ; solves the equation E [( t) + ] = ( + )E [( t) ] : The case = yields the =(+2)-quantile mentioned in the previous section. Generalization to the case of weighted squared error loss functions L(; ) = w()( ) 2 is straightforward. The weights w() usually allow larger errors for larger true values of the parameter. It is equivalent to the decision problem with the unweighted loss, if the distribution t (d) is replaced by 0 t(d) = 8 >< >: ( + )()w() E w() + R if < t; t w()d() : ()w() E w() + R if t: t w()d() Any weighted loss function can be considered similarly, as long as the weight function is -integrable.

BARON 4. Sensitivity with respect to, and correction of standard decision rules When the choice of, the relative additional cost of overestimation, is not obvious, it is natural to investigate the robustness of with respect to dierent. The following results concern the rate of change of = () for! 0 and! +. Lemma 3 Let L(; ) be the squared error loss. Then d () d =0 = Cov( ; I < ) = 2 E j j: (3) Lemma 3 shows that j () (0)j has the linear order in as! 0. As # 0, one has () = 2 E j j + O( 2 ): (4) In applications, this determines a simple strategy for a decision maker. Every time when overestimation is slightly more costly than underestimation, the standard Bayes estimator (0) = is to be corrected by a term proportional to the mean absolute deviation of. Suppose, for example, that the loss caused by overestimating the parameter is 0% higher than the loss caused by underestimating it by the same amount. Then the corrected decision rule (0:) = (0:05)E j j should be used instead of the standard posterior mean = E (). Dierentiating once more, one obtains, d 2 () d 2 =0 = E j jp f < g :

2 BARON This provides a more accurate correction of the standard decision rule, () = E j j 2 2 P f < g + O( 3 ) The next result bounds all possible deviations of.! ; as! 0: (5) Lemma 4 One has j () (0)j () 2 p + ; (6) where () denotes the standard deviation of under the distribution. Simple calculation shows that if consists of only two points, then (t) = everywhere between them. In this case, equality holds in (6). According to Lemma 4, when is unbounded, () deviates from (0) with the rate O( p ), as! +. 4.2 Examples Bayesian estimation of the normal mean We consider Bayesian estimation of a normal mean. Let X ; : : : ; X n be a sample from the normal distribution with E X = ; V ar(x) = 2. Suppose that 2 is known, and, the parameter of interest, has a prior distribution (), which is normal(; 2 ). The corresponding posterior = (jx) is normal( x ; x 2 ), where x = n X= 2 + = 2 n= 2 + = 2 and 2 x = n= 2 + = 2 (eg. [3], [7]). Then z = ( x )= x follows the standard normal distribution under (jx). Let (u) and (u) denote standard normal pdf and cdf

BARON 3 respectively. Note that E zi z<u = (u). From (), ~(t) = x + E( x + x z)i x+ xz<t + P f x + x z < tg = x x(u) + (u) ; where u = (t x )= x. Then the xed-point equation (7) is equivalent to (u) + u(u) + u = 0: (7) Hence, the minimum asymmetric risk decision rule is given by where u() is a solution of (7). According to (8), d d particular, d d () = x + x u(); (8) = xu 0 (), which is independent of X ; : : : ; X n. In q n 2 + = x (0) = p =0 2 and the same formula can be obtained by Lemma 3. By (5), () x x ( )= p 2 for small, and the correction term is the same regardless of the observed sample. 2 ; Asymmetric least squares In the no-intercept linear regression model y i = x i + i ; let ^ be selected to minimize (b) = X (y i x i b) 2 + X (y i x i b) 2 I yi <x i b = X x 2 i (r i b) 2 + X x 2 i (r i b) 2 I ri <b; (9)

4 BARON with r i = y i =x i. Additional weights assigned to squared residuals reect the higher cost of overestimating and overpredicting y. Suppose that ratios r i are enumerated in their increasing order, r : : : r n. For every k = ; : : : ; n, consider k(b) = nx i= This function is minimized at ^(k) = x 2 i (r i b) 2 + P n x2 i r i + P k x2 i r i P n x 2 i + P k x 2 i where the distribution puts masses 8 >< >: = kx i= Cx 2 i at r i ; i > k; C( + )x 2 i at r i ; i k: x 2 i (r i b) 2 ; R rd (r) R d (r) = E (r); Thus ^(k) has the Bayes-type form of ~ (k), and according to Theorem, the asymmetric least squares solution ^ = can be found as the xed point of ^(k). Clearly, the case = 0 gives the OLS solution. Change-point estimation In a classical change-point problem, a sample of independent variables x = (X ; : : : ; X n ) is such that the distribution of X j depends on whether j < or j. One needs to estimate the change-point parameter. Since a changepoint has to be detected \as soon as possible", suppose that overestimation leads to a higher loss than underestimation. Further, assume a (continuous) uniform prior distribution of over the interval = [0; n]. If the pre-change and the post-change densities or probability mass functions f and g are mutually absolutely continuous, the standard

BARON 5 Bayes rule under the squared-error loss has the form = (0) = E(jx) = P n k= k(k) P n k= (k) ; (20) where (k) = f(x ) : : : f(x k )=g(x )= : : : =g(x k ) ([8]). Correction for an asymmetric loss is straightforward. According to (5), one obtains the optimal decision rule under the loss function (2), () = 0 @ P n (k)jk j + P []+ (k) + (fg 2 ) ([] + ) P n (k) 2 0 @ P [] 2 2 A (k) + fg([] + ) P n A + O(3 ); (k) as! 0, where is given by (20), [] and fg denote its integer and fractional parts, respectively (derivation is omitted). 5 Appendix Proof of Lemma 2. Suppose that there exist such > 0 and a monotone sequence t j! that j (t ~ j ) ()j ~ > for all j. Let = min f ( () ~ ); ( () ~ + )g ( ()) ~ > 0: If t j ", then tj () " (), and tj ( () ~ ) > ( () ~ ) =2 for suciently large j. Then, since tj ( ~ ) ( ~ ) < ( () ~ ) =2, it follows from convexity of tj that it has a minimum on ( () ~ ; () ~ + ), which contradicts to j (t ~ j ) ()j ~ >. If t j #, then tj ( ()) ~ tj ( (t ~ j )) ( (t ~ j )) > ( ()) ~ +, and we have a contradiction: tj ( ()) ~ 6! ( ()). ~

6 BARON Hence, (t) ~ is continuous. Proof of Lemma 3. According to (2), Z ( )d() + Dierentiating implicitly, one obtains Z < ( )d() = 0: d () d = E ( ) + P f < g : Notice that = 0 corresponds to the symmetric square loss W L, hence (0) =. Also, E ( ) = E (I < ) E I < = Cov( ; I < ); (2) and the rst equality in (3) follows. Next, observe that E( )I > + E( )I < = E( ) = 0; (22) and E( )I > E( )I < = Ej j: (23) Subtracting (23) from (22), one completes the proof. Proof of Lemma 4. For any t let (t) denote a correlation coecient between and I <t under. Then, similarly to (2), one obtains from () ~(t) (0) = Cov(; I <t) + P f < tg = (t) () q P f < tg ( P f < tg) : + P f < tg

BARON 7 Clearly, ~ (t) (0). Then, by Theorem, () maximizes j ~ (t) (0)j over all t, and j () (0)j () max 0p q p( p) + p = () 2 p + : Acknowledgment. The author is grateful to Professors A. L. Rukhin, R. J. Sering, and K. Ostaszewski for their valuable comments to earlier versions of this paper. References [] J. Aitchison and I. R. Dunsmore. Statistical Prediction Analysis. Cambridge University Press, London, 975. [2] M. Basseville and I. V. Nikiforov. Detection of Abrupt Changes: Theory and Application. PTR Prentice-Hall, Inc., 993. [3] J. O. Berger. Statistical Decision Theory. Springer{Verlag, New York, 985. [4] J. O. Berger. Robust Bayesian analysis: sensitivity to the prior. J. Stat. Plann. Inf., 25:303{328, 990. [5] L. DeRobertis and J. A. Hartigan. Bayesian inference using intervals of measures. Ann. Statist., 9:235{244, 98. [6] R. V. Hogg and S. A. Klugman. Loss distributions. Wiley, New York, 984.

8 BARON [7] E. L. Lehmann. Theory of Point Estimation. Wiley, New York, 983. [8] A. L. Rukhin. Change-point estimation under asymmetric loss. Statistics&Decisions, 5:4{63, 995. [9] J. Shao and S.-C. Chow. Constructing release targets for drug products: a bayesian decision theory approach. Appl. Statist., 40:38{390, 99. [0] H. R. Varian. A Bayesian approach to real estate assessment. In S. E. Fienberg, A. Zellner, eds., Studies in Bayesian Econometrics and Statistics, pages 95{208, Amsterdam: North-Holland, 975. [] A. Zellner. Bayesian estimation and prediction using asymmetric loss functions. J. Amer. Stat. Assoc., 8:446{45, 986. Michael Baron Programs in Mathematical Sciences University of Texas at Dallas Richardson, TX 75083-0688