OPTIMIZATION UNDER UNCERTAINTY A unified framework (Draft)

Size: px

Start display at page:

Download "OPTIMIZATION UNDER UNCERTAINTY A unified framework (Draft)"

Aubrey Parrish
5 years ago
Views:

1 OPTIMIZATION UNDER UNCERTAINTY A unified framework (Draft) Warren B. Powell September 15, 2016 A JOHN WILEY & SONS, INC., PUBLICATION

2 Copyright c 2016 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) , fax (978) , or on the web at Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) , fax (201) Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created ore extended by sales representatives or written sales materials. The advice and strategies contained herin may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department with the U.S. at , outside the U.S. at or fax Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress Cataloging-in-Publication Data: Optimization Under Uncertainty: A unified framework Printed in the United States of America

5 CONTENTS 1 Decisions and Uncertainty Some sample problems Problem types State variables Types of decisions Types of uncertainty Models of system dynamics Objectives Staging of information and decisions Formulating a stochastic optimization problem A deterministic inventory problem The transition to a stochastic formulation Choosing inventory policies A generic formulation Solution strategies Designing policies for sequential decision problems Policy search Policies based on lookahead approximations Mixing and matching Pulling it all together 21 v

6 vi CONTENTS 1.6 Sequencing problem classes Bridging to statistics From deterministic to stochastic optimization Pedagogy Bibliographic notes 27 2 Canonical Problems The basic stochastic optimization problem Final reward formulation Cumulative reward Inventory/storage problems The multiarmed bandit problem Decision trees Online computation Two-stage stochastic programming Chance constrained problems Shortest paths A deterministic shortest path problem A stochastic shortest path problem Optimal stopping Markov decision processes Reinforcement learning Optimal control Multi-stage stochastic programming The nomadic trucker Statistics and Machine learning A simple modeling framework for dynamic programs Bibliographic notes 50 3 Approximation Strategies Lookup tables with frequentist updating Lookup tables with Bayesian updating The updating equations for independent beliefs Updating for correlated beliefs Lookup tables and aggregation Hierarchical aggregation Computing bias and variance Modeling aggregation Combining multiple levels of aggregation Linear parametric models Linear regression review 74

7 CONTENTS vii Regression variations Sparse additive models and Lasso Nonlinear parametric models Maximum likelihood estimation Sampled nonlinear models Neural networks - I Nonparametric models K-nearest neighbor Kernel regression Local polynomial regression Neural networks - II Support vector machines Indexed functions, tree structures and clustering Approximations and the curse of dimensionality Why does it work?** Correlations in hierarchical estimation Proof of Proposition Bibliographic notes 95 Problems 97 4 Introduction to stochastic optimization An overview of learning problems Deterministic methods A stochastic shortest path problem A newsvendor problem with known distribution Chance constrained optimization Optimal control Discrete Markov decision processes Remarks Sampled models Formulating a sampled model Benders decomposition for a sampled convex problem Convergence Decomposition strategies Creating a sampled model Adaptive learning algorithms Modeling adaptive learning problems Objective functions for learning Designing policies Closing remarks Bibliographic notes 121

8 viii CONTENTS 5 Derivative-Based Stochastic Search Some sample applications Stochastic gradient methods A stochastic gradient algorithm - asymptotic analysis A note on notation A finite horizon formulation Stepsizes Finite differences Derivatives of simulations Transient problems Recursive Benders decomposition The basic algorithm Benders with regularization Properties of algorithms Theoretical properties Empirical issues Why does it work?** Some probabilistic preliminaries An older proof A more modern proof Bibliographic notes Adaptive estimation and stepsize policies Deterministic stepsize recipes Properties for convergence Stochastic stepsizes The case for stochastic stepsizes Convergence conditions Recipes for stochastic stepsizes Experimental notes Optimal stepsizes for nonstationary time series Optimal stepsizes for stationary data Optimal stepsizes for nonstationary data - I Optimal stepsizes for nonstationary data - II Optimal stepsizes for approximate value iteration Convergence Guidelines for choosing stepsize formulas Bibliographic notes 170 Problems Derivative-Free Stochastic Search 175

9 CONTENTS ix 7.1 Examples Lookup table belief models Frequentist belief model Bayesian belief model Frequentist or Bayesian? Objective functions for learning policies Policies Policy function approximations Cost function approximations Policies based on value function approximations Single period lookahead policies Multiperiod lookahead policies Hybrid policies Properties of policies Learning and the multiarmed bandit problem Gittins indices for learning with cumulative rewards Foundations Basic theory of Gittins indices Gittins indices for normally distributed rewards Comments Value of information policies The belief model The knowledge gradient for offline (final reward) learning Knowledge gradient for correlated beliefs Linear belief models Nonlinear belief models Concavity of information The knowledge gradient for online (cumulative reward) learning Some properties of the knowledge gradient Tuning policies The effect of units MOLTE - Optimal learning testing system Stochastic optimization with exogenous state information Learning on continuous decisions Physical states vs. belief states Bibliographic notes 218 Problems Physical state applications Deterministic problems A learning exercise: the nomadic trucker 224

10 x CONTENTS The shortest path problem The discrete budgeting problem The continuous budgeting problem Stochastic problems Decision trees A stochastic shortest path problem The gambling problem Asset valuation The asset acquisition problem - I The asset acquisition problem - II The lagged asset acquisition problem The batch replenishment problem The transformer replacement problem The dynamic assignment problem Information acquisition problems An information-collecting shortest path problem Bibliographic notes 246 Problems Modeling dynamic programs Notational style Modeling time The states of our system Defining the state variable The three states of our system The initial state S The post-decision state variable Partially observable states* Latent variables Forecasts and the transition kernel Flat vs. factored state representations* A shortest path illustration Modeling decisions Decisions, actions, and controls Making decisions The exogenous information process Basic notation for information processes Outcomes and scenarios Lagged information processes Models of information processes Supervisory processes* Policies in the information process* 278

11 CONTENTS xi 9.6 The transition function A general model Model-free dynamic programming The resource transition function Exogenous transitions The objective function The contribution function Random contributions The value of a policy Finding the best policy Risk-based and robust objective functions Illustration: An energy storage model Base models and lookahead models* Advanced probabilistic modeling concepts** A measure-theoretic view of information Conditional expectations for sequential decision problems** Bibliographic notes 299 Problems Modeling uncertainty Types of uncertainty Observational errors Prognostic uncertainty Experimental noise Transitional uncertainty Inferential (or diagnostic) uncertainty Model uncertainty Systematic exogenous uncertainty Control uncertainty Algorithmic instability Discussion Creating random processes Sample paths State/action dependent processes Types of distributions Monte Carlo simulation Generating uniform [0, 1] random variables Uniform and normal random variable Generating random variables from inverse cumulative distributions Inverse cumulative from quantile distributions Distributions with uncertain parameters 325

12 xii CONTENTS 10.5 Sampling vs. sampled models Iterative sampling: A stochastic gradient algorithm Static sampling: Solving a sampled model Sampled representation with Bayesian updating Efficient sampling Variance reduction methods Importance sampling Quantization methods Sampling in high dimensions Adversarial sampling Bibliographic notes Policies Classes of policies Policy function approximations Cost function approximations Value function approximations Lookahead policies Hybrid strategies Randomized policies Illustration: An energy storage model revisited Policy function approximation Cost function approximation I Value function approximation Deterministic lookahead Hybrid lookahead-cost function approximation Experimental testing How to choose a policy? Bibliographic notes 347 Problems Policy function approximations and policy search Classes of policy function approximations Boltzmann policies for discrete actions Affine policies Constraints Monotone policies Lookup table policies Nonparametric policies Locally parametric policies Policy search 355

13 CONTENTS xiii Online vs. offline learning Derivative-based Derivative-free - active search Derivative-free - passive search Introduction to policy search Derivative-free optimization The ranking and selection problem The frequentist approach The Bayesian view Bayesian updating with correlated beliefs Some exploration policies The knowledge gradient algorithm for discrete alternatives The basic idea Computation Simulation optimization An indifference zone algorithm Optimal computing budget allocation Online vs. offline objectives Closing remarks Bibliographic notes Cost function approximations Optimal myopic policies Bibliographic notes Discrete Markov decision processes The optimality equations Bellman s equations Computing the transition matrix Random contributions Bellman s equation using operator notation* Finite horizon problems Infinite horizon problems Value iteration A Gauss-Seidel variation Relative value iteration Bounds and rates of convergence Policy iteration Hybrid value-policy iteration Average reward dynamic programming The linear programming method for dynamic programs 395

14 xiv CONTENTS 14.9 Monotone policies* The model Submodularity and other stories From submodularity to monotonicity Why does it work?** The optimality equations Convergence of value iteration Monotonicity of value iteration Bounding the error from value iteration Randomized policies Optimality of monotone policies Bibliographic notes 416 Problems Dynamic programs with special structure Monotone dynamic programming Special cases with analytical solutions Linear-quadratic regulation Bibliographic notes Backward approximate dynamic programming Numerical approximation methods Linear models using sampled states Low rank approximations Bibliographic notes Forward ADP I: The value of a policy Sampling the value of a policy Direct policy evaluation for finite horizon problems Policy evaluation for infinite horizon problems Temporal difference updates TD(λ) TD(0) and approximate value iteration TD learning for infinite horizon problems Stochastic approximation methods Recursive least squares for linear models Recursive least squares for stationary data Recursive least squares for nonstationary data Recursive estimation using multiple observations Recursive time-series estimation* Bellman s equation using a linear model 445

15 CONTENTS xv A matrix-based derivation* A simulation-based implementation Least squares temporal differences (LSTD) Least squares policy evaluation (LSPE) Analysis of TD(0), LSTD and LSPE using a single state Recursive least squares and TD(0) LSPE LSTD Discussion Gradient-based methods for approximate value iteration* Least squares temporal differencing with kernel regression* Value function approximations based on Bayesian learning* Minimizing bias Lookup tables with correlated beliefs Parametric models Creating the prior Learning algorithms and stepsizes Least squares temporal differences Least squares policy evaluation Recursive least squares Bounding 1/n convergence for approximate value iteration Discussion Why does it work* Derivation of the recursive estimation equations The Sherman-Morrison updating formula Bibliographic notes 466 Problems Forward ADP II: Policy optimization Overview of algorithmic strategies Approximate value iteration and Q-learning using lookup tables Value iteration using a pre-decision state variable On-policy, off-policy and the exploration-exploitation problem Q-learning Value iteration using a post-decision state variable Value iteration using a backward pass Statistical bias in the max operator A stochastic shortest path problem Approximate value iteration using linear models Illustrations using regression models A geometric view of basis functions* 491

16 xvi CONTENTS 18.6 Approximate policy iteration Finite horizon problems using lookup tables Finite horizon problems using basis functions LSTD for infinite horizon problems using basis functions The actor-critic paradigm Policy gradient methods The linear programming method using basis functions Approximate policy iteration using kernel regression* Finite horizon approximations for steady-state applications Bibliographic notes Forward ADP III: Convex functions Piecewise linear approximations for scalar functions Multiple convex dimensions Benders decomposition Benders with exogenous state variable High-dimensional applications Values versus marginal values Linear approximations Solving a resource allocation problem using piecewise linear functions Regression methods Value iteration for multidimensional decision vectors Cutting planes for multidimensional functions Convexity with exogenous information state Why does it work?** The projection operation Bibliographic notes 526 Problems Lookahead policies Optimal policies using lookahead Strategies for approximating the lookahead model Deterministic lookahead A shortest path illustration Decision trees Stochastic lookahead models with discrete actions Lookahead using backward dynamic programming Monte Carlo tree search Stochastic lookahead with vector decisions Sparse sampling tree search Roll-out heuristics 538

17 CONTENTS xvii Rolling horizon procedures Discussion Bibliographic notes Risk and Robustness Bibliographic notes 545

18 PART I - INTRODUCTION We begin our journey by providing an overview of the diversity of optimization problems under uncertainty. These have been introduced over the years by a number of different communities, motivated by different applications. Many of these problems have motivated entire fields of research under names such as dynamic programming, stochastic programming, optimal control, stochastic search, ranking and selection, and multiarmed bandit problems. xviii

19 CHAPTER 1 DECISIONS AND UNCERTAINTY There are few problems that offer the richness and diversity of making decisions in the presence of uncertainty. While often presented under impenetrable names such as stochastic programming, stochastic control, and Markov decision processes, decision making under uncertainty is a universal experience, something every human has had to manage since our first experiments trying new foods when we were two years old. Some samples of everyday problems where we have to manage uncertainty include: Personal decisions - These might include how much to withdraw from an ATM machine, finding the best path to a new job, and deciding what time to leave to make an important appointment. Health decisions - Examples include joining a health club, getting annual checkups, having that mole checked, using dental floss, and scheduling a colonoscopy. Investment decisions - What mutual fund should you use? How should you allocate your investments? How much should you put away for retirement? Should you rent or purchase a house? Diet decisions - Protein or carbohydrates? Diet cola vs. orange juice? Eggs or cereal? Decisions under uncertainty span virtually every major field. Samples include Optimization Under Uncertainty. By Warren B. Powell Copyright c 2016 John Wiley & Sons, Inc. 1

20 2 DECISIONS AND UNCERTAINTY Business - What products to sell, with what features? Which supplies should you use? What price should you charge? How should we manage our fleet of delivery vehicles? Which menu attracts the most customers? Internet - What ads to display to maximize ad-clicks? Which movies attract the most attention? When/how should mass notices be sent? Engineering - How to design devices from aerosol cans to an electric vehicle, bridges to transportation systems, transistors to computers? Materials science - What combination of temperatures, pressures and concentrations should we use to create a material with the highest strength? Medical research - What molecular configuration will produce the drug which kills the most cancer cells? What set of steps are required to produce single-walled nanotubes? Economics - What interest rate should the Federal Reserve charge? What levels of market liquidity should be provided? What guidelines should be imposed on investment banks? Needless to say, listing every possible type of decision is an impossible task. However, we would argue that in the context of a particular problem, listing the decisions is easier than identifying all the sources of uncertainty. The diversity of problems is so broad, that one might ask: can all these problems be covered in a single book? There are two reasons for pursuing such an ambitious goal: A universal formulation - The entire range of problems suggested above can be modeled, with perhaps a few adjustments, using a single formulation. This formulation is quite general, and makes it possible to model a truly diverse range of problems with a high level of realism. Cross fertilization - Ideas developed from one problem class or discipline can be used to help solve problems traditionally associated with different areas. There has been a historical pattern to pick up the modeling styles and solution approaches used in the different books captured in figure 1.1. These fields include: Decision analysis - This community generally works with discrete actions, possibly discrete random outcomes, but often features complex utility functions and the handling of risk. Problems are relatively small. Stochastic search (derivative based) - This field is centered on the basic problem min x EF (x, W ) where x is continuous (scalar or vector), W is a random variable, and where the expectation typically cannot be computed. However, we assume we can compute gradients x F (x, W ) for a known W. Ranking and selection (derivative free) - This field is also focused on min x EF (x, W ), but now we assume that x can take on one of a finite set of outcomes {x 1, x 2,..., x M }. Simulation-optimization - This community evolved from within the setting of discrete event simulations, where we need to use a simulator (such as one of a manufacturing system) to compare different designs. The field of simulation optimization started

21 3 with the ranking and selection problem, but has evolved to span a wider range of problems. Online computation - This field describes methods where decisions are made which simply react to information as it comes in, without considering the impact of decisions now on the future. This field was originally motivated by mobile applications where energy and computing capacity was limited. Optimal control - The roots of this community are in engineering, focusing on the control of aircraft, spacecraft, and robots, but has expanded to economics and computer science. The original problems were written in continuous time with continuous controls, but is often written in discrete time (typically with discrete controls), since this is how computation is done. Problems are typically deterministic, possibly with uncertain parameters, and possibly with additive noise in the transition function, but this community has been widely adopted, especially in finance where problems are purely stochastic. Robust optimization - This is an extension of stochastic search with roots in engineering, where the goal is to find the best design x (of a building, an aircraft, a car) that works under the worst instance of a random variable W (which could be wind, temperature, crash conditions). Instead of min x EF (x, W ), robust optimization problems seek to solve min x max w F (x, w). For example, we might want to design a wing to handle the worst possible wind stresses. Optimal stopping - This is an important problem in finance, where we have to study when to stop and sell (or buy) an asset. It also arises in engineering when we have to decide when to stop and repair a piece of machinery. The problem is to find a time τ to sell or repair, where τ can only depend on the information available at that time. The problem is popular within the applied probability community. Markov decision processes - This community evolved primarily within applied probability, and describes a system that takes on a discrete state s, and transitions to s when we take (discrete) action a with (known) probability p(s s, a). Reinforcement learning - This field started by modeling animal behavior seeking to solve a problem (such as finding a path through a maze), where experiences (in the form of successes or failures) were captured by estimating the value of a state-action pair. Approximate dynamic programming - Several communities have developed ways of overcoming the curse of dimensionality inherent in the tools of discrete Markov decision processes by using simulation-based methods to solve dynamic programs. There are very close parallels between the fields of approximate dynamic programming (also known as adaptive dynamic programming) and reinforcement learning, although the motivating problems tend to be somewhat different, as are the research styles. Stochastic programming - This community evolved from math programming with the desire to insert random variables into linear programs. The classical problem is the two-stage problem where you pick x 0 (e.g. how many Christmas trees to plant), after which we learn random information W 1, and then we make a second decision x 1 (e.g. shipping Christmas trees to customers).

where we need to learn the largest value of a continuous

where f (x) might be the amount of oil or natural gas yield by

Multiarmed bandit problems - The roots of this problem come

discrete alternative (known as an arm in this community) that

problems, but it provides a sense of the different communities

problems (optimal control, stochastic programming), while

notational systems and mathematical styles.

the potential for cross-fertilization of ideas across

Indeed, as of this writing there persists a fair amount of

22 4 DECISIONS AND UNCERTAINTY Figure 1.1 A sampling of major books representing different fields in stochastic optimization. Sequential kriging - This community evolved within geosciences, where we need to learn the largest value of a continuous function f (x) through expensive experiments (originally field experiments). The vector x was originally a two-dimensional point in space, where f (x) might be the amount of oil or natural gas yield by drilling a hole at x. Multiarmed bandit problems - The roots of this problem come from applied probability, where the goal is to identify a discrete alternative (known as an arm in this community) that yields the highest reward, where we do not know the value of each arm. We learn the value through repeated experimentation, accumulating rewards as we progress. This is hardly a comprehensive list of stochastic optimization problems, but it provides a sense of the different communities that has been drawn into the arena of making decisions under uncertainty. Some of these communities have their roots in deterministic problems (optimal control, stochastic programming), while others have their roots in communities such as applied probability and simulation. A byproduct of this confluence of communities is a variety of notational systems and mathematical styles. This diversity of languages disguises common approaches developed in different communities. Less important than this process of re-invention of methods is the potential for cross-fertilization of ideas across communities. Indeed, as of this writing there persists a fair amount of competition between communities where proponents of one methodological approach will insist that their approach is better than another. This book is not intended to replace the much more thorough treatments in these books. Rather, our goal is to provide a unified framework that provides a more comprehensive perspective of these fields. We have found that a single problem can be reasonably approached by techniques from multiple fields such as dynamic programming (operations research), model predictive control (control theory) and policy search (computer science),

23 SOME SAMPLE PROBLEMS 5 where any one of these methods may work best, depending on the specific characteristics of the data. At the same time, powerful hybrid strategies can be created by combining the tools from different fields. 1.1 SOME SAMPLE PROBLEMS A few sample problems provides a hint into the major classes of decision problems that involve uncertainty. The newsvendor problem The newsvendor problem is one of the most widely used examples of stochastic optimization problems. Imagine that you have to decide on a quantity x of newspapers to purchase at a unit price c that will be placed in the bin the next day. You then sell up to a random demand D, charging a price p. Your profit F (x, D) is given by F (x, D) = p min{x, D} cx. We do not know the demand D when we decide x, so we have to find x that maximizes the expected profits, given by max E{p min{x, D} cx}. (1.1) x The newsvendor problem comes in many variations, which explains its continued popularity after decades of study. One of the most important variations depends on whether the distribution D is known (which allows us to solve (1.1) analytically) or unknown (or not easily computable). Prices and costs may be random (we may be purchasing energy from the power grid at highly random prices, storing it in a battery to be used later). Newsvendor problems can be formulated in two ways. Offline (final reward) formulation - The most classical formulation, stated in (1.1), is one where we find a single solution x which, when implemented, solves (1.1). In this setting, we are allowed to search for the best solution without worrying about how we arrived at x, which means we are only interested in the final reward (that is, the quality of the solution after we have finished our search). We can further divide this problem into two formulations: The asymptotic formulation - This is the formulation in (1.1) - we are looking for a single, deterministic solution x to solve (1.1). The finite time formulation - Imagine that we are allowed N samples of the function F (x, D) to find a solution x N, which will depend on both how we have gone about finding our solution, as well as the noisy observations we made along the way. Online (cumulative reward) formulation - Now imagine that we do not know the distribution of the demand D, but rather have to experiment by choosing x and then observing D, after which we can figure out how much money we made that day. The problem is that we have to maximize profits over the days while we are learning the best solution, which means we have to live with the profits while we are learning the right solution.

24 6 DECISIONS AND UNCERTAINTY A stochastic shortest path problem Imagine that we are trying to find the best path over an urban network. As a result of congestion, travel times on each link (i, j) joining nodes i and j may be random. We may assume that we know the cost from i to j as soon as we arrive at node i. We may assume the distribution of the cost c ij is known, or unknown. In addition it may be stationary (does not change with time) or nonstationary (reflecting either predictable congestion patterns, or random accidents). Optimizing medical decisions During a routine medical exam, a physician realizes that the patient has high blood sugar. Courses of treatment can include diet and exercise, or a popular drug called metformin that is known to reduce blood sugar. Information collected as part of the medical history can be used to guide this decision, since not every patient can handle metformin. The doctor will have to learn how this patient responds to a particular course of treatment, which is information that he can use to help guide his treatment of not only this patient, but others with similar characteristics. Pricing an electricity contract A utility has been asked to price a contract to sell electricity over a 5-year horizon (which is quite long). The utility can exploit 5-year contracts on fuels (coal, oil, natural gas), which provide a forecast (by financial markets) on the price of the fuels in the future. Fuel prices can be translated to the cost of producing electricity from different generators, each of which has a heat rate that translates energy input (typically measured in millions of BTUs) and electricity output (measured in megawatt-hours). We can predict the price of electricity by finding the intersection between the supply curve, constructed by sorting generators from lowest to highest cost, and the projected demand. Five years from now, there is uncertainty in both the prices of different fuels, as well as the demand for electricity (known as the load in the power community). Inverted pendulum problem This is one of the most classical problems in engineering control. Imagine you have a vertical rod hinged at the bottom on a cart that can move left and right. The rod tends to fall in whatever direction it is leaning, but this can be countered by moving the cart in the same direction to push the rod back up. The challenge is to control the cart in a way that the rod remains upright, ideally minimizing the energy to maintain the rod in its vertical position. These problems are typically low-dimensional (this problem has a one- or twodimensional controller, depending on how the dynamics are modeled), and deterministic, although uncertainty can be introduced in the transition (for example to reflect wind) or in the implementation of the control. Managing blood inventories There are eight blood types (A, B, AB, O, which can each be either positive or negative). Blood can also be stored for up to six weeks, and it may be frozen so that it can be held for longer periods of time. Each blood type has different substitution options (see figure 1.2). For example, anyone can accept O-negative blood (known as the universal donor), while A-positive blood can only be used for people with A-positive or AB-negative (known as the universal recipient). As a result of different substitution options, it is not necessarily the case that you want to use, say, A-positive blood for an A-positive patient, who can also be handled with either O-negative, O-positive, A-negative as well as A-positive blood.

25 PROBLEM TYPES 7 Figure 1.2 The different substitution possibilities between donated blood and patient types (from Cant (2006)). Hospitals (or the Red Cross) have to periodically run blood drives, which produce an uncertain response. At the same time, demand for blood comes from a mixture of routine, scheduled surgeries and bursts from large accidents, storms and domestic violence. If the problem is modeled in weekly time increments, blood may have an age from 0 to 5 weeks. These six values times the eight blood types, times two (frozen or not) gives us 96 values for the blood attribute. There are hundreds of possible assignments of these blood types to the different patient types. These problems are but a tiny sample of the wide range of problems we may encounter that combine decisions and uncertainty. These applications illustrate both offline (design first) and online (decisions have to be made and experienced over time) settings. Decisions may be scalar, vectors (the blood), or categorical (the medical decisions). Uncertainty can be introduced in different forms. And finally, there are different types of objectives, including a desire to do well on average (the newsvendor problem is typically repeated many times), as well as to handle risk (of an outage of power or a blood shortage). 1.2 PROBLEM TYPES Although we revisit this later in more detail, it is useful to have a sense of the different types of problems classes. We provide this by running down the different dimensions of a sequential decision problem, identifying the varieties of each of the dimensions. We wait until chapter 9 before providing a much more comprehensive presentation of how to model this rich problem class.

26 8 DECISIONS AND UNCERTAINTY State variables The state variable S t captures the information available at time t (we may also use S n to describe the state at iteration n) to make a decision. State variables come in three general flavors: Physical state R t - This might be inventory, the location of a vehicle on a network, or the amount of money invested in a stock, where R t may be a scalar (money in the bank), a low-dimensional vector (the inventory of different blood types) or a very high-dimensional vector (the number of different types of aircraft described by a vector of attributes). Physical states restrict the decisions we can make in some way. For example, we cannot sell more shares of a stock than we own. The location on a network determines the decisions we can make. In the vast majority of problems, decisions affect the physical state, directly or indirectly. Informational state I t - This includes information that affects the behavior of a problem, such as the temperature (that influences evaporation from a reservoir), economic variables (that influence stock prices and interest rates), medical factors (e.g. whether someone needing a knee replacement is also a smoker). A informal state variable is any relevant piece of information that we know perfectly, and that we have not classified as a physical state. Knowledge state K t - The knowledge state describes any variable or parameter which we do not know perfectly, and instead use a probability distribution. These are studied under different names such as multiarmed bandit problems (or more broadly, optimal learning problems), and partially observable Markov decision processes (POMDPs). When we return to modeling in chapter 9, we will see that state variables come in a variety of styles, as we can mix and match physical, informational and knowledge state variables in different combinations. In addition, we can different behaviors depending on the relationship between a state variable now and the past Types of decisions Decisions come in many different styles, and this has produced a variety of notational systems. The most common canonical notational systems for decisions are: Discrete action a - This notation is typically used when a is discrete (binary, integer, categorical). This is widely used in computer science, which inherited the notation from the Markov decision process community in operations research. Continuous control u - In the controls community, u is typically a low-dimensional continuous vector (say 1-10 dimensions). General vectors x - In operations research, x is typically a vector of continuous or discrete (integer) variables, where it is not unusual to solve problems with tens or hundreds of thousands of variables (dimensions). Regardless of the community, decisions may come in many forms. We will use our default notation of x, where X is the set of possible values of x. Binary - X = {0, 1}. Binary choices arise frequently in finance (hold or sell an asset), and internet applications where x = 0 means run the current website while x = 1 means run the redesigned website (this is known as A/B testing).

27 PROBLEM TYPES 9 Discrete - X = {1, 2,..., M}. Subset - x may be a vector (0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1) indicating, for example, the starting lineup of a basketball team. Scalar continuous - X = (a, b) for some b > a. This is typically written X = R. Continuous vector - x = (x 1,..., x n ) might be an n-dimensional vector, where we may write X = R n. Discrete vector - x can be a vector of binary elements, or a vector of integers (0, 1, 2,...). Categorical - x may be a category such as a type of drug, or a choice of employee described by a long list of attributes, or a choice of a movie (also described by a long list of attributes). If we let a 1, a 2,..., a K be the different attributes of a choice, we see that the number of possible categories can be extremely large. The nature of the decision variable, along with properties of the objective function, can have a major impact on the design of a solution strategies Types of uncertainty When first starting out, it is natural to start with a simple problem such as newsvendor (with random demand), or perhaps a stochastic shortest path with random link costs, or perhaps you are modeling the process of selling an asset where the price process is random. Randomness can arise in the objective function (costs or prices), the constraints (such as the supply of or demand for blood), or perhaps the transition (the evolution in the price of a stock, the amount of wind energy being generated, or the uncertain evaporation of water from a reservoir). In more complex problems, uncertainty can arise in a variety of different ways. Some examples are Parameter and model uncertainty - We have a model describing how a system should behave, but its behavior is determined by a set of parameters θ that are not known perfectly. We may use (noisy) observations to estimate these parameters, but they are at best known up to a probability distribution. We may also have uncertainty about the fundamental structure of the model. We may adopt a Bayesian setting and put a prior on either the structure of the model or the parameters governing the behavior of the model. Forecasting errors - We may be trying to forecast demands, prices, wind, and customer arrivals. Observational errors - We often cannot know the true state of the system exactly. This might arise when we cannot perfectly estimate the location and speed of a robot, or we may have errors in our ability to monitor inventory, or the actual amount of energy in a car battery. Transitional uncertainty - We may be able to deterministically calculate the change in the level of a reservoir, but rainfall and seepage introduces noise in this transition.

28 10 DECISIONS AND UNCERTAINTY Control uncertainty - The decisions we make may not be implemented precisely as we wish. Goal uncertainty - People may have difficulty articulating their objectives, which might involve balancing priorities between different objectives. There are different ways to express uncertainty. Some of these include: Uninformative prior - This means we have no idea what the value is (we do not even know the order of magnitude). We can express this by letting the variance of the distribution of our belief to infinity. Unknown distribution - There are ways of modeling stochastic processes by assuming that observations come from an exogenous, and completely unknown, distribution. Exponential family - This includes many of our most familiar distributions, such as normal, exponential, log-normal, gamma, chi-squared, beta, Bernoulli, Poisson, and geometric. Uniform distributions - Including discrete and continuous. Heavy-tailed distributions - Such as the Cauchy distribution (which has infinite variance), as well as spikes. Mixture distributions - For example, jump diffusion models which use a mixture of a low variance and high variance normal distribution. Bursts - These arise in nonstationary processes where a set of events (spread of a disease, trending of new topics, flurry of sales of a low volume product) occur together. Rare events - Events that may occur, but rarely. probability distribution. These are not well described by a A more detailed presentation of the different sources of uncertainty is given in chapter Models of system dynamics Almost all of the problems we consider in this volume can be modeled sequentially, starting with some state which is then modified by a decision and then new information, which then leads to a new state. However, there are different ways that we can compute, measure or observe these transitions. These include Model-based - Here we assume we can model the transition as a system of equations. For example, an inventory problem might include the equation R t+1 = R t + x t + ˆR t+1, where R t is our current inventory (say, water in a reservoir or product on a store shelf), x t may represent purchases or (for water) releases, and ˆR t+1 is random, exogenous changes (rainfall or leakage, sales or returns). Model-free - This describes situations where we can observe a state S t, then take an action a t, but after that all we can do is observe the next state S t+1. This might arise whenever we have a complex process such as global climate change, the dynamics

29 PROBLEM TYPES 11 of a complex production plant, a complex computer simulation, or the behavior of a human being. Both of these are very important problem classes, and there are specialized modeling and algorithmic strategies designed for each of them Objectives There are many ways to evaluate the performance of a system, which may involve a single metric, or multiple goals or objectives. Different classes of metrics include Costs or contributions - These include financial metrics that we are minimizing (costs) or maximizing (profits, revenue, contributions). Performance metrics - Here we would include non-financial metrics such as strength of a material, cancer cells killed, post-operative recovery from a surgical procedure. Faults - We may wish to maximize a performance metric (or minimize cost), but we have to monitor service failures, or flaws in a product. Time - We may be trying to minimize the time required to complete a task. We also need to consider how costs (or rewards/contributions) are being accumulated: Cumulative rewards - We may accumulate rewards (or costs) as we progress, as would happen in many online applications where we have to actually experience the process (e.g. testing different prices and observing revenues, trying out different paths through a network, or making medical decisions for patients). Final rewards - We may be able to run a series of tests (in a lab, or using a computer simulation) looking to identify the best design, and then evaluate the performance based on how well the design works, and without regard to poor results that arise while searching for the best design. We next have to consider the different ways of evaluating these metrics in the presence of uncertainty. These include Expectations - This is the most widely used strategy, which involves averaging across the different outcomes. Risk measures - This includes a variety of metrics that capture the variability of our performance metric. Examples include: Variance of the metric. A quantile (e.g. maximizing the 10th percentile of profits). Probability of being above or below some quantile. Expected value given it is above or below some threshold. Worst case - Often associated with robust optimization, we may wish to focus on the worst possible outcome, as might arise in the design of an engineering part where we want to lowest cost to handle the worst case. This is technically a form of risk measure, but robust optimization is a distinct field.

30 12 DECISIONS AND UNCERTAINTY We also need to consider the characteristics of the function being used to evaluate our system. These include Computational cost - We may be able to compute performance in fractions of a second (e.g. analytical functions), seconds to minutes (complex analytical functions, small simulations), hours (more complex simulations, some laboratory experiments, observing response to prices on the internet), days (very complex simulations, more complex laboratory experiments), and weeks to months (very complex laboratory experiments, observing market response to business initiatives). Noise level - Function evaluations may be fairly reliable (e.g. variations less than 20 percent of the average), medium to high noise, or binomial (success/failure). Analytical behavior - Differentiable? Is the function convex (minimizing) or concave (maximizing)? Unimodal? Monotone? Differentiable? Smooth? Staging of information and decisions It is useful to distinguish problem classes in terms of the staging of information and decisions. Below we list major problem classes, and describe each in terms of the sequencing of decisions and information. Offline static stochastic optimization - Decision-information Offline learning (final reward) - Decision-information-decision-information... - decision-information Online learning (cumultative reward) - Decision-information-decision-information... -decision-information Contextual bandits (cumulative reward) - Information-decision-information-decisioninformation... Two-stage stochastic programming (vector decisions) - Decision-information-decision Multistage stochastic programming (vector decisions) - Decision-information-decisioninformation... -decision-information Stochastic control (continuous decisions) - Decision-information-decision-information... -decision-information Finite horizon Markov decision processes (discrete actions, cumulative reward) - Decision-information-decision-information... -decision-information Infinite horizon Markov decision process (discrete actions, cumulative reward) - Decision-information-decision-information... Entire fields have evolved around each of these problem classes. Classification can be tricky because it depends in part on how the problem is solved. For example, the stochastic optimization problem max x EF (x, W ) is often described as static stochastic optimization because you choose x, observe W and then stop. Yet, this problem is often solved using an iterative online learning algorithm where we successively choose x n and then observe W n+1 in the hopes that x n will converge to the optimal solution. Under the right conditions, both will yield the same optimal solution to the same objective function, and yet the styles and resulting analysis of the solution approaches is quite different.

Discriminant Analysis and Statistical Pattern Recognition

Discriminant Analysis and Statistical Pattern Recognition GEOFFRY J. McLACHLAN The University of Queensland @EEC*ENCE A JOHN WILEY & SONS, INC., PUBLICATION This Page Intentionally Left Blank Discriminant