Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Size: px
Start display at page:

Download "Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes"

Transcription

1 Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI Abstract In ulti-agent MDPs, it is generally necessary to consider the joint state space of all agents, aking the size of the proble and the solution exponential in the nuber of agents. However, often interactions between the agents are only local, which suggests a ore copact proble representation. We consider a subclass of ulti-agent MDPs with local interactions where dependencies between agents are asyetric, eaning that agents can affect others in a unidirectional anner. This asyetry, which often occurs in doains with authority-driven relationships between agents, allows us to ake better use of the locality of agents interactions. We present and analyze a graphical odel of such probles and show that, for soe classes of probles, it can be exploited to yield significant (soeties exponential) savings in proble and solution size, as well as in coputational efficiency of solution algoriths. 1. Introduction Markov decision processes [9] are widely used for devising optial control policies for agents in stochastic environents. Moreover, MDPs are also being applied to ultiagent doains [1, 10, 11]. However, a weak spot of traditional MDPs that subjects the to the curse of diensionality and presents significant coputational challenges is the flat state space odel, which enuerates all states the agent can be in. This is especially significant for ulti-agent MDPs, where, in general, it is necessary to consider the joint state and action spaces of all agents. Fortunately, there is often a significant aount of structure to MDPs, which can be exploited to devise ore copact proble and solution representations, as well as efficient solution ethods that take advantage of such representations. For exaple, a nuber of factored represen- This work was supported, in part, by a grant fro Honeywell Labs. tations have been proposed [2, 3, 5] that odel the state space as being factored into state variables, and use dynaic Bayesian network representations of the transition function to exploit the locality of the relationships between variables. We focus on ulti-agent MDPs and on a particular for of proble structure that is due to the locality of interactions between agents. Let us note, however, that we analyze the structure and coplexity of optial solutions only, and the clais do not apply to approxiate ethods that exploit proble structure (e.g., [5]). Central to our proble representation are dependency graphs that describe the relationships between agents. The idea is very siilar to other graphical odels, e.g., graphical gaes [6], coordination graphs [5], and ulti-agent influence diagras [7], where graphs are used to ore copactly represent the interactions between agents to avoid the exponential explosion in proble size. Siilarly, our representation of a ulti-agent MDP is exponential only in the degree of the dependency graph, and can be exponentially saller than the size of the flat MDP defined on the joint state and action spaces of all agents. We focus on asyetric dependency graphs, where the influences that agents exert on each other do not have to be utual. Such interactions are characteristic of doains with authority-based relationships between agents, i.e. lowauthority agents have no control over higher-authority ones. Given the copact representation of ulti-agent MDPs, an iportant question is whether the copactness of proble representation can be aintained in the solutions, and if so, whether it can be exploited to devise ore efficient solution ethods. We analyze the effects of optiization criteria and shape of dependency graphs on the structure of optial policies, and for probles where the copactness can be aintained in the solution, we present algoriths that ake use of the graphical representation. 2. Preliinaries In this section, we briefly review soe background and introduce our copact representation of ulti-agent MDPs.

2 2.1. Markov Decision Processes A single-agent MDP can be defined as a n-tuple S, A, P, R, where S = {i} and A = {a} are finite sets of states and actions, P : S A S [0, 1] defines the transition function (the probability that the agent goes to state j if it executes action a in state i is P (i, a, j)), and R : S R defines the rewards (the agent gets a reward of R(i) for visiting state i). 1 A solution to a MDP is a policy defined as a procedure for selecting an action. It is known [9] that, for such MDPs, there always exist policies that are uniforly-optial (optial for all initial conditions), stationary (tie independent), deterinistic (always select the sae action for a given state), and Markov (history-independent); such policies (π) can be described as appings of states to actions: π : S A. Let us now consider a ulti-agent environent with a set of n agents M = {} ( M = n), each of who has its own set of states S = {i } and actions A = {a }. The ost straightforward and also the ost general way to extend the concept of a single-agent MDP to the fullyobservable ulti-agent case is to assue that all agents affect the transitions and rewards of all other agents. Under these conditions, a ulti-agent MDP can be defined siply as a large MDP S M, A M, P M, R M, where the joint state space S M is defined as the cross product of the state spaces of all agents: S M = S 1... S n, and the joint action space is the cross product of the action spaces of all agents: A M = A 1... A n. The transition and the reward functions are defined on the joint state and action spaces of all agents in the standard way: P M : S M A M S M [0, 1] and R M : S M R. This representation, which we refer to as flat, is the ost general one, in that, by considering the joint state and action spaces, it allows for arbitrary interactions between agents. The trouble is that the proble (and solution) size grows exponentially with the nuber of agents. Thus, very quickly it becoes ipossible to even write down the proble, let alone solve it. Let us note that, if the state space of each agent is defined on a set of world features, there can be soe overlap in features between the agents, in which case the joint state space would be saller than the cross product of the state space of all agents, and would grow as a slower exponent. For siplicity, we ignore the possibility of overlapping features, but our results are directly applicable to that case as well Graphical Multi-Agent MDPs In any ulti-agent doains, the interactions between agents are only local, eaning that the rewards and tran- 1 Often, rewards are said to also depend on actions and future states. For siplicity, we define rewards as function of current state only, but our odel can be generalized to the ore general case (a) (b) (c) Figure 1. Agent Dependency Graphs sitions of an agent are not directly influenced by all other agents, but rather only by a sall subset of the. To exploit the sparseness in agents interactions, we propose a copact representation that is analogous to the Bayesian network representation of joint probability distributions of several rando variables. Given its siilarity to other graphical odels, we label our representation a graphical ulti-agent MDP (graphical MMDP). Central to the definition of a graphical MMDP is a notion of a dependency graph (Figure 1), which shows how agents affect each other. The graph has a vertex for every agent in the ulti-agent MDP. There is a directed edge fro vertex k to vertex if agent k has an influence on agent. The concept is very siilar to coordination graphs [5], but we distinguish between two ways agents can influence each other: (1) an agent can affect another agent s transitions, in which case we use a solid arrow to depict this relationship in the dependency graph, and (2) an agent can affect another agent s rewards, in which case we use a dashed arrow in the dependency graph. To siplify the following discussion of graphical ultiagent MDPs, we also introduce soe additional concepts and notation pertaining to the structure of the dependency graph. For every agent M, let us label all agents that directly affect s transitions as N (P ) (parents of with respect to transition function P ), and all agents whose transitions are directly affected by as N + (P ) (children of with respect to transition function P ). Siilarly, we use N (R) to refer to agents that directly affect s rewards, and N + (R) to refer to agents whose rewards are directly affected by. Thus, in the graph shown in Figure 1a, N (P ) = {1, 4}, N (R) = {1, 2}, N + (P ) = {3}, and N + (R) = {4}. We use the ters transition-related and reward-related parents and children to distinguish between the two categories. Soeties, it will also be helpful to talk about the union of transition-related and rewardrelated parents or children, in which case we use N = N (P ) N (R) and N + = N + (P ) N + (R). Furtherore, let us label the set of all ancestors of (all agents fro which is reachable) with respect to transitionrelated and reward-related dependencies as O (P ) and O (R), respectively. Siilarly, let us label the descendants of (all agents reachable fro ) as O + (P ) and O + (R). A graphical MMDP with a set of agents M is defined as follows. Associated with each agent M is a n-tuple

3 S, A, P, R, where the state space S and the action space A are defined exactly as before, but the transition and the reward functions are defined as follows: P : S N (P ) S A S [0, 1] R : S N (R) S R, where S N (P ) and S N(R) are the joint state spaces of the transition-related and reward-related parents of, respectively. In other words, the transition function of agent specifies a probability distribution over its next states S as a function of its own current state S and the current states of its parents S N (P ) and its own action A. That is P (i N (P ), i, a, j ) is the probability that agent goes to state j if it executes action a when its current state is i and the states of its transition-related parents are i N (P ). The reward function is defined analogously on the current states of the agent itself and the reward-related parents. Also notice that we allow cycles in the agent dependency graph, and oreover the sae agent can both influence and be influenced by soe other agent (e.g. agents 4 and in Figure 1a). We also allow for asyetric influences between agents, i.e. it could be the case that one agent affects the other, but not vice versa (e.g. agent, in Figure 1a, is influenced by agent 1, but the opposite is not true). This is often the case in doains where the relationships between agents are authority-based. It turns out that the existence of such asyetry has iportant iplications on the copactness of the solution and the coplexity of the solution algoriths. We return to a discussion of the consequences of this asyetry in the following sections. It is iportant to note that, in this representation, each transition and reward function only specifies the rewards and transition probabilities of agent, and contains no inforation about the rewards and transitions of other agents. This iplies that the reward and next state of agent are conditionally independent of the rewards and the next states of other agents, given the current action of and the state of and its parents N. Therefore, this odel does not allow for correlations between the rewards or the next states of different agents. For exaple, we cannot odel the situation where two agents are trying to go through the sae door and whether one agent akes it depends on whether the other one does; we can only represent, for each agent, the probability that it akes it, independently of the other. This liitation of the odel can be overcoe by luping together groups of agents that are correlated in such ways into a single agent as in the flat ulti-agent MDP forulation. In fact, we could have allowed for such dependencies in our odel, but it would have coplicated the presentation. Instead, we assue that all such correlations have already been dealt with, and the resulting proble only consists of agents (perhaps coposite ones) whose states and rewards have this conditional independence property. (1) It is easy to see that the size of a proble represented in this fashion is exponential in the axiu nuber of parents of any agent, but unlike the flat odel, it does not depend on the total nuber of agents. Therefore, for probles where agents have a sall nuber of parents, space savings can be significant. In particular, if the nuber of parents of any agent is bounded by a constant, the savings are exponential (in ters of the nuber of agents). 3. Properties of Graphical Multi-Agent MDPs Now that we have a copact representation of ultiagent MDPs, two iportant questions arise. First, can we copactly represent the solutions to these probles? And second, if so, can we exploit the copact representations of the probles and the solutions to iprove the efficiency of the solution algoriths? Positive answers to these questions would be iportant indications of the value of our graphical proble representation. However, before we attept to answer these questions and get into a ore detailed analysis of the related issues, let us lay down soe groundwork that will siplify the following discussion. First of all, let us note that a graphical ulti-agent MDP is just a copact representation, and any graphical MMDP can be easily converted to a flat ulti-agent MDP, analogously to how a copact Bayesian network can be converted to a joint probability distribution. Therefore, all properties of solutions to flat ulti-agent MDPs (e.g. stationarity, history-independence, etc.) also hold for equivalent probles that are forulated as graphical MMDPs. Thus, the following siple observation about the for of policies in graphical MMDPs holds. Observation 1 For a graphical MMDP S, A, P, R, M, with an optiization criterion for which optial policies are Markov, stationary, and deterinistic, 2 such policies can be represented as π : S X A, where S X is a cross product of the state spaces of soe subset of all agents (X M). Clearly, this observation does not say uch about the copactness of policies, since it allows X = M, which corresponds to a solution where an agent has to consider the states of all other agents when deciding on an action. If that were always the case, using this copact graphical representation for the proble would not (by itself) be beneficial, because the solution would not retain the copactness and would be exponential in the nuber of agents. However, as it turns out, for soe probles, X can be significantly saller than M. Thus we are interested in deterining, for every agent, the inial set of agents whose states s policy has to depend on: 2 We will iplicitly assue that optial policies are Markov, stationary, and deterinistic fro now on.

4 Definition 1 In a graphical MMDP, a set of agents X is a inial doain of an optial policy π : S X A of agent iff, for any set of agents Y and any policy π : S Y A, the following iplications hold: Y X = U(π ) < U(π ) Y X = U(π ) U(π ), where U(π) is the payoff that is being axiized. Essentially, this definition allows us to talk about the sets of agents whose joint state space is necessary and sufficient for deterining optial actions of agent. Fro now on, whenever we use the notation π : S X A, we iplicitly assue that X is the inial doain of π Assuptions As entioned earlier, one of the ain goals of the following sections will be to characterize the inial doains of agents policies under various conditions. Let us ake a few observations and assuptions about properties of inial doains that allow us to avoid soe non-interesting degenerate special cases and to focus on the hardest cases in our analysis. These assuptions do not liit the general coplexity results that follow, as the latter only require that there exist soe probles for which the assuptions hold. In the rest of the paper, we iplicitly assue that they hold. Central to our future discussion will be an analysis of which rando variables (rewards, states, etc.) depend on which others. It will be very useful to talk about the conditional independence of future values of soe variables, given the current values of others. Definition 2 We say that a rando variable X is Markov on the joint state space S Y of soe set of agents Y if, given the current values of all states in S Y, the future values of X are independent of any past inforation. If that property does not hold, we say that X is non-markov on S Y. Assuption 1 For a inial doain X of agent s optial policy, and a set of agents Y, the following hold: 1. X is unique 2. X 3. l X = S l is Markov on S X 4. S is Markov on S Y Y X The first assuption allows us to avoid soe special cases with sets of agents with highly-correlated states, where equivalent policies can be constructed as functions of either of the sets. The second assuption iplies that an optial policy of every agent depends on its own state. The third assuption says that the state space of any agent l that is in the inial doain of ust be Markov on the state space of the inial doain. Since the state space of agent l is in the inial doain of, it ust influence s rewards in a non-trivial anner. Thus, if S l is non-markov on S X, agent should be able to expand the doain of its policy to ake S l Markov, since that, in general, would increase s payoff. The fourth assuption says that the agent s state is Markov only on supersets of its inial doain, because the agent would want to increase the doain of its policy just enough to ake its state Markov. These assuptions are slightly redundant (e.g., 4 could be deduced fro weaker conditions), but we use this for for brevity Transitivity Using the results of the previous sections, we can now forulate an iportant clai that will significantly siplify the analysis that follows. Proposition 1 Consider two agents, l M, where the optial policies of and l have inial doains of X and X l, respectively (π : S X A, π l : S Xl A l ). Then, under Assuption 1, the following holds: l X = X l X, Proof: We will show this by contradiction. Let us consider an agent fro l s inial doain: k X l. Let us assue (contradicting the stateent of the proposition) that l X, but k / X. Consider the set of agents that consists of the union of the two inial doains X and X l, but with agent k reoved: Y = X (Xl \k). Then, since Y X l, Assuption 1.4 iplies that S l is non-markov on S Y. Thus, Assuption 1.3 iplies l / X, which contradicts our earlier assuption. Essentially, this proposition says that the inial doains have a certain transitive property: if agent needs to base its action choices on the state of agent l, then, in general, also needs to base its actions on the states of all agents in the inial doain of l. As such, this proposition will help us to establish lower bounds on policy sizes. In the rest of the paper, we analyze soe classes of probles to see how large the inial doains are, under various conditions and assuptions, and for doains where inial doains are not prohibitively large, we outline solution algoriths that exploit graphical structure. In what follows, we focus on two coon scenarios: one, where the agents work as a tea and ai to axiize the social welfare of the group (su of individual payoffs), and the other, where each agent axiizes its own payoff. 4. Maxiizing Social Welfare The following proposition characterizes the structure of the optial solutions to graphical ulti-agent MDPs under the social welfare optiization criterion, and as such serves

5 as an indication of whether the copactness of this particular representation can be exploited to devise an efficient solution algorith for such probles. We deonstrate that, in general, when the social welfare of the group is considered, the optial actions of each agent depend on the states of all other agents (unless the dependency graph is disconnected). Let us note that this case where all agents are axiizing the sae objective function is equivalent to a singleagent factored MDP, and our results for this case are analogous to the well-known fact that the value function in a single-agent factored MDP does not, in general, retain the structure of the proble [8]. Proposition 2 For a graphical MMDP with a connected (ignoring edge directionality) dependency graph, under the optiization criterion that axiizes the social welfare of all agents, an optial policy π of agent, in general, depends on the states of all other agents, i.e. π : S M A. Proof (Sketch): Agent ust, at the iniu, base its action decisions on the states of its iediate (both transitionand reward-related) parents and children. Indeed, agent should worry about the states of its transition-related parents, N (P ), because their states affect the one-step transition probabilities of, which certainly have a bearing on s payoff. Agent should also include in the doain of its policy the states of its reward-related parents, N (R), because they affect s iediate rewards and agent ight need to act so as to synchronize its state with the state of its parents. Siilarly, since the agent cares about the social welfare of all agents, it will need to consider the effect that its actions have on the states and rewards of its iediate children, and ust thus base its policy on the states of its iediate children N + (P ) and N + (R) to potentially set the up to get higher rewards. Having established that the inial doain of each agent ust include the iediate children and parents of the agent, we can use the transitivity property fro the previous section to extend this result. Although Proposition 1 only holds under the conditions of Assuption 1, for our purpose of deterining the coplexity of policies in general, it is sufficient that there exist probles for which Assuption 1 holds. It follows fro Proposition 1 that the inial doain of agent ust include all parents and children of s parents and children, and so forth. For a connected dependency graph, this expands the inial doain of each agent to all other agents in M. The above result should not be too surprising, as it akes clear, intuitive sense. Indeed, let us consider a siple exaple that has a flavor of a coonly-occurring production scenario. Suppose that there is a set of agents that can either cooperate to generate a certain product, yielding a very high reward, or they can concentrate on soe local tasks that do not require cooperation, but which have lower social payoff. Also, suppose that the interactions between the agents are only local for exaple, the agents are operating an assebly line, where each agent receives the product fro a previous agent, odifies it, and passes it on to the next agent. Let us now suppose that each agent has a certain probability of breaking down, and if that happens to at least one of the agents, the assebly line fails. In such an exaple, the optial policy for the agents would be to participate in the assebly-line production until one of the fails, at which point all agents should switch to working on their local tasks (perhaps processing ites already in the pipeline). Clearly, in that exaple, the policy of each agent is a function of the states of all other agents. The take-hoe essage of the above is that, when the agents care about the social welfare of the group, even when the interactions between the agents are only local, the agents policies depend on the joint state space of all agents. The reason for this is that a state change of one agent ight lead all other agents to want to iediately odify their behavior. Therefore, our particular type of copact graphical representation (by itself and without additional restrictions) cannot be used to copactly represent the solutions. 5. Maxiizing Own Welfare In this section, we analyze probles where each of the agents axiizes its own payoff. Under this assuption, unlike the discouraging scenario of the previous section, the coplexity of agents policies is slightly less frightening. The following result characterizes the size of the inial doain of optial policies for probles where each agent axiizes its own utility. Proposition 3 For a graphical MMDP with an optiization criterion where each agent axiizes its own reward, the inial doain of s policy consists of itself and all of its transition- and reward-related ancestors: X = E, where we define E = O (P ) O (R). Proof (Sketch): To show the correctness of the proposition, we need to prove that, (1) the inial doain ust include at least itself and its ancestors (X E ), and (2) that X does not include any other agents (X E ). We can show (1) by once again applying the transitivity property. Clearly, an agent s policy should be a function of the states of the agent s reward-related and transitionrelated parents, because they affect the one-step transition probabilities and rewards of the agent. Then, by Proposition 1, the inial doain of the agent s policy ust also include all of its ancestors. We establish (2) as follows. We assue that it holds for all ancestors of, and show that it ust then hold for. We then expand the stateent to all agents by induction.

6 Let us fix the policies π k of all agents except. Consider the n-tuple S E, A, P E, R E, where P E and R E are defined as follows: P E (i E, a, j E ) = P (i N (P ), i, a, j ) ) P k (i N k (P ), i k, π k (i E ), j k k k O R E (i E ) = R (i N (R), i ) The above constitutes a fully-observable MDP on S E and A with transition function P and reward function R. Let us label this decision process MDP 1. By properties of fully-observable MDPs, there exists an optial stationary deterinistic solution π 1 of the for π 1 : S E A. Also consider the following MDP on an augented state space that includes the joint state space of all the agents (and not just s ancestors): MDP 2 = S M, A, P M, R M, where P M and R M are defined as follows: P M (i M, a, j M ) = P (i N (P ), i, a, j ) ) P k (i N k O k (P ), i k, π k (i E ), j k k ) P k (i N k (P ), i k, π k (i M ), j k k M\\O R M (i M ) = R (i N (R), i ) Basically, we have now constructed two fully-observable MDPs: MDP 1 that is defined on S E, and MDP 2 that is defined on S M, where MDP 1 is essentially a projection of MDP 2 onto S E. We need to show that no solution to MDP 2 can have a higher value 3 than the optial solution to MDP 1. Let us refer to the optial solution to MDP 1 as π. 1 Suppose there exists a solution π 2 to MDP 2 that has a higher value than π. 1 The policy π 2 defines soe stochastic trajectory for the syste over the state space S M. Let us label the distribution over the state space at tie t as ρ(i M, t). It can be shown that under our assuptions we can always construct a non-stationary policy π (t) 1 : S E A for MDP 1 that yields the sae distribution ρ(i M, t) over the state space S E as the one produced by π. 2 Thus, there exists a non-stationary solution to MDP 1 that has a higher payoff than π, 1 which is a contradiction, since we assued that π 1 was optial for MDP 1. We have therefore shown that, given that the policies of all ancestors of depend only on their own states and the states of their ancestors, there always exists a policy that aps the state space of and its ancestors (S E ) to s actions (A ) that is at least as good as any policy that aps 3 The proof does not rely on the actual type of optiization criterion used by each agent and holds for any criterion that is a function only of the agents trajectories. (2) (3) the joint space of all agents (S M ) to s actions. Then, by using induction, we can expand this stateent to all agents (for acyclic graphs we use the root nodes as the base case, and for cyclic graphs, we use agents that do not have any ancestors that are not siultaneously their descendants). The point of the above proposition is that, for situations where each agent axiizes its own utility, the optial actions of each agent do not have to depend on the states of all other agents, but rather only on its own state and the states of its ancestors. In contrast to the conclusions of Section 4, this result is ore encouraging. For exaple, for dependency graphs that are trees (typical of authority-driven organizational structures), the nuber of ancestors of any agent equals the depth of the tree, which is logarithic in the nuber of agents. Therefore, if each agent axiizes its own welfare, the size of its policy will be exponential in the depth of the tree, but only linear in the nuber of agents Acyclic Dependency Graphs Thus far we have shown that probles where agents optiize their own welfare can allow for ore copact policy representations. We now describe an algorith that exploits the copactness of the proble representation to ore efficiently solve such policy optiization probles for doains with acyclic dependency graphs. It is a distributed algorith where the agents exchange inforation, and each one solves its own policy optiization proble. The algorith is very straightforward and works as follows. First, the root nodes of the graph (the ones with no parents) copute their optial policies that are siply appings of their own states to their own actions. Once a root agent coputes a policy that axiizes its welfare, it sends the policy to all of its children. Each child waits to receive the policies π k, k N fro its ancestors, then fors a MDP on the state space of itself and its ancestors as in (eq. 2). It then solves this MDP S E, A, P E, R E to produce a policy π : E A, at which point it sends the policy and the policies of its ancestors to its children. The process repeats until all agents copute their optial policies. Essentially, this algorith perfors, in a distributed anner, a topological sort of the dependency graph, and coputes a policy for every agent Cyclic Dependency Graphs We now turn our attention to the case of dependency graphs with cycles. Note that the coplexity result of Proposition 3 still applies, because no assuptions about the cyclic or acyclic nature of dependency graphs were ade in the stateent or proof of the proposition. Thus, the inial doain of an agent s policy is still the set of its ancestors.

7 The proble is, however, that the solution algorith of the previous section is inappropriate for cyclic graphs, because it will deadlock on agents that are part of a cycle, since these agents will be waiting to receive policies fro each other. Indeed, when self-interested agents utually affect each other, it is not clear how they should go about constructing their policies. Moreover, in general, for such agents there ight not even exist a set of stationary deterinistic policies that are in equilibriu, i.e., since the agents utually affect each other, the best responses of agents to each others policies ight not be in equilibriu. A careful analysis of this case falls in the real of Markov gaes, and is beyond the scope of this paper. However, if we assue that there exists an equilibriu in stationary deterinistic policies, and that the agents in a cycle have soe black-box way of constructing their policies, we can forulate an algorith for coputing optial policies, by odifying the algorith fro the previous section as follows. The agents begin by finding the largest cycle they are a part of, and then, after the agents receive policies fro their parents who are not also their descendants, the agents proceed to devise an optial joint policy for their cycle, which they then pass to their children. 6. Additive Rewards In our earlier analysis, a reward function R of an agent could depend in an arbitrary way on the current states of the agent and its parents (eq. 1). In fact, this is why agents, in general, needed to synchronize their states with the states of their parents (and children in the social welfare case), which, in turn, was why the effects of reward-related dependencies propagated just as the transition-related ones did. In this section, we consider a subclass of reward functions whose effects reain local. Naely, we focus on additively-separable reward functions: R (i N (R), i ) = r (i ) + r k (i k ), (4) k N (R) where r k is a function (r k : S k R) that specifies the contribution of agent k to s reward. In order for all of our following results to hold, these functions have to be subject to the following condition: r k (i k ) = l k (r kk (i k )), (5) where l k is a positive linear function (l k (x) = αx + β, α > 0, β 0). This condition iplies that agents preferences over each other states are positively (and linearly) correlated, i.e., when an agent increases its local reward, its contribution to the rewards of its reward-related children also increases linearly. Furtherore, the results of this section are only valid under certain assuptions about the optiization criteria the agents use. Let us say that if an agent receives a history of rewards H(r) = {r(t)} = {r(0), r(1),...}, its payoff is U(H(r)) = U ( r(0), r(1),... ). Then, in order for our results to hold, U has to be linear additive: U(H(r 1 + r 2 )) = U(H(r 1 )) + U(H(r 2 )) (6) Notice that this assuption holds for the coonly-used risk-neutral MDP optiization criteria, such as expected total reward, expected total discounted reward, and average per-step reward, and is, therefore, not greatly liiting. In the rest of this section, for siplicity, we focus on probles with two agents ore specifically, on two interesting special cases, shown in Figure 1b and 1c. However, the results can be generalized to probles with ultiple agents and arbitrary dependency graphs. First of all, let us note that both of these probles have cyclic dependency graphs. Therefore, if the reward functions of the agents were not additively-separable, per our earlier results of Section 5, there would be no guarantee that there exists an equilibriu in stationary deterinistic policies. However, as we show below, our assuption about the additivity of the reward functions changes that and ensures that an equilibriu always exists. Let us consider the case in Figure 1b. Clearly, the policy of neither agent affects the transition function of the other. Thus, given our assuptions about additivity of rewards and utility functions, it is easy to see that the proble of axiizing the payoff is separable for each agent. For exaple, for agent 1 we have: ( ax U 1 H(R1 ) ) ( = ax U 1 H(r11 + r 21 ) ) = π 1,π 2 π 1,π 2 ax U ( H(r 11 ) ) + ax U ( H(r 21 ) ) (7) π 1 π 2 Thus, regardless of what policy agent 2 chooses, agent 1 should adopt a policy that axiizes the first ter in (eq. 7). In gae-theoretic ters, each of the agents has a (weakly) doinant strategy, and will adopt that strategy, regardless of what the other agent does. This is what guarantees the above-entioned equilibriu. Also notice that this result does not rely on reward linearity (eq. 5) and holds for any additively-separable (eq. 4) reward functions. Now that we have deonstrated that, for each agent, it suffices to optiize a function of only that agent s own states and actions, it is clear that each agent can construct its optial policy independently. Indeed, each agent has to solve a standard MDP on its own state and action space with a slightly odified reward function: R (i ) = r (i ), which differs fro the original reward function (eq. 4) in that it ignores the contribution of s parents to its reward. Let us now analyze the case in Figure 1c, where the state of agent 1 affects the transition probabilities of agent 2, and the state of agent 2 affects the rewards of agent 1. Again,

8 without the assuption that rewards are additive, this cycle would have caused the policies of both agents to depend on the cross product of their state spaces S 1 S 2, and furtherore the existence of equilibria in stationary deterinistic policies between self-interested agents is not guaranteed. However, when rewards are additive, the proble is sipler. Indeed, due to our additivity assuptions, we can write the optiization probles of the two agents as: ax U 1 (...) = ax U 1 (H(r 11 )) + ax U 1 (H(r 12 )) π 1,π 2 π 1 π 1,π 2 ax U 2 (...) = ax U 2 (H(r 21 )) + ax U 2 (H(r 22 )) π 1,π 2 π 1,π 2 π 1 Notice that here the probles are no longer separable (as in the previous case), so neither agent is guaranteed to have a doinant strategy. However, if we ake use of the assuption that the rewards are positively and linearly correlated (eq. 5), we can show that there always exists an equilibriu in stationary deterinistic policies. This is due to the fact that a positive linear transforation of the reward function does not change the optial policy (we show this for discounted MDPs, but the stateent holds ore generally): Observation 2 Consider two MDPs: Λ = S, A, R, P and Λ = S, A, R, P, where R (s) = α(r(s)) + β, α > 0 and β 0. Then, a policy π is optial for Λ under the total expected discounted reward optiization criterion iff it is optial for Λ. Proof (Sketch): It is easy to see that the linear transforation R (i) = αr(i) + β of the reward function will lead to a linear transforation of the Q function: Q (i, a) = αq(i, a) + β(1 γ) 1, where γ is the discount factor. Indeed, the ultiplicative factor α just changes the scale of all rewards, and the additive factor β siply produces an extra discounted sequence of rewards that sus to β(1 γ) 1 over an infinite horizon. Then, since the optial policy is π(i) = ax a αq(i, a) + β(1 γ) 1 = ax a Q(i, a), a policy π is optial for Λ iff it is optial for Λ. Observation 2 iplies that, for any policy π 1, a policy π 2 that axiizes the second ter of U 1 in (eq. 8) will be siultaneously axiizing (given π 1 ) the second ter of U 2 in (eq. 8). In other words, given any π 1, both agents will agree on the choice of π 2. Therefore, agent 1 can find the pair π 1, π 2 that axiizes its payoff U 1 and adopt that π 1. Then, agent 2 will adopt the corresponding π 2, since deviating fro it cannot increase its utility. To su up, when rewards are additively-separable (eq. 4) and satisfy (eq. 5), for the purposes of deterining the inial doain of agents policies (in two-agent probles), we can ignore reward-related edges in dependency graphs. Furtherore, for graphs where there are no cycles with transition-related edges, the agents can forulate their optial policies via algoriths siilar to the ones described in Section 5.1, and these policies will be in equilibriu. (8) 7. Conclusions We have analyzed the use of a particular copact, graphical representation for a class of ulti-agent MDPs with local, asyetric influences between agents. We have shown that, generally, because the effects of these influences propagate with tie, the copactness of the representation is not fully preserved in the solution. We have shown this for ulti-agent probles with the social welfare optiization criterion, which are equivalent to single-agent probles, and for which siilar results are known. We have also analyzed probles with self-interested agents, and have shown the coplexity of solutions to be less prohibitive in soe cases (acyclic dependency graphs). We have deonstrated that under further restrictions on agents effects on each other (positive-linear, additively-separable rewards), locality is preserved to a greater extent equilibriu sets of stationary deterinistic policies for self-interested agents always exist even in soe classes of probles with rewardrelated cyclic relationships between agents. Our future work will cobine the graphical representation of ulti-agent MDPs with other fors of proble factorization, including constrained ulti-agent MDPs [4]. References [1] C. Boutilier. Sequential optiality and coordination in ultiagent systes. In IJCAI-99, pages , [2] C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assuptions and coputational leverage. JAIR, 11:1 94, [3] C. Boutilier, R. Dearden, and M. Goldszidt. Stochastic dynaic prograing with factored representations. Artificial Intelligence, 121(1-2):49 107, [4] D. A. Dolgov and E. H. Durfee. Optial resource allocation and policy forulation in loosely-coupled Markov decision processes. In ICAPS-04, To Appear. [5] C. Guestrin, D. Koller, R. Parr, and S. Venkataraan. Efficient solution algoriths for factored MDPs. Journal of Artificial Intelligence Research, 19: , [6] M. Kearns, M. L. Littan, and S. Singh. Graphical odels for gae theory. In Proc. of UAI01, pages , [7] D. Koller and B. Milch. Multi-agent influence diagras for representing and solving gaes. In IJCAI-01, pages , [8] D. Koller and R. Parr. Coputing factored value functions for policies in structured MDPs. In IJCAI-99, pages , [9] M. L. Puteran. Markov Decision Processes. John Wiley & Sons, New York, [10] D. Pynadath and M. Tabe. Multiagent teawork: Analyzing the optiality and coplexity of key theories and odels. In AAMAS-02, [11] S. Singh and D. Cohn. How to dynaically erge Markov decision processes. In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, NIPS-98, volue 10. The MIT Press, 1998.

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Generalized Queries on Probabilistic Context-Free Grammars

Generalized Queries on Probabilistic Context-Free Grammars IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 1, JANUARY 1998 1 Generalized Queries on Probabilistic Context-Free Graars David V. Pynadath and Michael P. Wellan Abstract

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

MULTIAGENT Resource Allocation (MARA) is the

MULTIAGENT Resource Allocation (MARA) is the EDIC RESEARCH PROPOSAL 1 Designing Negotiation Protocols for Utility Maxiization in Multiagent Resource Allocation Tri Kurniawan Wijaya LSIR, I&C, EPFL Abstract Resource allocation is one of the ain concerns

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Generalizing Plans to New Environments in Relational MDPs

Generalizing Plans to New Environments in Relational MDPs In International Joint Conference on Artificial Intelligence IJCAI-03), Acapulco, Mexico, August 003. Generalizing Plans to New Environents in Relational MDPs Carlos Guestrin Daphne Koller Chris Gearhart

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies OPERATIONS RESEARCH Vol. 52, No. 5, Septeber October 2004, pp. 795 803 issn 0030-364X eissn 1526-5463 04 5205 0795 infors doi 10.1287/opre.1040.0130 2004 INFORMS TECHNICAL NOTE Lost-Sales Probles with

More information

a a a a a a a m a b a b

a a a a a a a m a b a b Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information

ma x = -bv x + F rod.

ma x = -bv x + F rod. Notes on Dynaical Systes Dynaics is the study of change. The priary ingredients of a dynaical syste are its state and its rule of change (also soeties called the dynaic). Dynaical systes can be continuous

More information

Reducing Vibration and Providing Robustness with Multi-Input Shapers

Reducing Vibration and Providing Robustness with Multi-Input Shapers 29 Aerican Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June -2, 29 WeA6.4 Reducing Vibration and Providing Robustness with Multi-Input Shapers Joshua Vaughan and Willia Singhose Abstract

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

The Frequent Paucity of Trivial Strings

The Frequent Paucity of Trivial Strings The Frequent Paucity of Trivial Strings Jack H. Lutz Departent of Coputer Science Iowa State University Aes, IA 50011, USA lutz@cs.iastate.edu Abstract A 1976 theore of Chaitin can be used to show that

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China 6th International Conference on Machinery, Materials, Environent, Biotechnology and Coputer (MMEBC 06) Solving Multi-Sensor Multi-Target Assignent Proble Based on Copositive Cobat Efficiency and QPSO Algorith

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Equilibria on the Day-Ahead Electricity Market

Equilibria on the Day-Ahead Electricity Market Equilibria on the Day-Ahead Electricity Market Margarida Carvalho INESC Porto, Portugal Faculdade de Ciências, Universidade do Porto, Portugal argarida.carvalho@dcc.fc.up.pt João Pedro Pedroso INESC Porto,

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

4 = (0.02) 3 13, = 0.25 because = 25. Simi-

4 = (0.02) 3 13, = 0.25 because = 25. Simi- Theore. Let b and be integers greater than. If = (. a a 2 a i ) b,then for any t N, in base (b + t), the fraction has the digital representation = (. a a 2 a i ) b+t, where a i = a i + tk i with k i =

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields Finite fields I talked in class about the field with two eleents F 2 = {, } and we ve used it in various eaples and hoework probles. In these notes I will introduce ore finite fields F p = {,,...,p } for

More information

MULTIPLAYER ROCK-PAPER-SCISSORS

MULTIPLAYER ROCK-PAPER-SCISSORS MULTIPLAYER ROCK-PAPER-SCISSORS CHARLOTTE ATEN Contents 1. Introduction 1 2. RPS Magas 3 3. Ites as a Function of Players and Vice Versa 5 4. Algebraic Properties of RPS Magas 6 References 6 1. Introduction

More information

Principles of Optimal Control Spring 2008

Principles of Optimal Control Spring 2008 MIT OpenCourseWare http://ocw.it.edu 16.323 Principles of Optial Control Spring 2008 For inforation about citing these aterials or our Ters of Use, visit: http://ocw.it.edu/ters. 16.323 Lecture 10 Singular

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Decision-Theoretic Approach to Maximizing Observation of Multiple Targets in Multi-Camera Surveillance

Decision-Theoretic Approach to Maximizing Observation of Multiple Targets in Multi-Camera Surveillance Decision-Theoretic Approach to Maxiizing Observation of Multiple Targets in Multi-Caera Surveillance Prabhu Natarajan, Trong Nghia Hoang, Kian Hsiang Low, and Mohan Kankanhalli Departent of Coputer Science,

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

I. Understand get a conceptual grasp of the problem

I. Understand get a conceptual grasp of the problem MASSACHUSETTS INSTITUTE OF TECHNOLOGY Departent o Physics Physics 81T Fall Ter 4 Class Proble 1: Solution Proble 1 A car is driving at a constant but unknown velocity,, on a straightaway A otorcycle is

More information

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS

EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS Jochen Till, Sebastian Engell, Sebastian Panek, and Olaf Stursberg Process Control Lab (CT-AST), University of Dortund,

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Error Exponents in Asynchronous Communication

Error Exponents in Asynchronous Communication IEEE International Syposiu on Inforation Theory Proceedings Error Exponents in Asynchronous Counication Da Wang EECS Dept., MIT Cabridge, MA, USA Eail: dawang@it.edu Venkat Chandar Lincoln Laboratory,

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing

More information

Force and dynamics with a spring, analytic approach

Force and dynamics with a spring, analytic approach Force and dynaics with a spring, analytic approach It ay strie you as strange that the first force we will discuss will be that of a spring. It is not one of the four Universal forces and we don t use

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng EE59 Spring Parallel LSI AD Algoriths Lecture I interconnect odeling ethods Zhuo Feng. Z. Feng MTU EE59 So far we ve considered only tie doain analyses We ll soon see that it is soeties preferable to odel

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Outperforming the Competition in Multi-Unit Sealed Bid Auctions

Outperforming the Competition in Multi-Unit Sealed Bid Auctions Outperforing the Copetition in Multi-Unit Sealed Bid Auctions ABSTRACT Ioannis A. Vetsikas School of Electronics and Coputer Science University of Southapton Southapton SO17 1BJ, UK iv@ecs.soton.ac.uk

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

On the Existence of Pure Nash Equilibria in Weighted Congestion Games

On the Existence of Pure Nash Equilibria in Weighted Congestion Games MATHEMATICS OF OPERATIONS RESEARCH Vol. 37, No. 3, August 2012, pp. 419 436 ISSN 0364-765X (print) ISSN 1526-5471 (online) http://dx.doi.org/10.1287/oor.1120.0543 2012 INFORMS On the Existence of Pure

More information

12 Towards hydrodynamic equations J Nonlinear Dynamics II: Continuum Systems Lecture 12 Spring 2015

12 Towards hydrodynamic equations J Nonlinear Dynamics II: Continuum Systems Lecture 12 Spring 2015 18.354J Nonlinear Dynaics II: Continuu Systes Lecture 12 Spring 2015 12 Towards hydrodynaic equations The previous classes focussed on the continuu description of static (tie-independent) elastic systes.

More information

arxiv: v3 [cs.sy] 29 Dec 2014

arxiv: v3 [cs.sy] 29 Dec 2014 A QUADRATIC LOWER BOUND FOR THE CONVERGENCE RATE IN THE ONE-DIMENSIONAL HEGSELMANN-KRAUSE BOUNDED CONFIDENCE DYNAMICS EDVIN WEDIN AND PETER HEGARTY Abstract. Let f k n) be the axiu nuber of tie steps taken

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Multicollision Attacks on Some Generalized Sequential Hash Functions

Multicollision Attacks on Some Generalized Sequential Hash Functions Multicollision Attacks on Soe Generalized Sequential Hash Functions M. Nandi David R. Cheriton School of Coputer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada 2nandi@uwaterloo.ca D.

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex and Policy-Iteration Methods are Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010; revised Noveber 30, 2010 Abstract We prove that the classic

More information

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XV - Modeling of Discrete Event Systems - Stéphane Lafortune

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XV - Modeling of Discrete Event Systems - Stéphane Lafortune CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XV - Modeling of Discrete Event Systes - Stéphane Lafortune MODELING OF DISCRETE EVENT SYSTEMS Stéphane Lafortune The University of Michigan, USA Keywords:

More information

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact Physically Based Modeling CS 15-863 Notes Spring 1997 Particle Collision and Contact 1 Collisions with Springs Suppose we wanted to ipleent a particle siulator with a floor : a solid horizontal plane which

More information