arxiv: v5 [cs.lg] 28 Mar 2017

Size: px
Start display at page:

Download "arxiv: v5 [cs.lg] 28 Mar 2017"

Transcription

1 Equilibrium Propagation: Briging the Gap Between Energy-Base Moels an Backpropagation Benjamin Scellier an Yoshua Bengio * Université e Montréal, Montreal Institute for Learning Algorithms March 3, 217 arxiv: v5 [cs.lg] 28 Mar 217 Abstract We introuce Equilibrium Propagation, a learning framework for energy-base moels. It involves only one kin of neural computation, performe in both the first phase when the preiction is mae an the secon phase of training after the target or preiction error is reveale. Although this algorithm computes the graient of an objective function just like Backpropagation, it oes not nee a special computation or circuit for the secon phase, where errors are implicitly propagate. Equilibrium Propagation shares similarities with Contrastive Hebbian Learning an Contrastive Divergence while solving the theoretical issues of both algorithms: our algorithm computes the graient of a well efine objective function. Because the objective function is efine in terms of local perturbations, the secon phase of Equilibrium Propagation correspons to only nuging the preiction fixe point, or stationary istribution towars a configuration that reuces preiction error. In the case of a recurrent multi-layer supervise network, the output units are slightly nuge towars their target in the secon phase, an the perturbation introuce at the output layer propagates backwar in the hien layers. We show that the signal back-propagate uring this secon phase correspons to the propagation of error erivatives an encoes the graient of the objective function, when the synaptic upate correspons to a stanar form of spike-timing epenent plasticity. This work makes it more plausible that a mechanism similar to Backpropagation coul be implemente by brains, since leaky integrator neural computation performs both inference an error back-propagation in our moel. The only local ifference between the two phases is whether synaptic changes are allowe or not. We also show experimentally that multi-layer recurrently connecte networks with 1, 2 an 3 hien layers can be traine by Equilibrium Propagation on the permutation-invariant MNIST task. 1 Introuction The Backpropagation algorithm to train neural networks is consiere to be biologically implausible. Among other reasons, one major reason is that Backpropagation requires a special computational circuit an a special kin of computation in the secon phase of training. Here we introuce a new learning framework calle Equilibrium Propagation, which requires only one computational circuit an one type of computation for both phases of training. Just like Backpropagation applies to any ifferentiable computational graph an not just a regular multi-layer neural network, Equilibrium Propagation applies to a whole class of energy base moels the prototype of which is the continuous Hopfiel moel. In section 2, we revisit the continuous Hopfiel moel Hopfiel, 1984 an introuce Equilibrium Propagation as a new framework to train it. The moel is riven by an energy function whose minima correspon to preferre states of the moel. At preiction time, inputs are clampe an the network relaxes to a fixe point, corresponing to a local minimum of the energy function. The preiction is then rea out on the output units. This correspons to the first phase of the algorithm. In the secon phase of the training framework, when the target values for output units are observe, the outputs are nuge towars their targets an the network relaxes to a new but nearby fixe point which correspons to slightly smaller preiction error. The learning rule, which is prove to perform graient escent on the square error, is a kin of contrastive Hebbian learning rule in which we learn make more probable the secon-phase fixe point by reucing its energy an unlearn make less probable the first-phase fixe point by increasing its energy. However, our learning rule is not the usual contrastive Hebbian learning rule an it also iffers from Boltzmann machine learning rules, as iscusse in sections 4.1 an 4.2. During the secon phase, the perturbation cause at the outputs propagates across hien layers in the network. Because the propagation goes from outputs backwar in the network, it is better thought of as a back-propagation. It is *Y.B. is also a Senior Fellow of CIFAR 1

2 shown by Bengio an Fischer 215; Bengio et al. 217 that the early change of neural activities in the secon phase correspons to the propagation of error erivatives with respect to neural activities. Our contribution in this paper is to go beyon the early change of neural activities an to show that the secon phase also implements the back-propagation of error erivatives with respect to the synaptic weights, an that this upate correspons to a form of spike-timing epenent plasticity, using the results of Bengio et al In section 3, we present the general formulation of Equilibrium Propagation: a new machine learning framework for energy-base moels. This framework applies to a whole class of energy base moels, which is not limite to the continuous Hopfiel moel but encompasses arbitrary ynamics whose fixe points or stationary istributions correspon to minima of an energy function. In section 4, we compare our algorithm to the existing learning algorithms for energy-base moels. The recurrent back-propagation algorithm introuce by Pinea 1987; Almeia 1987 optimizes the same objective function as ours but it involves a ifferent kin of neural computation in the secon phase of training, which is not satisfying from a biological perspective. The contrastive Hebbian learning rule for continuous Hopfiel nets escribe by Movellan 199 suffers from theoretical problems: learning may eteriorate when the free phase an clampe phase lan in ifferent moes of the energy function. The Contrastive Divergence algorithm Hinton, 22 has theoretical issues too: the CD 1 upate rule may cycle inefinitely Sutskever an Tieleman, 21. The equivalence of back-propagation an contrastive Hebbian learning was shown by Xie an Seung 23 but at the cost of extra assumptions: their moel requires infinitesimal feeback weights an exponentially growing learning rates for remote layers. Equilibrium Propagation solves all these theoretical issues at once. Our algorithm computes the graient of a soun objective function that correspons to local perturbations. It can be realize with leaky integrator neural computation which performs both inference in the first phase an back-propagation of error erivatives in the secon phase. Furthermore, we o not nee extra hypotheses such as those require by Xie an Seung 23. Note that algorithms relate to ours base on infinitesimal perturbations at the outputs were also propose by O Reilly 1996; Hertz et al Finally, we show experimentally in section 5 that our moel is trainable. We train recurrent neural networks with 1, 2 an 3 hien layers on MNIST an we achieve.% training error. The generalization error lies between 2% an 3% epening on the architecture. The coe for the moel is available 1 for replicating an extening the experiments. 2 The Continuous Hopfiel Moel Revisite: Equilibrium Propagation as a More Biologically Plausible Backpropagation In this section we revisit the continuous Hopfiel moel Hopfiel, 1984 an introuce Equilibrium Propagation, a novel learning algorithm to train it. Equilibrium Propagation is similar in spirit to Backpropagation in the sense that it involves the propagation of a signal from output units to input units backwar in the network, uring the secon phase of training. Unlike Backpropagation, Equilibrium Propagation requires only one kin of neural computations for both phases of training, making it more biologically plausible than Backpropagation. However, several points still nee to be eluciate from a biological perspective. Perhaps the most important of them is that the moel escribe in this section requires symmetric weights, a question iscusse at the en of this paper. 2.1 A Kin of Hopfiel Energy Previous work Hinton an Sejnowski, 1986; Friston an Stephan, 27; Berkes et al., 211 has hypothesize that, given a state of sensory information, neurons are collectively performing inference: they are moving towars configurations that better explain the observe sensory ata. We can think of the neurons configuration as an explanation or interpretation for the observe sensory ata. In the energy-base moel presente here, that means that the units of the network graually move towars lower energy configurations that are more probable, given the sensory input an accoring to the current "moel of the worl" associate with the parameters of the moel. We enote by u the set of units of the network 2, an by θ = W, b the set of free parameters to be learne, which inclues the synaptic weights W ij an the neuron biases b i. The units are continuous-value an woul correspon to average voltage potential across time, spikes, an possibly neurons in the same minicolumn. Finally, ρ is a nonlinear activation function such that ρu i represents the firing rate of unit i. We consier the following energy function E, a kin of Hopfiel energy, first stuie by Bengio an Fischer 215; Bengio et al. 215a,b, 217: Eu := 1 u 2 i 1 W ijρu iρu j 2 2 i i j i For reasons of convenience, we use the same symbol u to mean both the set of units an the value of those units b iρu i. 1 2

3 Note that the network is recurrently connecte with symmetric connections, that is W ij = W ji. The algorithm presente here is applicable to any architecture so long as connections are symmetric, even a fully connecte network. However, to make the connection to backpropagation more obvious, we will consier more specifically a layere architecture with no skip-layer connections an no lateral connections within a layer Figure 1. In the supervise setting stuie here, the units of the network are split in three sets: the inputs x which are always clampe just like in other moels such as the conitional Boltzmann machine, the hien units h which may themselves be split in several layers an the output units y. We use the notation y for the targets, which shoul not be confuse with the outputs y. The set of all units in the network is u = {x, h, y}. As usual in the supervise learning scenario, the output units y aim to replicate their targets y. The iscrepancy between the output units y an the targets y is measure by the quaratic cost function C := 1 2 y y 2. 2 The cost function C also acts as an external potential energy for the output units, which can rive them towars their target. A novelty in our work, with respect to previously stuie energy-base moels, is that we introuce the total energy function F, which takes the form F := E + βc, 3 where β is a real-value scalar that controls whether the output y is pushe towars the target y or not, an by how much. We call β the influence parameter or clamping factor. The total energy F is the sum of two potential energies: the internal potential E that moels the interactions within the network, an the external potential βc that moels how the targets influence the output units. Contrary to Boltzmann Machines where the visible units are either free or fully clampe, here the real-value parameter β allows the output units to be weakly clampe. 2.2 The Neuronal Dynamics Figure 1: The input units x are always clampe. The state variable s inclues the hien units h an output units y. The targets are enote by y. The network is recurrently connecte with symmetric connections. Left. Equilibrium Propagation applies to any architecture, even a fully connecte network. Right. The connection with Backpropagation is more obvious when the network has a layere architecture. 3

4 We enote the state variable of the network by s = {h, y} which oes not inclue the input units x since they are always clampe. We assume that the time evolution of the state variable s is governe by the graient ynamics s t = F. 4 Unlike more conventional artificial neural networks, the moel stuie here is a continuous-time ynamical system escribe by the ifferential equation of motion Eq. 4. The total energy of the system ecreases as time progresses θ, β, x an y being fixe since F t = F s t = s 2 t. 5 The energy stops ecreasing when the network has reache a fixe point: F t = s t = F =. 6 The ifferential equation of motion Eq. 4 can be seen as a sum of two forces that act on the temporal erivative of s: s t = E β C. 7 The internal force inuce by the internal potential the Hopfiel energy, Eq. 1 on the i-th unit is E = ρ s i W ijρu j + b i s i, 8 i j i while the external force inuce by the external potential Eq. 2 on h i an y i is respectively β C h i = an β C y i = βy i y i. 9 The form of Eq. 8 is reminiscent of a leaky integrator neuron moel, in which neurons are seen as performing leaky temporal integration of their past inputs. Note that the hypothesis of symmetric connections W ij = W ji was use to erive Eq. 8. As iscusse in Bengio an Fischer 215, the factor ρ s i woul suggest that when a neuron is saturate firing at the maximal or minimal rate so that ρ s i, its state is not sensitive to external inputs, while the leaky term rives it out of the saturation regime, towars its resting value s i =. The form of Eq. 9 suggests that when β =, the output units are not sensitive to the outsie worl y. In this case we say that the network is in the free phase or first phase. On the contrary, when β >, the external force rives the output unit y i towars the target y i. In this case, we say that the network is in the weakly clampe phase or secon phase. Finally, a more likely ynamics woul inclue some form of noise. The notion of fixe point is then replace by that of stationary istribution. In Appenix C, we present a stochastic framework that naturally extens the analysis presente here. 2.3 Free Phase, Weakly Clampe Phase an Backpropagation of Errors In the first phase of training, the inputs are clampe an β = the output units are free. We call this phase the free phase an the state towars which the network converges is the free fixe point u. The preiction is rea out on the output units y at the fixe point. In the secon phase which we call weakly clampe phase, the influence parameter β is change to a small positive value β > an the novel term βc ae to the energy function Eq. 3 inuces a new external force that acts on the output units Eq. 9. This force moels the observation of y: it nuges the output units from their free fixe point value in the irection of their target. Since this force only acts on the output units, the hien units are initially at equilibrium at the beginning of the weakly clampe phase, but the perturbation cause at the output units will propagate in the hien units as time progresses. When the architecture is a multi-layere net Figure 1. Right, the perturbation at the output layer propagates backwars across the hien layers of the network. This propagation is thus better thought of as a backpropagation. The net eventually settles to a new fixe point corresponing to the new positive value of β which we call weakly clampe fixe point an enote by u β. Remarkably, the perturbation that is back-propagate uring the secon phase correspons to the propagation of error erivatives. It was first shown by Bengio an Fischer 215 that, starting from the free fixe point, the early changes of neural activities uring the weakly clampe phase cause by the output units moving towars their target 4

5 approximate some kin of error erivatives with respect to the layers activities. They consiere a regular multi-layer neural network with no skip-layer connections an no lateral connections within a layer. In this paper, we show that the weakly clampe phase also implements the back-propagation of error erivatives with respect to the synaptic weights. In the limit β, the upate rule W ij 1 ρ u β i ρ u β j ρ u i ρ u j 1 β gives rise to stochastic graient escent on J := 1 2 y y 2, where y is the state of the output units at the free fixe point. We will state an prove this theorem in a more general setting in section 3. In particular, this result hols for any architecture an not just a layere architecture Figure 1 like the one consiere by Bengio an Fischer 215. The learning rule Eq. 1 is a kin of contrastive Hebbian learning rule, somewhat similar to the one stuie by Movellan 199 an the Boltzmann machine learning rule. The ifferences with these algorithms will be iscusse in section 4. We call our learning algorithm Equilibrium Propagation. In this algorithm, leaky integrator neural computation as escribe in section 2.2, performs both inference in the free phase an error back-propagation in the weakly clampe phase. 2.4 Connection to Spike-Timing Depenent Plasticity Spike-Timing Depenent Plasticity STDP is believe to be a prominent form of synaptic change in neurons Markram an Sakmann, 1995; Gerstner et al., 1996, an see Markram et al. 212 for a review. The STDP observations relate the expecte change in synaptic weights to the timing ifference between postsynaptic spikes an presynaptic spikes. This is the result of experimental observations in biological neurons, but its role as part of a learning algorithm remains a topic where more exploration is neee. Here is an attempt in this irection. Experimental results by Bengio et al. 215a show that if the temporal erivative of the synaptic weight W ij satisfies W ij t ρu i uj t, 11 then one recovers the experimental observations by Bi an Poo 21 in biological neurons. Xie an Seung 2 have stuie the learning rule W ij ρu ρuj i. 12 t t Note that the two rules Eq. 11 an Eq. 12 are the same up to a factor ρ u j. An avantage of Eq. 12 is that it leas to a more natural view of the upate rule in the case of tie weights stuie here W ij = W ji. Inee, the upate shoul take into account the pressures from both the i to j an j to i synapses, so that the total upate uner constraint is W ij t ρu i ρuj t + ρu j ρui t = ρuiρuj. 13 t By integrating Eq. 13 on the path from the free fixe point u to the weakly clampe fixe point u β uring the secon phase, we get Wij W ij t = t t ρuiρujt = ρu iρu j = ρ u β i ρ u β j ρ u i ρ u j, 14 which is the same learning rule as Eq. 1 up to a factor 1/β. Therefore the upate rule Eq. 1 can be interprete as a continuous-time integration of Eq. 12, in the case of symmetric weights, on the path from u to u β uring the secon phase. We propose two possible interpretations for the synaptic plasticity in our moel. First view. In the first phase, a anti-hebbian upate occurs at the free fixe point W ij ρ u i ρ u j. In the secon phase, a Hebbian upate occurs at the weakly-clampe fixe point W ij +ρ u β i ρ u β j. Secon view. In the first phase, no synaptic upate occurs: W ij =. In the secon phase, when the neurons state move from the free fixe point u to the weakly-clampe fixe point u β, the synaptic weights follow the "tie version" of the continuous-time upate rule W ij ρui ρu j + ρu t t t j ρu i. t 5

6 3 A Machine Learning Framework for Energy Base Moels In this section we generalize the setting presente in section 2. We lay own the basis for a new machine learning framework for energy-base moels, in which Equilibrium Propagation plays a role analog to Backpropagation in computational graphs to compute the graient of an objective function. Just like the Multi Layer Perceptron is the prototype of computational graphs in which Backpropagation is applicable, the continuous Hopfiel moel presente in section 2 appears to be the prototype of moels which can be traine with Equilibrium Propagation. In our new machine learning framework, the central object is the total energy function F : all quantities of interest fixe points, cost function, objective function, graient formula can be efine or formulate irectly in terms of F. Besies, in our framework, the preiction or fixe point is efine implicitly in terms of the ata point an the parameters of the moel, rather than explicitly like in a computational graph. This implicit efinition makes applications on igital harware such as GPUs less practical as it nees long inference phases involving numerical optimization of the energy function. But we expect that this framework coul be very efficient if implemente by analog circuits, as suggeste by Hertz et al The framework presente in this section is eterministic, but a natural extension to the stochastic case is presente in Appenix C. 3.1 Training Objective In this section, we present the general framework while making sure to be consistent with the notations an terminology introuce in section 2. We enote by s the state variable of the network, v the state of the external worl i.e. the ata point being processe, an θ the set of free parameters to be learne. The variables s, v an θ are real-value vectors. The state variable s spontaneously moves towars low-energy configurations of an energy function Eθ, v, s. Besies that, a cost function Cθ, v, s measures how goo is a state is. The goal is to make low-energy configurations have low cost value. For fixe θ an v, we enote by s θ,v a local minimum of E, also calle fixe point, which correspons to the preiction from the moel: s θ,v arg min Eθ, v, s. 15 s Here we use the notation arg min to refer to the set of local minima. The objective function that we want to optimize is Jθ, v := C θ, v, s θ,v. 16 Note the istinction between the cost function C an the objective function J: the cost function is efine for any state s, whereas the objective function is the cost associate to the fixe point s θ,v. Now that the objective function has been introuce, we efine the training objective for a single ata point v as fin arg min θ Jθ, v. 17 A formula to compute the graient of J will be given in section 3.3 Theorem 1. Equivalently, the training objective can be reformulate as the following constraine optimization problem: fin subject to arg min Cθ, v, s 18 θ,s E θ, v, s =, 19 where the constraint E θ, v, s = is the fixe point conition. For completeness, a solution to this constraine optimization problem is given in Appenix B as well. Of course, both formulations of the training objective lea to the same graient upate on θ. Note that, since the cost Cθ, v, s may epen on θ, it can inclue a regularization term of the form λ Ω θ, where Ω θ is a L 1 or L 2 norm penalty for example. In section 2 we ha s = {h, y} for the state variable, v = {x, y} for the state of the outsie worl, θ = W, b for the set of learne parameters, an the energy function E an cost function C were of the form E θ, v, s = E θ, x, h, y an C θ, v, s = C y, y. 3.2 Total Energy Function Following section 2, we introuce the total energy function F θ, v, β, s := Eθ, v, s + β Cθ, v, s, 2 6

7 where β is a real-value scalar calle influence parameter. Then we exten the notion of fixe point for any value of β. The fixe point or energy minimum, enote by s β θ,v, is characterize by F θ, v, β, s β θ,v = 21 an 2 F θ, v, β, s β 2 θ,v is a symmetric positive efinite matrix. Uner mil regularity conitions on F, the implicit function theorem ensures that, for fixe v, the funtion θ, β s β θ,v is ifferentiable. 3.3 The Learning Algorithm: Equilibrium Propagation Theorem 1 Deterministic version. The graient of the objective function with respect to θ is given by the formula J 1 F θ, v = lim θ, v, β, s β θ,v F θ, v,, s θ,v, 22 β β or equivalently J C θ, v = θ, v, s 1 E θ,v + lim θ, v, s β θ,v E θ, v, s θ,v. 23 β β Theorem 1 will be prove in Appenix A. Note that the parameter β in Theorem 1 nee not be positive We only nee β. Using the terminology introuce in section 2, we call s θ,v the free fixe point, an s β θ,v the nuge fixe point or weakly-clampe fixe point in the case β >. Moreover, we call a free phase resp. nuge phase, or weakly-clampe phase a proceure that yiels a free fixe point resp. nuge fixe point, or weakly-clampe fixe point by minimizing the energy function F with respect to s, for β = resp. β. Theorem 1 suggests the following two-phase training proceure. Given a ata point v: 1. Run a free phase until the system settles to a free fixe point s θ,v an collect F θ, v,, s θ,v. 2. Run a nuge phase for some β such that β is "small", until the system settles to a nuge fixe point s β θ,v an collect F θ, v, β, s β θ,v. 3. Upate the parameter θ accoring to θ 1 β F θ, v, β, s β θ,v F θ, v,, s θ,v. 24 Consier the case β >. Starting from the free fixe point s θ,v which correspons to the preiction, a small change of the parameter β from the value β = to a value β > causes slight moifications in the interactions in the network. This small perturbation makes the network settle to a nearby weakly-clampe fixe point s β θ,v. Simultaneously, a kin of contrastive upate rule for θ is happening, in which the energy of the free fixe point is increase an the energy of the weakly-clampe fixe point is ecrease. We call this learning algorithm Equilibrium Propagation. Note that in the setting introuce in section 2.1 the total energy function Eq. 3 is such that F W ij = ρu iρu j, in agreement with Eq. 1. In the weakly clampe phase, the novel term 1 2 β y y 2 ae to the energy E with β > slightly attracts the output state y to the target y. Clearly, the weakly clampe fixe point is better than the free fixe point in terms of preiction error. The following proposition generalizes this property to the general setting. Proposition 2 Deterministic version. The erivative of the function β C θ, v, s β θ,v at β = is non-positive. Proposition 2 will also be prove in Appenix A. This proposition shows that, unless the free fixe point s θ,v is alreay optimal in terms of cost value, for β > small enough, the weakly-clampe fixe point s β θ,v achieves lower cost value than the free fixe point. Thus, a small perturbation ue to a small increment of β woul nuge the system towars a state that reuces the cost value. This property shes light on the upate rule Theorem 1, which can be seen as a kin of contrastive learning rule somehow similar to the Boltzmann machine learning rule where we learn make more probable the slightly better state s β θ,v by reucing its energy an unlearn make less probable the slightly worse state s θ,v by increasing its energy. However, our learning rule is ifferent from the Boltzmann machine learning rule an the contrastive Hebbian learning rule. The ifferences between these algorithms will be iscusse in section

8 Figure 2: Comparison between the traitional framework for Deep Learning an our framework. Left. In the traitional framework, the state of the network f θ v an the objective function Jθ, v are explicit functions of θ an v an are compute analytically. The graient of the objective function is also compute analytically thanks to the Backpropagation algorithm a.k.a automatic ifferentiation. Right. In our framework, the free fixe point s θ,v is an implicit function of θ an v an is compute numerically. The nuge fixe point s β θ,v an the graient of the objective function are also compute numerically, following our learning algorithm: Equilibrium Propagation. 3.4 Another View of the Framework In sections 3.1 an 3.2 as well as in section 2 we first efine the energy function E an the cost function C, an then we introuce the total energy F := E + βc. Here we propose an alternative view of the framework, where we reverse the orer in which things are efine. Given a total energy function F which moels all interactions within the network as well as the action of the external worl on the network, we can efine all quantities of interest in terms of F. Inee, we can efine the energy function E an the cost function C as Eθ, v, s := F θ, v,, s an Cθ, v, s := F θ, v,, s, 26 where F an F are evaluate with the argument β set to. Obviously the fixe points s θ,v an s β θ,v are irectly efine in terms of F, an so is the objective function Jθ, v := C θ, v, sθ,v. The learning algorithm Theorem 1 is also formulate in terms of F. 3 From this perspective, F contains all the information about the moel an can be seen as the central object of the framework. For instance, the cost C represents the marginal variation of the total energy F ue to a change of β. 3 The proof presente in Appenix A will show that E, C an F nee not satisfy Eq. 2 but only Eq

9 As a comparison, in the traitional framework for Deep Learning, a moel is represente by a ifferentiable computational graph in which each noe is efine as a function of its parents. The set of functions that efine the noes fully specifies the moel. The last noe of the computational graph represents the cost to be optimize, while the other noes represent the state of the layers of the network, as well as other intermeiate computations. In the framework for machine learning propose here the framework suite for Equilibrium Propagation, the analog of the set of functions that efine the noes in the computational graph is the total energy function F. 3.5 Backpropagation Vs Equilibrium Propagation In the traitional framework for Deep Learning Figure 2, left, each noe in the computational graph is an explicit ifferentiable function of its parents. The state of the network ŝ = f θ v an the objective function Jθ, v = C θ, v, f θ v are compute analytically, as functions of θ an v, in the forwar pass. The Backpropagation algorithm a.k.a automatic ifferentiation enables to compute the error erivatives analytically too, in the backwar pass. Therefore, the state of the network ŝ = f θ v forwar pass an the graient of the objective function J θ, v backwar pass can be compute efficiently an exactly. 4 In the framework for machine learning that we propose here Figure 2, right, the free fixe point ŝ = s θ,v is an implicit function of θ an v, characterize by E θ, v, s θ,v =. The free fixe point is compute numerically, in the free phase first phase. Similarly the nuge fixe point s β θ,v is an implicit function of θ, v an β, an is compute numerically in the nuge phase secon phase. Equilibrium Propagation estimates for the particular value of β chosen in the secon phase the graient of the objective function J θ, v base on these two fixe points. The requirement for numerical optimization in the first an secon phases make computations inefficient an approximate. The experiments in section 5 will show that the free phase is fairly long when performe with a iscrete-time computer simulation. However, we expect that the full potential of the propose framework coul be exploite on analog harware instea of igital harware, as suggeste by Hertz et al Relate Work In section 2.3 we have iscusse the relationship between Equilibrium Propagation an Backpropagation. In the weakly clampe phase, the change of the influence parameter β creates a perturbation at the output layer which propagates backwars in the hien layers. The error erivatives an the graient of the objective function are encoe by this perturbation. In this section we iscuss the connection between our work an other algorihms, starting with Contrastive Hebbian Learning. Equilibrium Propagation offers a new perspective on the relationship between Backpropagation in feeforwar nets an Contrastive Hebbian Learning in Hopfiel nets an Boltzmann machines Table 1. Backprop Equilibrium Prop Contrastive Hebbian Learning Almeia-Pineia First Phase Forwar Pass Free Phase Free Phase or Negative Phase Free Phase Secon Phase Backwar Pass Weakly Clampe Phase Clampe Phase or Positive Phase Recurrent Backprop Table 1: Corresponence of the phases for ifferent learning algorithms: Back-propagation, Equilibrium Propagation our algorithm, Contrastive Hebbian Learning an Boltzmann Machine Learning an Almeia-Pineia s Recurrent Back-Propagation 4.1 Link to Contrastive Hebbian Learning Despite the similarity between our learning rule an the Contrastive Hebbian Learning rule CHL for the continuous Hopfiel moel, there are important ifferences. First, recall that our learning rule is 1 W ij lim ρ u β i ρ u β j ρ u i ρ u j, 27 β β where u is the free fixe point an u β is the weakly clampe fixe point. The Contrastive Hebbian Learning rule is W ij ρ u i ρ u j ρ u i ρ u j, 28 4 Here we are not consiering numerical stability issues ue to the encoing of real numbers with finite precision. 9

10 where u is the fully clampe fixe point i.e. fixe point with fully clampe outputs. We choose the notation u for the fully clampe fixe point because it correspons to β + with the notations of our moel. Inee Eq. 9 shows that in the limit β +, the output unit y i moves infinitely fast towars y i, so y i is immeiately clampe to y i an is no longer sensitive to the internal force Eq. 8. Another way to see it is by consiering Eq. 3: as β +, the only value of y that gives finite energy is y. The objective functions that these two algorithms optimize also iffer. Recalling the form of the Hopfiel energy Eq. 1 an the cost function Eq. 2, Equilibrium Propagation computes the graient of J = 1 2 y y 2, 29 where y is the output state at the free phase fixe point u, while CHL computes the graient of J CHL = E u E u. 3 The objective function for CHL has theoretical problems: it may take negative values if the clampe phase an free phase stabilize in ifferent moes of the energy function, in which case the weight upate is inconsistent an learning usually eteriorates, as pointe out by Movellan 199. Our objective function oes not suffer from this problem, because it is efine in terms of local perturbations, an the implicit function theorem guarantees that the weakly clampe fixe point will be close to the free fixe point thus in the same moe of the energy function. We can also reformulate the learning rules an objective functions of these algorithms using the notations of the general setting section 3. For Equilibrium Propagation we have 1 θ lim β β F θ, v, β, s β θ,v F θ, v,, s θ,v an Jθ, v = F θ, v,, s θ,v. 31 As for Contrastive Hebbian Learning, one has F θ θ, v,, s F θ,v θ, v,, s θ,v an J CHLθ, v = F θ, v,, s θ,v F θ, v,, s θ,v, 32 where β = an β = are the values of β corresponing to free an fully clampe outputs respectively. Our learning algorithm is also more flexible because we are free to choose the cost function C as well as the energy funtion E, whereas the contrastive function that CHL optimizes is fully etermine by the energy function E. 4.2 Link to Boltzmann Machine Learning Again, the log-likelihoo that the Boltzmann machine optimizes is etermine by the Hopfiel energy E, whereas we have the freeom to choose the cost function in the framework for Equilibrium Propagation. As iscusse in Section 2.3, the secon phase of Equilibrium Propagation going from the free fixe point to the weakly clampe fixe point can be seen as a brief backpropagation phase with weakly clampe target outputs. By contrast, in the positive phase of the Boltzmann machine, the target is fully clampe, so the correct version of the Boltzmann machine learning rule requires two separate an inepenent phases Markov chains, making an analogy with backprop less obvious. Our algorithm is also similar in spirit to the CD algorithm Contrastive Divergence for Boltzmann machines. In our moel, we start from a free fixe point which requires a long relaxation in the free phase an then we run a short weakly clampe phase. In the CD algorithm, one starts from a positive equilibrium sample with the visible units clampe which requires a long positive phase Markov chain in the case of a general Boltzmann machine an then one runs a short negative phase. But there is an important ifference: our algorithm computes the correct graient of our objective function in the limit β, whereas the CD algorithm computes a biase estimator of the graient of the log-likelihoo. The CD 1 upate rule is provably not the graient of any objective function an may cycle inefinitely in some pathological cases Sutskever an Tieleman, 21. Finally, in the supervise setting presente in Section 2, a more subtle ifference with the Boltzmann machine is that the output state y in our moel is best thought of as being part of the latent state variable s. If we were to make an analogy with the Boltzmann machine, the visible units of the Boltzmann machine woul be v = {x, y}, while the hien units woul be s = {h, y}. In the Boltzmann machine, the state of the external worl is inferre irectly on the visible units because it is a probabilistic generative moel that maximizes the log-likelyhoo of the ata, whereas in our moel we make the choice to integrate in s special latent variables y that aim to match the target y. 1

11 4.3 Link to Recurrent Back-Propagation Directly connecte to our moel is the work by Pinea 1987; Almeia 1987 on recurrent back-propagation. They consier the same objective function as ours, but formulate the problem as a constraine optimization problem. In Appenix B we erive another proof for the learning rule Theorem 1 with the Lagrangian formalism for constraine optimization problems. The beginning of this proof is in essence the same as the one propose by Pinea 1987; Almeia 1987, but there is a major ifference when it comes to solving Eq. 75 for the costate variable λ. The metho propose by Pinea 1987; Almeia 1987 is to use Eq. 75 to compute λ by a fixe point iteration in a linearize form of the recurrent network. The computation of λ correspons to their secon phase, which they call recurrent back-propagation. However, this secon phase oes not follow the same kin of ynamics as the first phase the free phase because it uses a linearization of the neural activation rather than the fully non-linear activation. 5 From a biological plausibility point of view, having to use a ifferent kin of harware an computation for the two phases is not satisfying. By contrast, like the continuous Hopfiel net an the Boltzmann machine, our moel involves only one kin of neural computations for both phases. 4.4 The Moel by Xie & Seung Previous work on the back-propagation interpretation of contrastive Hebbian learning was one by Xie an Seung 23. The moel by Xie an Seung 23 is a moifie version of the Hopfiel moel. They consier the case of a layere MLP-like network, but their moel can be extene to a more general connectivity, as shown here. In essence, using the notations of our moel section 2, the energy function that they consier is E X&S u := 1 γ i u 2 i γ j W ijρu iρu j 2 i<j i i γ i b iρu i. 33 The ifference with Eq. 1 is that they introuce a parameter γ, assume to be small, that scales the strength of the connections. Their upate rule is the contrastive Hebbian learning rule which, for this particular energy function, takes the form EX&S W ij u E X&S u = γ j ρ u i ρ u j ρ u i ρ u j 34 W ij W ij for every pair of inices i, j such that i < j. Here u an u are the fully clampe fixe point an free fixe point respectively. Xie an Seung 23 show that in the regime γ this contrastive Hebbian learning rule is equivalent to back-propagation. At the free fixe point u, one has E X&S i u = for every unit s 6 i, which yiels, after iviing by γ i an rearranging the terms s i = ρ s i W ijρ u j + γ j i W ijρ u j + bi. 35 j<i In the limit γ, one gets s i ρ s i j<i Wijρu j + b i, so that the network almost behaves like a feeforwar net in this regime. As a comparison, recall that in our moel section 2 the energy function is the learning rule is 1 W ij lim β β j>i Eu := 1 u 2 i W ijρu iρu j 2 i i<j i E W ij u β b iρu i, 36 E u 1 = lim ρ u β i ρ u β j ρ u i ρ u j, 37 W ij β β an at the free fixe point, we have E i u = for every unit s i, which gives s i = ρ s i W ijρ u j + bi. 38 j i Here are the main ifferences between our moel an theirs. In our moel, the feeforwar an feeback connections are both strong. In their moel, the feeback weights are tiny compare to the feeforwar weights, which makes the 5 Reccurent Back-propagation correspons to Back-propagation Through Time BPTT when the network converges an remains at the fixe point for a large number of time steps. 6 Recall that in our notations, the state variable s oes not inclue the clampe inputs x, whereas u inclues x. 11

12 recurrent computations look almost feeforwar. In our secon phase, the outputs are weakly clampe. In their secon phase, they are fully clampe. The theory of our moel requires a unique learning rate for the weights, while in their moel the upate rule for W ij with i < j is scale by a factor γ j see Eq. 34. Since γ is small, the learning rates for the weights vary on many orers of magnitue in their moel. Intuitively, these multiple learning rates are require to compensate for the small feeback weights. 5 Implementation of the Moel an Experimental Results In this section, we provie experimental evience that our moel escribe in section 2 is trainable, by testing it on the classification task of MNIST igits LeCun an Cortes, The MNIST ataset of hanwritten igits consists of 6, training examples an 1, test examples. Each example x in the ataset is a gray-scale image of 28 by 28 pixels an comes with a label y {, 1,..., 9}. We use the same notation y for the one-hot encoing of the target, which is a 1-imensional vector. Recall that our moel is a recurrently connecte neural network with symmetric connections. Here, we train multilayere networks with 1, 2 an 3 hien layers, with no skip-layer connections an no lateral connections within layers. Although we believe that analog harware woul be more suite for our moel, here we propose an implementation on igital harware a GPU. We achieve.% training error. The generalization error lies between 2% an 3% epening on the architecture Figure 3. For each training example x, y in the ataset, training procees as follows: 1. Clamp x. 2. Run the free phase until the hien an output units settle to the free fixe point, an collect ρ u i ρ u j for every pair of units i, j. 3. Run the weakly clampe phase with a "small" β > until the hien an output units settle to the weakly clampe fixe point, an collect ρ u β i ρ u β j. 4. Upate each synapse W ij accoring to W ij 1 β ρ u β i ρ u β j ρ u i ρ u j. 39 The preiction is mae at the free fixe point u at the en of the first phase relaxation. The preicte value y pre is the inex of the output unit whose activation is maximal among the 1 output units: y pre := arg max i y i. 4 Note that no constraint is impose on the activations of the units of the output layer in our moel, unlike more traitional neural networks where a softmax output layer is use to constrain them to sum up to 1. Recall that the objective function that we minimize is the square of the ifference between our preiction an the one-hot encoing of the target value: J = 1 2 y y Finite Differences Implementation of the ifferential equation of motion. First we clamp x. Then the obvious way to implement Eq. 4 is to iscretize time into short time lapses of uration ɛ an to upate each hien an output unit s i accoring to s i s i ɛ F i θ, v, β, s. 42 This is simply one step of graient escent on the total energy F, with step size ɛ. For our experiments, we choose the har sigmoi activation function ρs i = s i 1, where enotes the max an the min. For this choice of ρ, since ρ s i = for s i <, it follows from Eq. 8 an Eq. 9 that if h i < then F h i θ, v, β, s = h i >. This force prevents the hien unit h i from going in the range of negative values. The same is true for the output units. Similarly, s i cannot reach values above 1. As a consequence s i must remain in the omain s i 1. Therefore, rather than the stanar graient escent Eq. 42, we will use a slightly ifferent upate rule for the state variable s: s i s i ɛ F θ, v, β, s i 12

13 Figure 3: Training an valiation error for neural networks with 1 hien layer of 5 units top left, 2 hien layers of 5 units top right, an 3 hien layers of 5 units bottom. The training error eventually ecreases to.% in all three cases. This little implementation etail turns out to be very important: if the i-th hien unit was in some state h i <, then Eq. 42 woul give the upate rule h i 1 ɛh i, which woul imply again h i < at the next time step assuming ɛ < 1. As a consequence h i woul remain in the negative range forever. Choice of the step size ɛ. We fin experimentally that the choice of ɛ has little influence as long as < ɛ < 1. What matters more is the total uration of the relaxation t = n iter ɛ where n iter is the number of iterations. In our experiments we choose ɛ =.5 to keep n iter = t/ɛ as small as possible so as to avoi extra unnecessary computations. Duration of the free phase relaxation. We fin experimentally that the number of iterations require in the free phase to reach the free fixe point is large an grows fast as the number of layers increases Table 2, which consierably slows own training. More experimental an theoretical investigation woul be neee to analyze the number of iterations require, but we leave that for future work. Duration of the weakly clampe phase. During the weakly clampe phase, we observe that the relaxation to the weakly clampe fixe point is not necessary. We only nee to initiate the movement of the units, an for that we use the following heuristic. Notice that the time constant of the integration process in the leaky integrator equation Eq. 8 is τ = 1. This time constant represents the time neee for a signal to propagate from a layer to the next one with "significant amplitue". So the time neee for the error signals to back-propagate in the network is Nτ = N, where N is the number of layers hiens an output of the network. Thus, we choose to perform N/ɛ iterations with step size ɛ = Implementation Details an Experimental Results To tackle the problem of the long free phase relaxation an spee-up the simulations, we use persistent particles for the latent variables to re-use the previous fixe point configuration for a particular example as a starting point for the next free phase relaxation on that example. This means that for each training example in the ataset, we store the state of the hien layers at the en of the free phase, an we use this to initialize the state of the network at the next epoch. This metho is similar in spirit to the PCD algorithm Persistent Contrastive Divergence for sampling from other energy-base moels like the Boltzmann machine Tieleman, 28. We fin that it helps regularize the network if we choose the sign of β at ranom in the secon phase. Note that the 13

14 weight upates remain consistent thanks to the factor 1/β in the upate rule W ij 1 ρ β Inee, the left-erivative an the right-erivative of the function β ρ u β i ρ u β j u β i ρ u β j ρ u i ρ u j. at the point β = coincie. Although the theory presente in this paper requires a unique learning rate for all synaptic weights, in our experiments we nee to choose ifferent learning rates for the weight matrices of ifferent layers to make the algorithm work. We o not have a clear explanation for this fact yet, but we believe that this is ue to the finite precision with which we approach the fixe points. Inee, the theory requires to be exactly at the fixe points, but in practice we minimize the energy function by numerical optimization, using Eq. 43. The precision with which we approach the fixe points epens on hyperparameters such as the step size ɛ an the number of iterations n iter. Let us enote by h, h 1,, h N the layers of the network where h = x an h N = y an by W k the weight matrix between the layers h k 1 an h k. We choose the learning rate α k for W k so that the quantities W k W k for k = 1,, N are approximately the same in average over training examples, where W k represents the weight change of W k after seeing a minibatch. The hyperparameters chosen for each moel are shown in Table 2 an the results are shown in Figure 3. We initialize the weights accoring to the Glorot-Bengio initialization Glorot an Bengio, 21. For efficiency of the experiments, we use minibatches of 2 training examples. Architecture Iterations Iterations ɛ β α 1 α 2 α 3 α 4 first phase secon phase Table 2: Hyperparameters. The learning rate ɛ is use for iterative inference Eq. 43. β is the value of the clamping factor in the secon phase. α k is the learning rate for upating the parameters in layer k. 6 Discussion, Looking Forwar From a biological perspective, a troubling issue in the Hopfiel moel is the requirement of symmetric weights between the units. Note that the units in our moel nee not correspon exactly to actual neurons in the brain it coul be groups of neurons in a cortical microcircuit, for example. It remains to be shown how a form of symmetry coul arise from the learning proceure itself for example from autoencoer-like unsupervise learning or if a ifferent formulation coul eliminate the symmetry requirement. Encouraging cues come from the observation that enoising autoencoers without tie weights often en up learning symmetric weights Vincent et al., 21. Another encouraging piece of evience, also linke to autoencoers, is the theoretical result from Arora et al. 215, showing that the symmetric solution minimizes the autoencoer reconstruction error between two successive layers of rectifying ReLU units, suggesting that symmetry may arise as the result of an aitional objective function making successive layers form an autoencoer. Also, Lillicrap et al. 214 show that the backpropagation algorithm for feeforwar nets also works when the feeback weights are ranom, an that in this case the feeforwar weight ten to align with the feeback weights. Another practical issue is that we woul like to reuce the negative impact of a lengthy relaxation to a fixe point, especially in the free phase. A possibility is explore by Bengio et al. 216 an was initially iscusse by Salakhutinov an Hinton 29 in the context of a stack of RBMs: by making each layer a goo autoencoer, it is possible to make this iterative inference converge quickly after an initial feeforwar phase, because the feeback paths agree with the states alreay compute in the feeforwar phase. Regaring synaptic plasticity, the propose upate formula can be contraste with theoretical synaptic learning rules which are base on the Hebbian prouct of pre- an postsynaptic activity, such as the BCM rule Bienenstock et al., 1982; Intrator an Cooper, The upate propose here is particular in that it involves the temporal erivative of the postsynaptic activity, rather than the actual level of postsynaptic activity. Whereas our work focuses on a rate moel of neurons, see Felman 212 for an overview of synaptic plasticity that goes beyon spike timing an firing rate, incluing synaptic cooperativity nearby synapses on the same enritic subtree an epolarization ue to multiple consecutive pairings or spatial integration across nearby locations on the enrite, as well as the effect of the synapse s istance to the soma. In aition, it woul be interesting to stuy upate rules which epen on the statistics of triplets or quaruplets of spikes timings, as in Froemke an Dan 22; Gjorgjievaa et al These effects are not consiere here but future work shoul consier them. Another question is that of time-varying input. Although this work makes back-propagation more plausible for the case of a static input, the brain is a recurrent network with time-varying inputs, an back-propagation through time seems even less plausible than static back-propagation. An encouraging irection is that propose by Ollivier et al. 215; 14

arxiv: v4 [cs.lg] 26 Sep 2016

arxiv: v4 [cs.lg] 26 Sep 2016 Equilibrium Propagation: Briging the Gap Between Energy-Base Moels an Backpropagation Benjamin Scellier an Yoshua Bengio Université e Montréal, Montreal Institute for Learning Algorithms September 27,

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13) Slie10 Haykin Chapter 14: Neuroynamics (3r E. Chapter 13) CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Neural Networks with Temporal Behavior Inclusion of feeback gives temporal characteristics to

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

II. First variation of functionals

II. First variation of functionals II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Markov Chains in Continuous Time

Markov Chains in Continuous Time Chapter 23 Markov Chains in Continuous Time Previously we looke at Markov chains, where the transitions betweenstatesoccurreatspecifietime- steps. That it, we mae time (a continuous variable) avance in

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Neural Networks. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Neural Networks. Tobias Scheffer Universität Potsam Institut für Informatik Lehrstuhl Maschinelles Lernen Neural Networks Tobias Scheffer Overview Neural information processing. Fee-forwar networks. Training fee-forwar networks, back

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

Credit Assignment: Beyond Backpropagation

Credit Assignment: Beyond Backpropagation Credit Assignment: Beyond Backpropagation Yoshua Bengio 11 December 2016 AutoDiff NIPS 2016 Workshop oo b s res P IT g, M e n i arn nlin Le ain o p ee em : D will r G PLU ters p cha k t, u o is Deep Learning

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Chapter 2 Lagrangian Modeling

Chapter 2 Lagrangian Modeling Chapter 2 Lagrangian Moeling The basic laws of physics are use to moel every system whether it is electrical, mechanical, hyraulic, or any other energy omain. In mechanics, Newton s laws of motion provie

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

Dynamics of Cortical Columns Self-Organization of Receptive Fields

Dynamics of Cortical Columns Self-Organization of Receptive Fields Preprint, LNCS 3696, 31 37, ICANN 2005 Dynamics of Cortical Columns Self-Organization of Receptive Fiels Jörg Lücke 1,2 an Jan D. Bouecke 1 1 Institut für Neuroinformatik, Ruhr-Universität, 44780 Bochum,

More information

Quantum mechanical approaches to the virial

Quantum mechanical approaches to the virial Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

A simple model for the small-strain behaviour of soils

A simple model for the small-strain behaviour of soils A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:

More information

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency Transmission Line Matrix (TLM network analogues of reversible trapping processes Part B: scaling an consistency Donar e Cogan * ANC Eucation, 308-310.A. De Mel Mawatha, Colombo 3, Sri Lanka * onarecogan@gmail.com

More information

A. Incorrect! The letter t does not appear in the expression of the given integral

A. Incorrect! The letter t does not appear in the expression of the given integral AP Physics C - Problem Drill 1: The Funamental Theorem of Calculus Question No. 1 of 1 Instruction: (1) Rea the problem statement an answer choices carefully () Work the problems on paper as neee (3) Question

More information

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM

ON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM ON THE OPTIMALITY SYSTEM FOR A D EULER FLOW PROBLEM Eugene M. Cliff Matthias Heinkenschloss y Ajit R. Shenoy z Interisciplinary Center for Applie Mathematics Virginia Tech Blacksburg, Virginia 46 Abstract

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

arxiv: v3 [cs.lg] 3 Dec 2017

arxiv: v3 [cs.lg] 3 Dec 2017 Context-Aware Generative Aversarial Privacy Chong Huang, Peter Kairouz, Xiao Chen, Lalitha Sankar, an Ram Rajagopal arxiv:1710.09549v3 [cs.lg] 3 Dec 2017 Abstract Preserving the utility of publishe atasets

More information

Role of parameters in the stochastic dynamics of a stick-slip oscillator

Role of parameters in the stochastic dynamics of a stick-slip oscillator Proceeing Series of the Brazilian Society of Applie an Computational Mathematics, v. 6, n. 1, 218. Trabalho apresentao no XXXVII CNMAC, S.J. os Campos - SP, 217. Proceeing Series of the Brazilian Society

More information

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

6. Friction and viscosity in gasses

6. Friction and viscosity in gasses IR2 6. Friction an viscosity in gasses 6.1 Introuction Similar to fluis, also for laminar flowing gases Newtons s friction law hols true (see experiment IR1). Using Newton s law the viscosity of air uner

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

An Iterative Incremental Learning Algorithm for Complex-Valued Hopfield Associative Memory

An Iterative Incremental Learning Algorithm for Complex-Valued Hopfield Associative Memory An Iterative Incremental Learning Algorithm for Complex-Value Hopfiel Associative Memory aoki Masuyama (B) an Chu Kiong Loo Faculty of Computer Science an Information Technology, University of Malaya,

More information

State observers and recursive filters in classical feedback control theory

State observers and recursive filters in classical feedback control theory State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent

More information

Harmonic Modelling of Thyristor Bridges using a Simplified Time Domain Method

Harmonic Modelling of Thyristor Bridges using a Simplified Time Domain Method 1 Harmonic Moelling of Thyristor Briges using a Simplifie Time Domain Metho P. W. Lehn, Senior Member IEEE, an G. Ebner Abstract The paper presents time omain methos for harmonic analysis of a 6-pulse

More information

Perturbation Analysis and Optimization of Stochastic Flow Networks

Perturbation Analysis and Optimization of Stochastic Flow Networks IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. XX, NO. Y, MMM 2004 1 Perturbation Analysis an Optimization of Stochastic Flow Networks Gang Sun, Christos G. Cassanras, Yorai Wari, Christos G. Panayiotou,

More information

arxiv: v1 [physics.class-ph] 20 Dec 2017

arxiv: v1 [physics.class-ph] 20 Dec 2017 arxiv:1712.07328v1 [physics.class-ph] 20 Dec 2017 Demystifying the constancy of the Ermakov-Lewis invariant for a time epenent oscillator T. Pamanabhan IUCAA, Post Bag 4, Ganeshkhin, Pune - 411 007, Inia.

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Dot trajectories in the superposition of random screens: analysis and synthesis

Dot trajectories in the superposition of random screens: analysis and synthesis 1472 J. Opt. Soc. Am. A/ Vol. 21, No. 8/ August 2004 Isaac Amiror Dot trajectories in the superposition of ranom screens: analysis an synthesis Isaac Amiror Laboratoire e Systèmes Périphériques, Ecole

More information

arxiv: v1 [cs.ds] 31 May 2017

arxiv: v1 [cs.ds] 31 May 2017 Succinct Partial Sums an Fenwick Trees Philip Bille, Aners Roy Christiansen, Nicola Prezza, an Freerik Rye Skjoljensen arxiv:1705.10987v1 [cs.ds] 31 May 2017 Technical University of Denmark, DTU Compute,

More information

there is no special reason why the value of y should be fixed at y = 0.3. Any y such that

there is no special reason why the value of y should be fixed at y = 0.3. Any y such that 25. More on bivariate functions: partial erivatives integrals Although we sai that the graph of photosynthesis versus temperature in Lecture 16 is like a hill, in the real worl hills are three-imensional

More information

arxiv:hep-th/ v1 3 Feb 1993

arxiv:hep-th/ v1 3 Feb 1993 NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

arxiv: v2 [math.pr] 27 Nov 2018

arxiv: v2 [math.pr] 27 Nov 2018 Range an spee of rotor wals on trees arxiv:15.57v [math.pr] 7 Nov 1 Wilfrie Huss an Ecaterina Sava-Huss November, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial

More information

How to Minimize Maximum Regret in Repeated Decision-Making

How to Minimize Maximum Regret in Repeated Decision-Making How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:

More information

Generalization of the persistent random walk to dimensions greater than 1

Generalization of the persistent random walk to dimensions greater than 1 PHYSICAL REVIEW E VOLUME 58, NUMBER 6 DECEMBER 1998 Generalization of the persistent ranom walk to imensions greater than 1 Marián Boguñá, Josep M. Porrà, an Jaume Masoliver Departament e Física Fonamental,

More information

u t v t v t c a u t b a v t u t v t b a

u t v t v t c a u t b a v t u t v t b a Nonlinear Dynamical Systems In orer to iscuss nonlinear ynamical systems, we must first consier linear ynamical systems. Linear ynamical systems are just systems of linear equations like we have been stuying

More information

HOMEOSTASIS AND LEARNING THROUGH SPIKE-TIMING DEPENDENT PLASTICITY

HOMEOSTASIS AND LEARNING THROUGH SPIKE-TIMING DEPENDENT PLASTICITY HOMEOSTASIS AND LEARNING THROUGH SPIKE-TIMING DEPENDENT PLASTICITY L.F. Abbott (1) an Wulfram Gerstner (2) (1) Volen Center for Complex Systems an Department of Biology Braneis University, Waltham MA 2454-911

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Comparative Approaches of Calculation of the Back Water Curves in a Trapezoidal Channel with Weak Slope

Comparative Approaches of Calculation of the Back Water Curves in a Trapezoidal Channel with Weak Slope Proceeings of the Worl Congress on Engineering Vol WCE, July 6-8,, Lonon, U.K. Comparative Approaches of Calculation of the Back Water Curves in a Trapezoial Channel with Weak Slope Fourar Ali, Chiremsel

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

arxiv: v2 [cs.ds] 11 May 2016

arxiv: v2 [cs.ds] 11 May 2016 Optimizing Star-Convex Functions Jasper C.H. Lee Paul Valiant arxiv:5.04466v2 [cs.ds] May 206 Department of Computer Science Brown University {jasperchlee,paul_valiant}@brown.eu May 3, 206 Abstract We

More information

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary

More information

Text S1: Simulation models and detailed method for early warning signal calculation

Text S1: Simulation models and detailed method for early warning signal calculation 1 Text S1: Simulation moels an etaile metho for early warning signal calculation Steven J. Lae, Thilo Gross Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresen, Germany

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10 Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information