Lending Direction to Neural Networks. Richard S. Zemel North Torrey Pines Rd. La Jolla, CA Christopher K. I.

Size: px
Start display at page:

Download "Lending Direction to Neural Networks. Richard S. Zemel North Torrey Pines Rd. La Jolla, CA Christopher K. I."

Transcription

1 Lending Direction to Neural Networks Richard S Zemel CNL, The Salk Institute North Torrey Pines Rd La Jolla, CA Christoher K I Williams Deartment of Comuter Science University of Toronto Toronto, ONT M5S 1A4 CANADA Michael C Mozer Deartment of Comuter Science & Institute of Cognitive Science University of Colorado Boulder, CO Abstract We resent a general formulation for a network of stochastic directional units This formulation is an extension of the Boltzmann machine in which the units are not binary, but take on values on a cyclic range, between 0 and 2 radians This measure is aroriate to many domains, reresenting cyclic or angular values, eg, wind direction, days of the week, hases of the moon The state of each unit in a Directional-Unit Boltzmann Machine (DUBM) is described by a comlex variable, where the hase comonent seci- es a direction; the weights are also comlex variables We associate a quadratic energy function, and corresonding robability, with each DUBM conguration The conditional distribution of a unit's stochastic state is a circular version of the Gaussian robability distribution, known as the von Mises distribution In a mean-eld aroximation to a stochastic dubm, the hase comonent of a unit's state reresents its mean direction, and the magnitude comonent secies the degree of certainty associated with this direction This combination of a value and a certainty rovides additional reresentational ower in a unit We resent a roof that the settling dynamics for a mean-eld DUBM cause convergence to a free energy minimum Finally, we describe a learning algorithm and simulations that demonstrate a mean-eld DUBM's ability to learn interesting maings To aear in: Neural Networks 1

2 1 Introduction Many kinds of information can naturally be reresented in terms of angular, or directional, variables A circular range forms a suitable reresentation for exlicitly directional information, such as wind direction, as well as for information where the underlying range is eriodic, such as days of the week or months of the year In comuter vision, tangent elds and otic ow elds are reresented as elds of oriented line segments, each of which can be described by a magnitude and direction Directions can also be used to reresent a set of symbolic labels, eg, object label A at 0, and object label B at =2 radians We discuss below some advantages of reresenting symbolic labels with directional units These and many other henomena can be usefully encoded using a directional reresentation a olar coordinate reresentation of comlex values in which the hase arameter indicates a direction between 0 and 2 radians We have devised a general formulation of networks of stochastic directional units This formulation is a novel generalization of a Boltzmann machine (Ackley, Hinton and Sejnowski, 1985) in which the units are not binary, but instead take on directional values between 0 and 2 radians This aer resents the theoretical basis of a directional-unit Boltzmann machine (dubm), including an energy function and underlying robabilistic model We also resent a simle derivation of a deterministic version of a dubm and a convergence roof for its dynamics Finally, we describe a generalization of the Boltzmann machine learning algorithm, and demonstrate that a dubm is able to learn interesting maings 2 Stochastic Directional Unit Boltzmann Machines 21 The DUBM Energy Function We formulate a network comosed of directional units a dubm in an analogous manner to a Boltzmann machine with binary-state units In the Hoeld sin-system (Hoeld, 1982) and in a Boltzmann machine, the units are binary, and the energy is dened to be the sum of the airwise roducts of the units' states and their connecting weights This energy can be written in quadratic form: E(s) =?1=2 s T Cs, where the weight matrix C is a real symmetric matrix, and s is the vector of the units' binary states in a articular global conguration In a dubm each stochastic directional unit takes on values on the unit circle We associate with unit j a random variable Z j ; a articular state of j is described by a comlex number with magnitude one and direction, or hase j : z j = e i j The weights of a dubm also take on comlex values The weight from unit k to unit j is: w jk = b jk e i jk The activation function of a unit is a generalization of the dot roduct to the comlex domain: the net inut x j = z w j = P k z kwjk, where z is the vector of the units' comlex states in a articular global conguration, and the asterisk indicates the comlex conjugate oeration Figure 1 shows an examle of this multilication oeration We constrain the weight matrix W to be Hermitian: W T = W, where the diagonal elements of the matrix are zero Note that if the comonents are real, then W T = W, which is a real 2

3 w 21 = 2e?i=2 z 1 = e i0 x 2 = z 1 w 21 = 2ei=2 Figure 1: This gure shows the eect of using comlex weights Unit 1 is in state z 1, and the weight from unit 1 to 2 is w 21 Multilying the state of unit 1 by the comlex conjugate of this weight doubles the magnitude and rotates the hase by =2 radians symmetric matrix Thus, the Hermitian form is a natural generalization of weight symmetry to the comlex domain This denition of W leads to a Hermitian quadratic form that generalizes the real quadratic form of the Hoeld energy function: E(z) =?1=2 z T Wz =?1=2 X j;k z j z kw jk (1) Noest (1988) indeendently roosed this energy function It is similar to that used in Fradkin, Huberman, and Shenker's (1978) generalization of the XY model of statistical mechanics to allow arbitary weight hases jk, and couled oscillator models, eg, Baldi and Meir (1990) We discuss below the relationshi between dubms and these models 22 The Circular Normal Distribution We can dene a robability distribution over the ossible states of a stochastic network by using the Boltzmann factor In a binary-unit Boltzmann machine, a binary random variable S j describes the state of unit j, and it has the relative robability distribution: (S j = 1) (S j = 0) = e?e(sj=1) e?e(s j=0) = e x j where x j = P k w jks k is the net inut to unit j, and = 1=T The robability that a unit is on is thus described by the familiar logistic function: (S j = 1) = (1 + e?x j )?1 = (x j ) We use a similar method to dene a robability distribution over the states of directional units; it turns out that this distribution is well-known for directional data In a dubm, we 3

4 can describe the energy as a function of the state of a articular unit j: E(Z j = z j ) =?1=2 [ X k z j z kw jk + X k z k z j w kj ] =?1=2 [z j x j + (z jx j ) ] (2) where x j = X k z k w jk (3) We can consider x j as the net inut to unit j; it catures the relevant information in the local eld of the unit We denote the magnitude and hase of x j as a j and j resectively, such that x j = a j e i j (4) Using Equations 2, 3, and 4, we nd that E(Z j = z j ) =?a j cos( j? j ) Alying the Boltzmann factor, the robability that unit j is in a articular state is roortional to: (Z j = z j ) / e?e(z j=z j ) = e a j cos( j? j ) (5) This robability distribution for a unit's state corresonds to a distribution known as the von Mises, or circular normal, distribution (Mardia, 1972) that in many ways acts on a circular range as the Gaussian distribution acts on a linear (ie, non-circular) range 1 Two arameters comletely characterize this distribution: a mean direction = (0; 2] and a concentration arameter m > 0 that behaves like the recirocal of the variance of a Gaussian distribution on a linear random variable The robability density function of a circular normal random variable Z is: 1 (; ; m) = 2I 0 (m) em cos(?) (6) The normalization factor I 0 (m) is the modied Bessel function of the rst kind and order zero 2 From Equation 5, we see that if a unit adots states according to its contribution to the system energy, it will be a circular normal variable with mean direction j and concentration arameter m j = a j These arameters are directly determined by the net inut to the unit Figure 2 shows a circular normal density function for Z j, the state of unit j This gure also shows the exected value of its stochastic state, which we dene as: y j = < Z j > = r j e i j (7) 1 The circular normal resembles the Gaussian in that (a) the maximum likelihood estimate of the mean direction of a random variable x is the samle mean if and only if x is a circular normal random variable; and (b) the circular normal distribution is the maximum entroy distribution if a mean direction and circular variance are xed Some of the other imortant roerties of the Gaussian, such as the central limit theorem, aly to another circular distribution, the wraed normal R 2 An integral reresentation of this function is I 0(m) = 1 em cos 0 d It can be comuted by numerical routines 4

5 s s s s s s s s s 0 o j r j s Figure 2: This gure shows a circular normal density function laid over a unit circle The dots along the circle reresent samles of the circular normal random variable Z j The exected direction of Z j, j, is =4; its resultant length r j is roortional to m j If m j is large, then most of the samles of Z j will be concentrated on a small arc about j, and r j! 1 As m j decreases, the samle sread increases, and r j! 0 where j, the hase of y j, is the mean direction and r j, the magnitude of y j, is the resultant length For a circular normal random variable, its exected direction j is simly the mean direction of the random variable: j = j (8) The calculation of r j, the resultant length, is more involved From Mardia (1972) we nd that 3 : r j = I 1(m j ) (9) I 0 (m j ) From the gure we can see that when most of the samles of Z j are concentrated on a small arc about the mean, r j will aroach length one This corresonds to a large concentration arameter (m j = a j ), which means that in Equation 6, small deviations from the mean direction will roduce much smaller exonents than that of the mean, and the robability distribution will be concentrated about the mean Conversely, for small m j, the distribution aroaches the uniform distribution on the circle, and the resultant length falls toward zero For a uniform distribution, r j = 0 A lot of r j against m j is shown in Figure 3 Note that the concentration arameter for a unit's circular normal density function is the roduct of a j, the magnitude of its inut, and, the recirocal of the system temerature 3 R 1 0 An integral reresentation of the modied Bessel function of the rst kind and order k is I k(m) = e m cos cos(k)d Note that I 1(m) = di 0(m)=dm 5

6 Higher temeratures will thus have the eect of making this distribution more uniform, just as they do in a binary-unit Boltzmann machine r j m j Figure 3: This gure shows the function dened by Equation 9 that mas m j, the concentration arameter associated with the circular normal distribution of unit j, to r j, its resultant length 3 Emergent Proerties of a DUBM 31 Evidence accumulation and reresenting certainty A network of directional units as dened above contains two imortant emergent roerties The rst roerty is that the magnitude of the net inut to unit j describes the extent to which its various inuts \agree" Intuitively, one can think of each comonent z k wjk of the sum that comrises x j as redicting a hase for unit j When the hases of these comonents are equal, the magnitude of x j, a j, is maximized If these hase redictions are far aart, then they will act to cancel each other out, and roduce a small a j Given x j, we can comute the exected value of the outut of unit j (Equation 11) The exected direction of the unit roughly reresents the weighted average of the hase redictions, while the resultant length, a number between zero and one, is a monotonic function of a j and hence describes the agreement between the various redictions Figure 4 illustrates this roerty with two examles showing the comutation of the inut and exected outut of a unit in a simle 3-unit dubm The key idea here is that the resultant length directly describes the degree of certainty in the exected direction of unit j Thus, a dubm naturally incororates a reresentation of the system's condence in a value This ability to combine several sources of evidence, and not 6

7 only reresent a value but also describe the certainty of that value is an imortant roerty that may be useful in a variety of domains 32 Rotation invariance The second emergent roerty is that the dubm energy is globally rotation-invariant E is unaected when the same rotation is alied to all units' states in the network This can be easily shown from Equation 1 Rotating a unit state z j by some hase is equivalent to multilying z j by e i If we aly this to each unit, then each e i will be cancelled out in the z j zk term in Equation 1 For each dubm conguration, there is thus an equivalence class of congurations which have the same energy In a similar way, we nd that the magnitude of x j is rotation-invariant That is, when we obtain a new inut, x 0 j by translating the hases of all of the other units by some hase, the magnitude a j is unaected: x 0 j = X k b jk e i( k+? jk ) = a j e i( j+) = x j e i This roerty underlies one of the key advantages of the reresentation: both the magnitude of a unit's state as well as system energy deend on the relative rather than absolute hases of the units 4 Deterministic Directional Unit Boltzmann Machines 41 Aroximating the stochastic system Calculating roerties of a large stochastic system is time-consuming: a simulation of such a system must be allowed to run for a sucient length of time to build u statistics of the robability distribution across the ossible congurations Just as in deterministic binaryunit Boltzmann machines (Peterson and Anderson, 1987; Hinton, 1989), we can greatly reduce the comutational time required if we invoke the mean-eld aroximation, which states that once the system has reached equilibrium, the stochastic variables can be aroximated by their mean values In this aroximation, the variables are treated as indeendent, and the system robability distribution is simly the roduct of the robability distributions for the individual units 4 Gislen, Peterson, and Soderberg (1992) originally roosed a mean-eld theory for networks of directional (or \rotor") units, but only considered the case of real-valued weights They derived the mean-eld consistency equations R by using the saddle-oint method to aroximate the artition function of the system: e?e() d, where indexes ossible con- gurations of the system Our aroach rovides an alternative, erhas more intuitive derivation, due to the use of the circular normal distribution 4 The tradeo for the increase in comutational seed of the mean-eld aroximation is a decrease in the number of states that can be reresented, and a loss of information concerning the correlations between units' states 7

8 Unit 3 y 3 = < Z 3 > = I 1(9=2) I 0 (9=2) ei=2 x 3 = 5=2e i=2 + 2e i=2 :9e i=2 = 9=2e i=2 w 31 = 5=2e i=4 w 32 = 2e i0 Unit 1 Unit 2 z 1 = e i3=4 z 2 = e i=2 Unit 3 y 3 = < Z 3 > = I 1(1=2) I 0 (1=2) ei=2 :25e i=2 x 3 = 5=2e?i=2 + 2e i=2 = 1=2e?i=2 w 31 = 5=2e i=4 w 32 = 2e i0 Unit 1 Unit 2 z 1 = e?i=4 z 2 = e i=2 Figure 4: These two diagrams grahically deict the comutation of the inut and exected outut of a unit Unit 3 is connected to units 1 and 2 Its inut, x 3, is comuted as described in Equation 3, while its exected value, y 3, can be comuted directly from its inut, as described in Equations 8 and 9 In the to diagram, the inut from the units 1 and 2 \agree", ie, their contributions to x 3 have the same hase The second diagram shows the eect of rotating the state of unit 1 by : the hases of the two comonents of x j no longer agree, resulting in a smaller magnitude, and hence a smaller resultant length 8

9 We can directly describe these mean values based on the circular normal interretation We still denote the net inut to a unit j as x j : x j = X k y k w jk = a j e i j (10) Once equilibrium has been reached, the state of unit j is y j, the exected value of Z j given the mean-eld aroximation We have already shown that Z j is a circular normal random variable with mean j and concentration arameter a j ; thus from Equations 8 and 9: y j = r j e i j = I 1(a j ) I 0 (a j ) ei j (11) In the stochastic as well as the deterministic system, units evolve to minimize the free energy, F = < E >? T H We calculate H, the entroy of the system, using the circular normal distribution and the mean-eld aroximation The entroy of a circular normal random variable Z is: H(Z) =?mr +log(2i 0 (m)) Since under the mean-eld aroximation, the units are considered indeendent, the entroy of the mean-eld dubm is simly the sum of the entroies of the individual units' states The mean-eld free energy is: F MF X X =?r j r k b jk cos( j? k + jk )? T j<k j [?a j r j + log(2i 0 (a j ))] (12) We can derive mean-eld consistency equations for x j and y j by minimizing F MF with resect to each variable indeendently, ie, by nding when the artial derivatives of F MF with resect to each of these variables equal zero The resulting equations match the mean- eld equations (Equations 10 and 11) derived directly from the circular normal robability density function They also match the secial case derived by Gislen et al for real-valued weights 42 Running a deterministic DUBM We have imlemented a dubm using the mean-eld aroximation The goal is to solve for a consistent set of x and y values; there are several ossible methods to settle the network to this state One is to erform asynchronous udates of Equations 10 and 11 We have selected an alternative method, erforming synchronous udates of the discrete-time aroximation of the set of dierential equations based on the net inut to each unit j We udate the x j variables using the following dierential equation: dx j =?x j + X k y k w jk (13) which has Equation 10 as its steady-state solution In order to adat x j, we actually adat its real and imaginary comonents: x R j = a j cos j and x I j = a j sin j The steady-state solutions for these two quantities are equivalent to, and 9

10 follow directly from Equation 10, and their udate equations follow directly from Equation 13: dx R j =?x R j + X k Re(y kw jk ) =?x R j + X k r k b jk cos( k? jk ) dx I j =?x I j + X k Im(y kw jk ) =?x I j + X k r k b jk sin( k? jk ) We use synchronous udates of these dierential equations At each ste, we udate x R j and x I j : x R j (t) = (1? )xr j (t? 1) + X k x I j (t) = (1? )x I j(t? 1) + X k r k (t? 1)b jk cos( k (t? 1)? jk ) r k (t? 1)b jk sin( k (t? 1)? jk ) where 0 < < 1 is a variable ste-size arameter that is selected to encourage the system to aroach a steady-state solution while avoiding oscillations We then comute the magnitude q and hase of the inut to unit j: a j (t) = (x R j (t))2 + (x I x I j (t))2 and j (t) = arctan j (t) x R j (t) These are then used to directly comute the state r j (t) and j (t) of each unit j (see Equation 11) In the simulations, we tyically set = 0:2 We also use simulated annealing to hel nd good minima of the free energy F This is imlemented using an annealing schedule which exonentially lowers the temerature until T = 1 Just as for the Hoeld binary-state network, it can be shown that the free energy F MF always decreases during the dynamical evolution described in Equation 13 The equilibrium solutions are free energy minima The Aendix contains a convergence roof for F MF, in the style of Hoeld (1984) 5 Learning in DUBMs The units in a dubm can be arranged in a variety of architectures The aroriate method for determining weight values for the network deends on the articular class of network architecture In a standard autoassociative network, containing a single set of interconnected units as in a Hoeld network, the weights can be set directly from the training atterns If hidden units are required to erform a task, then an algorithm for learning the weights is required We have exlored two learning algorithms The rst is a generalization of recurrent back-roagation (Pineda, 1987; Almeida, 1987) to a network of directional units The comlex reresentations in a dubm comlicates this algorithm, as it requires several matrix inversions The second algorithm generalizes the Boltzmann machine training algorithm (Ackley, Hinton and Sejnowski, 1985; Peterson and Anderson, 1987) to these networks This algorithm is considerably simler; we thus focus on the second algorithm here, and results obtained from it are described in the following section 10

11 We assume that the dubm contains two sets of units, visible and hidden, and concentrate on a articular tye of architecture, in which the visible units are slit into two sets, inut and outut The network is trained using a set of atterns, each secifying values for the inut units and target values for each of the outut units We can regard the target outut values as describing a desired set of robabilities If we take the outut units as reresenting a robability distribution across a set of directional values, then an aroriate error measure is the relative entroy of these robability distributions This is the asymmetric divergence, or Kullback information measure used in binary-unit Boltzmann machines The derivation of the learning rule for a dubm arallels that for the standard Boltzmann machine Let! and index ossible congurations of the outut and inut units, resectively The goal is to make the network learn assocations from to! such that the network's distribution P!j resembles the desired distribution R!j for each If the ossible inuts occur with robability, then the error measure, G, is: Z Z G = R!j log(r!j =P!j ) d! d Z! Z = G 0? R!j log P!j d! d! In the stochastic network, P!j = e?f! =e?f, where F! is the minimum free energy when the inut units are in conguration and the outut units are clamed to conguration!, and F is the minimum free energy when only the inut units are set If we assume that the training set secies the occurence robabilities, as well as the robabilities R!j for the! and, then G simlies to the following exression: G = G 0 + [F!? F ] Thus, the objective reduces to minimizing the dierence between F! and F, averaged over the training set For the mean-eld dubm, we use this same objective, where the free energies are now the mean-eld free energies, as described in Section 4:1 (see Equation 12) We dierentiate this objective function with resect to the magnitude and direction of the comlex-valued weights to determine the weight udate rule The artial derivative of G with resect to a weight is the dierence between the artials of the two mean-eld free energies, F MF! and F MF, with resect to the weight On a given training case, the required derivatives are given MF =@b jk =?r j r k cos( j? k + jk MF =@ jk = r j r k b jk sin( j? k + jk ) Note that the derivatives only involve the exlicit derivatives of F MF with resect to the weight arameters because we have settled to equilibrium (Hinton, 1989; Baldi and Pineda, 1991) The learning algorithm uses these gradients to nd weight values that will minimize the objective G over a training set 11

12 6 Exerimental Results The focus of this aer has been on the theoretical ideas underlying a generalization of a Boltzmann machine to directional units We resent below some illustrative examles to show that an adative network of directional units can be used in a range of aradigms, including associative memory, inut/outut maings, and attern comletion 61 Simle autoassociative DUBMs The rst set of exeriments considers a simle autoassociative dubm, which contains no hidden units, and the units are fully connected As in a standard Hoeld network, the weights are set directly from the training atterns; they equal the suerosition of the outer roduct of the atterns: w jk = 1 N NX =1 y j y k where indexes the training atterns We have exerimented with a number of dierent networks Figure 5 shows a network containing 30 units that has stored 3 atterns The weights shown are generated as described above; the gure also shows two examles of the network settling from a corruted version of one of the stored atterns to a state near the attern These atterns thus form stable attractors, as the network can erform attern comletion and clean-u from noisy inuts The rotation-invariance roerty of the energy function allows any rotated version of a training attern to also act as an attractor We have run several exeriments with simle autoassociative dubms The emirical results arallel those for binary-unit autoassociative networks Using an error criterion that measures the distance from an equilibrium state to the closest stored attern, we nd that the mean error raidly increases as the number of stored atterns increases For examle, the erformance of the network shown in Figure 5 raidly degrades for more than 4 orthogonal atterns; the atterns themselves no longer act as xed-oints, and many random initial states end in states far from any stored attern In addition, more orthogonal atterns can be stored than random atterns See Noest (1988) for an analysis of the caacity of an autoassociative dubm with sarse and asymmetric connections 62 Learning inut/outut maings We have also used the mean-eld dubm learning algorithm to learn the weights in networks containing hidden units Due to the resence of the hidden units, the network must settle via the synchronous udating rocedure described in Section 4:2 on each training attern, rst with the inut clamed, and then with both the inut and target outut clamed We have exerimented with a task that is well-suited to a directional reresentation There is a single-jointed robot arm, anchored at a oint, as shown in Figure 6 The inut consists of two angles: the angle between the rst arm segment and the ositive x-axis (), (14) 12

13 Patterns Examle 1 Iteration 0 Iteration 10 Iteration 21 Examle 2 Iteration 0 Iteration 10 Iteration 30 Weights Figure 5: This gure shows the atterns and weights stored in an autoassociative network of 30 directional units The area of each circle is roortional to the magnitude of the comlex value, and the orientation of the notch indicates its hase The weights are the outer roduct of the 3 atterns shown at the to Each row shows the incoming weights to a articular unit Note that the self weights (along the diagonal) have zero magnitude, and the symmetry of the diagram about the diagonal reects the Hermitian roerty of the weights On the right are two examles of the network settling to a xed-oint The rst examle ends u near the rst of the three stored atterns after 21 iterations (modulo a global rotation) The second examle is considerably corruted; it requires 30 iterations to aroach the second stored attern 13

14 B A Figure 6: A grahical dislay of a training case for the robot arm roblem The arm is anchored on the horizontal axis, and the lengths of its two segments, A and B, are xed The two angles, and, are given as inut for each training case, and the target outut for the net is the angle and the angle between the two arm segments () The two segments each have a xed length, A and B; these are not exlicitly given to the network The outut is the angle between the line connecting the two ends of the arm and the x-axis () This target angle is related in a comlex, non-linear way to the inut angles the network must learn to aroximate the following trigonometric relationshi: = arctan A sin? B sin( + ) A cos? B cos( + ) With 500 training cases, a dubm with 2 inut units and 8 hidden units was able to learn the task so that it could accurately estimate for novel atterns The learning required 200 iterations of the conjugate gradient training algorithm On each of 100 testing atterns, the resultant length of the outut unit exceeded 85, and the mean error on the angle was less than 05 radians The network can also learn the task with as few as 5 hidden units, with a concomitant decrease in learning seed The comact nature of this network shows that the directional units form a natural, ecient reresentation for this roblem 63 Comlex attern comletion Our earlier work describes a large-scale dubm that attacks a dicult roblem in comuter vision: image segmentation In magic (Mozer, Zemel and Behrmann, 1992; Mozer et al, 1992), directional values are used to reresent alternative labels that can be assigned to image features The goal of magic is to learn to assign aroriate object labels to a set of image features (eg, edge segments) based on a set of examles The idea is that the features of a given object should have consistent hases, with each object taking on its own 14

15 hase The units in the network are arranged into two layers feature and hidden and the comutation roceeds by randomly initializing the hases of the units in the feature layer, and settling on a labeling through a relaxation rocedure The units in the hidden layer learn to detect satially local congurations of the image features that are labeled in a consistent manner across the training examles magic successfully learns to segment novel scenes consisting of overlaing geometric objects The emergent dubm roerties described in Section 3 are essential to magic's ability to erform this task The evidence accumulation roerty allows units to act as feature detectors, exressing constraints in their comlex-valued weight atterns The comlex weights are necessary in magic, as the weights encode statistical regularities in the relationshis between image features, eg, that two features tyically belong to the same object (ie, have similar hase values) or to dierent objects (ie, are out of hase) The fact that a unit's resultant length reects the certainty in a hase label allows the system to decide which hase labels to use when udating labels of neighboring features: the initially random hases are ignored, while condent labels are roagated Finally, the rotation-invariance roerty allows the system to assign labels to features in a manner consistent with the relationshis described in the weights, where it is the relative rather than absolute hases of the units that are imortant 7 Discussion We have resented a general formulation for networks of directional units, called directionalunit Boltzmann machines, or dubms This formulation is based on an underlying robabilistic model, and we have develoed and exerimented with a mean-eld version of the stochastic system A deterministic dubm rovides a natural reresentation for both comuted values (the hase of a unit), and the certainty associated with these values (the magnitude of the unit's state) This formulation rovides a well-founded activation function for directional units, based on this mean-eld aroximation, as well as a convergence roof for the system dynamics In addition, this tye of network has many otential alications: it is well-suited to data that can naturally be reresented as directions, such as vector elds that arise in otic ow or tangent eld measurements of comuter vision, as well as for information that can take advantage of the cyclic nature of the directional reresentation, and tasks where relative rather than absolute information is imortant Our work on directional units ts in a long tradition of work on hase-based reresentations The well-known XY models of statistical mechanics, as well as couled oscillator models, describe the dynamics of a oulation of oscillators in which each oscillator is described by a single arameter, its hase Several researchers have recently studied such systems of couled oscillators (Kammen, Holmes and Koch, 1989; Baldi and Meir, 1990), and have alied these methods to model exerimental ndings concerning oscillations and hase-locking of cortical neurons in cats (Gray et al, 1989; Eckhorn et al, 1988) This work relates to the suggestion (von der Malsburg, 1981) that synchronized oscillations rovide a biologically lausible mechanism of labeling features through temoral correlations among 15

16 neural signals Other related work { mentioned earlier { includes that of Gislen, Peterson, and Soderberg (1992), which describes a mean-eld analysis for a system of directional units, and considers the general case where units assume multi-dimensional directional states on a hyershere Also, Noest (1988), roosed an energy function for hase-based units and analyzed the caacity of an autoassociative network which minimizes this energy function Several imortant distinctions exist between our work and these other models First, just as a binary-unit Boltzmann machine extends the reresentational ower of the Hoeld network by adding hidden units, our formulation extends the ower of autoassociative couled oscillator models A dubm can contain layers of hidden units, which enables the network to solve tasks that require a reresentation of higher-order relations between elements of inut atterns, such as the robot-arm task described above 5 In addition, our formulation allows the weights to be comlex-valued, which is imortant for encoding desired hase relationshis between two directional units This is a crucial roerty for systems looking for atterns of hase relationshis, as was discussed in Section 6:3 One could conceivably encode hase relationshis with scalar weights, but this would require additional connections between units, as well as an additional term in the energy function Finally, our work introduces the robabilistic interretation of networks of directional units We introduce the notion that the density function across directions for a given unit can be described by the circular normal, and show that the mean-eld equations for such a unit corresond to the exected value of this distribution This aroach greatly simlies the derivation of the deterministic system dynamics Our work demonstrates that networks comosed of directional units can learn interesting maings magic, as described in (Mozer et al, 1992), is a large-scale dubm that learns to segment novel images comosed of a wide range of overlaing geometric objects One extension of the current model is aroriate for tasks that involve learning a maing from a set of inut atterns to a set of outut atterns We can consider a dierent tye of architecture for these tasks, where the directional units are used in a strictly feedforward manner; such a network can then be trained by a modied form of the back-roagation training algorithm In this case, the convergence roerties of a dubm as well as the corresondence between state robabilities and energies no longer aly However, this aroach has some advantages over revious methods of training networks of comlex units, eg, (Benvenuto and Piazza, 1992; Widrow, McCool and Ball, 1975) Several interesting roerties of dubms, such as the rotation-invariance and the interretation of the magnitude of a unit as caturing the agreement of its inuts, are embodied in the activation function of the network and are hence retained We are currently exloring an imortant extension to the model which will exand its reresentational caabilities Radford Neal (ersonal communication, 1992) has suggested a method for combining binary and directional units This exanded reresentation may be 5 Note that deterministic dubms share the roblem of deterministic binary-unit Boltzmann machines, in that their erformance dros o in dee networks (Galland, 1992) This is largely due to the fact that the mean-eld aroximation breaks down as multile layers introduce increasing correlations among the units 16

17 useful in domains with directional data that is not resent everywhere For examle, it can be directly alied to the object labeling roblem exlored in magic The binary asect of the unit can describe whether a articular image feature is resent or absent This may enable the system to handle various comlications, articularly labeling across gas along the contour of an object Finally, we are alying a dubm network to the interesting and challenging roblem of time-series rediction of wind directions 8 Acknowledgements The authors thank Georey Hinton for his generous suort and guidance We thank Radford Neal for many valuable discussions that signicantly inuenced this work We also thank Peter Dayan, Conrad Galland, Sue Becker, Steve Nowlan, and the other members of the Connectionist Research Grou at the University of Toronto for helful comments regarding this work This research was suorted by a grant from the Information Technology Research Centre of Ontario to Georey Hinton, and NSF Presidential Young Investigator award IRI{ and grant 90{21 from the James S McDonnell Foundation to MM References Ackley, D H, Hinton, G E, and Sejnowski, T J (1985) A learning algorithm for Boltzmann machines Cognitive Science, 9:147{169 Almeida, L B (1987) A learning rule for asynchronous ercetrons with feedback in a combinatorial environment In Proceedings, 1st First International Conference on Neural Networks, volume 2, ages 609{618, San Diego, CA IEEE Baldi, P and Meir, R (1990) Comuting with arrays of couled oscillators: An alication to reattentive texture discrimination Neural Comutation, 2(4):458{471 Baldi, P and Pineda, F (1991) Contrastive learning and neural oscillations Neural Comutation, 3(4):526{533 Benvenuto, N and Piazza, F (1992) On the comlex backroagation algorithm IEEE Transactions On Signal Processing, 40(4):967{969 Eckhorn, R, Bauer, R, Jordan, W, Brosch, M, Kruse, W, Munk, M, and Reitboek, H K (1988) Coherent oscillations: A mechanism of feature linking in the visual cortex Biological Cybernetics, 60:121{130 Fradkin, E, Huberman, B A, and Shenker, S H (1978) Gauge symmetries in random magnetic systems Physical Review B, 18(9):4789{4814 Galland, C C (1992) Self-organization using Deterministic Boltzmann Machine learning PhD thesis, Physics Deartment of the University of Toronto 17

18 Gislen, L, Peterson, C, and Soderberg, B (1992) Rotor neurons: Basic formalism and dynamics Neural Comutation, 4(5):737{745 Gray, C M, Koenig, P, Engel, A K, and Singer, W (1989) Oscillatory resonses in cat visual cortex exhibiting intercolumnar synchronization which reects global roerties Nature (London), 338:334{337 Hinton, G E (1989) Deterministic Boltzmann learning erforms steeest descent in weightsace Neural Comutation, 1(2):143{150 Hoeld, J J (1982) Neural networks and hysical systems with emergent collective comutational abilities Proceedings of the National Academy of Sciences USA, 79:2554{2558 Hoeld, J J (1984) Neurons with graded resonse have collective comutational roerties like those of two-state neurons Proceedings of the National Academy of Sciences USA, 81:3088{3092 Kammen, D M, Holmes, P J, and Koch, C (1989) Cortical oscillations in neuronal networks: Feed-back versus local couling In Models of Brain Function, ages 273{284 Cambridge University Press, Cambridge Mardia, K V (1972) Statistics of Directional Data Academic Press, London Mozer, M C, Zemel, R S, and Behrmann, M (1992) Learning to segment images using dynamic feature binding In Advances in Neural Information Processing Systems 4, ages 436{443 Morgan Kaufmann, San Mateo, CA Mozer, M C, Zemel, R S, Behrmann, M, and Williams, C K I (1992) Learning to segment images using dynamic feature binding Neural Comutation, 4(5):650{665 Noest, A J (1988) Phasor neural networks In Neural Information Processing Systems, ages 584{591, New York AIP Peterson, C and Anderson, J R (1987) A mean eld theory learning algorithm for neural networks Comlex Systems, 1:995{1019 Pineda, F J (1987) Generalization of back roagation to recurrent and higher order neural networks In Proceedings of IEEE Conference on Neural Information Processing Systems, Denver, Colorado IEEE von der Malsburg, C (1981) The correlation theory of brain function Technical Reort Internal reort 81-2, Deartment of Neurobiology, Max Planck Institute for Biohysical Chemistry, Goettingen Widrow, B, McCool, J, and Ball, M (1975) The comlex LMS algorithm Proc IEEE, ages 719{720 18

19 A Aendix To show that F always decreases, we rst dierentiate E MF with resect to time: X E MF =?1=2 y j ykw jk j;k de X MF dyj dy =?1=2 y k kw jk + y j w jk j;k X "! dyj X =?1=2 y k wjk + dy j j We then dierentiate H MF with resect to time: H MF = X j dh MF = X j k [?a j r j + log(2i 0 (a j ))] "? da j r j? a j dr j X k + I 1(a j ) I 0 (a j ) y k w jk # da j!# (15) = X j?a j dr j (16) where we use the fact that di 0(m) = I 1 (m) dm, along with the denition of r j = I 1(a j ) I 0 (a j ) Now consider (recalling that x = ae i and y = re i = re i ): x dy 1=2 x dy + xdy = ae?i dr ei + ire = a dr + irad = a dr i d This last ste follows from the fact that if z = f + ig, then z + z = 2f Substituting back into Equation 16, we nd that: dh MF =?1=2 X j " x j dy j + x dyj j # (17) Combining Equations 15 and 17, and using the denition of dx j Equation 13): =?x j + P k y k w jk (see df MF = de MF =?1=2 X j =?1=2 X j? T dh MF " ( X k " dxj dy y k wjk j! )? x j + # dyj + dx j dy j (X k # dyj y k wjk )? xj! 19

20 We can exand these roducts as follows: dx dy dx dy = da ei i d + iae = dr e?i?i d? ire = dr da = dr da d + ar da 2 + ar 2 + i d a dr d? r da 2 + i a dr d d? r da d where we have used the fact that since r deends solely on a, dr Finally, df X " MF dxj dyj =?1=2 + dx j j 0 =?1=2 X j " drj daj da j = dr da! dyj # 2 + a j r j dj The nal ste in this derivation follows because r j is a monotonically increasing function of a j (see Figure 3), and because a j and r j are always non-negative da 2 # 20

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve In Advances in Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Limann, eds. Morgan Kaumann Publishers, San Mateo CA, 1995,. 950{957. A Simle Weight Decay Can Imrove Generalization

More information

Multilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale

Multilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale World Alied Sciences Journal 5 (5): 546-552, 2008 ISSN 1818-4952 IDOSI Publications, 2008 Multilayer Percetron Neural Network (MLPs) For Analyzing the Proerties of Jordan Oil Shale 1 Jamal M. Nazzal, 2

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Correspondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R.

Correspondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R. 1 Corresondence Between Fractal-Wavelet Transforms and Iterated Function Systems With Grey Level Mas F. Mendivil and E.R. Vrscay Deartment of Alied Mathematics Faculty of Mathematics University of Waterloo

More information

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

Recent Developments in Multilayer Perceptron Neural Networks

Recent Developments in Multilayer Perceptron Neural Networks Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

CMSC 425: Lecture 4 Geometry and Geometric Programming

CMSC 425: Lecture 4 Geometry and Geometric Programming CMSC 425: Lecture 4 Geometry and Geometric Programming Geometry for Game Programming and Grahics: For the next few lectures, we will discuss some of the basic elements of geometry. There are many areas

More information

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing PHYSICAL REVIEW E 75, 4197 27 Period-two cycles in a feedforward layered neural network model with symmetric sequence rocessing F. L. Metz and W. K. Theumann Instituto de Física, Universidade Federal do

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise Brad L. Miller Det. of Comuter Science University of Illinois at Urbana-Chamaign David E. Goldberg Det. of General Engineering University

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE THE 19 TH INTERNATIONAL CONFERENCE ON COMPOSITE MATERIALS A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE K.W. Gan*, M.R. Wisnom, S.R. Hallett, G. Allegri Advanced Comosites

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

FE FORMULATIONS FOR PLASTICITY

FE FORMULATIONS FOR PLASTICITY G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND

More information

PHYSICAL REVIEW LETTERS

PHYSICAL REVIEW LETTERS PHYSICAL REVIEW LETTERS VOLUME 81 20 JULY 1998 NUMBER 3 Searated-Path Ramsey Atom Interferometer P. D. Featonby, G. S. Summy, C. L. Webb, R. M. Godun, M. K. Oberthaler, A. C. Wilson, C. J. Foot, and K.

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

arxiv:cond-mat/ v2 25 Sep 2002

arxiv:cond-mat/ v2 25 Sep 2002 Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,

More information

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process Journal of Industrial and Intelligent Information Vol. 4, No. 2, March 26 Using a Comutational Intelligence Hybrid Aroach to Recognize the Faults of Variance hifts for a Manufacturing Process Yuehjen E.

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem An Ant Colony Otimization Aroach to the Probabilistic Traveling Salesman Problem Leonora Bianchi 1, Luca Maria Gambardella 1, and Marco Dorigo 2 1 IDSIA, Strada Cantonale Galleria 2, CH-6928 Manno, Switzerland

More information

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21 Trimester 1 Pretest (Otional) Use as an additional acing tool to guide instruction. August 21 Beyond the Basic Facts In Trimester 1, Grade 8 focus on multilication. Daily Unit 1: Rational vs. Irrational

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Participation Factors. However, it does not give the influence of each state on the mode.

Participation Factors. However, it does not give the influence of each state on the mode. Particiation Factors he mode shae, as indicated by the right eigenvector, gives the relative hase of each state in a articular mode. However, it does not give the influence of each state on the mode. We

More information

8.7 Associated and Non-associated Flow Rules

8.7 Associated and Non-associated Flow Rules 8.7 Associated and Non-associated Flow Rules Recall the Levy-Mises flow rule, Eqn. 8.4., d ds (8.7.) The lastic multilier can be determined from the hardening rule. Given the hardening rule one can more

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

Churilova Maria Saint-Petersburg State Polytechnical University Department of Applied Mathematics

Churilova Maria Saint-Petersburg State Polytechnical University Department of Applied Mathematics Churilova Maria Saint-Petersburg State Polytechnical University Deartment of Alied Mathematics Technology of EHIS (staming) alied to roduction of automotive arts The roblem described in this reort originated

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Improved Capacity Bounds for the Binary Energy Harvesting Channel Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,

More information

Bayesian System for Differential Cryptanalysis of DES

Bayesian System for Differential Cryptanalysis of DES Available online at www.sciencedirect.com ScienceDirect IERI Procedia 7 (014 ) 15 0 013 International Conference on Alied Comuting, Comuter Science, and Comuter Engineering Bayesian System for Differential

More information

Keywords: pile, liquefaction, lateral spreading, analysis ABSTRACT

Keywords: pile, liquefaction, lateral spreading, analysis ABSTRACT Key arameters in seudo-static analysis of iles in liquefying sand Misko Cubrinovski Deartment of Civil Engineering, University of Canterbury, Christchurch 814, New Zealand Keywords: ile, liquefaction,

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations PINAR KORKMAZ, BILGE E. S. AKGUL and KRISHNA V. PALEM Georgia Institute of

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

Chaotic Balanced State in a Model of Cortical Circuits

Chaotic Balanced State in a Model of Cortical Circuits ARTICLE Communicated by Misha Tsodyks Chaotic Balanced State in a Model of Cortical Circuits C. van Vreeswijk H. Somolinsky Racah Institute of Physics and Center for Neural Comutation, Hebrew University,

More information

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators S. K. Mallik, Student Member, IEEE, S. Chakrabarti, Senior Member, IEEE, S. N. Singh, Senior Member, IEEE Deartment of Electrical

More information

ute measures of uncertainty called standard errors for these b j estimates and the resulting forecasts if certain conditions are satis- ed. Note the e

ute measures of uncertainty called standard errors for these b j estimates and the resulting forecasts if certain conditions are satis- ed. Note the e Regression with Time Series Errors David A. Dickey, North Carolina State University Abstract: The basic assumtions of regression are reviewed. Grahical and statistical methods for checking the assumtions

More information

Fuzzy Automata Induction using Construction Method

Fuzzy Automata Induction using Construction Method Journal of Mathematics and Statistics 2 (2): 395-4, 26 ISSN 1549-3644 26 Science Publications Fuzzy Automata Induction using Construction Method 1,2 Mo Zhi Wen and 2 Wan Min 1 Center of Intelligent Control

More information

CSC165H, Mathematical expression and reasoning for computer science week 12

CSC165H, Mathematical expression and reasoning for computer science week 12 CSC165H, Mathematical exression and reasoning for comuter science week 1 nd December 005 Gary Baumgartner and Danny Hea hea@cs.toronto.edu SF4306A 416-978-5899 htt//www.cs.toronto.edu/~hea/165/s005/index.shtml

More information

UNCERTAINLY MEASUREMENT

UNCERTAINLY MEASUREMENT UNCERTAINLY MEASUREMENT Jan Čaek, Martin Ibl Institute of System Engineering and Informatics, University of Pardubice, Pardubice, Czech Reublic caek@uce.cz, martin.ibl@uce.cz In recent years, a series

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Covariance Matrix Estimation for Reinforcement Learning

Covariance Matrix Estimation for Reinforcement Learning Covariance Matrix Estimation for Reinforcement Learning Tomer Lancewicki Deartment of Electrical Engineering and Comuter Science University of Tennessee Knoxville, TN 37996 tlancewi@utk.edu Itamar Arel

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

Plotting the Wilson distribution

Plotting the Wilson distribution , Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion

More information

Estimation of Separable Representations in Psychophysical Experiments

Estimation of Separable Representations in Psychophysical Experiments Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

Principles of Computed Tomography (CT)

Principles of Computed Tomography (CT) Page 298 Princiles of Comuted Tomograhy (CT) The theoretical foundation of CT dates back to Johann Radon, a mathematician from Vienna who derived a method in 1907 for rojecting a 2-D object along arallel

More information

Classical gas (molecules) Phonon gas Number fixed Population depends on frequency of mode and temperature: 1. For each particle. For an N-particle gas

Classical gas (molecules) Phonon gas Number fixed Population depends on frequency of mode and temperature: 1. For each particle. For an N-particle gas Lecture 14: Thermal conductivity Review: honons as articles In chater 5, we have been considering quantized waves in solids to be articles and this becomes very imortant when we discuss thermal conductivity.

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert). LIMITATIONS OF RECEPTRON XOR Problem The failure of the ercetron to successfully simle roblem such as XOR (Minsky and Paert). x y z x y z 0 0 0 0 0 0 Fig. 4. The exclusive-or logic symbol and function

More information

Ordered and disordered dynamics in random networks

Ordered and disordered dynamics in random networks EUROPHYSICS LETTERS 5 March 998 Eurohys. Lett., 4 (6),. 599-604 (998) Ordered and disordered dynamics in random networks L. Glass ( )andc. Hill 2,3 Deartment of Physiology, McGill University 3655 Drummond

More information

PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin

PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH Luoluo Liu, Trac D. Tran, and Sang Peter Chin Det. of Electrical and Comuter Engineering, Johns Hokins Univ., Baltimore, MD 21218, USA {lliu69,trac,schin11}@jhu.edu

More information

A Special Case Solution to the Perspective 3-Point Problem William J. Wolfe California State University Channel Islands

A Special Case Solution to the Perspective 3-Point Problem William J. Wolfe California State University Channel Islands A Secial Case Solution to the Persective -Point Problem William J. Wolfe California State University Channel Islands william.wolfe@csuci.edu Abstract In this aer we address a secial case of the ersective

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming

Maximum Entropy and the Stress Distribution in Soft Disk Packings Above Jamming Maximum Entroy and the Stress Distribution in Soft Disk Packings Above Jamming Yegang Wu and S. Teitel Deartment of Physics and Astronomy, University of ochester, ochester, New York 467, USA (Dated: August

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Journal of Sound and Vibration (998) 22(5), 78 85 VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Acoustics and Dynamics Laboratory, Deartment of Mechanical Engineering, The

More information

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation 6.3 Modeling and Estimation of Full-Chi Leaage Current Considering Within-Die Correlation Khaled R. eloue, Navid Azizi, Farid N. Najm Deartment of ECE, University of Toronto,Toronto, Ontario, Canada {haled,nazizi,najm}@eecg.utoronto.ca

More information

One step ahead prediction using Fuzzy Boolean Neural Networks 1

One step ahead prediction using Fuzzy Boolean Neural Networks 1 One ste ahead rediction using Fuzzy Boolean eural etworks 1 José A. B. Tomé IESC-ID, IST Rua Alves Redol, 9 1000 Lisboa jose.tome@inesc-id.t João Paulo Carvalho IESC-ID, IST Rua Alves Redol, 9 1000 Lisboa

More information

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding Outline EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Error detection using arity Hamming code for error detection/correction Linear Feedback Shift

More information

Recursive Estimation of the Preisach Density function for a Smart Actuator

Recursive Estimation of the Preisach Density function for a Smart Actuator Recursive Estimation of the Preisach Density function for a Smart Actuator Ram V. Iyer Deartment of Mathematics and Statistics, Texas Tech University, Lubbock, TX 7949-142. ABSTRACT The Preisach oerator

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm Gabriel Noriega, José Restreo, Víctor Guzmán, Maribel Giménez and José Aller Universidad Simón Bolívar Valle de Sartenejas,

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

An Algorithm of Two-Phase Learning for Eleman Neural Network to Avoid the Local Minima Problem

An Algorithm of Two-Phase Learning for Eleman Neural Network to Avoid the Local Minima Problem IJCSNS International Journal of Comuter Science and Networ Security, VOL.7 No.4, Aril 2007 1 An Algorithm of Two-Phase Learning for Eleman Neural Networ to Avoid the Local Minima Problem Zhiqiang Zhang,

More information

Time Domain Calculation of Vortex Induced Vibration of Long-Span Bridges by Using a Reduced-order Modeling Technique

Time Domain Calculation of Vortex Induced Vibration of Long-Span Bridges by Using a Reduced-order Modeling Technique 2017 2nd International Conference on Industrial Aerodynamics (ICIA 2017) ISBN: 978-1-60595-481-3 Time Domain Calculation of Vortex Induced Vibration of Long-San Bridges by Using a Reduced-order Modeling

More information

Phase locking of two independent degenerate. coherent anti-stokes Raman scattering processes

Phase locking of two independent degenerate. coherent anti-stokes Raman scattering processes Phase locking of two indeendent degenerate coherent anti-tokes Raman scattering rocesses QUN Zhang* Hefei National Laboratory for Physical ciences at the Microscale and Deartment of Chemical Physics, University

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

arxiv: v1 [cond-mat.stat-mech] 1 Dec 2017

arxiv: v1 [cond-mat.stat-mech] 1 Dec 2017 Suscetibility Proagation by Using Diagonal Consistency Muneki Yasuda Graduate School of Science Engineering, Yamagata University Kazuyuki Tanaka Graduate School of Information Sciences, Tohoku University

More information

Estimating Time-Series Models

Estimating Time-Series Models Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary

More information

PHYS 301 HOMEWORK #9-- SOLUTIONS

PHYS 301 HOMEWORK #9-- SOLUTIONS PHYS 0 HOMEWORK #9-- SOLUTIONS. We are asked to use Dirichlet' s theorem to determine the value of f (x) as defined below at x = 0, ± /, ± f(x) = 0, - < x

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

Developing A Deterioration Probabilistic Model for Rail Wear

Developing A Deterioration Probabilistic Model for Rail Wear International Journal of Traffic and Transortation Engineering 2012, 1(2): 13-18 DOI: 10.5923/j.ijtte.20120102.02 Develoing A Deterioration Probabilistic Model for Rail Wear Jabbar-Ali Zakeri *, Shahrbanoo

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers Sr. Deartment of

More information

x and y suer from two tyes of additive noise [], [3] Uncertainties e x, e y, where the only rior knowledge is their boundedness and zero mean Gaussian

x and y suer from two tyes of additive noise [], [3] Uncertainties e x, e y, where the only rior knowledge is their boundedness and zero mean Gaussian A New Estimator for Mixed Stochastic and Set Theoretic Uncertainty Models Alied to Mobile Robot Localization Uwe D. Hanebeck Joachim Horn Institute of Automatic Control Engineering Siemens AG, Cororate

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

Finding recurrent sources in sequences

Finding recurrent sources in sequences Finding recurrent sources in sequences Aristides Gionis Deartment of Comuter Science Stanford University Stanford, CA, 94305, USA gionis@cs.stanford.edu Heikki Mannila HIIT Basic Research Unit Deartment

More information

KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS

KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS 4 th International Conference on Earthquake Geotechnical Engineering June 2-28, 27 KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS Misko CUBRINOVSKI 1, Hayden BOWEN 1 ABSTRACT Two methods for analysis

More information

MAKING WALD TESTS WORK FOR. Juan J. Dolado CEMFI. Casado del Alisal, Madrid. and. Helmut Lutkepohl. Humboldt Universitat zu Berlin

MAKING WALD TESTS WORK FOR. Juan J. Dolado CEMFI. Casado del Alisal, Madrid. and. Helmut Lutkepohl. Humboldt Universitat zu Berlin November 3, 1994 MAKING WALD TESTS WORK FOR COINTEGRATED VAR SYSTEMS Juan J. Dolado CEMFI Casado del Alisal, 5 28014 Madrid and Helmut Lutkeohl Humboldt Universitat zu Berlin Sandauer Strasse 1 10178 Berlin,

More information