Cascaded redundancy reduction

Size: px
Start display at page:

Download "Cascaded redundancy reduction"

Transcription

1 Network: Comput. Neural Syst. 9 (1998) Printe in the UK PII: S X(98) Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto, Toronto, Ontario, M5S 1A4, Canaa Receive 13 September 1997 Abstract. We escribe a metho for incrementally constructing a hierarchical generative moel of an ensemble of binary ata vectors. The moel is compose of stochastic, binary, logistic units. Hien units are ae to the moel one at a time with the goal of minimizing the information require to escribe the ata vectors using the moel. In aition to the top own generative weights that efine the moel, there are bottom up recognition weights that etermine the binary states of the hien units given a ata vector. Even though the stochastic generative moel can prouce each ata vector in many ways, the recognition moel is force to pick just one of these ways. The recognition moel therefore unerestimates the ability of the generative moel to preict the ata, but this unerestimation greatly simplifies the process of searching for the generative an recognition weights of a new hien unit. 1. Introuction Unsupervise learning algorithms attempt to extract statistical structure from an ensemble of input vectors without the help of an aitional teaching signal. One way of efining what it means to extract statistical structure is to appeal to a stochastic generative moel that prouces ata vectors from hien stochastic variables. If a relatively simple generative moel is likely to have prouce the observe ata vectors, then we can view the unerlying states that the moel uses to generate a particular ata vector as an economical representation of that vector. In this paper we consier the problem of fitting a generative moel that is compose of multiple layers of stochastic binary units. The bottom layer consists of visible units which correspon to the iniviual components of a binary ata vector. The moel generates ata by starting at the top layer an working ownwars. At the top level, each unit turns on ranomly an inepenently with a probability that epens only on its generative bias. At each subsequent level, a unit, i, receives top own input that epens on the alreay ecie binary states, s j, of units in the layer above an the generative weights, g j,i. This top own input is combine with the unit s generative bias, g 0,i to prouce the unit s total generative input, x i : x i = g 0,i + s j g j,i (1) j>i Present aress: Sloan Center for Theoretical Neurobiology, Department of Physiology Box 0444, 513 Parnassus Ave, San Francisco, CA , USA. The bias is equivalent to a generative weight from a bias unit that is always on. For notational convenience we efine this noe as unit X/98/ $19.50 c 1998 IOP Publishing Lt 73

2 74 V R e Sa an G E Hinton The total generative input is then put through a logistic function to etermine the probability, p i, of the unit being turne on: 1 p i = σ(x i ) =. (2) 1+e x i Because the units are stochastic, each top own generative pass through the moel will typically prouce a ifferent ata vector an each ata vector can typically be prouce in many ifferent ways. If we start with a multilayer moel with a fixe architecture an we want to learn the generative weights an biases, the obvious objective function is the log likelihoo of the observe ata uner the generative moel: log p( θ) = ( ) log p(α θ)p( α, θ) (3) α where is an inex over ata vectors, α is an inex over all possible assignments of binary states to the hien units an θ is the generative weights an biases. If units receive top own generative connections from many units in the layer above, to maximize this objective function using either the EM algorithm or graient ascent is an intractable problem. Both of these algorithms nee to compute the posterior istribution over exponentially many hien state vectors, α, given a ata vector,, an the generative parameters, θ. Instea of computing the posterior istribution exactly, it can be approximate using Gibbs sampling (Neal 1992) or mean-fiel methos (Jaakkola et al 1996), though neither of these approaches seems to have much biological plausibility. A Helmholtz machine (Dayan et al 1995) uses a separate set of recognition connections to compute an approximation to the posterior istribution. It can be shown that the use of an approximation still allows EM to maximize a lower boun on the log probability of the ata, with the tightness of the boun epening on the quality of the approximation (Neal an Hinton 1998). For one version of the Helmholtz machine (Hinton et al 1995), there is a very simple wake sleep learning algorithm that uses only locally available information. An extreme version of the iea behin Helmholtz machines is for the recognition connections to approximate the posterior istribution using a single vector of hien states. Although this provies a much worse boun than the other approximations it greatly simplifies the search for the recognition an generative weights because there is no nee to integrate across the many alternative ways of proucing the ata. The hope is that the computational convenience will compensate for the looser boun. A seconary aim is to see how much is gaine by allowing the recognition connections of a Helmholtz machine to prouce a whole istribution rather than a single hien state vector. In this form, the algorithm is similar to that of Relich (1993). The biggest ifference is in the search strategy an the construction in this case of a stochastic generative moel. Given a ata vector,, the recognition connections prouce a binary state vector, s, over the hien units. Using s we get an upper boun on the negative log probability of the ata: log p( θ) = log α p(α θ)p( α, θ) log p(s θ)p( s,θ). (4) This is essentially the same ebate as Viterbi versus Baum Welch in the fitting of hien Markov moels, except that Baum Welch computes the true posterior istribution rather than just an approximation to it.

3 Cascae reunancy reuction 75 This boun an its erivatives can be compute in a time that is linear in the number of connections, so it is much more efficient to perform graient escent in the upper boun than in the true negative log probability of the ata. 2. A minimum escription length (MDL) interpretation of the objective function There is an alternative interpretation of the upper boun which is mathematically equivalent, but conceptually ifferent. We think in terms of a communication game in which a sener must communicate each ata vector to a receiver. Rather than sening the iniviual components of each ata vector separately, the sener first uses the recognition connections to prouce a hierarchical representation of the ata vector. Then she sens this representation starting at the top level an working ownwars. The iea is that if the sener has foun a goo moel for representing the ata, it shoul take fewer bits to sen the ata using the representations prescribe by the moel than to sen the raw ata. In the full MDL framework, we must also inclue the one-time cost of communicating the moel itself. In this paper we ignore the moel cost, but it coul be use as a criterion for eciing when to stop enlarging the moel (only a the new unit when the improvement in coing cost is more than the cost of communicating the new unit; this is epenent on the coing strategy for the weights). Accoring to Shannon s coing theorem, if a sener an a receiver have agree upon a probability istribution uner which a iscrete event has probability p, the event can be communicate using log 2 p bits. This is an asymptotic result that involves using block coing techniques, but the etails are irrelevant here. On average, it is best for the sener an receiver to agree on the correct probability istribution, but this is not require. They can use any istribution they like. For communicating the binary state, si, of unit i that is prouce by the recognition connections, the sener an receiver use the Bernoulli istribution pi,1 p i that is obtaine by applying the generative moel to the states in the layer above (see equation (2)). This istribution is available to the receiver because the states of the units in the layer above have alreay been communicate. So it requires si log 2 pi (1 si ) log 2 (1 p i ) bits to communicate s i. The cost of communicating a ata vector is simply the sum of this quantity over all units, incluing the visible units that represent the ata. So the escription length is C() = i s i log 2 p i (1 s i ) log 2 (1 p i ) = log p(s θ)p( s,θ). (5) 3. Why a hien unit helps Consier the cost of escribing the ata using no hien units. The only ajustable parameters in the moel are the generative biases of the visible units. The cost in equation (5) is minimize by setting these biases so that p i = σ(g 0,i ) is equal to the fraction of the ata vectors in which unit i is on. We call this the base-rate moel. The base-rate moel makes goo use of the iniviual frequencies with which visible units come on, but it ignores all correlations. Now, suppose we have a single hien unit, k. If a subset of the visible units ten to come on together, this reunancy can be capture by setting the recognition weights, r i,k, of the hien unit so that it comes on in these circumstances an setting its generative weights, g k,i, so as to increase p i for all the units in the subset. By moifying

4 76 V R e Sa an G E Hinton the generative biases of the visible units it is also possible to reuce p i when the hien unit is off. In effect, the hien unit allows the base rates to be epenent on which ata vector is being escribe. With just one hien unit we obtain a mixture of two base-rate moels. After aing one hien unit, we coul split the ata into two isjoint subsets an apply the same algorithm again to each subset separately. This woul create a tree structure an woul be a natural extension of ecision tree algorithms like CART (Breiman et al 1984) to the task of probability ensity estimation. Unfortunately, splitting the ata in this way is usually a ba strategy when there are multiple ifferent regularities in the ata. Suppose, for example, that components of the ata vector can be split into two subsets. Within each subset there are weak correlations, but between subsets components are inepenent. The first split in a tree can use all of the ata to etect the reunancy within one of the subsets, but at the next level own the other reunancy must be iscovere separately in each half of the tree using only half the ata. To avoi this problem, we consier all of the ata when aing each new hien unit, but we take into account the work that is alreay being one by the previous hien units in creating appropriate top own probabilities for escribing the states of lower units. We also allow new hien units to receive recognition connections from existing hien units an to sen generative connections to them. In aing a new hien unit, k, we greeily attempt to minimize the cost of escribing the ata using the new unit an all the previous ones. This cost inclues the cost of escribing the state of k an the cost of escribing the state of every pre-existing unit, i, using the new generative probabilities create by combining the new bias for i with the generative weight from k an the pre-existing generative weights from units above i: C = ( s k log σ(g 0,k ) (1 sk ) log[1 σ(g 0,k)] ) + ( s i log[σ(x i )] (1 si )log[1 σ(x i)] ) (6) i where x i is the generative input to unit i: x i = g 0,i + sk g k,i + sj g j,i. k>j>i The term k>j>i s j g j,i is unaffecte by aing unit k, so it can be store for each ata vector in the training set, thus eliminating much computation. To emphasize this an to simplify subsequent equations we efine G k>j>i sj g j,i. (7) k>j>i In searching for the best generative an recognition weights for k, we allow the generative biases of all earlier units to be moifie, but not the other generative weights or the recognition weights or biases of other units. 4. Creating a smooth search space Once a new hien unit has been ae to the network, it behaves eterministically. Its recognition weights cause it to be either on or off for each ata vector. However, the search for a suitable set of recognition weights is easier if the weights have a smooth, ifferentiable effect on the unit s behaviour. This can be achieve by using a stochastic unit whose probability of being on is a smooth monotonic function of its recognition weights.

5 Cascae reunancy reuction 77 Each ata vector etermines a single state for all previous hien units but for the new unit we consier both possible states an compute the expecte value of the cost function in equation (6) given this stochastic behaviour. To make the cost function represent the negative log probability of the ata vector allowing for both states of the new hien unit, it is necessary to subtract the entropy of the state of the new hien unit, as explaine in Hinton an Zemel (1994). If this entropy term is omitte, the expecte cost is always minimize by scaling up all of the units recognition weights so that it behaves eterministically. While searching for the best recognition weights we want the hien unit to behave stochastically in orer to smooth the search space, but at the conclusion of the search we want the hien unit to behave eterministically. This can be achieve by using a temperature parameter which scales both the entropy term an the softness of the logistic function use in recognition (but not the one use in generation). During the search the temperature is reuce from 1 to 0 in small steps. At a given temperature, the cost function to be minimize is ( [ q k s i log ( σ(g k>j>i + g 0,i + g k,i ) ) C = i<k (1 si ) log ( 1 σ(g k>j>i + g 0,i + g k,i ) )] +(1 qk ) [ si log σ(g k>j>i + g 0,i) (1 si ) log ( 1 σ(g k>j>i + g 0,i) )]) + [ q k log σ(g 0,k ) (1 qk ) log ( 1 σ(g 0,k ) )] + T [ (q k log q k ) + (1 q k ) log(1 q k )] (8) where q k is the recognition probability of unit k for ata vector. The last line of equation (8) is the entropy of unit k (weighte by T ) an the penultimate line is the expecte cost of coing the state of unit k given its generative bias. The first two lines of the equation are the cost of coing all the other units given that unit k is on, weighte by the probability that k is on. The thir an fourth lines are the cost if k is off, weighte by the corresponing probability. 5. The stanar CRR learning proceure We have explore several variants of the cascae reunancy reuction (CRR) learning proceure. To simplify the iscussion we first escribe the stanar proceure. The outermost loop of the proceure consists of aing hien units one at a time in a cascae fashion (shown in figure 1) as in Fahlman an Lebiere (1990). Once a unit has been ae, its recognition weights an bias are never change. In searching for the best generative an recognition weights for the new unit, CRR ecreases the temperature from 1 to 0 in steps, an at each temperature it performs an alternating optimization that closely resembles the EM algorithm. Holing the recognition weights fixe, an iterative search is performe for the optimal generative weights an biases (an M-step). Then, holing the generative weights an biases fixe, the recognition weights

6 78 V R e Sa an G E Hinton generative connections recognition connections g 0,k q k r 0,k g 0,j s j g k,i r i,k r 0,j G k>j>i s i g 0,i Figure 1. The CRR network. The binary activity of unit i, resulting from application of pattern to the network, is given by si. The unit currently being consiere (k) has analogue activity given by qk. The recognition weight from unit i to unit k is given by r i,k an the generative weights from k to i by g k,i The bias unit is efine to be unit 0. For a given input pattern we represent the sum generative input from all ae units j to input unit i by G k>j>i. an bias are iteratively improve (a partial E-step). The overall algorithm has the form: Set the initial generative biases for visible units Repeat aing hien units until stopping criterion is met Create a new unit, k, with ranom recognition weights an recognition bias Set T = 1 Repeat until T = 0 Repeat N times Do Newton searches for new generative biases of all previous units, j Do Newton searches for generative weights from new unit, k Do a conjugate graient search for recognition weights an bias of k Decrease T using a temperature scheule A the new hien unit to the network Set the generative bias of the new hien unit Use conjugate graient to back-fit ALL generative weights an biases Upating the recognition weights The upate rule for the recognition weights can be obtaine by taking the partial erivative of the cost with respect to each moifiable recognition weight. This gives C r j,k = 1 T C qk (1 q k )s j (9) q k

7 Cascae reunancy reuction 79 where C qk = ( si log σ(g k>j>i + g 0,i + g k,i ) i<k σ(g k>j>i + g (1 si 0,i) ) log 1 σ(g k>j>i + g ) 0,i + g k,i ) 1 σ(g k>j>i + g 0,i) σ(g 0,k ) log (1 σ(g 0,k )) + T log qk (1 qk (10) ). A conjugate graient search technique is use to etermine successive steps in the search Upating the generative weights The generative weights coul also be upate using a graient metho. However, we can o better by taking avantage of the inepenence between the iniviual weights an the monotonicity of the erivative. First note that the partial erivative with respect to one of the generative weights epens only on that generative weight an the bias weight to the same unit: C = q ( k s g i σ(g k>j>i + g 0,i + g k,i ) ). (11) k,i Similarly, for the partial erivative with respect to one of the bias weights: C = q ( k s g i σ(g k>j>i + g 0,i + g k,i ) ) (1 qk )( si σ(g k>j>i + g 0,i) ). (12) 0,i This gives two equations (with two unknowns). However, we will show below that fining the local extrema involves solving equations of only one variable. Setting the erivative equal to zero in equations (11) an (12) gives 0 = q ( k s i σ(g k>j>i + g 0,i + g k,i ) ) 0 = ( q k ( s i σ(g k>j>i + g 0,i + g k,i ) ) + (1 q k )( s i σ(g k>j>i + g 0,i) )). an combining gives 0 = (1 q k )( s i σ(g k>j>i + g 0,i) ) which we can solve for g 0,i. Unfortunately, this equation cannot be solve irectly, but as it is a monotonic function it coul be solve using a binary search technique. We chose, however, to use Newton s metho. Letting A = (1 qk )( si σ(g k>j>i + g 0,i) ) we solve for g 0,i using ( ) g 0,i (t + 1) = g 0,i (t) + sign(a) min 2, A(g 0,i (t)) (A/g 0,i ) ɛ (13) where ɛ is a positive safety factor use to avoi instability where the secon erivative (first erivative of A) vanishes. This, together with the constraint that the maximum step size have magnitue 2, gives a robust implementation of Newton s algorithm. Note that ue to the monotonicity of A, A/g 0,i is always non-positive.

8 80 V R e Sa an G E Hinton Once we have solve for g 0,i in equation (13), we can use it to fin g k,i in a similar manner: B = q ( k s i σ(g k>j>i + g 0,i + g k,i ) ) ( ) (14) g k,i (t + 1) = g k,i (t) + sign(b) min 2, B(g 0,i,g k,i (t))) (B/g k,i ) ɛ Back-fitting the generative weights Greey algorithms have the isavantage that a step taken early may lea to subsequent poor performance. Ieally, we woul like to go back an moify earlier connections with ae hinsight. Unfortunately, to o this for all connections woul efeat the efficiency avantages of a greey algorithm. Changing the recognition weights to a lower unit coul invaliate the recognition weights to higher units as they epen on activity propagate from lower units. This woul also invaliate the generative weights as they epen on the activate recognition pattern s. The generative weights an biases, however, can be upate for little cost an without invaliating the recognition connections. The upate rule is etermine from the appropriate graient: ( ( C )) = s j s i σ s k g k,i g j,i k>i ( C = s i σ s k g k,i ). (16) g 0,i k>i The conjugate graient algorithm is also use to upate these weights. A few conjugate graient steps of back-fitting are performe after the aition of each new unit. 6. Algorithm moifications 6.1. Aing a unit more carefully One way to avoi some of the worst local minima while aing units is to consier a pool of units for each new input unit as in cascae-correlation (Fahlman an Lebiere 1990). These units can be initialize with ifferent ranom weights an traine inepenently in parallel. After training, the performance of each caniate unit can be assesse an the best unit ae. We foun that for a fixe number of units ae, this i improve the performance, but as iscusse below, for better performance with a fixe amount of time, serial consieration of caniate units was better. Only one caniate unit was traine at any time but if it i not lea to a lower coing cost it was not inclue. To avoi overfitting, the coing costs were evaluate on a separate valiation set, rather than on the ata use to train the weights. This evaluation was performe before the back-fitting stage Mini-batches All the iterative upates are one using batch algorithms. For the ata-set we use, an most suitable ata-sets, the size of the set is very large an the patterns within it are quite reunant. This suggests that an on-line algorithm woul be more efficient. To maintain (15)

9 Cascae reunancy reuction 81 the parameter-free avantages of the conjugate graient batch algorithm while increasing the learning spee we investigate the effect of using mini-batches. For each consiere unit 20% of the patterns, balance for class content, were use for the recognition weight searches an the generative Newton searches. Using this strategy, we achieve rapi initial ecrease in coing cost. At this stage it became more efficient to increase the size of the mini-batches instea of increasing the number of minimizing steps pursue for each unit. The batch size traes off spee for appropriate escent irection. Early in the search, the exact estimate of the graient is not important to allow significant improvement. When the search has evolve to a reasonable solution, further progress epens on progressively more accurate estimates of the graient of the esire function. This suggests that an aaptively growing batch size that monitors the percentage of rejecte units an increases the batch size when it gets too large woul be a goo strategy. 7. Results We teste the algorithm using a ata-set previously use to test similar algorithms (Hinton et al 1995; Frey et al 1996). This ata-set consists of normalize igits quantize into 8 8 binary images from the US Postal Service Office of Avance Technology. As in Frey et al (1996) the ata were ivie into 6000 training examples, 2000 valiation examples an 5000 examples for testing. The valiation ata were use to evaluate whether a unit shoul be ae an also, as they are alreay loae, for back-fitting the generative weights after aition of a unit. Training curves are shown in figures 2 an 3 for training with the full (training) ata-set, the 20% mini-batches as well as a run with a graually increasing batch size. The batches were 10% for the first 20 iterations, 20% for the next 20 iterations, 50% for the next 10 iterations, an 100% for the last 30 iterations. The curves plot the CRR calculate coing cost (an upper boun on the true coing cost) on the valiation set. As mentione, figure full batches 1/5 batches aaptive batches Coing Cost (bits) Time (secons) Figure 2. Coing cost versus time. The x-axis gives the number of secons when run on a 200 MHz R4400 chip with a 4 MB seconary cache. Each plotte point represents the aition of one unit. The plot for the mini-batches was run for s but ae no units after 5500 s. The y-axis gives the average coing cost over the ata-set (in bits) for the network at that stage.

10 82 V R e Sa an G E Hinton full batches 1/5 batches aaptive batches Coing Cost (bits) Number of Hien Units Figure 3. Coing cost versus number of ae units. This graph plots the same information as in figure 2, but with respect to the number of units in the current network. This is not just a simple rescale version of figure 2, as the consieration of each unit i not always result in a unit being ae to the network. shows that the mini-batches allow faster learning (particularly at the beginning) but are limite in their asymptotic accuracy. Increasing the batch size throughout learning, allows fast initial learning an goo late learning. Figure 3 shows that when using the valiation set, to ecie about aing a new unit, the resulting efficiency of the networks are similar for all the training methos. While the objective function of our network was specifically esigne, for simplicity, to optimize the coing cost using a single hien state per ata pattern, the resulting generative moel is a stochastic moel capable of coing each pattern with a istribution over hien states just like those constructe using Gibbs sampling, mean-fiel methos, an Helmholtz machines. We can therefore estimate the actual cost of coing the ata using the true posterior istribution of our traine networks. Testing the network traine with the increasing batch size, we foun that the CRR estimate coing cost (using s as the sole posterior state) was significantly overestimating the true coing cost of the network, particularly as the size of the network grew. Figure 4 shows the estimate coing cost using the CRR (single hien state) cost on the valiation set (as shown in figure 2) reprouce besie the coing cost on the test set calculate using Gibbs sampling of the full posterior istribution of hien states. Note that the Gibbs sample estimate of the test error gives a smaller coing cost (for all but the smallest network) an that the ifference increases with the size of the network. The Gibbs sample coing cost, on the test ata, for the network with 30 hien units was 38.6 bits, compare with the CRR valiation coing cost of 40.8 bits. For comparison a Helmholtz machine traine using the wake sleep algorithm with 72 hien units (in a layere network architecture) achieve a test-set coing cost of 39.1 bits in 3120 s (Frey et al 1996). This coing cost was achieve using the recognition istribution learne by the algorithm. For networks of this size, it was not possible to compute the true log-likelihoo of the ata uner the generative network learne by the wake sleep algorithm. However, Brenan Frey (1977) has foun that in a variety of cases in which the networks are small enough to compute the log-likelihoo of the ata exactly, the coing cost given by the recognition istribution was very close to the true

11 Cascae reunancy reuction CRR valiation error (upper boun) Gibbs sample test error (close approximation) 52 Coing Cost (bits) Time (secons) Figure 4. Coing cost versus time. This graph shows the effect of consiering the whole posterior istribution over hien states when calculating the coing cost. The whole posterior istribution is approximate using prolonge Gibbs sampling. log-likelihoo uner the generative moel. Frey et al (1996) reporte the total training time as 7200 s, which inclue training several architectures from which they picke the best. 8. Discussion The central iea of this training algorithm is to avoi the computational cost of computing the posterior istribution over the hien states by using a single-state approximation. This gives computational simplicity at the expense of increase coing cost. We foun that even though the network was traine to optimize coing cost using a single hien state, it ha a reuce cost when the full posterior istribution over hien states was consiere. Consiere in this way, the algorithm prouce networks with similar coing cost to those create using the wake sleep algorithm. The avantage of the CRR metho is that the size of the network oes not have to be pre-etermine. Also, it is hope that as the networks are biase to performing well with single-state posteriors, they might lea to simpler more unerstanable representations. On the particular problem we trie, our networks of 30 units achieve a lower coing cost than the 72 hien unit network traine with the wake sleep algorithm. Acknowlegments We thank Brenan Frey for proviing us with the coe for the Gibbs sampling calculations an the performance results for the stochastic Helmholtz machine. This research was fune by the Institute for Robotics an Intelligent Systems, the Information Technology Research Center, an NSERC. GH is the Nesbitt Burns Fellow of the Canaian Institute for Avance Research.

12 84 V R e Sa an G E Hinton References Breiman L, Frieman J H an Olshen R A an Stone C J 1984 Classification an Regression Trees (Belmont, CA: Wasworth) Dayan P, Hinton G E, Neal R M an Zemel R S 1995 The Helmholtz machine Neural Comput Fahlman S E an Lebiere C 1990 The Cascae-Correlation Learning Architecture Avances in Neural Information Processing Systems 2, e D S Touretzky (San Mateo, CA: Kaufmann) pp Frey B J 1997 personal communication Frey B J, Hinton G E an Dayan P 1996 Does the wake sleep algorithm prouce goo ensity estimators? Avances in Neural Information Processing Systems 8 e D S Touretzky, M C Mozer an M E Hasselmo (Cambrige, MA: MIT Press) pp Hinton G E, Dayan P, Frey B J an Neal R M 1995 The wake sleep algorithm for unsupervise neural networks Science Hinton G E an Zemel R S 1994 Autoencoers, minimum ecription length an Helmholz free energy Avances in Neural Information Proceessing Systems 6 e J D Cowan, G Tesauro an J Alspector (San Mateo, CA: Morgan Kaufmann) pp 3 10 Jaakkola T, Saul L K an Joran M I 1996 Fast learning by bouning likelihoos in sigmoi type belief networks Avances in Neural Information Processing Systems 8, e D S Touretzky, M C Mozer an M E Hasselmo (Cambrige, MA: MIT Press) pp Neal R M 1992 Connectionist learning of belief networks Artificial Intelligence Neal R M an Hinton G E 1998 A new view of the EM algorithm that justifies incremental an other variants Learning in Graphical Moels e M I Joran (Dorrecht: Kluwer) to be publishe (currently available from ftp://ftp.cs.utoronto.ca/pub/rafor/em.ps.z) Relich A N 1993 Reunancy reuction as a strategy for unsupervise learning Neural Comput

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

TIME-DELAY ESTIMATION USING FARROW-BASED FRACTIONAL-DELAY FIR FILTERS: FILTER APPROXIMATION VS. ESTIMATION ERRORS

TIME-DELAY ESTIMATION USING FARROW-BASED FRACTIONAL-DELAY FIR FILTERS: FILTER APPROXIMATION VS. ESTIMATION ERRORS TIME-DEAY ESTIMATION USING FARROW-BASED FRACTIONA-DEAY FIR FITERS: FITER APPROXIMATION VS. ESTIMATION ERRORS Mattias Olsson, Håkan Johansson, an Per öwenborg Div. of Electronic Systems, Dept. of Electrical

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

The wake-sleep algorithm for unsupervised neural networks

The wake-sleep algorithm for unsupervised neural networks The wake-sleep algorithm for unsupervised neural networks Geoffrey E Hinton Peter Dayan Brendan J Frey Radford M Neal Department of Computer Science University of Toronto 6 King s College Road Toronto

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

arxiv: v5 [cs.lg] 28 Mar 2017

arxiv: v5 [cs.lg] 28 Mar 2017 Equilibrium Propagation: Briging the Gap Between Energy-Base Moels an Backpropagation Benjamin Scellier an Yoshua Bengio * Université e Montréal, Montreal Institute for Learning Algorithms March 3, 217

More information

How to Minimize Maximum Regret in Repeated Decision-Making

How to Minimize Maximum Regret in Repeated Decision-Making How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Maximal Causes for Non-linear Component Extraction

Maximal Causes for Non-linear Component Extraction Journal of Machine Learning Research 9 (2008) 1227-1267 Submitte 5/07; Revise 11/07; Publishe 6/08 Maximal Causes for Non-linear Component Extraction Jörg Lücke Maneesh Sahani Gatsby Computational Neuroscience

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Estimating Causal Direction and Confounding Of Two Discrete Variables

Estimating Causal Direction and Confounding Of Two Discrete Variables Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]

More information

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Neural Networks. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Neural Networks. Tobias Scheffer Universität Potsam Institut für Informatik Lehrstuhl Maschinelles Lernen Neural Networks Tobias Scheffer Overview Neural information processing. Fee-forwar networks. Training fee-forwar networks, back

More information

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

inflow outflow Part I. Regular tasks for MAE598/494 Task 1 MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Nonlinear Adaptive Ship Course Tracking Control Based on Backstepping and Nussbaum Gain

Nonlinear Adaptive Ship Course Tracking Control Based on Backstepping and Nussbaum Gain Nonlinear Aaptive Ship Course Tracking Control Base on Backstepping an Nussbaum Gain Jialu Du, Chen Guo Abstract A nonlinear aaptive controller combining aaptive Backstepping algorithm with Nussbaum gain

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation

A Novel Decoupled Iterative Method for Deep-Submicron MOSFET RF Circuit Simulation A Novel ecouple Iterative Metho for eep-submicron MOSFET RF Circuit Simulation CHUAN-SHENG WANG an YIMING LI epartment of Mathematics, National Tsing Hua University, National Nano evice Laboratories, an

More information

Delay Limited Capacity of Ad hoc Networks: Asymptotically Optimal Transmission and Relaying Strategy

Delay Limited Capacity of Ad hoc Networks: Asymptotically Optimal Transmission and Relaying Strategy Delay Limite Capacity of A hoc Networks: Asymptotically Optimal Transmission an Relaying Strategy Eugene Perevalov Lehigh University Bethlehem, PA 85 Email: eup2@lehigh.eu Rick Blum Lehigh University Bethlehem,

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

A NONLINEAR SOURCE SEPARATION APPROACH FOR THE NICOLSKY-EISENMAN MODEL

A NONLINEAR SOURCE SEPARATION APPROACH FOR THE NICOLSKY-EISENMAN MODEL 6th European Signal Processing Conference EUSIPCO 28, Lausanne, Switzerlan, August 25-29, 28, copyright by EURASIP A NONLINEAR SOURCE SEPARATION APPROACH FOR THE NICOLSKY-EISENMAN MODEL Leonaro Tomazeli

More information

Stopping-Set Enumerator Approximations for Finite-Length Protograph LDPC Codes

Stopping-Set Enumerator Approximations for Finite-Length Protograph LDPC Codes Stopping-Set Enumerator Approximations for Finite-Length Protograph LDPC Coes Kaiann Fu an Achilleas Anastasopoulos Electrical Engineering an Computer Science Dept. University of Michigan, Ann Arbor, MI

More information

AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD. Mathcad Release 14. Khyruddin Akbar Ansari, Ph.D., P.E.

AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD. Mathcad Release 14. Khyruddin Akbar Ansari, Ph.D., P.E. AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD Mathca Release 14 Khyruin Akbar Ansari, Ph.D., P.E. Professor of Mechanical Engineering School of Engineering an Applie Science Gonzaga University SDC

More information

arxiv: v1 [math.co] 29 May 2009

arxiv: v1 [math.co] 29 May 2009 arxiv:0905.4913v1 [math.co] 29 May 2009 simple Havel-Hakimi type algorithm to realize graphical egree sequences of irecte graphs Péter L. Erős an István Miklós. Rényi Institute of Mathematics, Hungarian

More information

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation Error Floors in LDPC Coes: Fast Simulation, Bouns an Harware Emulation Pamela Lee, Lara Dolecek, Zhengya Zhang, Venkat Anantharam, Borivoje Nikolic, an Martin J. Wainwright EECS Department University of

More information

arxiv: v1 [hep-lat] 19 Nov 2013

arxiv: v1 [hep-lat] 19 Nov 2013 HU-EP-13/69 SFB/CPP-13-98 DESY 13-225 Applicability of Quasi-Monte Carlo for lattice systems arxiv:1311.4726v1 [hep-lat] 19 ov 2013, a,b Tobias Hartung, c Karl Jansen, b Hernan Leovey, Anreas Griewank

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Experiment 2, Physics 2BL

Experiment 2, Physics 2BL Experiment 2, Physics 2BL Deuction of Mass Distributions. Last Upate: 2009-05-03 Preparation Before this experiment, we recommen you review or familiarize yourself with the following: Chapters 4-6 in Taylor

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD. Mathcad Release 13. Khyruddin Akbar Ansari, Ph.D., P.E.

AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD. Mathcad Release 13. Khyruddin Akbar Ansari, Ph.D., P.E. AN INTRODUCTION TO NUMERICAL METHODS USING MATHCAD Mathca Release 13 Khyruin Akbar Ansari, Ph.D., P.E. Professor of Mechanical Engineering School of Engineering Gonzaga University SDC PUBLICATIONS Schroff

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

A Unified Approach for Learning the Parameters of Sum-Product Networks

A Unified Approach for Learning the Parameters of Sum-Product Networks A Unifie Approach for Learning the Parameters of Sum-Prouct Networks Han Zhao Machine Learning Dept. Carnegie Mellon University han.zhao@cs.cmu.eu Pascal Poupart School of Computer Science University of

More information

How the potentials in different gauges yield the same retarded electric and magnetic fields

How the potentials in different gauges yield the same retarded electric and magnetic fields How the potentials in ifferent gauges yiel the same retare electric an magnetic fiels José A. Heras a Departamento e Física, E. S. F. M., Instituto Politécnico Nacional, México D. F. México an Department

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

A New Minimum Description Length

A New Minimum Description Length A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum

More information

The influence of the equivalent hydraulic diameter on the pressure drop prediction of annular test section

The influence of the equivalent hydraulic diameter on the pressure drop prediction of annular test section IOP Conference Series: Materials Science an Engineering PAPER OPEN ACCESS The influence of the equivalent hyraulic iameter on the pressure rop preiction of annular test section To cite this article: A

More information

A simple model for the small-strain behaviour of soils

A simple model for the small-strain behaviour of soils A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:

More information

Lecture 10 Notes, Electromagnetic Theory II Dr. Christopher S. Baird, faculty.uml.edu/cbaird University of Massachusetts Lowell

Lecture 10 Notes, Electromagnetic Theory II Dr. Christopher S. Baird, faculty.uml.edu/cbaird University of Massachusetts Lowell Lecture 10 Notes, Electromagnetic Theory II Dr. Christopher S. Bair, faculty.uml.eu/cbair University of Massachusetts Lowell 1. Pre-Einstein Relativity - Einstein i not invent the concept of relativity,

More information

Multi-robot Formation Control Using Reinforcement Learning Method

Multi-robot Formation Control Using Reinforcement Learning Method Multi-robot Formation Control Using Reinforcement Learning Metho Guoyu Zuo, Jiatong Han, an Guansheng Han School of Electronic Information & Control Engineering, Beijing University of Technology, Beijing

More information

Linear and quadratic approximation

Linear and quadratic approximation Linear an quaratic approximation November 11, 2013 Definition: Suppose f is a function that is ifferentiable on an interval I containing the point a. The linear approximation to f at a is the linear function

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

State observers and recursive filters in classical feedback control theory

State observers and recursive filters in classical feedback control theory State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Evaporating droplets tracking by holographic high speed video in turbulent flow

Evaporating droplets tracking by holographic high speed video in turbulent flow Evaporating roplets tracking by holographic high spee vieo in turbulent flow Loïc Méès 1*, Thibaut Tronchin 1, Nathalie Grosjean 1, Jean-Louis Marié 1 an Corinne Fournier 1: Laboratoire e Mécanique es

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Revisiting Uncertainty in Graph Cut Solutions

Revisiting Uncertainty in Graph Cut Solutions Revisiting Uncertainty in Graph Cut Solutions Daniel Tarlow Dept. of Computer Science University of Toronto tarlow@cs.toronto.eu Ryan P. Aams School of Engineering an Applie Sciences Harvar University

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

EE 370L Controls Laboratory. Laboratory Exercise #7 Root Locus. Department of Electrical and Computer Engineering University of Nevada, at Las Vegas

EE 370L Controls Laboratory. Laboratory Exercise #7 Root Locus. Department of Electrical and Computer Engineering University of Nevada, at Las Vegas EE 370L Controls Laboratory Laboratory Exercise #7 Root Locus Department of Electrical an Computer Engineering University of Nevaa, at Las Vegas 1. Learning Objectives To emonstrate the concept of error

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS

DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS SHANKAR BHAMIDI 1, JESSE GOODMAN 2, REMCO VAN DER HOFSTAD 3, AND JÚLIA KOMJÁTHY3 Abstract. In this article, we explicitly

More information

Vectors in two dimensions

Vectors in two dimensions Vectors in two imensions Until now, we have been working in one imension only The main reason for this is to become familiar with the main physical ieas like Newton s secon law, without the aitional complication

More information

Optimal Signal Detection for False Track Discrimination

Optimal Signal Detection for False Track Discrimination Optimal Signal Detection for False Track Discrimination Thomas Hanselmann Darko Mušicki Dept. of Electrical an Electronic Eng. Dept. of Electrical an Electronic Eng. The University of Melbourne The University

More information

Role of parameters in the stochastic dynamics of a stick-slip oscillator

Role of parameters in the stochastic dynamics of a stick-slip oscillator Proceeing Series of the Brazilian Society of Applie an Computational Mathematics, v. 6, n. 1, 218. Trabalho apresentao no XXXVII CNMAC, S.J. os Campos - SP, 217. Proceeing Series of the Brazilian Society

More information

The Press-Schechter mass function

The Press-Schechter mass function The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for

More information

Neural Network Training By Gradient Descent Algorithms: Application on the Solar Cell

Neural Network Training By Gradient Descent Algorithms: Application on the Solar Cell ISSN: 319-8753 Neural Networ Training By Graient Descent Algorithms: Application on the Solar Cell Fayrouz Dhichi*, Benyounes Ouarfi Department of Electrical Engineering, EEA&TI laboratory, Faculty of

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Research Article When Inflation Causes No Increase in Claim Amounts

Research Article When Inflation Causes No Increase in Claim Amounts Probability an Statistics Volume 2009, Article ID 943926, 10 pages oi:10.1155/2009/943926 Research Article When Inflation Causes No Increase in Claim Amounts Vytaras Brazauskas, 1 Bruce L. Jones, 2 an

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

arxiv: v3 [cs.lg] 3 Dec 2017

arxiv: v3 [cs.lg] 3 Dec 2017 Context-Aware Generative Aversarial Privacy Chong Huang, Peter Kairouz, Xiao Chen, Lalitha Sankar, an Ram Rajagopal arxiv:1710.09549v3 [cs.lg] 3 Dec 2017 Abstract Preserving the utility of publishe atasets

More information

VI. Linking and Equating: Getting from A to B Unleashing the full power of Rasch models means identifying, perhaps conceiving an important aspect,

VI. Linking and Equating: Getting from A to B Unleashing the full power of Rasch models means identifying, perhaps conceiving an important aspect, VI. Linking an Equating: Getting from A to B Unleashing the full power of Rasch moels means ientifying, perhaps conceiving an important aspect, efining a useful construct, an calibrating a pool of relevant

More information

Robust Bounds for Classification via Selective Sampling

Robust Bounds for Classification via Selective Sampling Nicolò Cesa-Bianchi DSI, Università egli Stui i Milano, Italy Clauio Gentile DICOM, Università ell Insubria, Varese, Italy Francesco Orabona Iiap, Martigny, Switzerlan cesa-bianchi@siunimiit clauiogentile@uninsubriait

More information

Sensors & Transducers 2015 by IFSA Publishing, S. L.

Sensors & Transducers 2015 by IFSA Publishing, S. L. Sensors & Transucers, Vol. 184, Issue 1, January 15, pp. 53-59 Sensors & Transucers 15 by IFSA Publishing, S. L. http://www.sensorsportal.com Non-invasive an Locally Resolve Measurement of Soun Velocity

More information

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation

More information

CONTROL CHARTS FOR VARIABLES

CONTROL CHARTS FOR VARIABLES UNIT CONTOL CHATS FO VAIABLES Structure.1 Introuction Objectives. Control Chart Technique.3 Control Charts for Variables.4 Control Chart for Mean(-Chart).5 ange Chart (-Chart).6 Stanar Deviation Chart

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information