Inter-domain Gaussian Processes for Sparse Inference using Inducing Features

Size: px
Start display at page:

Download "Inter-domain Gaussian Processes for Sparse Inference using Inducing Features"

Transcription

1 Inter-omain Gaussian Processes for Sparse Inference using Inucing Features Miguel Lázaro-Greilla an Aníbal R. Figueiras-Vial Dep. Signal Processing & Communications Universia Carlos III e Mari, SPAIN {miguel,arfv}@tsc.uc3m.es Abstract We present a general inference framework for inter-omain Gaussian Processes (GPs) an focus on its usefulness to buil sparse GP moels. The state-of-the-art sparse GP moel introuce by Snelson an Ghahramani in relies on fining a small, representative pseuo ata set of m elements (from the same omain as the n available ata elements) which is able to explain existing ata well, an then uses it to perform inference. This reuces inference an moel selection computation time from O(n 3 ) to O(m 2 n), where m n. Inter-omain GPs can be use to fin a (possibly more compact) representative set of features lying in a ifferent omain, at the same computational cost. Being able to specify a ifferent omain for the representative features allows to incorporate prior knowlege about relevant characteristics of ata an etaches the functional form of the covariance an basis functions. We will show how previously existing moels fit into this framework an will use it to evelop two new sparse GP moels. Tests on large, representative regression ata sets suggest that significant improvement can be achieve, while retaining computational efficiency. Introuction an previous work Along the past ecae there has been a growing interest in the application of Gaussian Processes (GPs) to machine learning tasks. GPs are probabilistic non-parametric Bayesian moels that combine a number of attractive characteristics: They achieve state-of-the-art performance on supervise learning tasks, provie probabilistic preictions, have a simple an well-foune moel selection scheme, present no overfitting (since parameters are integrate out), etc. Unfortunately, the irect application of GPs to regression problems (with which we will be concerne here) is limite ue to their training time being O(n 3 ). To overcome this limitation, several sparse approximations have been propose 2, 3, 4, 5, 6. In most of them, sparsity is achieve by projecting all available ata onto a smaller subset of size m n (the active set), which is selecte accoring to some specific criterion. This reuces computation time to O(m 2 n). However, active set selection interferes with hyperparameter learning, ue to its non-smooth nature (see, 3). These proposals have been supersee by the Sparse Pseuo-inputs GP () moel, introuce in. In this moel, the constraint that the samples of the active set (which are calle pseuoinputs) must be selecte among training ata is relaxe, allowing them to lie anywhere in the input space. This allows both pseuo-inputs an hyperparameters to be selecte in a joint continuous optimisation an increases flexibility, resulting in much superior performance. In this work we introuce Inter-Domain GPs (IDGPs) as a general tool to perform inference across omains. This allows to remove the constraint that the pseuo-inputs must remain within the same omain as input ata. This ae flexibility results in an increase performance an allows to encoe prior knowlege about other omains where ata can be represente more compactly.

2 2 Review of GPs for regression We will briefly state here the main efinitions an results for regression with GPs. See 7 for a comprehensive review. Assume we are given a training set with n samples D {x j, y j } n j=, where each D-imensional input x j is associate to a scalar output y j. The regression task goal is, given a new input x, preict the corresponing output y base on D. The GP regression moel assumes that the outputs can be expresse as some noiseless latent function plus inepenent noise, y = f(x)+ε, an then sets a zero-mean GP prior on f(x), with covariance k(x, x ), an a zero-mean Gaussian prior on ε, with variance σ 2 (the noise power hyperparameter). The covariance function encoes prior knowlege about the smoothness of f(x). The most common choice for it is the Automatic Relevance Determination Square Exponential (ARD SE): k(x, x ) = σ 2 0 exp 2 (x x )2 l 2, () with hyperparameters σ0 2 (the latent function power) an {l } D (the length-scales, efining how rapily the covariance ecays along each imension). It is referre to as ARD SE because, when couple with a moel selection metho, non-informative input imensions can be remove automatically by growing the corresponing length-scale. The set of hyperparameters that efine the GP are θ = {σ 2, σ0 2, {l } D }. We will omit the epenence on θ for the sake of clarity. If we evaluate the latent function at X = {x j } n j=, we obtain a set of latent variables following a joint Gaussian istribution p(f X) = N (f 0, K ff ), where K ff ij = k(x i, x j ). Using this moel it is possible to express the joint istribution of training an test cases an then conition on the observe outputs to obtain the preictive istribution for any test case p GP (y x, D) = N (y k f (K ff + σ 2 I n ) y, σ 2 + k k f (K ff + σ 2 I n ) k f ), (2) where y = y,..., y n, k f = k(x, x ),..., k(x n, x ), an k = k(x, x ). I n is use to enote the ientity matrix of size n. The O(n 3 ) cost of these equations arises from the inversion of the n n covariance matrix. Preictive istributions for aitional test cases take O(n 2 ) time each. These costs make stanar GPs impractical for large ata sets. To select hyperparameters θ, Type-II Maximum Likelihoo (ML-II) is commonly use. This amounts to selecting the hyperparameters that correspon to a (possibly local) maximum of the log-marginal likelihoo, also calle log-evience. 3 Inter-omain GPs In this section we will introuce Inter-Domain GPs (IDGPs) an show how they can be use as a framework for computationally efficient inference. Then we will use this framework to express two previous relevant moels an evelop two new ones. 3. Definition Consier a real-value GP f(x) with x R D an some eterministic real function g(x, z), with z R H. We efine the following transformation: u(z) = f(x)g(x, z)x. (3) R D There are many examples of transformations that take on this form, the Fourier transform being one of the best known. We will iscuss possible choices for g(x, z) in Section 3.3; for the moment we will eal with the general form. Since u(z) is obtaine by a linear transformation of GP f(x), We follow the common approach of subtracting the sample mean from the outputs an then assume a zero-mean moel. 2

3 it is also a GP. This new GP may lie in a ifferent omain of possibly ifferent imension. This transformation is not invertible in general, its properties being efine by g(x, z). IDGPs arise when we jointly consier f(x) an u(z) as a single, extene GP. The mean an covariance function of this extene GP are overloae to accept arguments from both the input an transforme omains an treat them accoringly. We refer to each version of an overloae function as an instance, which will accept a ifferent type of arguments. If the istribution of the original GP is f(x) GP(m(x), k(x, x )), then it is possible to compute the remaining instances that efine the istribution of the extene GP over both omains. The transforme-omain instance of the mean is m(z) = Eu(z) = Ef(x)g(x, z)x = m(x)g(x, z)x. R D R D The inter-omain an transforme-omain instances of the covariance function are: k(x, z ) = Ef(x)u(z ) = E f(x) f(x )g(x, z )x = k(x, x )g(x, z )x (4) R D R D k(z, z ) = Eu(z)u(z ) = E f(x)g(x, z)x f(x )g(x, z )x R D R D = k(x, x )g(x, z)g(x, z )xx. (5) R D R D Mean m( ) an covariance function k(, ) are therefore efine both by the values an omains of their arguments. This can be seen as if each argument ha an aitional omain inicator use to select the instance. Apart from that, they efine a regular GP, an all stanar properties hol. In particular k(a, b) = k(b, a). This approach is relate to 8, but here the latent space is efine as a transformation of the input space, an not the other way aroun. This allows to pre-specify the esire input-omain covariance. The transformation is also more general: Any g(x, z) can be use. We can sample an IDGP at n input-omain points f = f, f 2,..., f n (with f j = f(x j )) an m transforme-omain points u = u, u 2,..., u m (with u i = u(z i )). With the usual assumption of f(x) being a zero mean GP an efining Z = {z i } m i=, the joint istribution of these samples is: ( ( ) f f Kff K fu p X, Z) = N 0,, (6) u u K fu K uu with K ff pq = k(x p, x q ), K fu pq = k(x p, z q ), K uu pq = k(z p, z q ), which allows to perform inference across omains. We will only be concerne with one input omain an one transforme omain, but IDGPs can be efine for any number of omains. 3.2 Sparse regression using inucing features In the stanar regression setting, we are aske to perform inference about the latent function f(x) from a ata set D lying in the input omain. Using IDGPs, we can use ata from any omain to perform inference in the input omain. Some latent functions might be better efine by a set of ata lying in some transforme space rather than in the input space. This iea is use for sparse inference. Following we introuce a pseuo ata set, but here we place it in the transforme omain: D = {Z, u}. The following erivation is analogous to that of. We will refer to Z as the inucing features an u as the inucing variables. The key approximation leaing to sparsity is to set m n an assume that f(x) is well-escribe by the pseuo ata set D, so that any two samples (either from the training or test set) f p an f q with p q will be inepenent given x p, x q an D. With this simplifying assumption 2, the prior over f can be factorise as a prouct of marginals: n p(f X, Z, u) p(f j x j, Z, u). (7) j= 2 Alternatively, (7) can be obtaine by proposing a generic factorise form for the approximate conitional p(f X, Z, u) q(f X, Z, u) = n qj(fj xj, Z, u) an then choosing the set of functions {q j( )} n j= so as to minimise the Kullback-Leibler (KL) ivergence from the exact joint prior j= KL(p(f X, Z, u)p(u Z) q(f X, Z, u)p(u Z)), as note in 9, Section

4 Marginals are in turn obtaine from (6): p(f j x j, Z, u) = N (f j k j K uuu, λ j ), where k j is the j-th row of K fu an λ j is the j-th element of the iagonal of matrix Λ f = iag(k ff K fu K uuk uf ). Operator iag( ) sets all off-iagonal elements to zero, so that Λ f is a iagonal matrix. Since p(u Z) is reaily available an also Gaussian, the inucing variables can be integrate out from (7), yieling a new, approximate prior over f(x): n p(f X, Z) = p(f, u X, Z)u p(f j x j, Z, u)p(u Z)u = N (f 0, K fu K uuk uf + Λ f ) Using this approximate prior, the posterior istribution for a test case is: j= p IDGP (y x, D, Z) = N (y k u Q K fuλ y y, σ 2 + k + k u (Q K uu)k u ), (8) where we have efine Q = K uu + K fuλ y K fu an Λ y = Λ f + σ 2 I n. The istribution (2) is approximate by (8) with the information available in the pseuo ata set. After O(m 2 n) time precomputations, preictive means an variances can be compute in O(m) an O(m 2 ) time per test case, respectively. This moel is, in general, non-stationary, even when it is approximating a stationary input-omain covariance an can be interprete as a egenerate GP plus heterosceastic white noise. The log-marginal likelihoo (or log-evience) of the moel, explicitly incluing the conitioning on kernel hyperparameters θ can be expresse as log p(y X, Z, θ) = 2 y Λ y which is also computable in O(m 2 n) time. y y Λ K fu Q K fuλ y+log( Q Λ y / K uu )+n log(2π) y Moel selection will be performe by jointly optimising the evience with respect to the hyperparameters an the inucing features. If analytical erivatives of the covariance function are available, conjugate graient optimisation can be use with O(m 2 n) cost per step. 3.3 On the choice of g(x, z) The feature extraction function g(x, z) efines the transforme omain in which the pseuo ata set lies. Accoring to (3), the inucing variables can be seen as projections of the target function f(x) on the feature extraction function over the whole input space. Therefore, each of them summarises information about the behaviour of f(x) everywhere. The inucing features Z efine the concrete set of functions over which the target function will be projecte. It is esirable that this set captures the most significant characteristics of the function. This can be achieve either using prior knowlege about ata to select {g(x, z i )} m i= or using a very general family of functions an letting moel selection automatically choose the appropriate set. Another way to choose g(x, z) relies on the form of the posterior. The posterior mean of a GP is often thought of as a linear combination of basis functions. For full GPs an other approximations such as, 2, 3, 4, 5, 6, basis functions must have the form of the input-omain covariance function. When using IDGPs, basis functions have the form of the inter-omain instance of the covariance function, an can therefore be ajuste by choosing g(x, z), inepenently of the input-omain covariance function. If two feature extraction functions g(, ) an h(, ) can be relate by g(x, z) = h(x, z)r(z) for any function r( ), then both yiel the same sparse GP moel. This property can be use to simplify the expressions of the instances of the covariance function. In this work we use the same functional form for every feature, i.e. our function set is {g(x, z i )} m i=, but it is also possible to use sets with ifferent functional forms for each inucing feature, i.e. {g i (x, z i )} m i= where each z i may even have a ifferent size (imension). In the sections below we will iscuss ifferent possible choices for g(x, z) Relation with Sparse GPs using pseuo-inputs The sparse GP using pseuo-inputs () was introuce in an was later rename to Fully Inepenent Training Conitional (FITC) moel to fit in the systematic framework of 0. Since y 4

5 the sparse moel introuce in Section 3.2 also uses a fully inepenent training conitional, we will stick to the first name to avoi possible confusion. IDGP innovation with respect to consists in letting the pseuo ata set lie in a ifferent omain. If we set g (x, z) δ(x z) where δ( ) is a Dirac elta, we force the pseuo ata set to lie in the input omain. Thus there is no longer a transforme space an the original moel is retrieve. In this setting, the inucing features of IDGP play the role of s pseuo-inputs Relation with Sparse Multiscale GPs Sparse Multiscale GPs (SMGPs) are presente in. Seeking to generalise the moel with ARD SE covariance function, they propose to use a ifferent set of length-scales for each basis function. The resulting moel presents a efective variance that is heale by aing heterosceastic white noise. SMGPs, incluing the variance improvement, can be erive in a principle way as IDGPs: g SMGP (x, z) D k SMGP (x, z ) = exp k SMGP (z, z ) = exp 2π(c 2 l 2 ) exp (x µ )2 2c 2 D l 2 c 2 (µ µ )2 D 2(c 2 + c 2 l2 ) (x µ ) 2 2(c 2 l2 ) l 2 c 2 + c 2 l2 µ with z = c (9) (0). () With this approximation, each basis function has its own centre µ = µ, µ 2,..., µ an its own length-scales c = c, c 2,..., c, whereas global length-scales {l } D are share by all inucing features. Equations (0) an () are erive from (4) an (5) using () an (9). The integrals efining k SMGP (, ) converge if an only if c 2 l2,, which suggests that other values, even if permitte in, shoul be avoie for the moel to remain well efine Frequency Inucing Features GP If the target function can be escribe more compactly in the frequency omain than in the input omain, it can be avantageous to let the pseuo ata set lie in the former omain. We will pursue that possibility for the case where the input omain covariance is the ARD SE. We will call the resulting sparse moel Frequency Inucing Features GP (). Directly applying the Fourier transform is not possible because the target function is not square integrable (it has constant power σ0 2 everywhere, so (5) oes not converge). We will workaroun this by winowing the target function in the region of interest. It is possible to use a square winow, but this results in the covariance being efine in terms of the complex error function, which is very slow to evaluate. Instea, we will use a Gaussian winow 3. Since multiplying by a Gaussian in the input omain is equivalent to convolving with a Gaussian in the frequency omain, we will be working with a blurre version of the frequency space. This moel is efine by: g FIF (x, z) D 2πc 2 exp x 2 2c 2 ( cos ω 0 + ) x ω with z = ω (2) ( ) k FIF (x, z x 2 ) = exp + c2 ω 2 2(c 2 + l2 ) cos ω 0 c 2 + ω x D l 2 c 2 + l2 c 2 + (3) l2 ( k FIF (z, z c 2 ) = exp (ω2 + ω 2 ) c 4 2(2c 2 + l2 ) exp (ω ω )2 2(2c 2 + l2 ) cos(ω 0 ω 0) ) c 4 + exp (ω + ω )2 D 2(2c 2 + l2 ) cos(ω 0 + ω 0) l 2 2c 2 +. (4) l2 3 A mixture of m Gaussians coul also be use as winow without increasing the complexity orer. 5

6 The inucing features are ω = ω 0, ω,..., ω, where ω 0 is the phase an the remaining components are frequencies along each imension. In this moel, both global length-scales {l } D an winow length-scales {c } D are share, thus c = c. Instances (3) an (4) are inuce by (2) using (4) an (5) Time-Frequency Inucing Features GP Instea of using a single winow to select the region of interest, it is possible to use a ifferent winow for each feature. We will use winows of the same size but ifferent centres. The resulting moel combines an, so we will call it Time-Frequency Inucing Features GP (). It is efine by g TFIF (x, z) g FIF (x µ, ω), with z = µ ω. The implie inter-omain an transforme-omain instances of the covariance function are: k TFIF (x, z ) = k FIF (x µ, ω ), k TFIF (z, z ) = k FIF (z, z ) exp (µ µ )2 2(2c 2 + l2 ) is trivially obtaine by setting every centre to zero {µ i = 0} m i=, whereas is obtaine by setting winow length-scales c, frequencies an phases {ω i } m i= to zero. If the winow lengthscales were iniviually ajuste, SMGP woul be obtaine. While has the moelling power of both an, it might perform worse in practice ue to it having roughly twice as many hyperparameters, thus making the optimisation problem harer. The same problem also exists in SMGP. A possible workaroun is to initialise the hyperparameters using a simpler moel, as one in for SMGP, though we will not o this here. 4 Experiments In this section we will compare the propose approximations an with the current state of the art, on some large ata sets, for the same number of inucing features/inputs an therefore, roughly equal computational cost. Aitionally, we provie results using a full GP, which is expecte to provie top performance (though requiring an impractically big amount of computation). In all cases, the (input-omain) covariance function is the ARD SE (). We use four large ata sets: Kin-40k, Pumayn-32nm 4 (escribing the ynamics of a robot arm, use with in ), Elevators an Pole Telecomm 5 (relate to the control of the elevators of an F6 aircraft an a telecommunications problem, an use in 2, 3, 4). Input imensions that remaine constant throughout the training set were remove. Input ata was aitionally centre for use with (the remaining methos are translation invariant). Pole Telecomm outputs actually take iscrete values in the 0-00 range, in multiples of 0. This was taken into account by using the corresponing quantization noise variance (0 2 /2) as lower boun for the noise hyperparameter 6. Hyperparameters are initialise as follows: σ0 2 = n n j= y2 j, σ2 = σ0/4, 2 {l } D to one half of the range spanne by training ata along each imension. For, pseuo-inputs are initialise to a ranom subset of the training ata, for winow size c is initialise to the stanar eviation of input ata, frequencies are ranomly chosen from a zero-mean l 2 -variance Gaussian istribution, an phases are obtaine from a uniform istribution in π). uses the same initialisation as, with winow centres set to zero. Final values are selecte by evience maximisation. Denoting the output average over the training set as y an the preictive mean an variance for test sample y l as µ l an σ l respectively, we efine the following quality measures: Normalize Mean Square Error (NMSE) (y l µ l ) 2 / (y l y) 2 an Mean Negative Log-Probability (MNLP) 2 (y l µ l ) 2 /σ l 2 + log σ2 l + log 2π, where averages over the test set. 4 Kin-40k: 8 input imensions, 0000/30000 samples for train/test, Pumayn-32nm: 32 input imensions, 768/024 samples for train/test, using exactly the same preprocessing an train/test splits as, 3. Note that their error measure is actually one half of the Normalize Mean Square Error efine here. 5 Pole Telecomm: 26 non-constant input imensions, 0000/5000 samples for train/test. Elevators: 7 non-constant input imensions, 8752/7847 samples for train/test. Both have been ownloae from ltorgo/regression/atasets.html 6 If unconstraine, similar plots are obtaine; in particular, no overfitting is observe. 6

7 For Kin-40k (Fig., top), all three sparse methos perform similarly, though for high sparseness (the most useful case) an are slightly superior. In Pumayn-32nm (Fig., bottom), only 4 out the 32 input imensions are relevant to the regression task, so it can be use as an ARD capabilities test. We follow an use a full GP on a small subset of the training ata (024 ata points) to obtain the initial length-scales. This allows better minima to be foun uring optimisation. Though all methos are able to properly fin a goo solution, an especially are better in the sparser regime. Roughly the same consierations can be mae about Pole Telecomm an Elevators (Fig. 2), but in these ata sets the superiority of an is more ramatic. Though not shown here, we have aitionally teste these moels on smaller, overfitting-prone ata sets, an have foun no noticeable overfitting even using m > n, espite the relatively high number of parameters being ajuste. This is in line with the results an iscussion of. Normalize Mean Square Error Full GP on 0000 ata points (a) Kin-40k NMSE (log-log plot) Mean Negative Log Probability Full GP on 0000 ata points (b) Kin-40k MNLP (semilog plot) Normalize Mean Square Error Full GP on 768 ata points (c) Pumayn-32nm NMSE (log-log plot) Mean Negative Log Probability Full GP on 768 ata points () Pumayn-32nm MNLP (semilog plot) Figure : Performance of the compare methos on Kin-40k an Pumayn-32nm. 5 Conclusions an extensions In this work we have introuce IDGPs, which are able combine representations of a GP in ifferent omains, an have use them to exten to hanle inucing features lying in a ifferent omain. This provies a general framework for sparse moels, which are efine by a feature extraction function. Using this framework, SMGPs can be reinterprete as fully principle moels using a transforme space of local features, without any nee for post-hoc variance improvements. Furthermore, it is possible to evelop new sparse moels of practical use, such as the propose an, which are able to outperform the state-of-the-art on some large ata sets, especially for high sparsity regimes. 7

8 Normalize Mean Square Error Full GP on 8752 ata points (a) Elevators NMSE (log-log plot) Mean Negative Log Probability Full GP on 8752 ata points (b) Elevators MNLP (semilog plot) Normalize Mean Square Error Full GP on 0000 ata points (c) Pole Telecomm NMSE (log-log plot) Mean Negative Log Probability Full GP on 0000 ata points () Pole Telecomm MNLP (semilog plot) Figure 2: Performance of the compare methos on Elevators an Pole Telecomm. Choosing a transforme space for the inucing features enables to use omains where the target function can be expresse more compactly, or where the evience (which is a function of the features) is easier to optimise. This ae flexibility translates as a etaching of the functional form of the input-omain covariance an the set of basis functions use to express the posterior mean. IDGPs approximate full GPs optimally in the KL sense note in Section 3.2, for a given set of inucing features. Using ML-II to select the inucing features means that moels proviing a goo fit to ata are given preference over moels that might approximate the full GP more closely. This, though rarely, might lea to harmful overfitting. To more faithfully approximate the full GP an avoi overfitting altogether, our proposal can be combine with the variational approach from 5, in which the inucing features woul be regare as variational parameters. This woul result in more constraine moels, which woul be closer to the full GP but might show reuce performance. We have explore the case of regression with Gaussian noise, which is analytically tractable, but it is straightforwar to apply the same moel to other tasks such as robust regression or classification, using approximate inference (see 6). Also, IDGPs as a general tool can be use for other purposes, such as moelling noise in the frequency omain, aggregating ata from ifferent omains or even imposing constraints on the target function. Acknowlegments We woul like to thank the anonymous referees for helpful comments an suggestions. This work has been partly supporte by the Spanish government uner grant TEC /TEC, an by the Mari Community uner grant S-505/TIC/

9 References E. Snelson an Z. Ghahramani. Sparse Gaussian processes using pseuo-inputs. In Avances in Neural Information Processing Systems 8, pages MIT Press, A. J. Smola an P. Bartlett. Sparse greey Gaussian process regression. In Avances in Neural Information Processing Systems 3, pages MIT Press, M. Seeger, C. K. I. Williams, an N. D. Lawrence. Fast forwar selection to spee up sparse Gaussian process regression. In Proceeings of the 9th International Workshop on AI Stats, V. Tresp. A Bayesian committee machine. Neural Computation, 2: , L. Csató an M. Opper. Sparse online Gaussian processes. Neural Computation, 4(3):64 669, C. K. I. Williams an M. Seeger. Using the Nyström metho to spee up kernel machines. In Avances in Neural Information Processing Systems 3, pages MIT Press, C. E. Rasmussen an C. K. I. Williams. Gaussian Processes for Machine Learning. Aaptive Computation an Machine Learning. MIT Press, M. Alvarez an N. D. Lawrence. Sparse convolve Gaussian processes for multi-output regression. In Avances in Neural Information Processing Systems 2, pages 57 64, E. Snelson. Flexible an efficient Gaussian process moels for machine learning. PhD thesis, University of Cambrige, J. Quiñonero-Canela an C. E. Rasmussen. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6: , C. Waler, K. I. Kim, an B. Schölkopf. Sparse multiscale Gaussian process regression. In 25th International Conference on Machine Learning. ACM Press, New York, G. Potgietera an A. P. Engelbrecht. Evolving moel trees for mining ata sets with continuous-value classes. Expert Systems with Applications, 35:53 532, L. Torgo an J. Pinto a Costa. Clustere partial linear regression. In Proceeings of the th European Conference on Machine Learning, pages Springer, G. Potgietera an A. P. Engelbrecht. Pairwise classification as an ensemble technique. In Proceeings of the 3th European Conference on Machine Learning, pages Springer-Verlag, M. K. Titsias. Variational learning of inucing variables in sparse Gaussian processes. In Proceeings of the 2th International Workshop on AI Stats, A. Naish-Guzman an S. Holen. The generalize FITC approximation. In Avances in Neural Information Processing Systems 20, pages MIT Press,

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Sparse Spectrum Gaussian Process Regression

Sparse Spectrum Gaussian Process Regression Journal of Machine Learning Research 11 (2010) 1865-1881 Submitted 2/08; Revised 2/10; Published 6/10 Sparse Spectrum Gaussian Process Regression Miguel Lázaro-Gredilla Departamento de Teoría de la Señal

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Variational Model Selection for Sparse Gaussian Process Regression

Variational Model Selection for Sparse Gaussian Process Regression Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada WEIGHTIG A RESAMPLED PARTICLE I SEQUETIAL MOTE CARLO L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical Sciences

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression

Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression Michalis K. Titsias Department of Informatics Athens University of Economics and Business mtitsias@aueb.gr Miguel Lázaro-Gredilla

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Inverse Theory Course: LTU Kiruna. Day 1

Inverse Theory Course: LTU Kiruna. Day 1 Inverse Theory Course: LTU Kiruna. Day Hugh Pumphrey March 6, 0 Preamble These are the notes for the course Inverse Theory to be taught at LuleåTekniska Universitet, Kiruna in February 00. They are not

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Quantum mechanical approaches to the virial

Quantum mechanical approaches to the virial Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from

More information

Sparse Spectral Sampling Gaussian Processes

Sparse Spectral Sampling Gaussian Processes Sparse Spectral Sampling Gaussian Processes Miguel Lázaro-Gredilla Department of Signal Processing & Communications Universidad Carlos III de Madrid, Spain miguel@tsc.uc3m.es Joaquin Quiñonero-Candela

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

State observers and recursive filters in classical feedback control theory

State observers and recursive filters in classical feedback control theory State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Switched Latent Force Models for Movement Segmentation

Switched Latent Force Models for Movement Segmentation Switche Latent Force Moels for Movement Segmentation Mauricio A. Álvarez, Jan Peters, Bernhar Schölkopf, Neil D. Lawrence 3,4 School of Computer Science, University of Manchester, Manchester, UK M3 9PL

More information

Estimating Causal Direction and Confounding Of Two Discrete Variables

Estimating Causal Direction and Confounding Of Two Discrete Variables Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize

More information

Bayesian Estimation of the Entropy of the Multivariate Gaussian

Bayesian Estimation of the Entropy of the Multivariate Gaussian Bayesian Estimation of the Entropy of the Multivariate Gaussian Santosh Srivastava Fre Hutchinson Cancer Research Center Seattle, WA 989, USA Email: ssrivast@fhcrc.org Maya R. Gupta Department of Electrical

More information

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.

More information

Monte Carlo Methods with Reduced Error

Monte Carlo Methods with Reduced Error Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for

More information

Robust Bounds for Classification via Selective Sampling

Robust Bounds for Classification via Selective Sampling Nicolò Cesa-Bianchi DSI, Università egli Stui i Milano, Italy Clauio Gentile DICOM, Università ell Insubria, Varese, Italy Francesco Orabona Iiap, Martigny, Switzerlan cesa-bianchi@siunimiit clauiogentile@uninsubriait

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Wiener Deconvolution: Theoretical Basis

Wiener Deconvolution: Theoretical Basis Wiener Deconvolution: Theoretical Basis The Wiener Deconvolution is a technique use to obtain the phase-velocity ispersion curve an the attenuation coefficients, by a two-stations metho, from two pre-processe

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Homework 2 EM, Mixture Models, PCA, Dualitys

Homework 2 EM, Mixture Models, PCA, Dualitys Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

Local and global sparse Gaussian process approximations

Local and global sparse Gaussian process approximations Local and global sparse Gaussian process approximations Edward Snelson Gatsby Computational euroscience Unit University College London, UK snelson@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering

More information

Predictive Control of a Laboratory Time Delay Process Experiment

Predictive Control of a Laboratory Time Delay Process Experiment Print ISSN:3 6; Online ISSN: 367-5357 DOI:0478/itc-03-0005 Preictive Control of a aboratory ime Delay Process Experiment S Enev Key Wors: Moel preictive control; time elay process; experimental results

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10 Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the

More information

Jointly continuous distributions and the multivariate Normal

Jointly continuous distributions and the multivariate Normal Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability

More information

Introduction to variational calculus: Lecture notes 1

Introduction to variational calculus: Lecture notes 1 October 10, 2006 Introuction to variational calculus: Lecture notes 1 Ewin Langmann Mathematical Physics, KTH Physics, AlbaNova, SE-106 91 Stockholm, Sween Abstract I give an informal summary of variational

More information

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Stable and compact finite difference schemes

Stable and compact finite difference schemes Center for Turbulence Research Annual Research Briefs 2006 2 Stable an compact finite ifference schemes By K. Mattsson, M. Svär AND M. Shoeybi. Motivation an objectives Compact secon erivatives have long

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

A New Minimum Description Length

A New Minimum Description Length A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum

More information

A simple model for the small-strain behaviour of soils

A simple model for the small-strain behaviour of soils A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

ASYMMETRIC TWO-OUTPUT QUANTUM PROCESSOR IN ANY DIMENSION

ASYMMETRIC TWO-OUTPUT QUANTUM PROCESSOR IN ANY DIMENSION ASYMMETRIC TWO-OUTPUT QUANTUM PROCESSOR IN ANY IMENSION IULIA GHIU,, GUNNAR BJÖRK Centre for Avance Quantum Physics, University of Bucharest, P.O. Box MG-, R-0775, Bucharest Mgurele, Romania School of

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while

More information

The canonical controllers and regular interconnection

The canonical controllers and regular interconnection Systems & Control Letters ( www.elsevier.com/locate/sysconle The canonical controllers an regular interconnection A.A. Julius a,, J.C. Willems b, M.N. Belur c, H.L. Trentelman a Department of Applie Mathematics,

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information