Inter-domain Gaussian Processes for Sparse Inference using Inducing Features
|
|
- Aubrie Grant
- 6 years ago
- Views:
Transcription
1 Inter-omain Gaussian Processes for Sparse Inference using Inucing Features Miguel Lázaro-Greilla an Aníbal R. Figueiras-Vial Dep. Signal Processing & Communications Universia Carlos III e Mari, SPAIN {miguel,arfv}@tsc.uc3m.es Abstract We present a general inference framework for inter-omain Gaussian Processes (GPs) an focus on its usefulness to buil sparse GP moels. The state-of-the-art sparse GP moel introuce by Snelson an Ghahramani in relies on fining a small, representative pseuo ata set of m elements (from the same omain as the n available ata elements) which is able to explain existing ata well, an then uses it to perform inference. This reuces inference an moel selection computation time from O(n 3 ) to O(m 2 n), where m n. Inter-omain GPs can be use to fin a (possibly more compact) representative set of features lying in a ifferent omain, at the same computational cost. Being able to specify a ifferent omain for the representative features allows to incorporate prior knowlege about relevant characteristics of ata an etaches the functional form of the covariance an basis functions. We will show how previously existing moels fit into this framework an will use it to evelop two new sparse GP moels. Tests on large, representative regression ata sets suggest that significant improvement can be achieve, while retaining computational efficiency. Introuction an previous work Along the past ecae there has been a growing interest in the application of Gaussian Processes (GPs) to machine learning tasks. GPs are probabilistic non-parametric Bayesian moels that combine a number of attractive characteristics: They achieve state-of-the-art performance on supervise learning tasks, provie probabilistic preictions, have a simple an well-foune moel selection scheme, present no overfitting (since parameters are integrate out), etc. Unfortunately, the irect application of GPs to regression problems (with which we will be concerne here) is limite ue to their training time being O(n 3 ). To overcome this limitation, several sparse approximations have been propose 2, 3, 4, 5, 6. In most of them, sparsity is achieve by projecting all available ata onto a smaller subset of size m n (the active set), which is selecte accoring to some specific criterion. This reuces computation time to O(m 2 n). However, active set selection interferes with hyperparameter learning, ue to its non-smooth nature (see, 3). These proposals have been supersee by the Sparse Pseuo-inputs GP () moel, introuce in. In this moel, the constraint that the samples of the active set (which are calle pseuoinputs) must be selecte among training ata is relaxe, allowing them to lie anywhere in the input space. This allows both pseuo-inputs an hyperparameters to be selecte in a joint continuous optimisation an increases flexibility, resulting in much superior performance. In this work we introuce Inter-Domain GPs (IDGPs) as a general tool to perform inference across omains. This allows to remove the constraint that the pseuo-inputs must remain within the same omain as input ata. This ae flexibility results in an increase performance an allows to encoe prior knowlege about other omains where ata can be represente more compactly.
2 2 Review of GPs for regression We will briefly state here the main efinitions an results for regression with GPs. See 7 for a comprehensive review. Assume we are given a training set with n samples D {x j, y j } n j=, where each D-imensional input x j is associate to a scalar output y j. The regression task goal is, given a new input x, preict the corresponing output y base on D. The GP regression moel assumes that the outputs can be expresse as some noiseless latent function plus inepenent noise, y = f(x)+ε, an then sets a zero-mean GP prior on f(x), with covariance k(x, x ), an a zero-mean Gaussian prior on ε, with variance σ 2 (the noise power hyperparameter). The covariance function encoes prior knowlege about the smoothness of f(x). The most common choice for it is the Automatic Relevance Determination Square Exponential (ARD SE): k(x, x ) = σ 2 0 exp 2 (x x )2 l 2, () with hyperparameters σ0 2 (the latent function power) an {l } D (the length-scales, efining how rapily the covariance ecays along each imension). It is referre to as ARD SE because, when couple with a moel selection metho, non-informative input imensions can be remove automatically by growing the corresponing length-scale. The set of hyperparameters that efine the GP are θ = {σ 2, σ0 2, {l } D }. We will omit the epenence on θ for the sake of clarity. If we evaluate the latent function at X = {x j } n j=, we obtain a set of latent variables following a joint Gaussian istribution p(f X) = N (f 0, K ff ), where K ff ij = k(x i, x j ). Using this moel it is possible to express the joint istribution of training an test cases an then conition on the observe outputs to obtain the preictive istribution for any test case p GP (y x, D) = N (y k f (K ff + σ 2 I n ) y, σ 2 + k k f (K ff + σ 2 I n ) k f ), (2) where y = y,..., y n, k f = k(x, x ),..., k(x n, x ), an k = k(x, x ). I n is use to enote the ientity matrix of size n. The O(n 3 ) cost of these equations arises from the inversion of the n n covariance matrix. Preictive istributions for aitional test cases take O(n 2 ) time each. These costs make stanar GPs impractical for large ata sets. To select hyperparameters θ, Type-II Maximum Likelihoo (ML-II) is commonly use. This amounts to selecting the hyperparameters that correspon to a (possibly local) maximum of the log-marginal likelihoo, also calle log-evience. 3 Inter-omain GPs In this section we will introuce Inter-Domain GPs (IDGPs) an show how they can be use as a framework for computationally efficient inference. Then we will use this framework to express two previous relevant moels an evelop two new ones. 3. Definition Consier a real-value GP f(x) with x R D an some eterministic real function g(x, z), with z R H. We efine the following transformation: u(z) = f(x)g(x, z)x. (3) R D There are many examples of transformations that take on this form, the Fourier transform being one of the best known. We will iscuss possible choices for g(x, z) in Section 3.3; for the moment we will eal with the general form. Since u(z) is obtaine by a linear transformation of GP f(x), We follow the common approach of subtracting the sample mean from the outputs an then assume a zero-mean moel. 2
3 it is also a GP. This new GP may lie in a ifferent omain of possibly ifferent imension. This transformation is not invertible in general, its properties being efine by g(x, z). IDGPs arise when we jointly consier f(x) an u(z) as a single, extene GP. The mean an covariance function of this extene GP are overloae to accept arguments from both the input an transforme omains an treat them accoringly. We refer to each version of an overloae function as an instance, which will accept a ifferent type of arguments. If the istribution of the original GP is f(x) GP(m(x), k(x, x )), then it is possible to compute the remaining instances that efine the istribution of the extene GP over both omains. The transforme-omain instance of the mean is m(z) = Eu(z) = Ef(x)g(x, z)x = m(x)g(x, z)x. R D R D The inter-omain an transforme-omain instances of the covariance function are: k(x, z ) = Ef(x)u(z ) = E f(x) f(x )g(x, z )x = k(x, x )g(x, z )x (4) R D R D k(z, z ) = Eu(z)u(z ) = E f(x)g(x, z)x f(x )g(x, z )x R D R D = k(x, x )g(x, z)g(x, z )xx. (5) R D R D Mean m( ) an covariance function k(, ) are therefore efine both by the values an omains of their arguments. This can be seen as if each argument ha an aitional omain inicator use to select the instance. Apart from that, they efine a regular GP, an all stanar properties hol. In particular k(a, b) = k(b, a). This approach is relate to 8, but here the latent space is efine as a transformation of the input space, an not the other way aroun. This allows to pre-specify the esire input-omain covariance. The transformation is also more general: Any g(x, z) can be use. We can sample an IDGP at n input-omain points f = f, f 2,..., f n (with f j = f(x j )) an m transforme-omain points u = u, u 2,..., u m (with u i = u(z i )). With the usual assumption of f(x) being a zero mean GP an efining Z = {z i } m i=, the joint istribution of these samples is: ( ( ) f f Kff K fu p X, Z) = N 0,, (6) u u K fu K uu with K ff pq = k(x p, x q ), K fu pq = k(x p, z q ), K uu pq = k(z p, z q ), which allows to perform inference across omains. We will only be concerne with one input omain an one transforme omain, but IDGPs can be efine for any number of omains. 3.2 Sparse regression using inucing features In the stanar regression setting, we are aske to perform inference about the latent function f(x) from a ata set D lying in the input omain. Using IDGPs, we can use ata from any omain to perform inference in the input omain. Some latent functions might be better efine by a set of ata lying in some transforme space rather than in the input space. This iea is use for sparse inference. Following we introuce a pseuo ata set, but here we place it in the transforme omain: D = {Z, u}. The following erivation is analogous to that of. We will refer to Z as the inucing features an u as the inucing variables. The key approximation leaing to sparsity is to set m n an assume that f(x) is well-escribe by the pseuo ata set D, so that any two samples (either from the training or test set) f p an f q with p q will be inepenent given x p, x q an D. With this simplifying assumption 2, the prior over f can be factorise as a prouct of marginals: n p(f X, Z, u) p(f j x j, Z, u). (7) j= 2 Alternatively, (7) can be obtaine by proposing a generic factorise form for the approximate conitional p(f X, Z, u) q(f X, Z, u) = n qj(fj xj, Z, u) an then choosing the set of functions {q j( )} n j= so as to minimise the Kullback-Leibler (KL) ivergence from the exact joint prior j= KL(p(f X, Z, u)p(u Z) q(f X, Z, u)p(u Z)), as note in 9, Section
4 Marginals are in turn obtaine from (6): p(f j x j, Z, u) = N (f j k j K uuu, λ j ), where k j is the j-th row of K fu an λ j is the j-th element of the iagonal of matrix Λ f = iag(k ff K fu K uuk uf ). Operator iag( ) sets all off-iagonal elements to zero, so that Λ f is a iagonal matrix. Since p(u Z) is reaily available an also Gaussian, the inucing variables can be integrate out from (7), yieling a new, approximate prior over f(x): n p(f X, Z) = p(f, u X, Z)u p(f j x j, Z, u)p(u Z)u = N (f 0, K fu K uuk uf + Λ f ) Using this approximate prior, the posterior istribution for a test case is: j= p IDGP (y x, D, Z) = N (y k u Q K fuλ y y, σ 2 + k + k u (Q K uu)k u ), (8) where we have efine Q = K uu + K fuλ y K fu an Λ y = Λ f + σ 2 I n. The istribution (2) is approximate by (8) with the information available in the pseuo ata set. After O(m 2 n) time precomputations, preictive means an variances can be compute in O(m) an O(m 2 ) time per test case, respectively. This moel is, in general, non-stationary, even when it is approximating a stationary input-omain covariance an can be interprete as a egenerate GP plus heterosceastic white noise. The log-marginal likelihoo (or log-evience) of the moel, explicitly incluing the conitioning on kernel hyperparameters θ can be expresse as log p(y X, Z, θ) = 2 y Λ y which is also computable in O(m 2 n) time. y y Λ K fu Q K fuλ y+log( Q Λ y / K uu )+n log(2π) y Moel selection will be performe by jointly optimising the evience with respect to the hyperparameters an the inucing features. If analytical erivatives of the covariance function are available, conjugate graient optimisation can be use with O(m 2 n) cost per step. 3.3 On the choice of g(x, z) The feature extraction function g(x, z) efines the transforme omain in which the pseuo ata set lies. Accoring to (3), the inucing variables can be seen as projections of the target function f(x) on the feature extraction function over the whole input space. Therefore, each of them summarises information about the behaviour of f(x) everywhere. The inucing features Z efine the concrete set of functions over which the target function will be projecte. It is esirable that this set captures the most significant characteristics of the function. This can be achieve either using prior knowlege about ata to select {g(x, z i )} m i= or using a very general family of functions an letting moel selection automatically choose the appropriate set. Another way to choose g(x, z) relies on the form of the posterior. The posterior mean of a GP is often thought of as a linear combination of basis functions. For full GPs an other approximations such as, 2, 3, 4, 5, 6, basis functions must have the form of the input-omain covariance function. When using IDGPs, basis functions have the form of the inter-omain instance of the covariance function, an can therefore be ajuste by choosing g(x, z), inepenently of the input-omain covariance function. If two feature extraction functions g(, ) an h(, ) can be relate by g(x, z) = h(x, z)r(z) for any function r( ), then both yiel the same sparse GP moel. This property can be use to simplify the expressions of the instances of the covariance function. In this work we use the same functional form for every feature, i.e. our function set is {g(x, z i )} m i=, but it is also possible to use sets with ifferent functional forms for each inucing feature, i.e. {g i (x, z i )} m i= where each z i may even have a ifferent size (imension). In the sections below we will iscuss ifferent possible choices for g(x, z) Relation with Sparse GPs using pseuo-inputs The sparse GP using pseuo-inputs () was introuce in an was later rename to Fully Inepenent Training Conitional (FITC) moel to fit in the systematic framework of 0. Since y 4
5 the sparse moel introuce in Section 3.2 also uses a fully inepenent training conitional, we will stick to the first name to avoi possible confusion. IDGP innovation with respect to consists in letting the pseuo ata set lie in a ifferent omain. If we set g (x, z) δ(x z) where δ( ) is a Dirac elta, we force the pseuo ata set to lie in the input omain. Thus there is no longer a transforme space an the original moel is retrieve. In this setting, the inucing features of IDGP play the role of s pseuo-inputs Relation with Sparse Multiscale GPs Sparse Multiscale GPs (SMGPs) are presente in. Seeking to generalise the moel with ARD SE covariance function, they propose to use a ifferent set of length-scales for each basis function. The resulting moel presents a efective variance that is heale by aing heterosceastic white noise. SMGPs, incluing the variance improvement, can be erive in a principle way as IDGPs: g SMGP (x, z) D k SMGP (x, z ) = exp k SMGP (z, z ) = exp 2π(c 2 l 2 ) exp (x µ )2 2c 2 D l 2 c 2 (µ µ )2 D 2(c 2 + c 2 l2 ) (x µ ) 2 2(c 2 l2 ) l 2 c 2 + c 2 l2 µ with z = c (9) (0). () With this approximation, each basis function has its own centre µ = µ, µ 2,..., µ an its own length-scales c = c, c 2,..., c, whereas global length-scales {l } D are share by all inucing features. Equations (0) an () are erive from (4) an (5) using () an (9). The integrals efining k SMGP (, ) converge if an only if c 2 l2,, which suggests that other values, even if permitte in, shoul be avoie for the moel to remain well efine Frequency Inucing Features GP If the target function can be escribe more compactly in the frequency omain than in the input omain, it can be avantageous to let the pseuo ata set lie in the former omain. We will pursue that possibility for the case where the input omain covariance is the ARD SE. We will call the resulting sparse moel Frequency Inucing Features GP (). Directly applying the Fourier transform is not possible because the target function is not square integrable (it has constant power σ0 2 everywhere, so (5) oes not converge). We will workaroun this by winowing the target function in the region of interest. It is possible to use a square winow, but this results in the covariance being efine in terms of the complex error function, which is very slow to evaluate. Instea, we will use a Gaussian winow 3. Since multiplying by a Gaussian in the input omain is equivalent to convolving with a Gaussian in the frequency omain, we will be working with a blurre version of the frequency space. This moel is efine by: g FIF (x, z) D 2πc 2 exp x 2 2c 2 ( cos ω 0 + ) x ω with z = ω (2) ( ) k FIF (x, z x 2 ) = exp + c2 ω 2 2(c 2 + l2 ) cos ω 0 c 2 + ω x D l 2 c 2 + l2 c 2 + (3) l2 ( k FIF (z, z c 2 ) = exp (ω2 + ω 2 ) c 4 2(2c 2 + l2 ) exp (ω ω )2 2(2c 2 + l2 ) cos(ω 0 ω 0) ) c 4 + exp (ω + ω )2 D 2(2c 2 + l2 ) cos(ω 0 + ω 0) l 2 2c 2 +. (4) l2 3 A mixture of m Gaussians coul also be use as winow without increasing the complexity orer. 5
6 The inucing features are ω = ω 0, ω,..., ω, where ω 0 is the phase an the remaining components are frequencies along each imension. In this moel, both global length-scales {l } D an winow length-scales {c } D are share, thus c = c. Instances (3) an (4) are inuce by (2) using (4) an (5) Time-Frequency Inucing Features GP Instea of using a single winow to select the region of interest, it is possible to use a ifferent winow for each feature. We will use winows of the same size but ifferent centres. The resulting moel combines an, so we will call it Time-Frequency Inucing Features GP (). It is efine by g TFIF (x, z) g FIF (x µ, ω), with z = µ ω. The implie inter-omain an transforme-omain instances of the covariance function are: k TFIF (x, z ) = k FIF (x µ, ω ), k TFIF (z, z ) = k FIF (z, z ) exp (µ µ )2 2(2c 2 + l2 ) is trivially obtaine by setting every centre to zero {µ i = 0} m i=, whereas is obtaine by setting winow length-scales c, frequencies an phases {ω i } m i= to zero. If the winow lengthscales were iniviually ajuste, SMGP woul be obtaine. While has the moelling power of both an, it might perform worse in practice ue to it having roughly twice as many hyperparameters, thus making the optimisation problem harer. The same problem also exists in SMGP. A possible workaroun is to initialise the hyperparameters using a simpler moel, as one in for SMGP, though we will not o this here. 4 Experiments In this section we will compare the propose approximations an with the current state of the art, on some large ata sets, for the same number of inucing features/inputs an therefore, roughly equal computational cost. Aitionally, we provie results using a full GP, which is expecte to provie top performance (though requiring an impractically big amount of computation). In all cases, the (input-omain) covariance function is the ARD SE (). We use four large ata sets: Kin-40k, Pumayn-32nm 4 (escribing the ynamics of a robot arm, use with in ), Elevators an Pole Telecomm 5 (relate to the control of the elevators of an F6 aircraft an a telecommunications problem, an use in 2, 3, 4). Input imensions that remaine constant throughout the training set were remove. Input ata was aitionally centre for use with (the remaining methos are translation invariant). Pole Telecomm outputs actually take iscrete values in the 0-00 range, in multiples of 0. This was taken into account by using the corresponing quantization noise variance (0 2 /2) as lower boun for the noise hyperparameter 6. Hyperparameters are initialise as follows: σ0 2 = n n j= y2 j, σ2 = σ0/4, 2 {l } D to one half of the range spanne by training ata along each imension. For, pseuo-inputs are initialise to a ranom subset of the training ata, for winow size c is initialise to the stanar eviation of input ata, frequencies are ranomly chosen from a zero-mean l 2 -variance Gaussian istribution, an phases are obtaine from a uniform istribution in π). uses the same initialisation as, with winow centres set to zero. Final values are selecte by evience maximisation. Denoting the output average over the training set as y an the preictive mean an variance for test sample y l as µ l an σ l respectively, we efine the following quality measures: Normalize Mean Square Error (NMSE) (y l µ l ) 2 / (y l y) 2 an Mean Negative Log-Probability (MNLP) 2 (y l µ l ) 2 /σ l 2 + log σ2 l + log 2π, where averages over the test set. 4 Kin-40k: 8 input imensions, 0000/30000 samples for train/test, Pumayn-32nm: 32 input imensions, 768/024 samples for train/test, using exactly the same preprocessing an train/test splits as, 3. Note that their error measure is actually one half of the Normalize Mean Square Error efine here. 5 Pole Telecomm: 26 non-constant input imensions, 0000/5000 samples for train/test. Elevators: 7 non-constant input imensions, 8752/7847 samples for train/test. Both have been ownloae from ltorgo/regression/atasets.html 6 If unconstraine, similar plots are obtaine; in particular, no overfitting is observe. 6
7 For Kin-40k (Fig., top), all three sparse methos perform similarly, though for high sparseness (the most useful case) an are slightly superior. In Pumayn-32nm (Fig., bottom), only 4 out the 32 input imensions are relevant to the regression task, so it can be use as an ARD capabilities test. We follow an use a full GP on a small subset of the training ata (024 ata points) to obtain the initial length-scales. This allows better minima to be foun uring optimisation. Though all methos are able to properly fin a goo solution, an especially are better in the sparser regime. Roughly the same consierations can be mae about Pole Telecomm an Elevators (Fig. 2), but in these ata sets the superiority of an is more ramatic. Though not shown here, we have aitionally teste these moels on smaller, overfitting-prone ata sets, an have foun no noticeable overfitting even using m > n, espite the relatively high number of parameters being ajuste. This is in line with the results an iscussion of. Normalize Mean Square Error Full GP on 0000 ata points (a) Kin-40k NMSE (log-log plot) Mean Negative Log Probability Full GP on 0000 ata points (b) Kin-40k MNLP (semilog plot) Normalize Mean Square Error Full GP on 768 ata points (c) Pumayn-32nm NMSE (log-log plot) Mean Negative Log Probability Full GP on 768 ata points () Pumayn-32nm MNLP (semilog plot) Figure : Performance of the compare methos on Kin-40k an Pumayn-32nm. 5 Conclusions an extensions In this work we have introuce IDGPs, which are able combine representations of a GP in ifferent omains, an have use them to exten to hanle inucing features lying in a ifferent omain. This provies a general framework for sparse moels, which are efine by a feature extraction function. Using this framework, SMGPs can be reinterprete as fully principle moels using a transforme space of local features, without any nee for post-hoc variance improvements. Furthermore, it is possible to evelop new sparse moels of practical use, such as the propose an, which are able to outperform the state-of-the-art on some large ata sets, especially for high sparsity regimes. 7
8 Normalize Mean Square Error Full GP on 8752 ata points (a) Elevators NMSE (log-log plot) Mean Negative Log Probability Full GP on 8752 ata points (b) Elevators MNLP (semilog plot) Normalize Mean Square Error Full GP on 0000 ata points (c) Pole Telecomm NMSE (log-log plot) Mean Negative Log Probability Full GP on 0000 ata points () Pole Telecomm MNLP (semilog plot) Figure 2: Performance of the compare methos on Elevators an Pole Telecomm. Choosing a transforme space for the inucing features enables to use omains where the target function can be expresse more compactly, or where the evience (which is a function of the features) is easier to optimise. This ae flexibility translates as a etaching of the functional form of the input-omain covariance an the set of basis functions use to express the posterior mean. IDGPs approximate full GPs optimally in the KL sense note in Section 3.2, for a given set of inucing features. Using ML-II to select the inucing features means that moels proviing a goo fit to ata are given preference over moels that might approximate the full GP more closely. This, though rarely, might lea to harmful overfitting. To more faithfully approximate the full GP an avoi overfitting altogether, our proposal can be combine with the variational approach from 5, in which the inucing features woul be regare as variational parameters. This woul result in more constraine moels, which woul be closer to the full GP but might show reuce performance. We have explore the case of regression with Gaussian noise, which is analytically tractable, but it is straightforwar to apply the same moel to other tasks such as robust regression or classification, using approximate inference (see 6). Also, IDGPs as a general tool can be use for other purposes, such as moelling noise in the frequency omain, aggregating ata from ifferent omains or even imposing constraints on the target function. Acknowlegments We woul like to thank the anonymous referees for helpful comments an suggestions. This work has been partly supporte by the Spanish government uner grant TEC /TEC, an by the Mari Community uner grant S-505/TIC/
9 References E. Snelson an Z. Ghahramani. Sparse Gaussian processes using pseuo-inputs. In Avances in Neural Information Processing Systems 8, pages MIT Press, A. J. Smola an P. Bartlett. Sparse greey Gaussian process regression. In Avances in Neural Information Processing Systems 3, pages MIT Press, M. Seeger, C. K. I. Williams, an N. D. Lawrence. Fast forwar selection to spee up sparse Gaussian process regression. In Proceeings of the 9th International Workshop on AI Stats, V. Tresp. A Bayesian committee machine. Neural Computation, 2: , L. Csató an M. Opper. Sparse online Gaussian processes. Neural Computation, 4(3):64 669, C. K. I. Williams an M. Seeger. Using the Nyström metho to spee up kernel machines. In Avances in Neural Information Processing Systems 3, pages MIT Press, C. E. Rasmussen an C. K. I. Williams. Gaussian Processes for Machine Learning. Aaptive Computation an Machine Learning. MIT Press, M. Alvarez an N. D. Lawrence. Sparse convolve Gaussian processes for multi-output regression. In Avances in Neural Information Processing Systems 2, pages 57 64, E. Snelson. Flexible an efficient Gaussian process moels for machine learning. PhD thesis, University of Cambrige, J. Quiñonero-Canela an C. E. Rasmussen. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6: , C. Waler, K. I. Kim, an B. Schölkopf. Sparse multiscale Gaussian process regression. In 25th International Conference on Machine Learning. ACM Press, New York, G. Potgietera an A. P. Engelbrecht. Evolving moel trees for mining ata sets with continuous-value classes. Expert Systems with Applications, 35:53 532, L. Torgo an J. Pinto a Costa. Clustere partial linear regression. In Proceeings of the th European Conference on Machine Learning, pages Springer, G. Potgietera an A. P. Engelbrecht. Pairwise classification as an ensemble technique. In Proceeings of the 3th European Conference on Machine Learning, pages Springer-Verlag, M. K. Titsias. Variational learning of inucing variables in sparse Gaussian processes. In Proceeings of the 2th International Workshop on AI Stats, A. Naish-Guzman an S. Holen. The generalize FITC approximation. In Avances in Neural Information Processing Systems 20, pages MIT Press,
Gaussian processes with monotonicity information
Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationA Review of Multiple Try MCMC algorithms for Signal Processing
A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications
More informationChapter 6: Energy-Momentum Tensors
49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationSparse Spectrum Gaussian Process Regression
Journal of Machine Learning Research 11 (2010) 1865-1881 Submitted 2/08; Revised 2/10; Published 6/10 Sparse Spectrum Gaussian Process Regression Miguel Lázaro-Gredilla Departamento de Teoría de la Señal
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationVariational Model Selection for Sparse Gaussian Process Regression
Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of Computer Science University of Manchester 7 September 2008 Outline Gaussian process regression and sparse
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationIntroduction to Markov Processes
Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationTHE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE
Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More informationWEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada
WEIGHTIG A RESAMPLED PARTICLE I SEQUETIAL MOTE CARLO L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical Sciences
More informationLeaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes
Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,
More informationConservation Laws. Chapter Conservation of Energy
20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action
More informationVariational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression
Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression Michalis K. Titsias Department of Informatics Athens University of Economics and Business mtitsias@aueb.gr Miguel Lázaro-Gredilla
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationSchrödinger s equation.
Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More informationInverse Theory Course: LTU Kiruna. Day 1
Inverse Theory Course: LTU Kiruna. Day Hugh Pumphrey March 6, 0 Preamble These are the notes for the course Inverse Theory to be taught at LuleåTekniska Universitet, Kiruna in February 00. They are not
More informationThe Principle of Least Action
Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of
More informationSTATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING
STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski
More informationNon-Linear Bayesian CBRN Source Term Estimation
Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction
More information'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21
Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting
More informationCascaded redundancy reduction
Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationQuantum mechanical approaches to the virial
Quantum mechanical approaches to the virial S.LeBohec Department of Physics an Astronomy, University of Utah, Salt Lae City, UT 84112, USA Date: June 30 th 2015 In this note, we approach the virial from
More informationSparse Spectral Sampling Gaussian Processes
Sparse Spectral Sampling Gaussian Processes Miguel Lázaro-Gredilla Department of Signal Processing & Communications Universidad Carlos III de Madrid, Spain miguel@tsc.uc3m.es Joaquin Quiñonero-Candela
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationState observers and recursive filters in classical feedback control theory
State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationBalancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling
Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January
More informationEVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES
MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION
More informationLinear Regression with Limited Observation
Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear
More informationKNN Particle Filters for Dynamic Hybrid Bayesian Networks
KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030
More informationSwitched Latent Force Models for Movement Segmentation
Switche Latent Force Moels for Movement Segmentation Mauricio A. Álvarez, Jan Peters, Bernhar Schölkopf, Neil D. Lawrence 3,4 School of Computer Science, University of Manchester, Manchester, UK M3 9PL
More informationEstimating Causal Direction and Confounding Of Two Discrete Variables
Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]
More informationθ x = f ( x,t) could be written as
9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)
More informationNecessary and Sufficient Conditions for Sketched Subspace Clustering
Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationImproving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers
International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize
More informationBayesian Estimation of the Entropy of the Multivariate Gaussian
Bayesian Estimation of the Entropy of the Multivariate Gaussian Santosh Srivastava Fre Hutchinson Cancer Research Center Seattle, WA 989, USA Email: ssrivast@fhcrc.org Maya R. Gupta Department of Electrical
More informationEstimation of the Maximum Domination Value in Multi-Dimensional Data Sets
Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.
More informationMonte Carlo Methods with Reduced Error
Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for
More informationRobust Bounds for Classification via Selective Sampling
Nicolò Cesa-Bianchi DSI, Università egli Stui i Milano, Italy Clauio Gentile DICOM, Università ell Insubria, Varese, Italy Francesco Orabona Iiap, Martigny, Switzerlan cesa-bianchi@siunimiit clauiogentile@uninsubriait
More informationEntanglement is not very useful for estimating multiple phases
PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The
More informationHyperbolic Moment Equations Using Quadrature-Based Projection Methods
Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the
More informationA. Exclusive KL View of the MLE
A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function
More informationWiener Deconvolution: Theoretical Basis
Wiener Deconvolution: Theoretical Basis The Wiener Deconvolution is a technique use to obtain the phase-velocity ispersion curve an the attenuation coefficients, by a two-stations metho, from two pre-processe
More informationOptimization of Geometries by Energy Minimization
Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.
More informationHomework 2 EM, Mixture Models, PCA, Dualitys
Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The
More informationensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y
Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay
More informationLagrangian and Hamiltonian Mechanics
Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical
More informationLocal and global sparse Gaussian process approximations
Local and global sparse Gaussian process approximations Edward Snelson Gatsby Computational euroscience Unit University College London, UK snelson@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering
More informationPredictive Control of a Laboratory Time Delay Process Experiment
Print ISSN:3 6; Online ISSN: 367-5357 DOI:0478/itc-03-0005 Preictive Control of a aboratory ime Delay Process Experiment S Enev Key Wors: Moel preictive control; time elay process; experimental results
More informationDiagonalization of Matrices Dr. E. Jacobs
Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More informationSome vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10
Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the
More informationJointly continuous distributions and the multivariate Normal
Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability
More informationIntroduction to variational calculus: Lecture notes 1
October 10, 2006 Introuction to variational calculus: Lecture notes 1 Ewin Langmann Mathematical Physics, KTH Physics, AlbaNova, SE-106 91 Stockholm, Sween Abstract I give an informal summary of variational
More informationRelative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation
Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationStable and compact finite difference schemes
Center for Turbulence Research Annual Research Briefs 2006 2 Stable an compact finite ifference schemes By K. Mattsson, M. Svär AND M. Shoeybi. Motivation an objectives Compact secon erivatives have long
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationA New Minimum Description Length
A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum
More informationA simple model for the small-strain behaviour of soils
A simple moel for the small-strain behaviour of soils José Jorge Naer Department of Structural an Geotechnical ngineering, Polytechnic School, University of São Paulo 05508-900, São Paulo, Brazil, e-mail:
More informationThe Exact Form and General Integrating Factors
7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily
More informationLecture 2 Lagrangian formulation of classical mechanics Mechanics
Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationA Modification of the Jarque-Bera Test. for Normality
Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam
More informationTEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE
TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More informationModelling and simulation of dependence structures in nonlife insurance with Bernstein copulas
Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationASYMMETRIC TWO-OUTPUT QUANTUM PROCESSOR IN ANY DIMENSION
ASYMMETRIC TWO-OUTPUT QUANTUM PROCESSOR IN ANY IMENSION IULIA GHIU,, GUNNAR BJÖRK Centre for Avance Quantum Physics, University of Bucharest, P.O. Box MG-, R-0775, Bucharest Mgurele, Romania School of
More informationMath 1B, lecture 8: Integration by parts
Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores
More informationSINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
More informationTopic Modeling: Beyond Bag-of-Words
Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while
More informationThe canonical controllers and regular interconnection
Systems & Control Letters ( www.elsevier.com/locate/sysconle The canonical controllers an regular interconnection A.A. Julius a,, J.C. Willems b, M.N. Belur c, H.L. Trentelman a Department of Applie Mathematics,
More informationd dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1
Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of
More informationEquilibrium in Queues Under Unknown Service Times and Service Value
University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University
More information