Appendix. Bayesian Information Criterion (BIC) E Proof of Proposition 1. Compositional Kernel Search Algorithm. Base Kernels.

Size: px
Start display at page:

Download "Appendix. Bayesian Information Criterion (BIC) E Proof of Proposition 1. Compositional Kernel Search Algorithm. Base Kernels."

Transcription

1 Hyunjik Ki and Yee Whye Teh Appendix A Bayesian Inforation Criterion (BIC) The BIC is a odel selection criterion that is the arginal likelihood with a odel coplexity penalty: BIC = log p(y ˆθ) 1 2 p log(n) for observations y, nuber of observations N, axiu likelihood estiate (MLE) of odel hyperparaeters ˆθ, nuber of hyperparaeters p. It is derived as an approxiation to the log odel evidence log p(y). B Copositional Kernel Search Algorith Algorith 2: Copositional Kernel Search Algorith Input: data x 1,..., x n R D, y 1,..., y n R,base kernel set B Output: k, the resulting kernel For each base kernel on each diension, fit GP to data (i.e. optiise hyperparas by ML-II) and set k to be kernel with highest BIC. for depth=1:t (either fix T or repeat until BIC no longer increases) do Fit GP to following kernels and set k to be the one with highest BIC: (1) All kernels of for k + B where B is any base kernel on any diension (2) All kernels of for k B where B is any base kernel on any diension (3) All kernels where a base kernel in k is replaced by another base kernel C D Base Kernels LIN(x, x ) = σ 2 (x l)(x l) ( SE(x, x ) = σ 2 exp (x x ) 2 ) 2l 2 ( PER(x, x ) = σ 2 exp 2 sin2 (π(x x )/p) ) l 2 Matrix Identities Lea 1 (Woodbury s Matrix Inversion Lea). (A + UBV ) 1 = A 1 A 1 U(B 1 + V A 1 U) 1 V A 1 So setting A = σ 2 I (Nyströ) or σ 2 I + diag(k ˆK) (FIC) or σ 2 I + blockdiag(k ˆK) (PIC), U = Φ T = V, B = I, we get: (A + Φ Φ) 1 = A 1 A 1 Φ (I + ΦA 1 Φ ) 1 ΦA 1 Lea 2 (Sylvester s Deterinant Theore). det(i + AB) = det(i + BA) A R n B R n Hence: det(σ 2 I + Φ Φ) = (σ 2 ) n det(i + σ 2 Φ Φ) = (σ 2 ) n det(i + σ 2 ΦΦ ) = (σ 2 ) n det(σ 2 I + ΦΦ ) E Proof of Proposition 1 Proof. If PCG converges, the upper bound for NIP is. We showed in Section 4.1 that the convergence happened in only a few iterations. Moreover Cortes et al [7] shows that the lower bound for NIP can be rather loose in general. So it suffices to prove that the upper bound for NLD is tighter than the lower bound for NLD. Let (λ i ) N, (ˆλ i ) N be the ordered eigenvalues of K + σ 2 I, ˆK + σ 2 I respectively. Since K ˆK is positive sei-definite (e.g. [3]), we have λ i ˆλ i 2σ 2 i (using the assuption in the proposition). Now the slack in the upper bound is: 1 2 log det( ˆK + σ 2 I) ( 1 2 log det(k + σ2 I)) = 1 2 (log λ i log ˆλ i ) Hence the slack in the lower bound is: 1 2 log det(k + σ2 I) [ 1 2 log det( ˆK + σ 2 I) 1 ] Tr(K ˆK) 2σ2 = 1 (log λ i log 2 ˆλ i ) + 1 2σ 2 (λ i ˆλ i ) Now by concavity and onotonicity of log, and since ˆλ 2σ 2, we have: 1 2 log λ i log ˆλ i λ i ˆλ i 1 2σ 2 (log λ i log ˆλ i ) 1 2σ 2 (log λ i log ˆλ i ) 1 2σ 2 (λ i ˆλ i ) 1 2 (λ i ˆλ i ) (log λ i log ˆλ i )

2 Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes F Convergence of hyperparaeters fro optiising lower bound to optial hyperparaeters Note fro Figure 7 that the hyperparaeters found by optiising the lower bound converges to the hyperparaeters found by the GP when optiising the arginal likelihood, giving epirical evidence for the second clai in Section 3.3. G Parallelising SKC Note that SKC can be parallelised across the rando hyperparaeter initialisations, and also across the kernels at each depth for coputing the BIC intervals. In fact, SKC is even ore parallelisable with the kernel buffer: say at a certain depth, we have two kernels reaining to be optiised and evaluated before we can ove onto the next depth. If the buffer size is 5, say, then we can in fact ove on to the next depth and grow the kernel search tree on the top 3 kernels of the buffer, without having to wait for the 2 kernel evaluations to be coplete. This saves a lot of coputation tie wasted by idle cores waiting for all kernel evaluations to finish before oving on to the next depth of the kernel search tree. H Optiisation Since we wish to use the learned kernels for interpretation, it is iportant to have the hyperparaeters lie in a sensible region after the optiisation. In other words, we wish to regularise the hyperparaeters during optiisation. For exaple, we want the SE kernel to learn a globally sooth function with local variation. When naïvely optiising the lower bound, soeties the length scale and the signal variance becoes very sall, so the SE kernel explains all the variation in the signal and ends up connecting the dots. We wish to avoid this type of behaviour. This can be achieved by giving priors to hyperparaeters and optiising the energy (log prior added to the log arginal likelihood) instead, as well as using sensible initialisations. Looking at Figure 8, we see that using a strong prior with a sensible rando initialisation (see Appendix I for details) gives a sensible soothly varying function, whereas for all the three other cases, we have the length scale and signal variance shrinking to sall values, causing the GP to overfit to the data. Note that the weak prior is the default prior used in the GPstuff software [39]. Careful initialisation of hyperparaeters and inducing points is also very iportant, and can have strong influence the resulting optia. It is sensible to have the optiised hyperparaeters of the parent kernel in the search tree be inherited and used to initialise the hyperparaeters of the child. The new hyperparaeters of the child ust be initialised with rando restarts, where the variance is sall enough to ensure that they lie in a sensible region, but large enough to explore a good portion of this region. As for the inducing points, we want to spread the out to capture both local and global structure. Trying both K-eans and a rando subset of training data, we conclude that they give siilar results and resort to a rando subset. Moreover we also have the option of learning the inducing points. However, this will be considerably ore costly and show little iproveent over fixing the, as we show in Section 4. Hence we do not learn the inducing points, but fix the to a given randoly chosen set. Hence for SKC, we use axiu a posteriori (MAP) estiates instead of MLE for the hyperparaeters to calculate the BIC, since the priors have a noticeable effect on the optiisation. This is justified [23] and has been used for exaple in [11], where they argue that using the MLE to estiate the BIC for Gaussian ixture odels can fail due to singularities and degeneracies. I Hyperparaeter initialisation and priors Z N (, 1), T N(σ 2, I) is a Gaussian with ean and variance σ 2 truncated at the interval I then renoralised. Signal noise σ 2 =.1 exp(z/2) p(log σ 2 ) = N (,.2) LIN σ 2 = exp(v ) where V T N(1, [, ]), l = exp( Z 2 ) p(log σ 2 ) = logunif p(log l) = logunif SE l = exp(z/2), σ 2 =.1 exp(z/2) p(log l) = N (,.1), p(log σ 2 ) = logunif PER p in = log(1 ax(x) in(x) N ) (shortest possible period is 1 tie steps) p ax = log( ax(x) in(x) 5 ) (longest possible period is a fifth of the range of data set) l = exp(z/2), p = exp(p in + W ) or exp(p ax + U), σ 2 1 =.1 exp(z/2) w.p. 2 where W T N (.5, [, ]),

3 Hyunjik Ki and Yee Whye Teh Figure 7: Log arginal likelihood and hyperparaeter values after optiising the lower bound with ARD kernel on a subset of the Power plant data for different values of. This is copared against the GP values when optiising the true log arginal likelihood. Error bars show ean ± 1 standard deviation over 1 rando iterations W/ data strong prior, rand init weak prior, rand init strong prior, sall init weak prior, sall init year Figure 8: GP predictions on solar data set with SE kernel for different priors and initialisations. U T N (.5, [, ])) p(log l) = t(µ =, σ 2 = 1, ν = 4), p(log p) = LN (p in.5,.25) or LN (p ax 2,.5) 1 w.p. 2 p(log σ 2 ) = logunif where LN (µ, σ 2 ) is log Gaussian, t(µ, σ 2, ν) is the student s t-distribution. J Coputation ties Look at Table 2. For all three data sets, the GP optiisation tie is uch greater than the su of the Var GP optiisation tie and the upper bound (NLD + NIP) evaluation tie for 8. Hence the savings in coputation tie for SKC is significant even for these sall data sets. Note that we show the lower bound optiisation tie against the upper bound evaluation tie instead of the evaluation ties for both, since this is what happens in SKC - the lower bound has to be optiised for each kernel, whereas the upper bound only has to be evaluated once. K Mauna and Solar plots and hyperparaeter values found by SKC Solar The solar data has 26 cycles over 285 years, which gives a periodicity of around years. Using SKC with = 4, we find the kernel: SE PER SE SE PER. The value of the period hyperparaeter in PER is years, hence SKC finds the periodicity to 3 s.f. with only 4 inducing points. The SE ter converts the global periodicity to local periodicity, with the extent of the locality governed by the length scale paraeter in SE, equal to 45. This is fairly large, but saller than the range of the doain ( ), indicating that the periodicity spans over a long tie but isn t quite global. This is ost likely due to the static solar irradiance between the years , adding a bit of noise to the periodicities. Mauna The annual periodicity in the data and the linear trend with positive slope is clear. Linear regression gives us a slope of SKC with = 4 gives the kernel: SE + PER + LIN. The period hyperparaeter in PER takes value 1, hence SKC successfully finds the right periodicity. The offset l and agnitude σ 2 paraeters of LIN allow us to calculate the slope

4 Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes Table 2: Mean and standard deviation of coputation ties (in seconds) for full GP optiisation, Var GP optiisation(with and without learning inducing points), NLD and NIP (PCG using PIC preconditioner) upper bounds over 1 rando iterations. Solar Mauna Concrete GP ± ± ± Var GP = ± ± ±.7298 = ± ± ± = ± ± ± = ± ± ± = ± ± ± = ± ± ± NLD =1.19 ±.1.33 ± ±.4 =2.26 ±.1.46 ± ±.7 =4.43 ±.1.79 ± ±.5 =8.84 ± ± ±.12 = ± ± ±.3 = ± ± ±.74 NIP =1.474 ± ± ±.26 =2.422 ± ± ±.45 =4.284 ± ± ±.483 =8.199 ± ± ±.376 =16.26 ± ± ±.422 =32.25 ± ± ±.433 Var GP, = ± ± ± 32.5 learn IP = ± ± ± 97. = ± ± ± = ± ± ± 41. = ± ± ± 46.9 = ± ± ± 82.9 by the forula σ 2 (x l) [σ 2 (x l)(x l) + σ 2 ni] 1 y where σ 2 n is the noise variance in the learned likelihood. This forula is obtained fro the posterior ean of the GP, which is linear in the inputs for the linear kernel. This value aounts to 1.515, hence the slope found by SKC is accurate to 3 s.f. L Optiising the upper bound If the upper bound is tighter and ore robust with respect to choice of inducing points, why don t we optiise the upper bound to find hyperparaeters? If this were to be possible then we can axiise this to get an upper bound of the arginal likelihood with optial hyperparaeters. In fact this holds for any analytic upper bound whose value and gradients can be evaluated cheaply. Hence for any, we can find an interval that contains the true optiised arginal likelihood. So if this interval is doinated by an interval of another kernel, we can discard the kernel and there is no need to evaluate the bounds for bigger values of. Now we wish to use values of such that we can choose the right kernel (or kernels) at each depth of the search tree with inial coputation. This gives rise to an exploitation-exploration trade-off, whereby we want to balance between raising for tight intervals that allow us to discard unsuitable kernels whose intervals fall strictly below that of other kernels, and quickly oving on to the next depth in the search tree to search for finer structure in the data. The search algorith is highly parallelisable, and thus we ay raise siultaneously for all candidate kernels. At deeper levels of the search tree, there ay be too any candidates for siultaneous coputation, in which case we ay select the ones with the highest upper bound to get tighter intervals. Such attepts are listed below. Note the two inequalities for the NLD and NIP ters: 1 2 log det( ˆK + σ 2 I) 1 Tr(K ˆK) 2σ2 1 2 log det(k + σ2 I) 1 2 y ( ˆK + σ 2 I) 1 y 1 2 y (K+σ 2 I) 1 y 1 2 log det( ˆK + σ 2 I) (5) 1 2 y (K + (σ 2 + Tr(K ˆK))I) 1 y (6) Where the first two inequalities coe fro [3], the

5 Hyunjik Ki and Yee Whye Teh solar irradiance to get the upper bound 1 2 log det( ˆK +σ 2 I) 1 2 y (K +(σ 2 +Tr(K ˆK))I) 1 y However we found that axiising this bound gives quite a loose upper bound unless = O(N). Hence this upper bound is not very useful M Rando Fourier Features CO2 level year (a) Solar year (b) Mauna Figure 9: Plots of sall tie series data: Solar and Mauna Rando Fourier Features (RFF) (a.k.a. Rando Kitchen Sinks) was introduced by [26] as a low rank approxiation to the kernel atrix. It uses the following theore Theore 3 (Bochner s Theore [29]). A stationary kernel k(d) is positive definite if and only if k(d) is the Fourier transfor of a non-negative easure. to give an unbiased low-rank approxiation to the Gra atrix K = E[Φ Φ] with Φ R N. A bigger lowers the variance of the estiate. Using this approxiation, one can copute deterinants and inverses in O(N 2 ) tie. In the context of kernel coposition in 2, RFFs have the nice property that saples fro the spectral density of the su or product of kernels can easily be obtained as sus or ixtures of saples of the individual kernels (see Appendix N). We use this later to give a eory-efficient upper bound on the log arginal likelihood in Appendix P. third inequality is a direct consequence of K ˆK being postive sei-definite, and the last inequality is fro Michalis Titsias lecture slides 3. Also fro 4, we have that 1 2 log det( ˆK + σ 2 I) α (K + σ 2 I)α α y is an upper bound α R N. Thus one idea of obtaining a cheap upper bound to the optiised arginal likelihood was to solve the following axiin optiisation proble: ax θ in 1 α R N 2 log det( ˆK+σ 2 I)+ 1 2 α (K+σ 2 I)α α y One way to solve this cheaply would be by coordinate descent, where one axiises with respect to θ fixing α, then iniises with respect to α fixing θ. However σ tends to blow up in practice. This is because the expression is O( log σ 2 + σ 2 ) for fixed α, hence axiising with respect to σ pushes it towards infinity. N Rando Features for Sus and Products of Kernels For RFF the kernel can be approxiated by the inner product of rando features given by saples fro its spectral density, in a Monte Carlo approxiation, as follows: k(x y) = e iv (x y) dp(v) R D p(v)e iv (x y) dv R D where φ(x) = = E p(v) [e iv x (e iv y ) ] = E p(v) [Re(e iv x (e iv y ) )] 1 Re(e iv x k (e iv y k ) ) k=1 = E b,v [φ(x) φ(y)] 2 (cos(v 1 x+b 1 ),..., cos(v x+b )) An alternative is to su the two upper bounds above with spectral frequencies v k iid saples fro p(v) and b k iid saples fro U[, 2π]. 3 Let k 1, k 2 be two stationary kernels, with respective

6 Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes spectral densities p 1, p 2 so that k 1 (d) = a 1 ˆp 1 (d), k 2 (d) = a 2 ˆp 2 (d), where ˆp(d) := p(v)e iv d dv. We use this convention as the Fourier R D transfor. Note a i = k i (). (k 1 + k 2 )(d) = a 1 p 1 (v)e iv d dv + a 2 p 2 (v)e iv d dv = (a 1 + a 2 ) p ˆ + (d) where p + (v) = a1 a 1+a 2 p 1 (v) + a2 a 1+a 2 p 2 (v), a ixture of p 1 and p 2. So to generate RFF for k 1 + k 2, generate v p + by generating v p 1 w.p. 1 a a 1+a 2 and v a p 2 w.p. 2 a 1+a 2 Now for the product, suppose (k 1 k 2 )(d) = a 1 a 2 ˆp 1 (d) ˆp 2 (d) = a 1 a 2 ˆp (d) Then p (d) is the inverse fourier transfor of ˆp 1 ˆp 2, which is the convolution p 1 p 2 (d) := p R D 1 (z)p 2 (d z)dz. So to generate RFF for k 1 k 2, generate v p by generating v 1 p 1, v 2 p 2 and setting v = v 1 + v 2. This is not applicable for non-stationary kernels, such as the linear kernel. We deal with this proble as follows: Suppose φ 1, φ 2 are rando features such that k 1 (x, x ) = φ 1 (x) φ 1 (x ), φ 2 (x) φ 2 (x ), φ i : R D R. It is straightforward to verify that P An upper bound to NLD using Rando Fourier Features Note that the function f(x) = log det(x) is convex on the set of positive definite atrices [6]. Hence by Jensen s inequality we have, for Φ Φ an unbiased estiate of K: 1 2 log det(k + σ2 I) = f(k + σ 2 I) = f(e[φ Φ + σ 2 I]) E[f(Φ Φ + σ 2 I)] Hence 1 2 log det(φ Φ + σ 2 I) is a stochastic upper bound to NLD that can be calculated in O(N 2 ). An exaple of such an unbiased estiator Φ is given by RFF. Q Further Plots (k 1 + k 2 )(x, x ) = φ + (x) φ + (x ) (k 1 k 2 )(x, x ) = φ (x) φ (x ) where φ + ( ) = (φ 1 ( ), φ 2 ( ) ) and φ ( ) = φ 1 ( ) φ 2 ( ). However we do not want the nuber of features to grow as we add or ultiply kernels, since it will grow exponentially. We want to keep it to be features. So we subsaple entries fro φ + (or φ ) and scale by factor 2 ( for φ ), which will still give us unbiased estiates of the kernel provided that each ter of the inner product φ + (x) φ + (x ) (or φ (x) φ (x )) is an unbiased estiate of (k 1 + k 2 )(x, x )(or (k 1 k 2 )(x, x )). This is how we generate rando features for linear kernels cobined with other stationary kernels, using the features φ(x) = σ (x,..., x). O Spectral Density for PER Fro [35], we have that the spectral density of the PER kernel is: I n (l 2 ( ) exp(l 2 ) δ v 2πn ) p n= where I is the odified Bessel function of the first kind.

7 Hyunjik Ki and Yee Whye Teh negative log ML energy fullgp Var LB UB NLD Nystro NLD UB RFF NLD UB NIP (a) Mauna: fix inducing points CG PCG Nys PCG FIC PCG PIC negative log energy ML fullgp Var LB UB NLD Nystro NLD UB RFF NLD UB NIP (b) Mauna: learn inducing points Figure 1: Sae as 1a and 1b but for Mauna Loa data. CG PCG Nys PCG FIC PCG PIC negative log ML energy -6-8 fullgp Var LB UB NLD Nystro NLD UB RFF NLD UB NIP (a) Concrete: fix inducing points CG PCG Nys PCG FIC PCG PIC negative log ML energy fullgp Var LB UB NLD Nystro NLD UB 2 RFF NLD UB NIP CG PCG Nys -55 PCG FIC PCG PIC 1-6 (b) Concrete: learn inducing points Figure 11: Sae as 1a and 1b but for Concrete data. SE PER LIN (SE)*SE (SE)*PER (SE)+SE (PER)+SE (PER)+PER (PER)*PER (PER)+LIN (LIN)+LIN (LIN)+SE (PER)*LIN (LIN)*LIN (SE)*LIN Figure 12: Kernel search tree for SKC on solar data up to depth 2. We show the upper and lower bounds for different nubers of inducing points.

8 Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes Figure 13: Sae as Figure 12 but for Mauna data. Figure 14: Sae as Figure 12 but for concrete data and up to depth 1.

arxiv: v2 [stat.ml] 14 Feb 2018 Abstract

arxiv: v2 [stat.ml] 14 Feb 2018 Abstract Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes arxiv:176.2524v2 [stat.ml] 14 Feb 218 Abstract Autoating statistical odelling is a challenging proble in artificial

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes Scaling up the Autoatic Statistician: Scalable Structure Discovery using Gaussian Processes Hyunjik Ki and Yee Whye Teh University of Oxford, DeepMind {hki,y.w.teh}@stats.ox.ac.uk Abstract Autoating statistical

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng EE59 Spring Parallel LSI AD Algoriths Lecture I interconnect odeling ethods Zhuo Feng. Z. Feng MTU EE59 So far we ve considered only tie doain analyses We ll soon see that it is soeties preferable to odel

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

PHY 171. Lecture 14. (February 16, 2012)

PHY 171. Lecture 14. (February 16, 2012) PHY 171 Lecture 14 (February 16, 212) In the last lecture, we looked at a quantitative connection between acroscopic and icroscopic quantities by deriving an expression for pressure based on the assuptions

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction TWO-STGE SMPLE DESIGN WITH SMLL CLUSTERS Robert G. Clark and David G. Steel School of Matheatics and pplied Statistics, University of Wollongong, NSW 5 ustralia. (robert.clark@abs.gov.au) Key Words: saple

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Principal Components Analysis

Principal Components Analysis Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

ma x = -bv x + F rod.

ma x = -bv x + F rod. Notes on Dynaical Systes Dynaics is the study of change. The priary ingredients of a dynaical syste are its state and its rule of change (also soeties called the dynaic). Dynaical systes can be continuous

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers Ocean 40 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers 1. Hydrostatic Balance a) Set all of the levels on one of the coluns to the lowest possible density.

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

Training an RBM: Contrastive Divergence. Sargur N. Srihari

Training an RBM: Contrastive Divergence. Sargur N. Srihari Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

Automated Frequency Domain Decomposition for Operational Modal Analysis

Automated Frequency Domain Decomposition for Operational Modal Analysis Autoated Frequency Doain Decoposition for Operational Modal Analysis Rune Brincker Departent of Civil Engineering, University of Aalborg, Sohngaardsholsvej 57, DK-9000 Aalborg, Denark Palle Andersen Structural

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

a a a a a a a m a b a b

a a a a a a a m a b a b Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

An improved self-adaptive harmony search algorithm for joint replenishment problems

An improved self-adaptive harmony search algorithm for joint replenishment problems An iproved self-adaptive harony search algorith for joint replenishent probles Lin Wang School of Manageent, Huazhong University of Science & Technology zhoulearner@gail.co Xiaojian Zhou School of Manageent,

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information