STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

Size: px
Start display at page:

Download "STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING"

Transcription

1 STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA Anrzej Przybyszewski McGill University, Montreal, Canaa, an University of Massachusetts, Worcester, MA Leszek Plaskota Department of Mathematics Warsaw University Warsaw, Polan ABSTRACT We show that maximum a posteriori (MAP) statistical methos can be use in nonparametric machine learning problems in the same way as their current applications in parametric statistical problems, an give some examples of applications. This MAPN (MAP for nonparametric machine learning) paraigm can also reprouce much more transparently the same results as regularization methos in machine learning, spline algorithms in continuous complexity theory, an Baysian minimum risk methos. KEY WORDS machine learning, Bayesian statistics 1 Introuction Machine learning an articial neural network theory often eal with the problem of learning an input-output (i-o) function from examples, i.e., from partial information ([1], [2], [3], [4], [5],[6], [7]). Given an unknown i-o function f(x), along with examples Nf = (f(x 1 ); : : : ; f(x n )), the goal is to learn f. In this paper we escribe an extension of stanar maximum a posteriori (MAP) methos in parametric statistics to this (nonparametric) machine learning problem. Consier the problem ([6], [7]) of recovering f from a hypothesis space F of possible functions using information Nf, or more general information Nf = (L 1 f; : : : ; L n f), with L i general functionals on f (e.g., Fourier coefcients). This problem occurs in machine learning, statistical learning theory ([6], [7])), information-base complexity, nonparametric Bayesian statistics ([8],[9]), optimal recovery [10] an ata mining [11]. Since we will assume little about the reaer's knowlege of this area, we will inclue some basic examples an enitions. To give an example of this type of nonparametric machine learning, we might be seeking a function f(x) [12] which represents a relationship between inputs an outputs in a chemical mixture. Suppose we are builing a control system in which homeostatic parameters, incluing temperature, humiity an amounts of chemical components of an inustrial mixture can be controlle as input variables. Suppose we want to control the output variable y, which is the ratio of strength to brittleness of the plastic prouce from the mixture. We want to buil a machine which has the above input variables x = (x 1 ; x 2 ; :::; x n ) an whose output preicts the correct ratio y. The machine will use experimental ata points y = f(x) to learn from previous runs of the equipment. We may alreay have a prior moel for f base on simple assumptions on the relationships of the variables. We then want to combine this prior information with that from the several runs we have mae of our experiment. Learning an unknown i-o function f from a high imensional hypothesis space F is a nonparametric statistical problem inference from the ata Nf is one from a very large set of possibilities. We will show that stanar parametric MAP estimation algorithms can be extene irectly to a MAP for nonparametric machine learning (MAPN). The metho presente here is simple an intuitively appealing in spite of the high imensionality of the problems, an its estimates uner stanar hypotheses coincie with those obtaine by other methos, e.g., optimal recovery [10], information base complexity [3], an statistical learning theory [7], as will be emonstrate below. MAPN is a Bayesian learning algorithm which assumes prior knowlege given by a probability istribution for the unknown function f 2 F, representing information about f (we assume here that F is a norme linear space). An example of Bayesian prior knowlege of the type mentione above woul be the stipulation that the probability istribution on F is a Gaussian measure centere at f. Examples of common a priori measures on a hypothesis space F inclue Gaussian an elliptically contoure measures ([3], [13], [8]). The most important new element in this extension of MAP to hypothesis spaces of functions is a proof [14] that it is possible to ene ensity functions %(f) corresponing to measures in a way analogous to how this is one in

2 nite imensional estimation see below. The MAPN algorithm, as oes MAP, will then use the ensity function (f) corresponing to this measure [14], an maximize it over the set N 1 (z) of functions f 2 F consistent with the ata z, yieling the MAPN estimate (or o the same with an assumption of some measurement error; see below). The key issue in the evelopment of the MAPN algorithm is that, as in MAP, we require existence of a ensity function for. As is well known, such a ensity can exist for measures on nite imensional F, i.e., when the number of parameters to be estimate is nite. In many ata mining applications, however, as in the above example, an entire function f (i.e., an innite number of parameters) must be extrapolate from ata, an the corresponing in- nite imensional parameter space F presents an obstacle to extening MAP, since measures in innite imension o not amit ensity functions in the stanar sense. This is because the ensity function (f) is the erivative (with respect to Lebesgue measure) of the a priori probability measure, an Lebesgue measure fails to exist in innite imension. Thus it has up to now been natural to assume that a probability ensity for f cannot be ene or maximize if F is a function space. The metho for ening a ensity function for a prior measure on an innite imensional F in fact can be accomplishe in a way which is interestingly analogous to the nite imensional case. In the latter situation, (f) = (f) f with f Lebesgue measure (which exists only in nite imension). We show that it is possible to construct ensities for such measures also in innite imensional spaces F. More generally, we can show such ensities can even exist for - nitely aitive measures, e.g., isonormal Gaussian measure on F with covariance operator C = 1 2 I. Uner nite imensional Bayesian inference with prior ensity (f), the MAP estimate b f is the maximizer bf of (f), subject to the ata z: bf = arg max (f): f2n 1 z Likelihoo functions have some signicant avantages, such as ease of use, ease of maximization, an ease of conitioning when further information (e.g., ata N f = y) becomes available. As mentione above, the lack of a ensity (f) in in- nite imensional hypothesis spaces F is base on the lack of a Lebesgue measure. However, Lebesgue measure is require only to ene sets of same size at ifferent locations (probabilities are then compare). This can be accomplishe in other ways in innite imensional hypothesis spaces, as we show here. We remark that the innite imensional nature of MAPN shoul be viewe as summarizing an inuctive limit of nite imensional approximation methos. The extent to which the algorithm is vali is etermine by the valiity of the same metho for nite imensional approximations. Here such approximations woul entail approximating the space F of allowe i-o functions f as a nite imensional space, say consisting of gri approximations of f. The valiity of the MAPN proceure in innite imension states that there is a vali inuctive limit of MAP algorithms for nite imensional approximations of the esire f. 2 Invariant measures an ensity functions Let be a probability measure on a norme linear space F. For estimation of an unknown f 2 F we will rst consier how transformations of affect estimators. Consier the invariance properties of the measure uner a transformation T : F! F : If F is nite imensional, we can ene to be invariant with respect to T in two ways: 1. It can be measure-invariant with respect to T, so that (T 1 A) = (A) 8 meas. A 2. It can be ensity-invariant with respect to T, so that (at least if F nite imensional) (T f) = (f) for all x, where (f) = (f) = (f) is the ensity of f, with Lebesgue measure. In a Bayesian setting, let enote the a priori istribution for the unknown f, an assume we have ata Nf = y regaring f. Then the expecte error minimizing estimator of f is f B = E(fjNf = y) [3]. If T is measure-invariant then T (f B ) = T (f) B ; i.e., the transform of estimator is the estimator of the transform. Thus average-case estimation ([3]; [15]) is invariant uner T. However, this is not true for the MAP estimate. If is ensity-invariant uner T, then the MAP estimate is preserve uner T, i.e., arg max (T 1 f) = T arg max (f) : Note that in nite imension, if is ensity-invariant with respect to any T which preserves a norm kaxk, then is elliptically contoure with respect A; essentially this means is a superposition of Gaussians whose covariance operators are scalar multiples of a single covariance (Traub, et al., 1987).

3 3 Denitions an basic results Let F be nite imensional, represent prior knowlege about f 2 F, an ene the ensity (f) = (with Lebesgue measure). Then we can also ene (up to a multiplicative constant) by: (f) (g) = lim (B (f))!0 (B (g)), (1) where B (f) enotes the -ball centere at f. The rst (erivative) enition oes not exten to innite imension, but the secon one on the right of 1 oes. More generally, for cyliner (nitely aitive) probability measures on F, we can ene a ensity by p(f) p (g) = (B lim ;N (f)) R(N)!1,!0 (B ;N (g)), where B ;N (f) enotes the epsilon-cyliner at f of coimension n : B ;N (f) = ff 0 j kn(f f 0 )k 2 g; an N has nite rank, with n = R(N) =rank(n) (with some technical conitions on the sequence N of nite rank operators). Theorem (Kon, 2004): If 1 an 2 are two outer regular measures on F, an if 2 (f) exists an is Lebesgue a.e. 1 with respect to 1, then exists an is nite a.e. Then r(f) lim!0 2 (B (f)) 1 (B (f)) r(f) = 2 1 (f); almost everywhere. Corollary: If has a ensity (f) an the erivative (x g) (x) exists for all g, then (f 1 ) (f 2 ) = (f f 2 + f 1 ) (f) f=f2 For example, if is a Gaussian measure with positive covariance C, then (f) = e 1 2 kafk2, where C = A 2. By above, the case C = I (cyliner measure only) is inclue as well. 4 Density viewpoint: ensity functions as a priori information As mentione earlier, it is sometimes attractive to ene a measure representing prior knowlege about f which is only nitely aitive, e.g. a cyliner measure, to properly reect a priori information. As an example, consier a Gaussian, with covariance operator C = I. It is known that in innite imension, this is not a probability measure (i.e., it is not normalizable as a countably aitive measure). Nevertheless, it has a ensity as a cyliner measure, as inicate above. This ensity (f) = e 1 2 kfk2 (2) can reect a priori knowlege about f in a precise way. Specically, (f) reects relative preferences for ifferent choices of f. For example, the ensity (f) 1, represents an a priori Lebesgue measure for f, i.e., inicating no preference, on any size space. However, ensities (f) as a priori information can incorporate other types of information, e.g., as partial information about a probability istribution or even noncountably aitive probabilities e.g., cyliner measures (see e.g., [16]). The use of a ensity which oes not correspon to a measure can be illustrate with a simple example of inference with a priori information, which is in fact analogous to the situation in an innite imensional function space. Consier an unknown integer n. Suppose we only have the a priori knowlege that P (n is even) = 2P (n is o) Such information cannot be formulate in a single probability istribution on the integers, since such a istribution woul have to be uniform on the evens an on the os, an no such countably aitive istribution exists. A likelihoo function is neee to incorporate this information, i.e., ( 2 if n even (n) = 1 if n o Likelihoo functions play a similar role in an innite imensional space F, reecting partial knowlege about f in a noncompact space where there is no uniform probability istribution. As an example in innite imension, suppose we know only that, given two choices f 1 an f 2 of f, the ratio of their probabilities is e 1 2 kf1k2 e 1 2 kf2k2 ; it makes sense to ene a likelihoo function e 1 2 kfk2, whether or not this correspons to an actual measure. Thus likelihoo functions are a more natural way than measures of incorporating a priori information in innite imensional Bayesian settings. This can also be seen in nite but high nite imensional situations, in which a naive regularization approach to incorporating a priori information regaring an unknown f might be as follows. Common methos for inference in statistical learning theory ([6], [7],[17]) involve regularization. Thus in aition to ata y = Nf, a priori information might be: kafk shoul be small for a xe linear A, where, e.g., A is a erivative operator, in which case kafk is a Sobolev norm. For simplicity, assume A = I is the ientity an the norm is Eucliean. A naive approach might be to assume a prior istribution on the ranom variable R = kfk, say R (R) = 2 p e R2 (R > 0): (3) Note this marginal for R correspons to an n imensional istribution of the form

4 f (f) = C 1 1 kfk n 1 e kfk 2 ; (4) if we assume f (f) to be raially symmetric in f. This likelihoo function is singular at the origin an clearly vanishes as n! 1 (so that it has no innite imensional limit), seemingly implying no likelihoo methos in innite imension. Compare this to the present likelihoo function approach: we start with likelihoo function f (f) = C 2 e kafk2 (above, A = I), which irectly expresses intuition that a function f 1 is preferable to a function f 2 by kaf 1 k2 a likelihoo factor e. An ae feature is that this e kaf 2 likelihoo function is the k2 ensity of a measure on F if an only if A 2 is trace class [3]. The point here is that working with stanar probability measures can be misleaing, while expressing a priori information in likelihoo functions claries a priori assumptions in practical situations. In the innite imensional case, if we assume a priori likelihoo function, e.g., (f) = e kafk2, even if A = I, (so is the ensity of an isonormal cyliner Gaussian measure), we unerstan the connection of (f) (an hence unerlying probabilities) with a priori knowlege about likelihoos (see below). 5 Applications MAPN in Bayesian estimation: As inicate above, in innite imensional Bayesian inference the MAP estimator can be as useful as in parametric statistics. An unknown f 2 F with a Bayesian prior istribution on F now has a likelihoo function, as in parametric statistics. Gaussian prior: For example consier an innite imensional Gaussian prior measure on a Hilbert space F with covariance C. Assume without loss that C has ense range. We nee: Dening A = p C 1, can be shown to have ensity (f) = e 1 2 kafk2 [14]. Suppose we are given stanar information y = Nf = (f(x 1 ); : : : ; f(x n )). By the above, the conitional ensity (fjy) is the restriction of (f), so the MAPN estimate of f is: bf = arg max Nf=y e 1 2 kafk2 = arg min kafk Nf=y Note that this correspons to the spline estimate of f [3], as well as the regularization theory estimate of f for exact (i.e., error-free) information [17], an Bayesian minimum average error estimates base on Gaussian priors [15]. Gaussian prior with noisy information: For inexact information, assume a ranom inepenent error with ensity (y). The information moel is Then the MAPN estimate is Note y = Nf + : bf =arg max (fjy): f (fjy) = y(yjf)(f) : y (y) If we further assume that the ensity in R n of is Gaussian, i.e., () = C 3 e kbk2, with B linear an C 3 a constant, we conclue (fjy) = C 4 e kb(n(f) y)k2 e kafk2 y (y) = C 5 e kb(n(f) y)k2 e kafk2, where C 5 can epen on y. MAPN yiels b f as a maximum of e kb(n(f) y)k2 kafk 2, so bf = arg min Nf=y kafk 2 + kb(nf y)k 2 : This log likelihoo function can be minimize e.g., using Lagrange multipliers. Note this is exactly the spline solution of the same problem, an also the minimum of the regularization functional kafk 2 + kb(nf y)k 2 appearing in regularization methos for solving N f = y ([7], [17]). Note also that the same proceure can be use for estimates base on elliptically contoure measures, which for elliptically contoure measures is more ifcult to o using minimum square error methos. If likelihoo functions are use, operator methos of maximizing them which work in nite imension also typically work in innite imensions, with matrices replace by operators (e.g., the covariance a Gaussian becomes an operator). Example: In the example of the chemical mixture mentione earlier, suppose that we have an input vector x = (x 1 ; x 2 ; :::; x 20 ) representing an input mixture, with the 20 parameters representing temperature, mixing strength, along with 18 numbers representing proportions of chemical components (in moles/kg). The measure output y represents the measure ratio of strength to brittleness for the plastic prouct of the mixture. In this example, we have assume n = 400 runs of the experiment (a relatively small number given the size of the space F of possible i-o functions). For simplicity we assume that all variables x i have been re-scale so that their experimental range is the

5 interval [ 1=2; 1=2]. Thus the a priori space F of possible i-o functions will be all (square integrable) functions on the input space X = [ 1=2; 1=2] 20. As a rst approximation we assume that the target f 2 F is smooth, an thus we assume a prior probability istribution of f to be a Gaussian, favoring functions which are smooth an (for regularity purposes) not large. In aition, it may be felt (from previous experience) that the input variables x 1 ; :::; x 5 are associate with more sharp variations in y than the other 15 variables, an we expect the variation of f in these irections to be more sensitive to ata than to our a priori assumptions of smoothness. Finally, we will want the a priori smoothness an size assumptions to have more weight in regions where there are less ata in a precise parametric way. To this en our a priori esieratum will initially require "smallness" of the regularization term kf(x)k 2 k((x) [1 + (a D)] f(x))k2 ; where a = (:1; :1; :::; :1; :2; :2; :::; :2)2R 20 has its rst 5 components equal to :10 (reecting greater esire epenence of f on the variations in the rst ve variables) an its last 15 components equal to :20 (reecting smaller epenence on the last 15 variables); these parameters can be ajuste. We have also ene D = 15 ; 15 ; :::; 15 ; the high orer of this operator is necessary since its omain must con- x 15 1 x 15 2 x sist of continuous functions on R 20 ; i.e., functions which have more than 10 erivatives. Note that the norm kk above is given by kfk 2 = R f(x) 2 x, an the associ- 20 ate inner prouct is hf; gi = R f(x)g(x)x R 20 Above the function (x) is chosen to reect the ensity (x) of the sample points fx k g n k=1 at the location x. Note if (x) is large, then we wish our estimate to epen more on ata an less on a priori assumptions, an that we want (x) to be small there; the opposite is esire when (x) is small. As an initial approximation we choose (x) = (1 + (x)) 1, with the local linear ensity (x) ene by (x) m = P n k=1 e (x xk), where here the imension m = 20: We will nee to ene a omain with proper bounary conitions for the square of the above regularization operator in orer to obtain a function class F with a unique Gaussian istribution. The simplest bounary conitions are Dirichlet B.C., i.e., the requirement that f vanish on the bounary. Since this is not a natural requirement for our function class, we exten the omain space F to be square integrable functions on [ 1; 1] 20 in orer to impose such conitions sufciently far away from the "region of interest" (the vali omain of variation of the inputs, [ 1=2; 1=2] 20 [ 1; 1] 20 ). Thus F is square integrable functions in 20 variables, each in the interval [ 1; 1]. We ene the operator ad = a The Dirichlet operator B ; a 15 x 15 2 ; :::; a 15 1 x x ene by hf; Bfi = k((x) [(1 + (ad)] f(x))k 2 is given by Bf = ((x) [(1 + (ad)]) ((x) [(1 + (ad)]) f = (x) [(1 + (ad)] [(1 (ad)] (x)f(x) = (x)(1 (ad) 2 )(x)f(x) with omain of B restricte to f satisfying f(x) = 0 on the bounary, i.e., when any x i = 1: This operator has a trace class inverse B 1 = C, which is the covariance operator of a Gaussian measure P on F. We ene P to be the associate Gaussian istribution on the space F of functions f : [ 1; 1] 20! R. Note that our ata are restricte to the subset [ 1=2; 1=2] 20 [ 1; 1] 20 : The measure P is our a priori Gaussian measure on F. By the above analysis, given an unknown f 2 F an information y = Nf = (f(x 1 ); :::; f(x n )), then accoring to the above, the MAPN estimator of f is bf = arg inf Nf=y fkf(x)k2 k((x) [1 + (ad)] f(x))k2 = hf; Bfi = f; 2 (x)f(x) (x)(ad) 2 (x)f(x) g: Note that ([18], [7], [12]), the above minimizer is bf(x) = nx c k G(x; x k ); (5) k=1 where G(x; x 0 ) is the Green function for the ifferential operator B ene above. Thus G is the kernel of the covariance operator B 1 = C; which can be compute separately. Note B 1 f = (x) 1 (1 (ad) 2 ) 1 (x) 1 f: Thus G(x; x 0 ) =(x) 1 G 0 (x; x 0 )(x 0 ) 1 ; where G 0 (x) is the kernel of (1 (ad) 2 ) 1 = 1 20X a 2 i i=1! 1 30 x 30 (6) i on F, with vanishing bounary conitions. Since the bounaries of the omain [ 1; 1] 20 of F are relatively far from the region of interest [ :5; :5] 20 in which the ata lie, we approximate the Green kernel G 0 (x; x 0 ) with bounary conitions vanishing on the bounary of [ 1; 1] 20 ; by the approximation G(x; e x 0 ); the kernel of (1 (ad) 2 ) 1 on R 20 with bounary conitions at 1. This has the form 0 = 1 eg(x; x 0 ) = e G(x x 0 ) 20X i=1! 1 1 a 2 i (i i ) 30 A (x x 0 ); (7) where F enotes Fourier transform an i is the variable ual to x i. Thus our approximation of f is given by (5) (see, e.g., [12]), with c = fc i g n i=1 = G 1 (Nf) T ; where

6 G is a matrix with G ij = G(x i x j ); with T enoting transpose. Explicitly, [10] C. Micchelli an T. Rivlin, Lectures on optimal recovery, Lecture Notes in Mathematics 1129, (Berlin: Springer-Verlag, 1985) nx bf = c k G(x k=1 x k ) nx k=1 G 1 Nf T k e G(x x k ); [11] T. Hastie, R. Tibshirani an J. Frieman, The elements of statistical learning: ata mining, inference an preiction (Berlin: Springer-Verlag, 2001). where e G(x x k ) is computable from (7). 6 Conclusions We have showe that it is possible to exten MAP estimates to high an innite imensional approximations. In applications we have constructe an approximation to an unknown function f from pointwise information N f = (f(x 1 ); :::; f(x n )) by incorporating prior information regaring smoothness of f, an aapting the solution in a way which epens on the local ensity of ata points x i. References [1] T. Mitchell, Machine Learning (New York: McGraw-Hill, 1997). [2] J. Traub an H. Wozniakowski, A General Theory of Optimal Algorithms (New York: Acaemic Press, 1980). [3] J. Traub, G. Wasilkowski, an H. Woniakowski, Information-Base Complexity (Boston: Acaemic Press, 1988). [4] T. Poggio an S. Smale, The Mathematics of Learning: Dealing with Data, Notices of the AMS, 50, 2003, [12] M. Kon an L. Plaskota, Information complexity of neural networks, Neural Networks, 13, 2000, [13] J. Traub an H. Wozniakowski, Breaking intractability, Scientic American, 270, 1994, [14] M. Kon, Density functions for machine learning an optimal recovery, 2004, preprint.. [15] G. Wahba, Bayesian conence intervals for the cross-valiate smoothing spline, Journal of the Royal Statistical Society B, 45, 1983, [16] N. Frieman an J. Halpern, Plausibility measures an efault reasoning, In Thirteenth National Conf. on Articial Intelligence (New York: AAAI, 1996), [17] T. Evgeniou, M. Pontil, an T. Poggio, Regularization Networks an Support Vector Machines, Avances in Computational Mathematics, 13, 2000, [18] Micchelli, C.A. an M. Buhmann, On raial basis approximation on perioic gris, Math. Proc. Camb. Phil. Soc., 112, 1992, [19] C. A. Micchelli, Interpolation of scattere ata: Distance matrices an conitionally positive enite functions, Constructive Approximation, 2, 1986, [5] T. Poggio an C. Shelton, Machine Learning, Machine Vision, an the Brain, The AI Magazine, 20, 1999, URL: citeseer.nj.nec.com/poggio99machine.htm [6] V. Vapnik, The Nature of Statistical Learning Theory, (New York: Springer, 1995). [7] V. Vapnik, Statistical Learning Theory (New York: Wiley, 1998). [8] G. Wahba, Generalization an regularization in nonlinear learning systems, in M. Arbib (E.), Hanbook of Brain Theory an Neural Networks, (Cambrige: MIT Press, 1995) [9] G. Wahba, Support vector machines, reproucing kernel Hilbert spaces an the ranomize GACV, in B. Schoelkopf, C. Burges & A. Smola (Es.), Avances in Kernel Methos Support Vector Learning, (Cambrige: MIT Press, 1999)

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

INDEPENDENT COMPONENT ANALYSIS VIA

INDEPENDENT COMPONENT ANALYSIS VIA INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 4 2 0 2 4 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 2 1 0 1 2 3 4 Marginal

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

DAMTP 000/NA04 On the semi-norm of raial basis function interpolants H.-M. Gutmann Abstract: Raial basis function interpolation has attracte a lot of

DAMTP 000/NA04 On the semi-norm of raial basis function interpolants H.-M. Gutmann Abstract: Raial basis function interpolation has attracte a lot of UNIVERSITY OF CAMBRIDGE Numerical Analysis Reports On the semi-norm of raial basis function interpolants H.-M. Gutmann DAMTP 000/NA04 May, 000 Department of Applie Mathematics an Theoretical Physics Silver

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

Problems Governed by PDE. Shlomo Ta'asan. Carnegie Mellon University. and. Abstract

Problems Governed by PDE. Shlomo Ta'asan. Carnegie Mellon University. and. Abstract Pseuo-Time Methos for Constraine Optimization Problems Governe by PDE Shlomo Ta'asan Carnegie Mellon University an Institute for Computer Applications in Science an Engineering Abstract In this paper we

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Modeling time-varying storage components in PSpice

Modeling time-varying storage components in PSpice Moeling time-varying storage components in PSpice Dalibor Biolek, Zenek Kolka, Viera Biolkova Dept. of EE, FMT, University of Defence Brno, Czech Republic Dept. of Microelectronics/Raioelectronics, FEEC,

More information

EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION

EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION DISSERTATION Presente in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Grauate

More information

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold CHAPTER 1 : DIFFERENTIABLE MANIFOLDS 1.1 The efinition of a ifferentiable manifol Let M be a topological space. This means that we have a family Ω of open sets efine on M. These satisfy (1), M Ω (2) the

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information

On the Value of Partial Information for Learning from Examples

On the Value of Partial Information for Learning from Examples JOURNAL OF COMPLEXITY 13, 509 544 (1998) ARTICLE NO. CM970459 On the Value of Partial Information for Learning from Examples Joel Ratsaby* Department of Electrical Engineering, Technion, Haifa, 32000 Israel

More information

3 The variational formulation of elliptic PDEs

3 The variational formulation of elliptic PDEs Chapter 3 The variational formulation of elliptic PDEs We now begin the theoretical stuy of elliptic partial ifferential equations an bounary value problems. We will focus on one approach, which is calle

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

sampling, resulting in iscrete-time, iscrete-frequency functions, before they can be implemente in any igital system an be of practical use. A consier

sampling, resulting in iscrete-time, iscrete-frequency functions, before they can be implemente in any igital system an be of practical use. A consier DISTANCE METRICS FOR DISCRETE TIME-FREQUENCY REPRESENTATIONS James G. Droppo an Les E. Atlas Department of Electrical Engineering University of Washington Seattle, WA fjroppo atlasg@u.washington.eu ABSTRACT

More information

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary

More information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

More information

A representation theory for a class of vector autoregressive models for fractional processes

A representation theory for a class of vector autoregressive models for fractional processes A representation theory for a class of vector autoregressive moels for fractional processes Søren Johansen Department of Applie Mathematics an Statistics, University of Copenhagen November 2006 Abstract

More information

Optimal Measurement and Control in Quantum Dynamical Systems.

Optimal Measurement and Control in Quantum Dynamical Systems. Optimal Measurement an Control in Quantum Dynamical Systems. V P Belavin Institute of Physics, Copernicus University, Polan. (On leave of absence from MIEM, Moscow, USSR) Preprint No 411, Torun, February

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION

More information

Discrete Operators in Canonical Domains

Discrete Operators in Canonical Domains Discrete Operators in Canonical Domains VLADIMIR VASILYEV Belgoro National Research University Chair of Differential Equations Stuencheskaya 14/1, 308007 Belgoro RUSSIA vlaimir.b.vasilyev@gmail.com Abstract:

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

ON THE OPTIMAL CONVERGENCE RATE OF UNIVERSAL AND NON-UNIVERSAL ALGORITHMS FOR MULTIVARIATE INTEGRATION AND APPROXIMATION

ON THE OPTIMAL CONVERGENCE RATE OF UNIVERSAL AND NON-UNIVERSAL ALGORITHMS FOR MULTIVARIATE INTEGRATION AND APPROXIMATION ON THE OPTIMAL CONVERGENCE RATE OF UNIVERSAL AN NON-UNIVERSAL ALGORITHMS FOR MULTIVARIATE INTEGRATION AN APPROXIMATION MICHAEL GRIEBEL AN HENRYK WOŹNIAKOWSKI Abstract. We stuy the optimal rate of convergence

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Sparse Reconstruction of Systems of Ordinary Differential Equations

Sparse Reconstruction of Systems of Ordinary Differential Equations Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA

More information

Chapter 9 Method of Weighted Residuals

Chapter 9 Method of Weighted Residuals Chapter 9 Metho of Weighte Resiuals 9- Introuction Metho of Weighte Resiuals (MWR) is an approimate technique for solving bounary value problems. It utilizes a trial functions satisfying the prescribe

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016 Noname manuscript No. (will be inserte by the eitor) Scaling properties of the number of ranom sequential asorption iterations neee to generate saturate ranom packing arxiv:607.06668v2 [con-mat.stat-mech]

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

On the number of isolated eigenvalues of a pair of particles in a quantum wire

On the number of isolated eigenvalues of a pair of particles in a quantum wire On the number of isolate eigenvalues of a pair of particles in a quantum wire arxiv:1812.11804v1 [math-ph] 31 Dec 2018 Joachim Kerner 1 Department of Mathematics an Computer Science FernUniversität in

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Advanced Partial Differential Equations with Applications

Advanced Partial Differential Equations with Applications MIT OpenCourseWare http://ocw.mit.eu 18.306 Avance Partial Differential Equations with Applications Fall 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.eu/terms.

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

How to Minimize Maximum Regret in Repeated Decision-Making

How to Minimize Maximum Regret in Repeated Decision-Making How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:

More information

Application of the homotopy perturbation method to a magneto-elastico-viscous fluid along a semi-infinite plate

Application of the homotopy perturbation method to a magneto-elastico-viscous fluid along a semi-infinite plate Freun Publishing House Lt., International Journal of Nonlinear Sciences & Numerical Simulation, (9), -, 9 Application of the homotopy perturbation metho to a magneto-elastico-viscous flui along a semi-infinite

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

constant ρ constant θ

constant ρ constant θ Answers to Problem Set Number 3 for 8.04. MIT (Fall 999) Contents Roolfo R. Rosales Λ Boris Schlittgen y Zhaohui Zhang ẓ October 9, 999 Problems from the book by Saff an Snier. 2. Problem 04 in section

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information

Nonlinear Adaptive Ship Course Tracking Control Based on Backstepping and Nussbaum Gain

Nonlinear Adaptive Ship Course Tracking Control Based on Backstepping and Nussbaum Gain Nonlinear Aaptive Ship Course Tracking Control Base on Backstepping an Nussbaum Gain Jialu Du, Chen Guo Abstract A nonlinear aaptive controller combining aaptive Backstepping algorithm with Nussbaum gain

More information

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

inflow outflow Part I. Regular tasks for MAE598/494 Task 1 MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the

More information

Estimating Causal Direction and Confounding Of Two Discrete Variables

Estimating Causal Direction and Confounding Of Two Discrete Variables Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]

More information

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features Inter-omain Gaussian Processes for Sparse Inference using Inucing Features Miguel Lázaro-Greilla an Aníbal R. Figueiras-Vial Dep. Signal Processing & Communications Universia Carlos III e Mari, SPAIN {miguel,arfv}@tsc.uc3m.es

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

Research Article When Inflation Causes No Increase in Claim Amounts

Research Article When Inflation Causes No Increase in Claim Amounts Probability an Statistics Volume 2009, Article ID 943926, 10 pages oi:10.1155/2009/943926 Research Article When Inflation Causes No Increase in Claim Amounts Vytaras Brazauskas, 1 Bruce L. Jones, 2 an

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

Stable and compact finite difference schemes

Stable and compact finite difference schemes Center for Turbulence Research Annual Research Briefs 2006 2 Stable an compact finite ifference schemes By K. Mattsson, M. Svär AND M. Shoeybi. Motivation an objectives Compact secon erivatives have long

More information

The Sokhotski-Plemelj Formula

The Sokhotski-Plemelj Formula hysics 25 Winter 208 The Sokhotski-lemelj Formula. The Sokhotski-lemelj formula The Sokhotski-lemelj formula is a relation between the following generalize functions (also calle istributions), ±iǫ = iπ(),

More information

CONSERVATION PROPERTIES OF SMOOTHED PARTICLE HYDRODYNAMICS APPLIED TO THE SHALLOW WATER EQUATIONS

CONSERVATION PROPERTIES OF SMOOTHED PARTICLE HYDRODYNAMICS APPLIED TO THE SHALLOW WATER EQUATIONS BIT 0006-3835/00/4004-0001 $15.00 200?, Vol.??, No.??, pp.?????? c Swets & Zeitlinger CONSERVATION PROPERTIES OF SMOOTHE PARTICLE HYROYNAMICS APPLIE TO THE SHALLOW WATER EQUATIONS JASON FRANK 1 an SEBASTIAN

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Periods of quadratic twists of elliptic curves

Periods of quadratic twists of elliptic curves Perios of quaratic twists of elliptic curves Vivek Pal with an appenix by Amo Agashe Abstract In this paper we prove a relation between the perio of an elliptic curve an the perio of its real an imaginary

More information

Polynomial Inclusion Functions

Polynomial Inclusion Functions Polynomial Inclusion Functions E. e Weert, E. van Kampen, Q. P. Chu, an J. A. Muler Delft University of Technology, Faculty of Aerospace Engineering, Control an Simulation Division E.eWeert@TUDelft.nl

More information

REAL ANALYSIS I HOMEWORK 5

REAL ANALYSIS I HOMEWORK 5 REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove

More information

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,

More information

Stable Adaptive Control and Recursive Identication. Using Radial Gaussian Networks. Massachusetts Institute of Technology. functions employed.

Stable Adaptive Control and Recursive Identication. Using Radial Gaussian Networks. Massachusetts Institute of Technology. functions employed. NSL-910901, Sept. 1991 To appear: IEEE CDC, Dec. 1991 Stable Aaptive Control an Recursive Ientication Using Raial Gaussian Networks Robert M. Sanner an Jean-Jacques E. Slotine Nonlinear Systems Laboratory

More information

Abstract A nonlinear partial differential equation of the following form is considered:

Abstract A nonlinear partial differential equation of the following form is considered: M P E J Mathematical Physics Electronic Journal ISSN 86-6655 Volume 2, 26 Paper 5 Receive: May 3, 25, Revise: Sep, 26, Accepte: Oct 6, 26 Eitor: C.E. Wayne A Nonlinear Heat Equation with Temperature-Depenent

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

GLOBAL SOLUTIONS FOR 2D COUPLED BURGERS-COMPLEX-GINZBURG-LANDAU EQUATIONS

GLOBAL SOLUTIONS FOR 2D COUPLED BURGERS-COMPLEX-GINZBURG-LANDAU EQUATIONS Electronic Journal of Differential Equations, Vol. 015 015), No. 99, pp. 1 14. ISSN: 107-6691. URL: http://eje.math.txstate.eu or http://eje.math.unt.eu ftp eje.math.txstate.eu GLOBAL SOLUTIONS FOR D COUPLED

More information

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col Review of Linear Algebra { E18 Hanout Vectors an Their Inner Proucts Let X an Y be two vectors: an Their inner prouct is ene as X =[x1; ;x n ] T Y =[y1; ;y n ] T (X; Y ) = X T Y = x k y k k=1 where T an

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

arxiv: v1 [physics.flu-dyn] 8 May 2014

arxiv: v1 [physics.flu-dyn] 8 May 2014 Energetics of a flui uner the Boussinesq approximation arxiv:1405.1921v1 [physics.flu-yn] 8 May 2014 Kiyoshi Maruyama Department of Earth an Ocean Sciences, National Defense Acaemy, Yokosuka, Kanagawa

More information

Assignment #3: Mathematical Induction

Assignment #3: Mathematical Induction Math 3AH, Fall 011 Section 10045 Assignment #3: Mathematical Inuction Directions: This assignment is ue no later than Monay, November 8, 011, at the beginning of class. Late assignments will not be grae.

More information

Transform Regression and the Kolmogorov Superposition Theorem

Transform Regression and the Kolmogorov Superposition Theorem Transform Regression an the Kolmogorov Superposition Theorem Ewin Penault IBM T. J. Watson Research Center Kitchawan Roa, P.O. Box 2 Yorktown Heights, NY 59 USA penault@us.ibm.com Abstract This paper presents

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information