A. Exclusive KL View of the MLE

Size: px
Start display at page:

Download "A. Exclusive KL View of the MLE"

Transcription

1 A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function an ensity evaluation of z 0 R m is tractable uner p 0. The resulting ensity function can be written p Z z = p 0 z 0 ψz 0 z 0 The maximum likelihoo principle requires us to minimize the following metric: D KL pata z p Z z = E pata [log p ata z log p Z z] = E pata [log p ata z log p 0 z 0 ψ 1 ] z z ] = E pata [log p ata z ψ 1 1 z z log p 0 z 0 which coincies with exclusive KL ivergence; to see this, we take X = Z, Y = Z 0, f = ψ 1, p X = p ata, an p target = p 0 = E px [log p X x fx x = D KL py y p target y 1 1 log p target y This means we want to transform the empirical istribution p ata, or p X, to fit the target ensity the usually unstructure, base istribution p 0, as explaine in Section 2. B. Monotonicity of NAF Here we show that using strictly positive weights an strictly monotonically increasing activation functions is sufficient to ensure strict monotonicity of NAF. For brevity, we write monotonic, or monotonicity to represent the strictly monotonically increasing behavior of a function. Also, note that a continuous function is strictly monotonically increasing exactly when its erivative is greater than 0. Proof. Proposition 1 Suppose we have an MLP with L + 1 layers: h 0, h 1, h 2,..., h L, x = h 0 an y = h L, where x an y are scalar, an h l,j enotes the j-th noe of the l-th layer.for 1 l L, we have p l,j = w T l,jh l 1 + b l,j 15 h l,j = A l p l,j 16 ] for some monotonic activation function A l, positive weight vector w l,j, an bias b l,j. Differentiating this yiels h l,j = A lp l,j w l,j,k, 17 h l 1,k p l,j which is greater than 0 since A is monotonic an hence has positive erivative, an w l,j,k is positive by construction. Thus any unit of layer l is monotonic with respect to any unit of layer l 1, for 1 l L. Now, suppose we have J monotonic functions f j of the input x. Then the weighte sum of these functions J u jf j with u j > 0 is also monotonic with respect to x, since x J u j f j = J u j f j x > 0 18 Finally, we use inuction to show that all h l,j incluing y are monotonic with respect to x. 1. The base case l = 1 is given by Equation Suppose the inuctive hypothesis hols, which means h l,j is monotonic with respect to x for all j of layer l. Then by Equation 18, h l+1,k is also monotonic with respect to x for all k. Thus by mathematical inuction, monotonicity of h l,j hols for all l an j. C. Log Determinant of Jacobian As we mention at the en of Section 3.1, to compute the log-eterminant of the Jacobian as part of the objective function, we nee to hanle the numerical stability. We first erive the Jacobian of the DDSF note that DSF is a special case of DDSF, an then summarize the numerically stable operations that were utilize in this work. C.1. Jacobian of DDSF Again efining x = h 0 an y = h L, the Jacobian of each DDSF transformation can be written as a sequence of ot proucts ue to the chain rule: x y = [ h L 1h L] [ h L 2h L 1],, [ h 0h 1] 19 For notational convenience, we efine a few more interme-

2 iate variables. For each layer of DDSF, we have C l+1 D l+1 = a l+1 } {{ } u l+1 = w l+1 σc l+1 h l+1 = σ 1 D l+1 h l + b 1+1 l l The graient can be expane as h l h l+1 = D σ 1 D l+1 [:, ] σc D l+1 C σc l+1 u l+1 h l h l [,:] C l+1 [,:] u l+1 h l [,:,:] 1 [:,:, ] where the bullet in the subscript inicates the imension is broacaste, enotes element-wise multiplication, an 1 enotes summation over the last imension after element-wise prouct, 1 = D l+1 1 D l+1 [:, ] w l+1 σc l+1 1 σc l+1 a l+1 [,:] u l+1 [,:,:] C.2. Numerically Stable Operations 1 [:,:, ] [,:] 20 Since the Jacobian of each DDSF transformation is chain of ot proucts Equation 19, with some neste multiplicative operations Equation 20, we calculate everything in the log-scale where multiplication is replace with aition to avoi unstable graient signal. C.2.1. LOG ACTIVATION To ensure the summing-to-one an positivity constraints of u an w, we let the autoregressive conitioner output pre-activation u an w, an apply softmax to them. We o the same for a by having the conitioner output a an apply softplus to ensure positivity. In Equation 20, we have where log w = logsoftmaxw log u = logsoftmaxu log σc = logsigmoic log 1 σc = logsigmoi C logsoftmaxx = x logsumexpx logsigmoix = softplus x logsumexpx = log expx i x + x i softplusx = log1 + expx + δ where x = max i {x i } an δ is a small value such as C.2.2. LOGARITHMIC DOT PRODUCT In both Equation 19 an 20, we encounter matrix/tensor prouct, which is achive by summing over one imension after oing element-wise multiplication. Let M1 = log M 1 an M 2 = log M 2 be 0 1 an 1 2, respectively. The logarithmic matrix ot prouct can be written as: M 1 M 2 = logsumexp im=1 M1 [:,:, ] + M2 [,:,:] where the subscript of logsumexp inicates the imension inex starting from 0 over which the elements are to be summe up. Note that M1 M 2 = log M 1 M 2. D. Scalability an Parameter Sharing As iscusse in Section 3.3, a multi-layer NAF such as DDSF requires the autoregressive conitioner c to output many pseuo-parameters, on the orer of OL 2, where L is the number of layers of the transformer network τ, an is the average number of hien units per layer. In practice, we reuce the number of outputs an thus the computation an memory requirements of DDSF by instea enowing τ with some learne non-conitional statistical parameters. Specifically, we ecompose w an u the preactivations of τ s weights, see section C.2.1 into pseuo-parameters an statistical parameters. Take u for example: u l+1 = softmax im=1 v l+1 + η l+1 [,:] where v l+1 is a l matrix of statistical parameters, an η is output by c. See figure D for a epiction. The linear transformation before applying sigmoi resembles conitional weight normalization CWN Krueger

3 S * n S S n y Figure 9. Factorize weight of DDSF. The v i, j are learne parameters of the moel; only the pseuo-parameters η an a are output by the conitioner. The activation function f is softmax, so aing η yiels an element-wise rescaling of the inputs to this layer of the transformer by expη. et al., While CWN rescales the weight vectors normalize to have unit L2 norm, here we rescale the weight vector normalize by softmax such that it sums to one an is positive. We call this conitional normalize weight exponentiation. This reuces the number of pseuo-parameters to OL. E. Ientity Flow Initialization In many cases, initializing the transformation to have a minimal effect is believe to help with training, as it can be thought of as a warm start with a simpler istribution. For instance, for variational inference, when we initialize the normalizing flow to be an ientity flow, the approximate posterior is at least as goo as the input istribution usually a fully factorize Gaussian istribution before the transformation. To this en, for DSF an DDSF, we initialize the pseuo-weights a to be close to 1, the pseuo-biases b to be close to 0. This is achieve by initializing the conitioner whose outputs are the pseuo-parameters to have small weights an the appropriate output biases. Specifically, we initialize the output biases of the last layer of our MADE Germain et al., 2015 conitioner to be zero, an a softplus to the outputs of which correspon to a before applying the softplus activation function. We initialize all conitioner s weights by sampling from from Unif 0.001, We note that there might be better ways to initialize the weights to account for the ifferent numbers of active incoming units x Figure 10. Visualization of how sigmoial functions can universally approximate an monotonic function in [0, 1]. The re otte curve is the target monotonic function S, an blue soli curve is the intermeiate superposition of step functions S n with the parameters chosen in the proof. In this case, n is 6 an S n S 1 7. The green ashe curve is the resulting superposition of sigmois S n, that intercepts with each step of S n with the aitionally chosen τ. F. Lemmas: Uniform Convergence of DSF We want to show the convergence result of Equation 11. To this en, we first show that DSF can be use to universally approximate any strictly monotonic function. The is the case where x 1:t 1 are fixe, which means Cx 1:t 1 are simply constants. We emonstrate it using the following two lemmas. Lemma 1. Step functions universally approximate monotonic functions Define: S nx = w j s x b j where sz is efine as a step function that is 1 when z 0 an 0 otherwise. For any continuous, strictly monotonically increasing S : [r 0, r 1 ] [0, 1] where Sr 0 = 0, Sr 1 = 1 an r 0, r 1 R; an given any ɛ > 0, there exists a positive integer n, real constants w j an b j for j = 1,..., n, where n j w j = 1, w j > 0 an b j [r 0, r 1 ] for all j, such that Snx Sx < ɛ x [r 0, r 1 ]. Proof. Lemma 1 For brevity, we write s j x = sx b j. For any ɛ > 0, we choose n = 1 ɛ, an ivie the range 0, 1 into n + 1 evenly space intervals: 0, y 1, y 1, y 2,..., y n, 1. For each y j, there is a corresponing inverse value since S is strictly monotonic, x j = S 1 y j. We want to set

4 Snx j = y j for 1 j n 1 an Snx n = 1. To o so, we set the bias terms b j to be x j. Then we just nee to solve a system of n linear equations n j =1 w j s j x j = t j, where t j = y j for 1 j < n, t 0 = 0 an t n = 1. We can express this system of equations in the matrix form as Sw = t, where: S j,j = s j x j = δ xj b j = δ j j, w j = w j, t j = t j where δ ω η = 1 whenever ω η an δ ω η = 0 otherwise. Then we have w = S 1 t. Note that S is a lower triangular matrix, an its inverse takes the form of a Joran matrix: S 1 i,i = 1 an S 1 i+1,i = 1. Aitionally, t j t j 1 = 1 n+1 for j = 1,..., n 1 an is equal to 2 n+2 for j = n. We then have S nx = t T S T sx, where sx j = s j x; thus S nx Sx = s j xt j t j 1 Sx n 1 1 = s j x + 2s nx n + 1 n + 1 Sx = C y 1:n 1 x n δ x y n n + 1 Sx 1 n + 1 < 1 1/ɛ ɛ 21 where C v z = k δ z v k is the count of elements in a vector that z is no smaller than. Note that the aitional constraint that w lies on an n 1 imensional simplex is always satisfie, because w j = t j t j 1 = t n t 0 = 1 j j See Figure 10 for a visual illustration of the proof. Using this result, we now turn to the case of using sigmoi functions instea of step functions. Lemma 2. Superimpose sigmois universally approximate monotonic functions Define: S n x = x bj w j σ With the same constraints an efinition in Lemma 1, given any ɛ > 0, there exists a positive integer n, real constants w j, τ j an b j for j = 1,..., n, where aitionally τ j are boune an positive, such that S n x Sx < ɛ x r 0, r 1. Proof. Lemma 2 τ j Let ɛ 1 = 1 3 ɛ an ɛ 2 = 2 3 ɛ. We know that for this ɛ 1, there exists an n such that S n S < ɛ 1. We chose the same w j, b j for j = 1,..., n as the ones use in the proof of Lemma 1, an let τ 1,..., τ n all be the same value enote by τ. κ Take κ = min j j b j b j an τ = σ 1 1 ɛ 0 for some ɛ 0 > 0. Take Γ to be a lower triangular matrix with values of 0.5 on the iagonal an 1 below the iagonal. max S nb j Γ j w,...,n = max bj b j w j σ w j Γ jj τ j j = max bj b j σ Γ jj τ j w j < max j w j ɛ 0 = ɛ 0 The inequality is ue to bj b j σ γ = σ b j b j min k k b k b k = 0.5 if j = j 1 ɛ 0 if j > j ɛ 0 if j < j σ 1 1 ɛ 0 Since the prouct Γ w represents the half step points of Sn at x = b j s, the result above entails S n x Snx < 1 ɛ 2 = 2ɛ 1 for all x. To see this, we choose ɛ 0 = 2n+1. Then S n intercepts with all segments of Sn except for the ens. We choose ɛ 2 = 2ɛ 1 since the last step of Sn is of 2 size n+1, an thus the boun also hols true in the vicinity where S n intercepts with the last step of Sn. Hence, S n x Sx S n x S nx + S nx Sx < ɛ 1 + ɛ 2 = ɛ Now we show that the pre-logit DSF Equation 11 can universally approximate monotonic functions. We o this by showing that the well-known universal function approximation properties of neural networks Cybenko, 1989 allow us to prouce parameters which are sufficiently close to those require by Lemma 2. Lemma 3. Let x 1:m [r 0, r 1 ] m where r 0, r 1 R. Given any ɛ > 0 an any multivariate continuously ifferentiable

5 function 11 Sx 1:m t = S t x t, x 1:t 1 for t [1, m] that is strictly monotonic with respect to the first argument when the secon argument is fixe, where the bounary values are S t r 0, x 1:t 1 = 0 an S t r 1, x 1:t 1 = 1 for all x 1:t 1 an t, then there exists a multivariate function S such that Sx 1:m Sx 1:m < ɛ for all x 1:m, of the following form: Sx 1:m t = S t x t, C t x 1:t 1 = w tj x 1:t 1 σ xt b tj x 1:t 1 τ tj x 1:t 1 where t [1, m], an C t = w tj, b tj, τ tj n are functions of x 1:1 t parameterize by neural networks, with τ tj boune an positive, b tj [r 0, r 1 ], n w tj = 1, an w tj > 0 Proof. Lemma 3 First we eal with the univariate case for any t an rop the subscript t of the functions. We write S n an C k to enote the sequences of univariate functions. We want to show that for any ɛ > 0, there exist 1 a sequence of functions S n x t, C k x 1:t 1 in the given form, an 2 large enough N an K such that when n N an k K, S n x t, C k x 1:t 1 Sx t, x 1:t 1 ɛ for all x 1:t [r 0, r 1 ] t. The iea is first to show that we can fin a sequence of parameters, C n x 1:t 1, that yiel a goo approximation of the target function, Sx t, x 1:t 1. We then show that these parameters can be arbitrarily well approximate by the outputs of a neural network, C k x 1:t 1, which in turn yiel a goo approximation of S. From Lemma 2, we know that such a sequence C n x 1:t 1 exists, an furthermore that we can, for any ɛ, an inepenently of S an x 1:t 1 choose an N large enough so that: S n x t, C n x 1:t 1 Sx t, x 1:t 1 < ɛ 2 22 To see that we can further approximate a given C n x 1:t 1 well by C k x 1:t 1 12, we apply the classic result of Cybenko 1989, which states that a multilayer perceptron can approximate any continuous function on a compact subset of R m. Note that specification of C n x 1:t 1 = 11 S : [r 0, r 1] m [0, 1] m is a multivariate-multivariable function, where S t is its t-th component, which is a univariatemultivariable function, written as S t, : [r 0, r 1] [r 0, r 1] t 1 [0, 1]. 12 Note that C n is a chosen function we are not assuming its parameterization; i.e. not necessarily a neural network that we seek to approximate using C k, which is the output of a neural network. w tj, b tj, τ tj n in Lemma 2 epens on the quantiles of S t x t, as a function of x 1:t 1 ; since the quantiles are continuous functions of x 1:t 1, so is C n x 1:t 1, an the theorem applies. Now, S has boune erivative wrt C, an is thus uniformly continuous, so long as τ is greater than some positive constant, which is always the case for any fixe C n, an thus can be guarantee for C k as well for large enough k. Uniform continuity allows us into translate the convergence of C k C n to convergence of S n x t, C k x 1:t 1 S n x t, C n x 1:t 1, since for any ɛ, there exists a δ > 0 such that C k x 1:t 1 C n x 1:t 1 < δ = S n x t, C k x 1:t 1 S n x t, C n x 1:t 1 < ɛ 2 23 Combining this with Equation 22, we have for all x t an x 1:t 1, an for all n N an k K Sn x t, C k x 1:t 1 Sx t, x x1:t 1 S n x t, C k x 1:t 1 S n x t, C n x 1:t 1 + Sn x t, C n x 1:t 1 Sx t, x x1:t 1 < ɛ 2 + ɛ 2 = ɛ Having prove the univariate case, we a back the subscript t to enote the inex of the function. From the above, we know that given any ɛ > 0 for each t, there exist Nt an a sequence of univariate functions S n,t such that for all n Nt, S n,t S t < ɛ for all x 1:t. Choosing N = max t [1,m] Nt, we have that there exists a sequence of multivariate functions S n in the given form such that for all n N, S n S < ɛ for all x 1:m. G. Proof of Universal Approximation of DSF Lemma 4. Let X X be a ranom variable, an X R m an Y R m. Given any function J : X Y an a sequences of functions J n that converges pointwise to J, the sequence of ranom variables inuce by the transformations Y n = Jn X converges in istribution to Y =.. JX. Proof. Lemma 4 Let h be any boune, continuous function on R m, so that h J n converges pointwise to h J by continuity of h. Since h is boune, then by the ominate convergence theorem, E[hY n ] = E[hJ n X] converges to E[hJX] = E[hY ]. As this result hols for any boune continuous function h, by the Portmanteau s lemma, we have Y n Y.

6 Proof. Proposition 2 Given an arbitrary orering, let F be the CDFs of Y, efine as F t y t, x 1:t 1 = PrY t y t x 1:t 1. Accoring to Theorem 1 of Hyvärinen & Pajunen 1999, F Y is uniformly istribute in the cube [0, 1] m. F has an upper triangular Jacobian matrix, whose iagonal entries are conitional ensities which are positive by assumption. Let G be a multivariate an multivariable function where G t is the inverse of the CDF of Y t : G t F t y t, x 1:t 1, x 1:t 1 = y t. Accoring to Lemma 3, there exists a sequence of functions in the given form S n n 1 that converge uniformly to σ G. Since uniform convergence implies pointwise convergence, G n = σ 1 S n converges pointwise to G, by continuity of σ 1. Since G n converges pointwise to G an GX = Y, by Lemma 4, we have Y n Y Proof. Proposition 3 Given an arbitrary orering, let H be the CDFs of X: y 1. = H1 x 1, = F 1 x 1, = PrX 1 x 1. y t = Ht x t, x 1:t 1 = F t xt, {H t t x t t, x 1:t t 1} t 1 t =1 = PrX t x t y 1:t 1 for 2 t m Due to Hyvärinen & Pajunen 1999, y 1,...y m are inepenently an uniformly istribute in 0, 1 m. Accoring to Lemma 3, there exists a sequence of functions in the given form S n n 1 that converge uniformly to H. Since H n = S n converges pointwise to H an HX = Y, by Lemma 4, we have Y n Y Proof. Theorem 1 Given an arbitrary orering, let H be the CDFs of X efine the same way in the proof for Proposition 3, an let G be the inverse of the CDFs of Y efine the same way in the proof for Proposition 2. Due to Hyvärinen & Pajunen 1999, HX is uniformly istribute in 0, 1 m, so GHX = Y. Since H t x t, x 1:t 1 is monotonic wrt x t given x 1:t 1, an G t H t, H 1:t 1 is monotonic wrt H t given H 1:t 1, G t is also monotonic wrt x t given x 1:t 1, as G t H t, H 1:t 1 x t = G th t, H 1:t 1 H t H t x t, x 1:t 1 x t is always positive. Accoring to Lemma 3, there exists a sequence of functions in the given form S n n 1 that converge uniformly to σ G H. Since uniform convergence implies pointwise convergence, K n = σ 1 S n converges pointwise to G H, by continuity of σ 1. Since K n converges pointwise to G H an GHX = Y, by Lemma 4, we have Y Y n H. Experimental Details For the experiment of amortize variational inference, we implement the Variational Autoencoer Kingma & Welling, Specifically, we follow the architecture use in Kingma et al. 2016: the encoer has three layers with [16, 32, 32] feature maps. We use resnet blocks He et al., 2016 with 3 3 convolution filters an a strie of 2 to ownsize the feature maps. The convolution layers are followe by a fully connecte layer of size 450 as a context for the flow layers that transform the noise sample from a stanar normal istribution of imension 32. The ecoer is symmetrical with the encoer, with the strie convolution replace by a combination of bilinear upsampling an regular resnet convolution to ouble the feature map size. We use the ELUs activation function Clevert et al., 2016 an weight normalization Salimans & Kingma, 2016 in the encoer an ecoer. In terms of optimization, Aam Kingma & Ba, 2014 is use with learning rate fine tune for each inference setting, an Polyak averaging Polyak & Juitsky, 1992 was use for evaluation with α = which stans for the proportion of the past at each time step. We also consier a variant of Aam known as Amsgra Rei et al., 2018 as a hyperparameter. For vanilla VAE, we simply apply a resnet ot prouct with the context vector to output the mean an the pre-softplus stanar eviation, an transform each imension of the noise vector inepenently. We call this linear flow. For IAF-affine an IAF-DSF, we employ MADE Germain et al., 2015 as the conitioner cx 1:t 1, an we apply ot prouct on the context vector to output a scale vector an a bias vector to conitionally rescale an shift the preactivation of each layer of the MADE. Each MADE has one hien layer with 1920 hien units. The IAF experiments all start with a linear flow layer followe by IAF-affine or IAF-DSF transformations. For DSF, we choose = 16. For the experiment of ensity estimation with MAF, we followe the implementation of Papamakarios et al Specifically for each ataset, we experimente with both 5 an 10 flow layers, followe by one linear flow layer. The following table specifies the number of hien layers an the number of hien units per hien layer for MADE: Table 3. Architecture specification of MADE in the MAF experiment. Number of hien layers an number of hien units. POWER GAS HEPMASS MINIBOONE BSDS

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. Uniqueness for solutions of ifferential equations. We consier the system of ifferential equations given by x = v( x), () t with a given initial conition

More information

Problem set 2: Solutions Math 207B, Winter 2016

Problem set 2: Solutions Math 207B, Winter 2016 Problem set : Solutions Math 07B, Winter 016 1. A particle of mass m with position x(t) at time t has potential energy V ( x) an kinetic energy T = 1 m x t. The action of the particle over times t t 1

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

Jointly continuous distributions and the multivariate Normal

Jointly continuous distributions and the multivariate Normal Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Stable and compact finite difference schemes

Stable and compact finite difference schemes Center for Turbulence Research Annual Research Briefs 2006 2 Stable an compact finite ifference schemes By K. Mattsson, M. Svär AND M. Shoeybi. Motivation an objectives Compact secon erivatives have long

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Bayesian Estimation of the Entropy of the Multivariate Gaussian

Bayesian Estimation of the Entropy of the Multivariate Gaussian Bayesian Estimation of the Entropy of the Multivariate Gaussian Santosh Srivastava Fre Hutchinson Cancer Research Center Seattle, WA 989, USA Email: ssrivast@fhcrc.org Maya R. Gupta Department of Electrical

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION

More information

Hyperbolic Systems of Equations Posed on Erroneous Curved Domains

Hyperbolic Systems of Equations Posed on Erroneous Curved Domains Hyperbolic Systems of Equations Pose on Erroneous Curve Domains Jan Norström a, Samira Nikkar b a Department of Mathematics, Computational Mathematics, Linköping University, SE-58 83 Linköping, Sween (

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Euler Equations: derivation, basic invariants and formulae

Euler Equations: derivation, basic invariants and formulae Euler Equations: erivation, basic invariants an formulae Mat 529, Lesson 1. 1 Derivation The incompressible Euler equations are couple with t u + u u + p = 0, (1) u = 0. (2) The unknown variable is the

More information

GENERATIVE NETWORKS AS INVERSE PROBLEMS

GENERATIVE NETWORKS AS INVERSE PROBLEMS GENERATIVE NETWORKS AS INVERSE PROBLEMS WITH SCATTERING TRANSFORMS Tomás Angles & Stéphane Mallat École normale supérieure, Collège e France, PSL Research University 75005 Paris, France tomas.angles@ens.fr

More information

Self-normalized Martingale Tail Inequality

Self-normalized Martingale Tail Inequality Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value

More information

LOGISTIC regression model is often used as the way

LOGISTIC regression model is often used as the way Vol:6, No:9, 22 Secon Orer Amissibilities in Multi-parameter Logistic Regression Moel Chie Obayashi, Hiekazu Tanaka, Yoshiji Takagi International Science Inex, Mathematical an Computational Sciences Vol:6,

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

1 Parametric Bessel Equation and Bessel-Fourier Series

1 Parametric Bessel Equation and Bessel-Fourier Series 1 Parametric Bessel Equation an Bessel-Fourier Series Recall the parametric Bessel equation of orer n: x 2 y + xy + (a 2 x 2 n 2 )y = (1.1) The general solution is given by y = J n (ax) +Y n (ax). If we

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1 Physics 5153 Classical Mechanics The Virial Theorem an The Poisson Bracket 1 Introuction In this lecture we will consier two applications of the Hamiltonian. The first, the Virial Theorem, applies to systems

More information

REAL ANALYSIS I HOMEWORK 5

REAL ANALYSIS I HOMEWORK 5 REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove

More information

Physics 505 Electricity and Magnetism Fall 2003 Prof. G. Raithel. Problem Set 3. 2 (x x ) 2 + (y y ) 2 + (z + z ) 2

Physics 505 Electricity and Magnetism Fall 2003 Prof. G. Raithel. Problem Set 3. 2 (x x ) 2 + (y y ) 2 + (z + z ) 2 Physics 505 Electricity an Magnetism Fall 003 Prof. G. Raithel Problem Set 3 Problem.7 5 Points a): Green s function: Using cartesian coorinates x = (x, y, z), it is G(x, x ) = 1 (x x ) + (y y ) + (z z

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Monte Carlo Methods with Reduced Error

Monte Carlo Methods with Reduced Error Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for

More information

SYMPLECTIC GEOMETRY: LECTURE 3

SYMPLECTIC GEOMETRY: LECTURE 3 SYMPLECTIC GEOMETRY: LECTURE 3 LIAT KESSLER 1. Local forms Vector fiels an the Lie erivative. A vector fiel on a manifol M is a smooth assignment of a vector tangent to M at each point. We think of M as

More information

Estimating Causal Direction and Confounding Of Two Discrete Variables

Estimating Causal Direction and Confounding Of Two Discrete Variables Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

II. First variation of functionals

II. First variation of functionals II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

P(x) = 1 + x n. (20.11) n n φ n(x) = exp(x) = lim φ (x) (20.8) Our first task for the chain rule is to find the derivative of the exponential

P(x) = 1 + x n. (20.11) n n φ n(x) = exp(x) = lim φ (x) (20.8) Our first task for the chain rule is to find the derivative of the exponential 20. Derivatives of compositions: the chain rule At the en of the last lecture we iscovere a nee for the erivative of a composition. In this lecture we show how to calculate it. Accoringly, let P have omain

More information

Chapter 2. Exponential and Log functions. Contents

Chapter 2. Exponential and Log functions. Contents Chapter. Exponential an Log functions This material is in Chapter 6 of Anton Calculus. The basic iea here is mainly to a to the list of functions we know about (for calculus) an the ones we will stu all

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

More information

Linear Algebra- Review And Beyond. Lecture 3

Linear Algebra- Review And Beyond. Lecture 3 Linear Algebra- Review An Beyon Lecture 3 This lecture gives a wie range of materials relate to matrix. Matrix is the core of linear algebra, an it s useful in many other fiels. 1 Matrix Matrix is the

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

On conditional moments of high-dimensional random vectors given lower-dimensional projections

On conditional moments of high-dimensional random vectors given lower-dimensional projections Submitte to the Bernoulli arxiv:1405.2183v2 [math.st] 6 Sep 2016 On conitional moments of high-imensional ranom vectors given lower-imensional projections LUKAS STEINBERGER an HANNES LEEB Department of

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

INDEPENDENT COMPONENT ANALYSIS VIA

INDEPENDENT COMPONENT ANALYSIS VIA INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 4 2 0 2 4 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 2 1 0 1 2 3 4 Marginal

More information

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations Characterizing Real-Value Multivariate Complex Polynomials an Their Symmetric Tensor Representations Bo JIANG Zhening LI Shuzhong ZHANG December 31, 2014 Abstract In this paper we stuy multivariate polynomial

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

BIRS Ron Peled (Tel Aviv University) Portions joint with Ohad N. Feldheim (Tel Aviv University)

BIRS Ron Peled (Tel Aviv University) Portions joint with Ohad N. Feldheim (Tel Aviv University) 2.6.2011 BIRS Ron Pele (Tel Aviv University) Portions joint with Oha N. Felheim (Tel Aviv University) Surface: f : Λ Hamiltonian: H(f) DGFF: V f(x) f(y) x~y V(x) x 2. R or for f a Λ a box Ranom surface:

More information

Short Intro to Coordinate Transformation

Short Intro to Coordinate Transformation Short Intro to Coorinate Transformation 1 A Vector A vector can basically be seen as an arrow in space pointing in a specific irection with a specific length. The following problem arises: How o we represent

More information

Analyzing the Structure of Multidimensional Compressed Sensing Problems through Coherence

Analyzing the Structure of Multidimensional Compressed Sensing Problems through Coherence Analyzing the Structure of Multiimensional Compresse Sensing Problems through Coherence A. Jones University of Cambrige B. Acock Purue Univesity A. Hansen University of Cambrige Abstract In previous work

More information

Global Solutions to the Coupled Chemotaxis-Fluid Equations

Global Solutions to the Coupled Chemotaxis-Fluid Equations Global Solutions to the Couple Chemotaxis-Flui Equations Renjun Duan Johann Raon Institute for Computational an Applie Mathematics Austrian Acaemy of Sciences Altenbergerstrasse 69, A-44 Linz, Austria

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information

Basic Thermoelasticity

Basic Thermoelasticity Basic hermoelasticity Biswajit Banerjee November 15, 2006 Contents 1 Governing Equations 1 1.1 Balance Laws.............................................. 2 1.2 he Clausius-Duhem Inequality....................................

More information

Chapter 2 Lagrangian Modeling

Chapter 2 Lagrangian Modeling Chapter 2 Lagrangian Moeling The basic laws of physics are use to moel every system whether it is electrical, mechanical, hyraulic, or any other energy omain. In mechanics, Newton s laws of motion provie

More information