Penalized Exponential Series Estimation of Copula Densities

Size: px
Start display at page:

Download "Penalized Exponential Series Estimation of Copula Densities"

Transcription

1 Penalized Exponential Series Estimation of Copula Densities Ximing Wu Abstract The exponential series density estimator is advantageous to the copula density estimation as it is strictly positive, explicitly defined on a bounded support, and largely mitigates the boundary bias problem. However, the selection of basis functions is challenging and can cause numerical difficulties, especially for high dimensional density estimations. To avoid the issues associated with basis function selection, we adopt the strategy of regularization by employing a relatively large basis and penalizing the roughness of the resulting model, which leads to a penalized maximum likelihood estimator. To further reduce the computational cost, we propose an approximate likelihood cross validation method for the selection of smoothing parameter. Our extensive Monte Carlo simulations demonstrate the effectiveness of the proposed estimator for copula density estimations. Department of Agricultural Economics, Texas A&M University. xwu@tamu.edu. I gratefully acknowledge the Supercomputing Facility of the Texas A&M University where all computations in the study were performed. 1

2 1 Introduction This paper proposes a penalized maximum likelihood estimation for copula densities via the exponential series estimator for multivariate densities introduced in Wu (2010). Consider a d-dimensional random variable x with joint distribution function F. In his seminal paper, Skalar (1959) shows that, via a change of variable argument, the joint distribution can be written as F (x 1,..., x d ) = C (F 1 (x 1 ),..., F d (x d )), (1) where F j (x j ), j = 1,..., d, is the marginal distribution for the jth element of x. C( ), the so called copula function, summarizes the dependence structure among x completely. When the margins are continuous, the copula function is unique. Thus a multivariate distribution can be completely described by its copula and its univariate marginal distributions. Suppose F is differentiable with density function f. Taking derivatives of both sides of (1) yields f(x 1,, x d ) = f 1 (x 1 ) f d (x d )c(f 1 (x 1 ),..., F d (x d )), (2) where f j ( ) is the marginal density of x j, j = 1,..., d, and c( ) is the copula density function, which itself is a density function defined on the unit cube [0, 1] d. For a detailed treatment of the mathematical properties of copulas, see Nelsen (2010). Copula is a useful device for two reasons. First, it provides a way of studying scale-free dependence structure. By writing a joint density function as the product of marginal densities and the copula density, one can separate the influence of marginal densities from that of the dependence structure. The dependence structure captured by the copula is scale free and invariant to monotone transformations. In fact, many well known measures of dependence, including Kendall s τ and Spearman s ρ, can be calculated from the copula function alone. Second, copula is a starting point for constructing families of multivariate distributions. It allows us to divide multivariate density estimation into two parts: the univariate density estimation of the margins and the copula estimation. Like the usual density functions, the copula densities can be estimated by parametric 2

3 or non-parametric methods. The commonly used parametric copulas usually contain one or two parameters and thus may not be adequate to describe complicated relations among random variables. In addition simple copulas sometimes place restrictions on the dependence structure among variables. For example, the popular Gaussian copula assumes zero tail dependence among random variables and is therefore not suitable for the study of financial assets that tend to move together under extreme market conditions. A second limitation of the parametric approach is that many parametric copulas are only defined for bivariate variables, and extensions to higher dimensional cases are not available. Alternatively one can estimate copula densities using nonparametric methods. The kernel density estimator (KDE) is a popular smoother for density estimation. It is known that the KDE suffers from the boundary bias, which is particularly severe when the derivatives of a density do not vanish at the boundaries. Unfortunately this poses a considerable difficulty for copula density estimation, because copula density functions often have nonzero derivatives at boundaries and corners. For example, the return distributions of US and UK stock markets tend to move together, especially under extreme market conditions, resulting in spikes in their copula density function at the two ends of the diagonal of the unit square. 1 Like the kernel estimation, the series estimation is a commonly used nonparametric approach. For density estimations, orthogonal series estimation, or generalized Fourier estimation, is often employed. The series estimator has the advantage of automatic adaptiveness in the sense that the degree of the series, when selected in an optimal manner, can adapt to the unknown degree of smoothness of the underlying distribution to obtain the optimal convergence rate. In contrast for the kernel estimations, one may need to use a higher (thansecond) order kernel density estimation to obtain the optimal convergence rate. 2 However, the series density estimators share with higher order kernel estimators the same problem that they may produce negative density estimates. Wu (2010) proposes an exponential series estimator (ESE) for multivariate density esti- 1 Charpentier et al. (2007) discuss several remedies to mitigate boundary bias of the KDE along the line of boundary kernel estimators. 2 For instance higher-order kernels are required to obtain faster-than n 2/5 convergence rate for univariate kernel density estimations. 3

4 mations. This method is particularly advantageous to the copula density estimation as it is strictly positive, explicitly defined on a bounded support, and largely mitigates the boundary bias problem. Numerical evidence in Wu (2010) and Chui and Wu (2009) demonstrates the effectiveness of this method for copula densities. However, the selection of basis functions for the multivariate ESEs is challenging and can cause severe numerical difficulties. In this study, we adopt a regularization approach by employing a relatively large basis functions and penalizing the roughness of the resulting model to balance between the goodness-of-fit and the simplicity of the model. This approach leads to a penalized maximum likelihood estimator for copula densities. To further reduce the computational cost, we suggest an approximate likelihood cross validation method for smoothing parameter selection. Our Monte Carlo simulations show that the proposed estimator outperforms the conventional kernel density estimator, sometimes by substantial margins. The rest of the paper is organized as follows. Section 2 provides brief backgrounds on the exponential series estimation, discussing its information theoretic origin, large sample properties, extensions to multivariate variables, and smoothing parameter selection. Section 3 proposes the penalized exponential series estimator and presents an approximate likelihood cross validation method for smoothing parameter selection. Section 4 reports our Monte Carlo simulations. Some concluding remarks are offered in the last section. 2 Exponential Series Estimator of Copula Density Functions Wu (2010) proposes a multivariate exponential series estimator, and shows that it is particularly useful for copula density estimations. In this section, we briefly discuss the exponential series density estimator. We first introduce the idea of maximum entropy density, upon which the exponential series estimator is based. We then present the exponential series estimator and discuss its smoothing parameter selection and some practical difficulties in multivariate cases. 4

5 2.1 Maximum Entropy Density One strategy to obtain strictly positive density estimates using the series method is to model the log density via a series estimator. This idea is not new; earlier studies on the approximation of log- densities using polynomials include Neyman (1937) and Good (1963). Transforming the polynomial estimate of log-density back to its original scale results in a density estimator in the exponential family. Thus approximating log densities by the series estimators amounts to estimating densities by sequences of canonical exponential families. The maximum likelihood estimation (MLE) provides efficient estimates of these exponential families. Crain (1974) establishes the existence and consistency of the MLE in this case. This method of density estimation arises naturally according to the principle of maximum entropy. The information entropy, the central concept of information theory, of a univariate continuous random variable with density f is defined as W (f) = f(x) log f(x)dx. Suppose regarding a random variable x with an unknown density function f 0, one knows only some of its moments. There may exist an infinite number of distributions satisfying these given moment conditions. Jaynes (1957) proposes a method of constructing a unique density estimate based on the moment conditions as the following: max W (f) f subject to the integration to unity and side moment conditions: f(x)dx = 1, φ k (x)f(x)dx = µ k, k = 1,..., K, where φ k s are real-valued, linearly independent functions defined on the support of x. 5

6 The solution, obtained by an application of calculus of variations, takes the form f(x; c) = exp( K c k φ k (x) c 0 ) = exp(c φ(x) c 0 ), (3) k=1 where φ = (φ 1,..., φ K ) and c = (c 1,..., c K ) are the Lagrangian multipliers for the moment conditions. The normalization factor c 0 = log { exp(c φ(x))dx } < ensures the integration to unity condition. Among all distributions satisfying the given moment conditions, the maximum entropy density is the closest to the uniform distribution defined on the support of x. Many distributions can be characterized as maximum entropy densities. For example, the normal distribution is obtained by setting φ 1 (x) = x and φ 2 (x) = x 2 for x R, and the Beta distribution by φ 1 (x) = ln(x) and φ 2 (x) = ln(1 x) for x (0, 1). In practice, the population moments are often unknown and therefore replaced by their sample counterparts. Given an iid random sample X 1,..., X n, the maximum entropy density is then estimated by the MLE based on the sample moments ˆφ = ( ˆφ 1,..., ˆφ K ), where ˆφ k = 1/n n i=1 φ k(x i ), k = 1,..., K. The log-likelihood function is given by L = 1 n n i=1 = c ˆφ log { [ { c φ(x i ) log } exp(c φ(x))dx. }] exp(c φ(x))dx Denote the MLE solution by f( ; ĉ). Thanks to the canonical exponential form of the maximum entropy density, ˆφ are the sufficient statistics of f( ; ĉ). Therefore, we call ˆφ the characterizing moments of the maximum entropy density. The coefficients for the maximum entropy densities generally cannot be obtained analytically and thus are to be solved numerically. Zellner and Highfield (1988) and Wu (2003) discuss the numerical calculations of the maximum entropy density. Define g(x) = exp(c φ(x)), µ g (h) = h(x) exp(g(x))dx/ exp(g(x))dx. (4) 6

7 The score function and the Hessian matrix of the MLE are then given by S = ˆφ µĝ(φ), H = {µĝ(φφ ) µĝ(φ)µĝ(φ )}, where ĝ(x) = exp(ĉ φ(x)). One can then use the Newton s method to solve for ĉ iteratively. The uniqueness of the solution is ensured by the positive-definiteness of the Hessian matrix. Therefore for a maximum entropy density, there exists a unique correspondence between its characterizing moments ˆφ and its coefficients ĉ. 2.2 Exponential Series Density Estimator The maximum entropy density is a useful approach for constructing a density estimate given a set of moment conditions, which enjoys an appealing information-theoretic interpretation. On the other hand, like the usual parametric models, this density estimator generally is not consistent unless the underlying distribution happens to belong to the canonical exponential family with characterizing moments given by φ. To obtain consistent density estimates, in principle one can let the number of characterizing moments increase with the sample size at a proper rate, which effectively transforms the maximum entropy method into a nonparametric estimator. To stress the nonparametric nature of the said estimator, we call a maximum entropy density whose number of characterizing moments increases with sample size an Exponential Series Estimator (ESE). Moving into the realm of nonparametric estimations inevitably brings new problems. The paramount issue is the determination of degree of smoothing, which will be discussed in length below. Another issue that warrants caution is identification. To ensure a oneto-one correspondence between f(x) and exp(g(x))/ exp(g(x))dx, we need impose certain restrictions. Two commonly used identification conditions are g(x 0 ) = 0 for certain x 0 and g(x)dx = 0. When we use orthogonal series as the basis functions, the second condition is satisfied automatically. Furthermore, since a constant in g(x) is not identified, it is excluded. 7

8 Thus throughout the text, we maintain that the usual zero order term φ 0 (x) = 1 is excluded from the basis functions for g. to f 0. Let x be a random variable defined on [0, 1] with density f 0 and ˆf is an ESE approximation Without loss of generality, let φ = (φ 1,..., φ K ) be a series of orthonormal basis functions with respect to the Lebesgue measure on [0, 1]. One can measure the discrepancy between f 0 and ˆf by the Kullback-Leibler Information Criterion (KLIC, also known as the relative entropy or cross entropy), which is defined as D(f 0 ˆf) = f 0 (x) ln(f 0 (x)/ ˆf(x))dx. 3 In an important development, Barron and Sheu (1991) establish that the sequence of ˆfs converge to f 0 in terms of the KLIC. In particular, suppose { / x r (log f 0 (x))} 2 dx <, then D(f 0 ˆf) = O p (1/K 2r + K/n), with K and K 3 /n 0 for the power series and K 2 /n 0 for the trigonometric series and the splines. Wu (2010) extends the ESE to multivariate densities. He uses the tensor product of the univariate orthogonal basis functions to construct multivariate orthogonal basis functions. Let x be a d-dimensional random variable defined on [0, 1] d with density f 0. A multivariate ESE for f 0 is then constructed as f(x) = exp ( K1 Under the assumption that { ( K1 ) exp k 1 =1 K d k d =1 φ k 1 (x 1 ) φ kd (x d ) k 1 =1 K d k d =1 φ k 1 (x 1 ) φ kd (x d ) the ESE estimates converge to f 0 at rate O p ( d j=1 K 2r j j r ) dx 1 dx d. r 1 r d ln f 0(x) } 2 dx <, where r = d j=1 r j, he shows that + 1/n d j=1 K j) in terms of the KLIC. Convergence rates in other metrics are also established, and extensive Monte Carlo simulations demonstrate the effectiveness of the ESE for multivariate density estimations. Like the orthogonal series density estimator, the ESE enjoys the automatic adaptiveness to the unknown smoothness of the underlying distribution. On the other hand, it is strictly positive, and therefore avoids the negative density estimates that might occur with the orthogonal series estimators and the higher order kernel estimators. In addition, Wu (2010) 3 The KLIC is a pseudo-metric in the sense that D(f g) = 0 if and only if f = g almost everywhere, whereas it is asymmetric and does not satisfy the triangle inequality. 8

9 suggests that the ESE is an appealing estimator for the copula density because it is explicitly defined on a bounded support and therefore less sensitive to the boundary bias problem. Chui and Wu (2009) provide further Monte Carlo evidence on this. 2.3 Selection of basis functions In this subsection we discuss the selection of the degree of basis functions for the ESE, with a focus on the multivariate case. It is well known that the choice of smoothing parameter is often the most crucial ingredient of nonparametric estimations. The kernel density estimates can vary substantially with the bandwidth. Similarly, the numerical performance of orthogonal series density estimations hinges on the degree of basis functions; for example, a high order power series may oscillate wildly and produce negative density estimates. The ESE, which can be viewed as a series estimator raised to the exponent, is no exception. When a higher-than desirable number of characterizing moments is used in the estimation, the density estimates may exhibit spurious bumps and spikes. In addition, a large number of characterizing moments increases not only the computational cost, but also the probability that the Hessian matrix used in Newton updating approaches (near) singularity. Therefore, judicious choice of the degree of basis functions is called for. 4 The natural connection between the maximum entropy density and the MLE facilitates adopting some information criterion for model specification. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are two commonly used information criteria that strive for a balance between the goodness-of-fit and simplicity of statistical modeling. In the paradigm of nonparametric estimation where an estimator approximates the unknown underlying process, the AIC is considered optimal in the minimax sense. 5 On 4 The selection of basis functions is particularly important for the ESEs, compared with the generalized Fourier series density estimation. For the latter, the coefficient for each basis function φ k is given by φk (x)f 0 (x)dx, which can be conveniently estimated by its sample counterpart. Therefore although the selection of basis functions affects its performance, no numerical difficulties are involved for the generalized Fourier series estimations. In contrast, the coefficients for the ESEs are obtained through an inverse problem which involves all basis functions through Newton s updating; an overlarge basis may render the Hessian matrix near singular and consequently cause numerical difficulties. 5 The BIC is consistent if the set of candidates contains the true model. However, in nonparametric estimations generally the true model is assumed unknown and the goal is to arrive at an increasingly better 9

10 the other hand, from a penalized MLE point of view, the difference between the AIC and the BIC resides in their penalties for roughness or number of parameters. Let L be the log likelihood and K the number of parameters in a model, which reflects the complexity of the model and is to be penalized. Both criteria can be written in the form of L λk, where the second term is the roughness penalty and λ determines the strength of roughness penalty. For the AIC and the BIC, λ takes the value of 1 and 1/2 ln K respectively. The cross validation (CV) provides an alternative method to select smoothing parameters. Let L i denote the log likelihood for the ith observation evaluated at a model estimated with the entire sample but the ith observation. The cross validated log likelihood is calculated as L n i=1 L i. The likelihood cross validation method minimizes the Kullback-Leibler loss and therefore is asymptotically equivalent to the AIC approach. (See Hall (1987) for an in-depth analysis of the likelihood cross validation method.) Using the information criteria or the CV method to select the smoothing parameter for the univariate ESE is relatively straightforward. Recall that the selection of smooth parameter is equivalent to the selection of basis functions or characterizing moments for the ESE. Given a reasonably large candidate set of basis functions, one can evaluate all subsets of the candidate set and select the optimal set of basis functions according to a given selection criterion. However, the process can be time consuming: if the candidate set contains K basis functions, then the number of subsets is 2 K. This process can be greatly simplified when the basis functions have certain natural ordering or hierarchical structure. For example, the polynomial series and the trigonometric series have an intuitive frequentist interpretation such that the low/high order basis functions capture low/high frequency features of the underlying process. When this type of series are used, it is a common, and sometimes preferred, practice to use a hierarchical selection approach in the sense that if the kth basis function is selection, all lower order basis are included automatically. Clearly, the hierarchical selection method is a truncation approach. The number of models that need to be estimated is K, considerably smaller than 2 K as is required in the complete subset selection. In principle either the subset selection or truncation method can be used to select the approximation to the underlying model rather than identifying the true model. 10

11 smoothing parameter for estimating multivariate densities using the ESE. However, the practical difficulty is that the number of required evaluations increases exponentially with the dimension of x. For density estimation of a d-dimensional x, we consider the tensor products of univariate basis functions given by φ k (x) = d j=1 φk j j (x j), where the multi-index k = (k 1,..., k d ). Denote the size of a multi-index by k = d j=1 k j. Suppose the candidate set M K consists of basis functions whose sizes are no great than K, i.e., M K = {φ k : 1 k K}. With a slight abuse of notation, let M K denote the number of elements in M K. One can show that M K = ( ) K+d d 1. Therefore if the subset selection method is used, it would require estimating 2 M K ESE densities, which can be prohibitively expensive. For instance, if d = 2 and K = 4, we need estimate 2 14 ESE densities; the number explodes to 2 34 if d = 3. Thus, the subset selection approach is practically infeasible except for the simplest cases. Now let us consider the truncation method, which is more economical in terms of basis functions. It seems we can proceed as in the univariate case, estimating the ESE densities with basis functions M k, k = 1,..., K, and then selecting the optimal set according to some criterion. There is, however, a key difference between the univariate case and the multivariate case. For the former, as we raise k from 1 to K, each time we increase the number of basis functions by one. On the other hand, for the general d-dimensional case, the number of basis functions added to the candidate set increases with k. To be precise, let m K = {φ k : k = K}, then M K = (m 1,..., m K ). The number of additional basis functions incorporated along the stepwise process at each stage is m k = M k M k 1 = ( ) k+d 1 d 1 for d 2 and k 1. For instance, when d = 2, we have m k = k + 1; when d = 3, the corresponding step sizes increase to 3, 6, 10, and 15 for k = 1,..., 4. Therefore in the multivariate case, the number of basis functions added along the process of truncation method rises rapidly with the dimension d, leading to increasingly abrupt expansions in model complexity. Consequently, the suitability of the simple truncation method to high dimensional case is questionable. 11

12 Lastly in addition to the practical difficulties associated with model specification in the high dimension case, there exists another potential problem. Recall that the maximum entropy density can produce spurious spikes when it is based on a large number of moments. Not surprisingly, this problem can be aggravated when using the ESE to estimate multivariate densities, whose number of characterizing moments increase rapidly with dimension and sample size. In addition, the larger is the number of basis functions, the higher is the probability that the Hessian matrix used in the Newton updating becomes (near) singular, introducing further complications. To mitigate the aforementioned problems, below we propose an alternative penalized MLE estimation approach for copula densities using the ESE. 3 Penalized exponential series estimation As is discussed above, the ESE for multivariate densities can be challenging due to difficulties associated with selection of basis functions and numerical issues. In this section, we propose to use the method of penalized MLE to determine the degree of smoothing: instead of painstakingly selecting a (small) set of basis functions to model a multivariate density, we use a relatively large set of basis functions and shrink coefficients of the resulting models toward zero to penalize its complexity. 3.1 The model Good and Gaskins (1971) introduce the idea of roughness penalty density estimation. Their idea is to use as an estimate that density which maximizes a penalized version of the likelihood. The penalized likelihood is defined as Q = 1 n n ln f(x i ) λj(f), i=1 12

13 where J(f) is a roughness penalty for density f and λ is the smoothing/tuning parameter. The log likelihood dictates the estimate to adapt to the data, the roughness counteracts by demanding less variation, and the smoothing parameter controls the tradeoff between the two conflicting goals. Various roughness penalties have been proposed in the literature. For instance, Good and Gaskins (1971) use J(f) = (f ) 2 (x)/f(x)dx, Silverman (1982) sets J(f) = (d/dx) 3 ln f(x)dx, and Gu and Qiu (1993) propose general quadratic roughness penalties for the smoothing spline density estimation. Without loss of generality and for simplicity, we consider a bivariate random variable (x, y) with a strictly positive and bounded density, f 0, defined on the unit square [0, 1] [0, 1]. Let φ k (x, y), 1 k K, be an orthonormal basis function with respect to the Lebesgue measure on the unit square. To ease notation, we denote M = M K, where M is understood to be a function of K, which in turn is a function of the sample size. We also change the multi-index k to a single index k, k = 1,..., M. We consider approximating f 0 by f(x, y) = exp( M k=1 φ k(x, y)) exp( M k=1 c kφ k (x, y))dxdy exp(g(x, y)) exp(g(x, y))dxdy, (5) where g(x, y) = c φ(x, y) with c = (c 1,..., c M ) and φ(x, y) = (φ 1 (x, y),..., φ M (x, y)). Throughout this section, integration is taken to be on the unit square. For the roughness penalty, we adopt a quadratic penalty on the log density g. The penalized MLE objective function is then given by Q = 1 n n i=1 c φ(x i, Y i ) ln exp(g(x, y))dxdy λ 2 c W c, (6) where W is a positive definite weight matrix for the roughness penalty. Given the smoothing parameter and the roughness penalty, one can use Newton s method 13

14 to solve for c iteratively. The gradient and Hessian of (6) are respectively S = ˆφ µĝ(φ) λw ĉ, H = {µĝ(φφ ) µĝ(φ)µĝ(φ) } λw, where µ g is given by (4), and ĝ = exp {ĉ φ}. One can establish the existence and uniqueness of the penalized MLE within the general exponential family under rather mild conditions (see, e.g., Lemma 2.1 of Gu and Qiu (1993)). To implement the penalized MLE, we must specify several factors: (i) the basis functions φ; (ii) the weight function W ; and (iii) the smoothing parameter λ. The smoothing parameter plays a most crucial role and is to be discussed in length in the next subsection. As for the choice of basis functions, commonly used orthogonal series include the Legendre series, the trigonometric series and the splines. Although there exist subtle differences between these series (e.g., in terms of their boundary biases), they lead to the same convergence rates under suitable regularity conditions. We also need determine the size/number of basis functions. We stress that for the penalized MLE, the number of basis functions is generally not considered a smoothing parameter. For instance, in the smoothing spline density estimations, the size of the basis functions can be as large as the sample size. In practice, often a sufficiently large (but smaller than sample size) basis suffices. The size of basis functions in the penalized likelihood estimation is usually considerably larger than that would be selected according to some information criteria, and thus calls for roughness penalty. 6 Next we need choose the weight matrix W. We consider penalizing the roughness of log density g(x, y) = c φ(x, y), whose penalty is given by {g J(g) = (m) (x, y) } 2 {c dxdy = φ (m) (x, y) } { 2 dxdy = c } φ (m) φ (m) dxdy c, 6 See Gu and Wang (2003) for an asymptotic analysis of the size of basis functions in the smoothing spline estimations. 14

15 where g (m) (x, y) = m 1 +m 2 m g(x,y) x m 1 y m 2 with m 0. Therefore, W is given by the middle factor of the last equality. Using orthonormal series simplifies the construction of the weight matrix since it leads to a diagonal weight matrix. When m = 0, W equals the identity matrix and coefficients for all basis functions are penalized equally; when m 1, coefficients for higher order/frequency moments are increasingly penalized, where the rate of increase rises geometrically with m. 7 Popular choices of m include m = 0 and m = 2, corresponding to the natural splines and cubic splines respectively in the smoothing spline estimations. Since there is a one-to-one correspondence between the characterizing moments and their coefficients in the ESE density, the penalized MLE can be viewed as a shrinkage estimator that shrinks the sample characterizing moments towards zero. In addition, this roughness penalty defines a null space J, on which J(g) = 0. This null space is usually finitedimensional to avoid interpolation of the data. When the smoothing parameter λ, the penalized MLE converges to the MLE on J, which is the smoothest density induced by the given roughness penalty. For instance, when m = 2, the smoothest function for g is linear in x; when m = 3, the smoothest function is quadratic in x, leading to the normal distribution (See Silverman (1982)). 3.2 Selection of Smoothing Parameter One advantage of using the penalized MLE for model selection is that it avoids the difficult subset selection or truncation process to determine the optimal degree of smoothing. Instead, a relatively large number of basis functions is used and the degree of smoothing is determined by a single continuous parameter λ. In practice, one has to choose λ. The method of cross validation is commonly used for this purpose. This is a natural choice, because the penalized MLE does not involve subset selection and therefore the AIC, BIC type criteria that penalize the number of parameters cannot be easily applied. The leave-one-out cross validation for linear regressions is quite straightforward. As a matter of fact, for a sample with n observations, one usually need not estimate n regressions 7 For instance, the penalty weight given to a univariate cosine basis φ k (x) = 2 cos(πkx), x [0, 1], is (πk) 2m. 15

16 because there is an analytical formula to calculate the least squares cross validation result from the regression on the full sample. This kind of analytical solution, however, generally does not exist for nonlinear estimations. For the ESE estimation of multivariate densities, this can pose a practical difficulty due to high computational cost. This is because that the coefficients of the ESE are calculated iteratively through Newton s updating. For basis functions of size M, the Hessian matrix has M(M + 1)/2 distinct elements to evaluate, each requiring multidimensional integration by numerical methods. The computational cost increases rapidly with the dimension because (i) the number of basis functions increases with the dimension in nonparametric estimations, and (ii) the multidimensional integration becomes increasingly expensive with the dimension as well. Thus it is rather expensive to implement the leave-one-out cross validation for multivariate ESEs, especially for penalized MLEs that use a large number of basis functions. Therefore we propose a first order approximation to the cross validated log likelihood, which only requires one estimation of the ESE based on the full sample. Recall that (5) belongs to the general exponential family and the sample averages ˆφ are the sufficient statistics for the penalized MLE. Denote the sample averages calculated leaving out the ith observation by φ i = 1/(n 1) n j i φ(x j, Y j ). It follows that φ i are the sufficient statistics for the penalized MLE calculated with the ith observation deleted. For given basis functions and smoothing parameter, denote by ˆf and ˆf i the penalized MLE estimates associated with ˆφ and ˆφ i respectively. Let ĉ and coefficients and Hessian matrix of ˆf, and ĉ i and theorem, we have ĉ i ĉ Ĥ 1 ( ˆφ φ i ). Ĥ be the estimated Ĥ i be similarly defined. By Taylor s The normalization factor can be approximated similarly. Define c 0 = ln exp(g(x, y))dxdy. Let ĉ 0 and ĉ 0, i be the normalization factor of ˆf and ˆf i respectively. It follows that ĉ 0, i ĉ 0 µĝ(φ )Ĥ 1 ( ˆφ ˆφ i ). 16

17 Next let L i be the log likelihood of the ith observation evaluated at ˆf i. The cross validated log likelihood can then be approximated as follows: L = 1 n L i (X i, Y i ) n i=1 = 1 n {ĉ } n i φ(x i, Y i ) ĉ 0, i i=1 1 n { ĉ n Ĥ 1 ( φ φ } i ) φ(xi, Y i ) 1 n { ĉ 0 (µĝ(φ)) Ĥ 1 ( n φ φ } i ) i=1 i=1 = 1 n {ĉ φ(x i, Y i ) ĉ 0 } 1 n φ (X i, Y i n n )Ĥ 1 ( φ φ i ) i=1 i=1 = L 1 n φ (X i, Y i n )Ĥ 1 ( φ φ i ), (7) i=1 where L is the maximum penalized log likelihood on the full sample, and the second to last equality follows because n i=1 ( φ φ i ) = 0. Next let Φ be a n M matrix with the ith row being (φ 1 (X i, Y i ),..., φ M (X i, Y i )). The cross validated log likelihood (7), after straightforward but tedious algebra, can be written in the following matrix form L L 1 n 1 trace[φĥ 1 Φ 1 ] + n(n 1) (1 Φ) Ĥ 1 (Φ 1), (8) where 1 is a n 1 vector of ones. We then use (8) as the objective function of the penalized MLE. Since λ is a scalar, we use a simple grid search to locate the solution. As is discussed above, multi-dimensional numerical integrations are used repeatedly in our estimations. For the calculation of µĝs, we use Smolyak algorithm for cubatures following Gu and Wang (2003). Smolyak cubatures are highly accurate for smooth functions. We note that the placement of nodes in Smolyak cubatures is dense near the boundaries. Therefore, they are particularly suitable for evaluating the ESEs of copula densities since they often peak near the boundaries and corners. 17

18 4 Monte Carlo Simulations To investigate the finite sample performance of the proposed estimators, we conduct a series of Monte Carlo simulations. For the penalized MLE estimator, we penalize the third order derivatives of the log densities. Denote a bivariate ESE density by f(x, y) = exp (g(x, y) c 0 ), where g(x, y) = M m=1 φ m(x, y). The the penalty then takes the form [ { J(g) = c ( x x 2 y + 3 xy + } 2 )g(x, y) dxdy] c. 2 y3 When the smoothing parameter goes to infinity, the penalized MLE converges to the following smoothest distribution induced by the penalty given above: f(x, y) = exp ( c 1 x + c 2 x 2 + c 3 y + c 4 y 2 + c 5 xy c 0 ), x, y [0, 1], which is a truncated bivariate normal density defined on the unit square. 8 Alternatively, one can penalize lower or higher order derivatives of g. We choose the third order derivative because under this penalty, the smoothest distribution is the simplest one that contains useful information on the dependence between x and y, which is captured by the sample moment 1/n n i=1 X iy i. If a lower order derivative is used, the smoothest distribution contains only moments on the margins and thus is not informative since for copula densities, all margins are uniform. On the other hand, if higher order derivatives are used, the smoothest distributions contain higher order information on the dependence between x and y, whose coefficients are not penalized. 9 We consider both the Legendre series and the cosine series, orthonormalized on the unit 8 We note that this is different from the Gaussian copula, whose distribution function is given by C(x, y; ρ) = Φ ρ (Φ 1 (x), Φ 1 (y)), x, y [0, 1], where Φ is the standard normal distribution function, and Φ ρ is the standard bivariate normal distribution function with correlation coefficient ρ. 9 In contrast to the large literature on the selection of smoothing parameters, theoretical guidance to the specification of penalty forms is scanty. On the the hand, existing literature suggests that the estimations are usually not sensitive to the form of penalty, which is consistent with our own numerical experiments. 18

19 square. The results from the two basis functions are rather similar, hence to save space below we only report the results on the Legendre series. We consider three sample sizes: 50, 100 and 200. For all three sizes, we find that the Legendre basis functions with degree no larger than 4 produce satisfactory results. 10 The approximate cross validation method described in the previous section is used to select the smoothing parameter. For comparison, we also estimate the copula densities using the kernel density estimator. In particular, we use the product Gaussian kernel and the bandwidth is selected by the method of likelihood cross validation. We consider four different copulas: the Gaussian, T, Frank, and Galambos; the first two belong to the Elliptical class, and the next two to the Archimedean and the extreme value classes respectively. For each copula, we look at three cases with low, medium and high dependence respectively. The coefficients for the copulas are selected such that the low, medium and high dependence cases correspond to a correlation of 0.2, 0.5 and 0.8 respectively. All experiments are repeated 500 times. For each experiment, let (X i, Y i ), i = 1,..., n, be an iid sample generated from a given copula. Define the pseudo-observations as X i = 1 n + 1 n j=1 I(X j X i ), Y i = 1 n + 1 n I(Y j Y i ), where the denominator is set to n + 1 to avoid numerical difficulties. j=1 Jackel (2002) and Charpentier et al. (2007) suggest that using the pseudo-observations instead of the true observations reduces the variation. The intuition is that the above transformation effectively changes both marginal series (after being sorted in the ascending order) to ( 1 n+1,..., n n+1 ), which is consistent with the fact that copula densities have uniform margins. We use the pseudo-observations in all our estimations. To gauge the performance of our estimates, we calculate the mean square errors (MSE) and mean absolute deviations (MAD) between the estimated densities and the true copula 10 Let φ k the kth degree Legendre polynomial on [0, 1]. We include in our bivariate density estimations basis functions of the form φ j (x)φ k (y), j + k 4. The size of this basis is

20 densities, evaluated on a 30 by 30 equally spaced grid on the unit square. Figure 1 reports the estimated results measured by the MSE. The top, middle and bottom rows correspond to the sample sizes 200, 100 and 50, and the left, middle and right columns correspond to the low, medium and high correlation cases. In each plot, the MSEs for the ESEs are represented by circles connected by solid lines, while those for the KDEs are represented by triangles connected by dash lines. The Gaussian, T, Frank and Galambos copulas are labeled as 1,2,3 and 4 respectively in each plot. Note that the scales for the plots differ. In all our experiments, the ESE outperforms the KDE, often considerably. The MSE increases with the degree of dependence and decreases with the sample size. Averaging across four copulas, the ratios of MSEs between the ESEs and KDEs are 0.25, 0.49 and 0.77 for the low, medium and high correlation cases respectively. The corresponding ratios for sample sizes 50, 100 and 200 are respectively 0.62, 0.67 and Figure 2 reports the estimation results in MADs. The overall pictures are similar to those of MSEs, but with a lager average performance gap. Averaging across four copulas, the ratios of MADs between the ESEs and KDEs are 0.32, 0.42 and 0.61 for the low, medium and high correlation cases respectively. The corresponding ratios for sample sizes 50, 100 and 200 are respectively 0.45, 0.45 and Thus our numerical experiments support our contentions in the previous sections that the ESE provides a useful nonparametric estimator for the copula densities. 5 Concluding Remarks We have proposed a penalized maximum likelihood estimator of the exponential series method for the copula density estimation. The exponential series density estimator is strictly positive and overcomes the boundary bias issue associated with the kernel density estimation. However, the selection of basis functions for the ESEs is challenging and can cause severe numerical difficulties, especially for multivariate densities. To avoid the issue of basis function selection, we adopt the strategy of regularization by employing a relatively large basis and penalizing the roughness of the resulting model, which leads to a penalized maximum 20

21 MSE copula Figure 1: Mean squared errors of estimated copulas. The ESE and the KDE results are represented by circles and triangles respectively. Rows 1-3 correspond to n = 200, 100 and 50 respectively; columns 1-3 correspond to correlation equal to 0, 2, 0.5 and 0.8 respectively; in each plot, copulas 1-4 correspond to the Gaussian, T, Frank and Galambos copula. Note that the scales of the plots differ. 21

22 MAD copula Figure 2: Mean absolute deviation of estimated copulas. The ESE and the KDE results are represented by circles and triangles respectively. Rows 1-3 correspond to n = 200, 100 and 50 respectively; columns 1-3 correspond to correlation equal to 0, 2, 0.5 and 0.8 respectively; in each plot, copulas 1-4 correspond to the Gaussian, T, Frank and Galambos copula. Note that the scales of the plots differ. 22

23 likelihood estimator. To further reduce the computational cost, we propose an approximate likelihood cross validation method for the selection of smoothing parameter. Our extensive Monte Carlo simulations demonstrate the usefulness of the proposed estimator for copula density estimations. Generalization of the said estimator to nonparametric multivariate regressions and applications in high dimensional analysis, especially in financial econometrics, may be of interest for future study. 23

24 References Barron, A. and C. Sheu (1991). Approximation of density functions by sequences of exponential families. Annals of Statistics 19, Charpentier, A., J. Fermanian, and O. Scaillet (2007). The estimation of copulas: Theory and practice. In J. Rank (Ed.), Copulas: From theory to Application in Finance. Risk Publications. Chui, C. and X. Wu (2009). Exponential series estimation of empirical copulas with application to financial returns. In Q. Li and J. Racine (Eds.), Advances in Econometrics, Volume 25. Crain, B. (1974). Estimation of distributions using orthogonal expansions. Annals of Statistics 2, Good, I. (1963). Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables. Annals of Mathematical Statistics 34, Good, I. and R. Gaskins (1971). Nonparametric roughness penalities for probability densi. Biometrika 58, Gu, C. and C. Qiu (1993). Smoothing spline density estimation: Theory. Annals of Statistics 21, Gu, C. and J. Wang (2003). Penalized likelihood density estimation: Direct cross-validation and scalable approximation. Statistica Sinica 13, Hall, P. (1987). On kullback-leibler loss and density estimation. Annals of Statistics 15, Jackel, P. (2002). Monte Carlo Methods in Finance. New York: John Wiley and Sons. Jaynes, E. (1957). Information theory and statistical mechanics. Physics Review 106,

25 Nelsen, R. B. (2010). An Introduction to Copulas. Springer. Neyman, J. (1937). Smooth test for goodness of fit. Scandinavian Aktuarial 20, Silverman, B. W. (1982). On the estimation of a probability density function by the maximum penalized likelihood method. Annals of Statistics 10, Skalar, A. (1959). Functions de repartition a n dimensions et leurs merges. Publ. Inst. Statist. Univ. Paris 8, Wu, X. (2003). Calculation of maximum entropy densities with application to income distribution. Journal of Econometrics 115, Wu, X. (2010). Exponential series estimator of multivariate densities. Journal of Econometrics 156, Zellner, A. and R. Highfield (1988). Calculation of maximum entropy distribution and approximation of marginal posterior distributions. Journal of Econometrics 37,

Transformation-based Nonparametric Estimation of Multivariate Densities

Transformation-based Nonparametric Estimation of Multivariate Densities Transformation-based Nonparametric Estimation of Multivariate Densities Meng-Shiuh Chang Ximing Wu March 9, 2013 Abstract We propose a probability-integral-transformation-based estimator of multivariate

More information

Calculation of maximum entropy densities with application to income distribution

Calculation of maximum entropy densities with application to income distribution Journal of Econometrics 115 (2003) 347 354 www.elsevier.com/locate/econbase Calculation of maximum entropy densities with application to income distribution Ximing Wu Department of Agricultural and Resource

More information

Information Theoretic Asymptotic Approximations for Distributions of Statistics

Information Theoretic Asymptotic Approximations for Distributions of Statistics Information Theoretic Asymptotic Approximations for Distributions of Statistics Ximing Wu Department of Agricultural Economics Texas A&M University Suojin Wang Department of Statistics Texas A&M University

More information

GMM Estimation of a Maximum Entropy Distribution with Interval Data

GMM Estimation of a Maximum Entropy Distribution with Interval Data GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff January, 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics

More information

Probability Distributions and Estimation of Ali-Mikhail-Haq Copula

Probability Distributions and Estimation of Ali-Mikhail-Haq Copula Applied Mathematical Sciences, Vol. 4, 2010, no. 14, 657-666 Probability Distributions and Estimation of Ali-Mikhail-Haq Copula Pranesh Kumar Mathematics Department University of Northern British Columbia

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

A measure of radial asymmetry for bivariate copulas based on Sobolev norm

A measure of radial asymmetry for bivariate copulas based on Sobolev norm A measure of radial asymmetry for bivariate copulas based on Sobolev norm Ahmad Alikhani-Vafa Ali Dolati Abstract The modified Sobolev norm is used to construct an index for measuring the degree of radial

More information

Maximum Entropy Approximations for Asymptotic Distributions of Smooth Functions of Sample Means

Maximum Entropy Approximations for Asymptotic Distributions of Smooth Functions of Sample Means Scandinavian Journal of Statistics Vol. 38: 130 146 2011 doi: 10.1111/j.1467-9469.2010.00698.x Published by Blackwell Publishing Ltd. Maximum Entropy Approximations for Asymptotic Distributions of Smooth

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Spline Density Estimation and Inference with Model-Based Penalities

Spline Density Estimation and Inference with Model-Based Penalities Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These

More information

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions Yu Yvette Zhang Abstract This paper is concerned with economic analysis of first-price sealed-bid auctions with risk

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

A Goodness-of-fit Test for Copulas

A Goodness-of-fit Test for Copulas A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Financial Econometrics and Volatility Models Copulas

Financial Econometrics and Volatility Models Copulas Financial Econometrics and Volatility Models Copulas Eric Zivot Updated: May 10, 2010 Reading MFTS, chapter 19 FMUND, chapters 6 and 7 Introduction Capturing co-movement between financial asset returns

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

The Instability of Correlations: Measurement and the Implications for Market Risk

The Instability of Correlations: Measurement and the Implications for Market Risk The Instability of Correlations: Measurement and the Implications for Market Risk Prof. Massimo Guidolin 20254 Advanced Quantitative Methods for Asset Pricing and Structuring Winter/Spring 2018 Threshold

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

On the Choice of Parametric Families of Copulas

On the Choice of Parametric Families of Copulas On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University of Toronto Collaborators: Mariana Craiu, University Politehnica, Bucharest Vienna, July 2008 Outline 1 Brief

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Multivariate Distribution Models

Multivariate Distribution Models Multivariate Distribution Models Model Description While the probability distribution for an individual random variable is called marginal, the probability distribution for multiple random variables is

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

GMM Estimation of a Maximum Entropy Distribution with Interval Data

GMM Estimation of a Maximum Entropy Distribution with Interval Data GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff March 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics

More information

Integrating Correlated Bayesian Networks Using Maximum Entropy

Integrating Correlated Bayesian Networks Using Maximum Entropy Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Tail dependence in bivariate skew-normal and skew-t distributions

Tail dependence in bivariate skew-normal and skew-t distributions Tail dependence in bivariate skew-normal and skew-t distributions Paola Bortot Department of Statistical Sciences - University of Bologna paola.bortot@unibo.it Abstract: Quantifying dependence between

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

An Alternative Method for Estimating and Simulating Maximum Entropy Densities

An Alternative Method for Estimating and Simulating Maximum Entropy Densities An Alternative Method for Estimating and Simulating Maximum Entropy Densities Jae-Young Kim and Joonhwan Lee Seoul National University May, 8 Abstract This paper proposes a method of estimating and simulating

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

A Brief Introduction to Copulas

A Brief Introduction to Copulas A Brief Introduction to Copulas Speaker: Hua, Lei February 24, 2009 Department of Statistics University of British Columbia Outline Introduction Definition Properties Archimedean Copulas Constructing Copulas

More information

Kernel B Splines and Interpolation

Kernel B Splines and Interpolation Kernel B Splines and Interpolation M. Bozzini, L. Lenarduzzi and R. Schaback February 6, 5 Abstract This paper applies divided differences to conditionally positive definite kernels in order to generate

More information

Semi-parametric predictive inference for bivariate data using copulas

Semi-parametric predictive inference for bivariate data using copulas Semi-parametric predictive inference for bivariate data using copulas Tahani Coolen-Maturi a, Frank P.A. Coolen b,, Noryanti Muhammad b a Durham University Business School, Durham University, Durham, DH1

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Vectors in Function Spaces

Vectors in Function Spaces Jim Lambers MAT 66 Spring Semester 15-16 Lecture 18 Notes These notes correspond to Section 6.3 in the text. Vectors in Function Spaces We begin with some necessary terminology. A vector space V, also

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories Entropy 2012, 14, 1784-1812; doi:10.3390/e14091784 Article OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories Lan Zhang

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way. AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

A Review of Basic Monte Carlo Methods

A Review of Basic Monte Carlo Methods A Review of Basic Monte Carlo Methods Julian Haft May 9, 2014 Introduction One of the most powerful techniques in statistical analysis developed in this past century is undoubtedly that of Monte Carlo

More information

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds. Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds. A SIMULATION-BASED COMPARISON OF MAXIMUM ENTROPY AND COPULA

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Imputation Algorithm Using Copulas

Imputation Algorithm Using Copulas Metodološki zvezki, Vol. 3, No. 1, 2006, 109-120 Imputation Algorithm Using Copulas Ene Käärik 1 Abstract In this paper the author demonstrates how the copulas approach can be used to find algorithms for

More information

FRÉCHET HOEFFDING LOWER LIMIT COPULAS IN HIGHER DIMENSIONS

FRÉCHET HOEFFDING LOWER LIMIT COPULAS IN HIGHER DIMENSIONS DEPT. OF MATH./CMA UNIV. OF OSLO PURE MATHEMATICS NO. 16 ISSN 0806 2439 JUNE 2008 FRÉCHET HOEFFDING LOWER LIMIT COPULAS IN HIGHER DIMENSIONS PAUL C. KETTLER ABSTRACT. Investigators have incorporated copula

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

CV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ

CV-NP BAYESIANISM BY MCMC. Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUEZ CV-NP BAYESIANISM BY MCMC Cross Validated Non Parametric Bayesianism by Markov Chain Monte Carlo CARLOS C. RODRIGUE Department of Mathematics and Statistics University at Albany, SUNY Albany NY 1, USA

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

MULTIDIMENSIONAL POVERTY MEASUREMENT: DEPENDENCE BETWEEN WELL-BEING DIMENSIONS USING COPULA FUNCTION

MULTIDIMENSIONAL POVERTY MEASUREMENT: DEPENDENCE BETWEEN WELL-BEING DIMENSIONS USING COPULA FUNCTION Rivista Italiana di Economia Demografia e Statistica Volume LXXII n. 3 Luglio-Settembre 2018 MULTIDIMENSIONAL POVERTY MEASUREMENT: DEPENDENCE BETWEEN WELL-BEING DIMENSIONS USING COPULA FUNCTION Kateryna

More information

arxiv: v1 [physics.comp-ph] 22 Jul 2010

arxiv: v1 [physics.comp-ph] 22 Jul 2010 Gaussian integration with rescaling of abscissas and weights arxiv:007.38v [physics.comp-ph] 22 Jul 200 A. Odrzywolek M. Smoluchowski Institute of Physics, Jagiellonian University, Cracov, Poland Abstract

More information

Asymptotic distribution of the sample average value-at-risk

Asymptotic distribution of the sample average value-at-risk Asymptotic distribution of the sample average value-at-risk Stoyan V. Stoyanov Svetlozar T. Rachev September 3, 7 Abstract In this paper, we prove a result for the asymptotic distribution of the sample

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Penalty Methods for Bivariate Smoothing and Chicago Land Values

Penalty Methods for Bivariate Smoothing and Chicago Land Values Penalty Methods for Bivariate Smoothing and Chicago Land Values Roger Koenker University of Illinois, Urbana-Champaign Ivan Mizera University of Alberta, Edmonton Northwestern University: October 2001

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

An Overly Simplified and Brief Review of Differential Equation Solution Methods. 1. Some Common Exact Solution Methods for Differential Equations

An Overly Simplified and Brief Review of Differential Equation Solution Methods. 1. Some Common Exact Solution Methods for Differential Equations An Overly Simplified and Brief Review of Differential Equation Solution Methods We will be dealing with initial or boundary value problems. A typical initial value problem has the form y y 0 y(0) 1 A typical

More information