Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification. Jialin Zou

Size: px
Start display at page:

Download "Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification. Jialin Zou"

Transcription

1 Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification by Jialin Zou A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Statistical Science University of Toronto c Copyright 2015 by Jialin Zou

2 Abstract Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification Jialin Zou Doctor of Philosophy Graduate Department of Statistical Science University of Toronto 2015 Nonparametric estimation is a novelty statistical method which relaxes the distribution assumption about the relationship between response and covariate, in contrast to parametric estimation. This method has been applied in many field of interest, including density function, regression model and derivative function. One of the important application of nonparametric estimation is modelling dependence among random variables via copula approaches has attracted considerable research attention. With advances in data collection, the strength of dependence often varies according to some covariate, which motivates the dependence calibration using conditional copulas. We propose a penalized estimation framework for the copula parameter function that inherits the flexibility of a nonparametric method and, at the same time, yields a parsimonious and interpretable dependence structure. The theoretical analysis guarantees that the penalized estimators enjoy the oracle properties and behave asymptotically as well as their nonparametric counterparts, while numerical experiments demonstrate the improved empirical performance. We then apply the proposed method to a twin birth weights data. Another important application of nonparametric estimation is classifying the functional data. We consider the classification of sparse functional data that are often encountered in longitudinal studies and other scientific experiments. To utilize the information from not only the functional trajectories but also the observed class labels, we propose a probability enhanced method achieved by weighted support vector machine ii

3 based on its Fisher consistency property to estimate the effective dimension reduction space. Since only a few measurements are available for some, even all, individuals, a cumulative slicing approach is suggested to borrow information across individuals. We provide justification for validity of the probability-based effective dimension reduction space, and a straightforward implementation that yields a low-dimensional projection space ready for applying standard classifiers. The empirical performance is illustrated through simulated and real examples, particularly in contrast to classification results based on the prominent functional principal component analysis. iii

4 Dedication This thesis is dedicated to my parents. iv

5 Acknowledgements First, I would like to express my deepest thanks to my supervisor, Professor Fang Yao, for his guidance, enthusiasm and patience during my research. Without his help, finishing the thesis is impossible. Secondly, I also thank my thesis committee members, especially, Professor Radu V. Craiu from University of Toronto and Professor Yichao Wu from North Carolina State University for their feedback and comments on my thesis. Thirdly, I am grateful to the faculty members, Professor Sheldon Lin, Professor Radford Neal, Professor Nancy Reid, Professor Jeffrey S. Rosenthal, Professor Mike Evans and Professor Lawrence J. Brunner for teaching me statistical courses. I also express many thanks to all the staff at Department of Statistical Science, especially Andrea Carter, Christine Bulguryemez and Dermot Whelan for the support and help during my PhD program. Furthermore, I express many thanks to the graduate students at Department of Statistical Science for their support and help. Finally, I would like to thank my father and mother for their encouragement and love. v

6 Contents 1 Interpretable Dependence Calibration in Conditional Copulas Introduction Proposed Methodology Asymptotic Properties Simulation Study Application to Twin Birth Data Proofs of Main Theorems PEFCS for Classifying Sparse Functional Data Introduction Proposed Methodology Simulations Data Examples Concluding Remarks Bibliography 74 vi

7 List of Tables 1.1 Comparisons between the proposed penalized estimation and the unpenalized local linear estimation, using the true and nonparametrically estimated conditional marginals, respectively, for the Clayton family. The underlying models correspond to η j defined in (1.11), where the IBIAS 2, IVAR and IMSE with their standard errors in parenthesis are with respect to Kendall s tau functions τ j (multiplied by 100 for visualization), j = 1, 2, Proportion(%) of correct identified copula in each family under the calibration functions, η 1 (x), η 2 (x) and η 3 (x) The average classification error ( 100%) with its standard error in parenthesis obtained from 100 Monte Carlo repetitions in Simulation I The average classification error ( 100%), with its standard error in parenthesis obtained from 20 random partitions of the Berkeley growth data The average classification error ( 100%), with its standard error in parenthesis obtained from 20 random partitions of the spinal bone density data The average classification error ( 100%), with its standard error in parenthesis obtained from 20 random partitions of the primary biliary cirrhosis follow-up data vii

8 List of Figures 1.1 The Kendall s tau of the conditional copula estimates: local linear unpenalized estimate(dashed line) Kendall s tau functions τ j (x) that correspond to η j (x) in (1.11) for j = 1, 2, 3 (from left to right) Scatterplots and histograms for the original twin birth weights (left) and the transformed responses using the parametric (middle) and nonparametric (right) marginal estimates, respectively The Kendall s tau of estimated copula parameter functions under three Archimedean couples: Clayton, Frank and Gumbel. The top panels correspond to parametric marginals, and the bottom panels correspond to nonparametric marginals. In each panel, shown are the penalized estimate (solid) with 95% bootstrap confidence bands (dotted), as well as the unpenalized estimate (dotdashed) Height trajectories of 39 boys (top) and 54 girls (bottom) from the Berkeley growth data Spinal bone density data for Hispanic female (top) and male (bottom) Logarithm-transformed measurements of serum bilirubin for the patients that are alive (top) or dead (bottom) beyond ten years from the primary biliary cirrhosis data viii

9 Chapter 1 Interpretable Dependence Calibration in Conditional Copulas: A Penalized Approach 1.1 Introduction One of the challenging problems in statistics is how to characterize dependence structure among random variables. Although correlation is easy to obtain for measuring linear association as a common means, a full characterization of dependence among random variables is desirable but more difficult. Pioneered by Sklar s theorem (Sklar, 1959), copula has become a powerful tool for modelling dependence structure. To be specific, denote the marginal distributions of random variables Y 1 and Y 2 by F 1 and F 2, and their joint distribution by H, then the existence and uniqueness of the copula function C are guaranteed by Sklar s theorem, H(y 1, y 2 ) = C{F 1 (y 1 ), F 2 (y 2 )}. Along with theoretical developments, the applications of copula have also flourished, e.g., in finance and insurance (Frees and Valdez, 1998; Embrechts and Straumann, 2002; Cherubini et al., 2004), survival analysis (Clayton, 1978; Shih and Louis, 1995; Hougaard, 2000; Wang and Wells, 2000), among others. Although the ordinary copula has been widely studied, it failed to enjoy additional information from covariates. The conditional copula was recently proposed by introducing covariate into the copula modelling. By extending the 1

10 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas2 Sklar s theorem (Patton, 2006), the existence and uniqueness of a conditional copula are also guaranteed and covariate adjustment is brought into conditional distributions for improving estimation of copula parameter. For instance, based on information from a covariate X, the dependence surface between Y 1 and Y 2 can be modelled by C(U 1, U 2 ; X), where U 1 = F 1 X (Y 1 X) and U 2 = F 2 X (Y 2 X) with F j X being the conditional marginal cumulative distribution functions (c.d.f.). Patton (2006) showed that the conditional joint distribution for each X = x is uniquely defined by H(y 1, y 2 ; x) = C(U 1, U 2 ; x) for (y 1, y 2 ) in the support of (Y 1, Y 2 ). A copula family defined by C is often indexed by a parameter θ that plays a critical role in determining the dependence structure. As a consequence, the estimation of the copula parameter is of particular interest. Common approaches for estimating a single copula parameter include the maximum likelihood method (Genest and Rivest, 1993; Joe, 1997), or alternatively the nonparametric kernel method (Fermanian and Scaillet, 2003; Chen and Huang, 2007). For estimating the conditional copula parameter, we refer readers to Bartrama et al. (2007), Jondeau and Rockinger (2006) and Patton (2006) for parametric estimation. Due to the limitation of a priori, nonparametric estimation of the functional relationship between the conditional copula parameter θ and a covariate X has been called for. Gijbels et al. (2011) proposed empirical estimators that are fully nonparametric and studied the asymptotic properties in Veraverbeke et al. (2011). Acar et al. (2011) modelled the conditional copula as an unknown function of X, and expanded it around each point by utilizing the local polynomial technique (Fan and Gijbels, 1996), while treating the conditional marginal distributions as known. Abegaz et al. (2012) extended this framework with a nonparametric kernel estimator for the conditional marginals. Although a nonparametric estimation of the conditional copula is flexible, it some-

11 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas3 times can be hazardous to have overly fluctuated dependence over some range of the covariate. This is inherited from nonparametric regression and might not respect the underlying relationship. This consideration motivates us to inspect the Kendall s tau of the conditional copula estimates obtained by local linear modelling for the twin birth weight data from the Matched Multiple Birth Dataset of the National Centre for Health Statistics (Acar et al., 2011), shown in Figure 1.1. A careful inspection may question that the dependence strength does not necessarily change in the middle region. Moreover, should the dependence in the middle be constant, the copula model is more parsimonious with an enhanced interpretation. Figure 1.1: The Kendall s tau of the conditional copula estimates: local linear unpenalized estimate(dashed line). We tackle this problem by introducing a penalized estimation framework that detects the region over which the dependence structure is potentially constant to remove undesirable fluctuation (Yao and Zou, 2015). There have emerged extensive penalty approaches in high-dimensional literature, such as the nonnegative garrote (Breiman,

12 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas4 1995), the (adaptive) LASSO (Tibshirani, 1996; Zou, 2006), the Smoothly Clipped Absolute Deviation (SCAD) (Fan and Li, 2001), among others. Some of these approaches have also been adopted to nonparametric estimation for identifying regions of particular interest, among which a relevant work is Kong et al. (2013) that coupled local polynomial regression with the SCAD in varying coefficient model to identify nonzero regions. In this chapter, we introduce a specially designed penalty in the context of local likelihood for conditional copulas, so that the resultant estimator respects the underlying dependence relationship. Our proposal enjoys the flexibility of nonparametric estimation without suffering from unnecessary fluctuations. A main contribution is the theoretical analysis which guarantees that the proposed method enjoys the oracle properties and behaves asymptotically as well as its nonparametric counterparts, while the numerical study illustrates its superior finite sample performance. In the following, we briefly review the relevant topics that are involved in our proposed methodology for interpretable dependence calibration in conditional copulas Introduction to Copula Modeling A copula is used to describe the association of two or more random variables from any joint distribution. It can be defined as follow. Definition 1. A bivariate copula is a joint distribution of two uniform distributions, i.e., C(u 1, v 1 ) = P (U 1 u 1, V 1 v 1 ), (1.1) where U 1 Uniform(0, 1) and V 1 Uniform(0, 1). Equivalently, a copula has the following properties.

13 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas5 Proposition 1. A bivariate copula is a two dimensional function C whose support and range are [0, 1] 2 and [0, 1] and has the following properties: 1. C(0, u) = C(u, 0) = 0 and C(1, u) = C(u, 1) = u, u [0, 1]. 2. C(u 1, v 1 ) + C(u 2, v 2 ) C(u 1, v 2 ) C(u 2, v 1 ) 0 for all u 1, u 2, v 1, v 2 [0, 1] such that u 1 u 2 and v 1 v 2. The central theory about copula is Sklar s theorem (1959), which demonstrates the relationship among copula, joint distribution and marginal distribution. Theorem 1. (Sklar s Theorem) Suppose that H is the joint distribution for continuous random variables Y 1 and Y 2 with marginal distributions F 1 and F 2, then there exists a unique copula C such that H(y 1, y 2 ) = C{F 1 (y 1 ), F 2 (y 2 )}, for all (y 1, y 2 ) R 2. (1.2) Conversely, if F 1 and F 2 are distribution functions and C is a copula, then the function H defined by (1.2) is the joint distribution function with margin distributions F 1 and F 2. The detail proof can be found in Schweizer and Sklar (1983). Sklar s theorem guarantees the existence and uniqueness of the so-called copula function C. After the ordinary copula has been widely applied, the conditional copula was recently proposed by introducing the covariate into the copula function. Definition 2. (Conditional copula). The conditional copula C(Y 1, Y 2 X) is the joint distribution of U 1 = F 1 X (Y 1 X) and V 1 = F 2 X (Y 2 X) given X, where Y 1 X F 1 X ( X) and Y 2 X F 2 X ( X). Patton (2006) extended the Sklar s theorem to guarantee the existence and uniqueness of the conditional copula.

14 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas6 Theorem 2. (Sklar s Theorem for the conditional distributions) Suppose that Y 1 X = x and Y 2 X = x have conditional distributions F 1 X ( x) and F 2 X ( x), respectively, H X ( ; x) is the joint distribution of Y 1 and Y 2 given X = x, where the support of X is χ, then there exists a unique copula C( x) such that H X (y 1, y 2 ; x) = C(F 1 X (y 1 x), F 2 X (y 2 x); x), (1.3) if F 1 X ( x) and F 2 X ( x) are continuous in y 1 and y 2. Conversely, Y 1 X = x and Y 2 X = x have conditional distributions F 1 X ( x) and F 2 X ( x), respectively, and C( x) is a conditional copula which is measurable in x, then the function H X ( x) defined by (1.3) is the joint distribution function with margin distributions F 1 X ( x) and F 2 X ( x). Detail of the proof can be found in Patton (2002). Sklar s Theorem for the conditional distributions guarantees the existence and uniqueness of the so-called copula function C after bringing in the covariate, it extends the flexibility of copula. Note based on the theorem, all of F 1 X ( x), F 2 X ( x) and C(, ; x) are conditional on covariate x, otherwise the theorem fails Local Polynomial Regression Smoothing methods is a powerful approach to describe the complex data structure without any stringent assumptions. Many advanced techniques have been extensively studied in the regression setting and more complicated framework. Under the framework of the simple nonparametric regression, we have the model Y = η(x) + ε, ε (0, σ 2 ), (1.4) where η is the smooth function of interest, ε is the noise with mean zero and variance σ 2.

15 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas7 The local polynomial regression (Fan and Gijbels, 1996) is a popular and simple method to (1.4). By Taylor expansion, we approximate η(x) by p th order expansion η(x i ) p j=0 η (j) (x) (X i x) j j! p β j,x (X i x) j. (1.5) j=0 To account for the contribution from X i in the neighborhood of x, we minimize the local mean square error { Y i p j=0 } 2K( β j,x (X i x) j X i x ), (1.6) h where K is a one-dimensional symmetric kernel function and the bandwidth h determines the width of the local window. From (1.5), we obtain the estimator ˆη (j) (x) = j! ˆβ j,x, j = 0,..., p Adopting the idea from local polynomial, one can generalize it to a framework, when the least square loss is not appropriate, and shed light on further development of likelihood-based technique. Suppose that the observation (X i, Y i ) has the log-likelihood l{η(x i ), Y i }, where η(x i ) is to be estimated, the log-likelihood or the loss function for the entire n data points can be written as, l{η(x i ), Y i }. On the consideration of the contribution of the local log-likelihood, using expansion (1.5), the value of ˆη(x) at grid x is given by maximizing the local log-likelihood { p } l β j,x (X i x) j, Y i K( X i x ), h j=0 over β 0,x,..., β p,x. Similarly, the estimators are given by ˆη (j) (x) = j! ˆβ j,x, j = 0,..., p.

16 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas8 The local likelihood approach has been extended to a large number of problems. The generalized additive models (Hastie and Tibshirani, 1990), the generalized linear models (Fan et al., 1995) and the varying coefficient models (Cai et al., 2000) have been developed under this framework. In multiparameter regression, Aerts and Claeskens (2000) proposed the multiparameter likelihood models, among others Smoothly Clipped Absolute Deviation Fan and Li (2001) proposed the Smoothly Clipped Absolute Deviation (SCAD) penalty which yields the estimators with desired properties including sparsity, continuity and unbiasedness. To be specific, sparsity means that the small estimator is being set to zero to achieve the variable selection, and continuity means that the penalty can lead to the continuous estimator so that the model prediction is stable, while unbiasedness means that the estimator is unbiased if the true estimator is large. As a consequence, with the help of SCAD, Fan and Li (2001) provided a novelty approach to achieve the dimension reduction and variable selection simultaneously. The derivative formula is P λ(t) = λ{i(t λ) + (aλ t) + I(t > λ)}, t > 0, for some a > 2, (a 1)λ and the regular formula is P λ (t) = λ t, if t λ, t2 2a t + λ 2, 2(a 1) if λ < t aλ, (a + 1)λ 2, 2 if t aλ, where a = 3.7 is often used and λ is the tuning parameter which controls the penalty

17 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas9 strength. From the formula, we can obtain that the SCAD penalty is continuously differentiable in R except 0 with its derivative being zero in (, aλ) and (aλ, ). Under the multivariate linear regression framework, suppose Y is the response and β is a d 1 coefficient vector associated with the corresponding covariate vector X. We consider the model Y = Xβ + ɛ, where ɛ is from the normal distribution with mean zero and constant variance. With SCAD penalty, we can obtain the resulting estimators through the following equation min β 1,,β d (Y i X i β) 2 + n d P λ ( β j ). j=1 Note when the design matrices are orthonormal, we can obtain the explicit form of the resulting estimator as follow, ˆβ SCAD = ( ˆβ λ) + sign( ˆβ), if ˆβ < 2λ, (a 1) ˆβ sign( ˆβ)aλ, a 2 if 2λ < ˆβ aλ, ˆβ, if ˆβ aλ, where ˆβ is the unpenalized estimator, i.e., the least square estimator. The SCAD penalty has been studied extensively. Fan and Peng (2004) developed the oracle properties with a diverging number of covariates, while Kim et al. (2008) studied the sparsity property in high dimension case. The rest of the chapter is organized as follows. In section 1.2, we describe the proposed methodology, along with the algorithm and the selection of tuning parameters. Asymptotic properties are presented in Section 1.3, where both parametrically and

18 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas10 nonparametrically estimated marginals are considered. We illustrate the empirical performance through a simulation study in Section 1.4, and apply the proposed method to the twin birth weights data in Section 1.5. Technical proofs are deferred section Proposed Methodology Penalized local likelihood for interpretable dependence Let X be a continuous covariate that a pair of continuous response (Y 1, Y 2 ) is conditional on, and recall that the marginal c.d.f. given X are F 1 X and F 2 X. We first focus on the estimation of the copula parameter function, treating the marginals as known, and then extend to the case of estimated marginals. There exists a unique copula function C such that the joint conditional distribution (Y 1, Y 2 ) given X can be expressed as follows, H{y 1, y 2 ; θ(x)} = C{F 1 X (y 1 x), F 2 X (y 2 x); θ(x)} or h{y 1, y 2 ; θ(x)} = c{f 1 X (y 1 x), F 2 X (y 2 x); θ(x)}f 1 X (y 1 x)f 2 X (y 2 x), where f 1 X (y 1 x) and f 2 X (y 2 x) are conditional density functions. Let U 1,x = F 1 X (Y 1 x) and U 2,x = F 2 X (Y 2 x) that are uniformly distributed as U[0, 1], we have (U 1i, U 2i ) X i C{u 1i, u 2i ; θ(x i )}, where θ(x i ) = g 1 {η(x i )} and g is a known monotone link function to guarantee the value of θ in a proper range, i = 1, 2,..., n. We begin with the local polynomial expansion around a fixed x in the support of X, η(x i ) η(x) + η (x)(x i x) η (p) (x)(x i x) p /p!.

19 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas11 Denoting β k,x = η (k) (x)/k!, the copula function is approximated by θ(z) g 1 {β 0,x + β 1 (z x) β p,x (z x) p } for z in some neighbourhood of x and β 0,x = g{θ(x)}. For brevity we suppress the dependence of β k,x on x. It is known that estimating higher degree coefficients leads to larger variability and computational complexity, while a customary choice is a local linear smoother with p = 1 (Fan and Gijbels, 1996). Denoting β = (β 0, β 1 ), the local log-likelihood at x is l(β; x) = log c[u 1i, U 2i ; g 1 {β 0 + β 1 (X i x)}]k h (X i x), (1.7) where K h ( ) = h 1 K( /h), K is a compactly supported kernel density, and h is the bandwidth to control the amount of smoothing. Common choices of K include the Epanechnikov kernel, K(u) = 3 4 (1 u2 )I( u 1), if K is Epanechnikov kernel, where I( ) is the indicator function, as well as the triweight kernel or Gaussian kernel, etc. Our goal is to encourage the nonparametric estimation to stay constant whenever the underlying relationship is indeed so. Note that the local coefficients of higher degrees regulate how the dependence structure varies over the neighbourhood of x. Specifically the local slope β 1 represents the rate of smooth change at x in the copula parameter. To identify the constant region of the dependence, we use a sufficiently dense grid over the domain of X, say {x 1,..., x N } on which the estimates will be attained. If the local slope parameters are zero for a set of consecutive grid points, say {x j,..., x j+l }, we will

20 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas12 regard θ (1) (x) = 0 for x (x j, x j+l ). Given the smooth feature of the local linear fit, the resultant estimator of η( ) will appear constant over the region (x j, x j+l ). The above consideration suggests the penalization to be imposed on the local slope parameters at each grid point. To properly scale the penalty function when coupled with the local log-likelihood l(β; x) in (1.7), we divide l(β) by K h (0) so that K h (X i x)/k h (0) = O(1). At any fixed x, the data contributing to the estimation of η(x) only include those in its local window. Thus we define the effective sample size at x by m x = n K h(x i x)/k h (0). Lastly we standardize each column of the design matrix as in traditional linear model, before coupling with the penalty. Denote the standard deviation of {K 1/2 h (X i x)},...,n by s x, the standard deviation of {(X i x)k 1/2 (X i x)},...,n by r x. Then the local coefficients are scaled as β 0 = s x β 0 and β 1 = r x β 1. This scaling facilitates our asymptotic analysis, thanks to the same convergence rates for the estimates of β 0 and β 1. Now we aim for maximizing the following penalized local log-likelihood with respect to (w.r.t.) β = ( β 0, β 1 ), h Q( β; x) = log c[u 1i, U 2i ; g 1 {s 1 β x 0 + rx 1 β 1 (X i x)}] K h(x i x) + m x P λx ( K h (0) β 1 ), (1.8) where m x = n K h(x i x)/k h (0), P λx ( ) is the penalty function that tends to shrink the local slope β 1 to zero if the true value is, and λ x is the shrinkage parameter. We employ the SCAD penalty that yields estimators with desired consistency and sparsity, { P λ x (t) = λ x I(t λ x ) + (aλ x t) } + I(t > λ x ), t > 0, for some a > 2, (a 1)λ x where a = 3.7 is suggested by Fan and Li (2001). Other choices of P λx ( ) are available, such as the MCP (Zhang, 2010) and the (adaptive) LASSO (Tibshirani, 1996; Zou, 2006).

21 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas13 The adaptive LASSO is a convex penalty function, while MCP and SCAD are non-convex penalty functions. All of them produce continuous and unbiased solution. For practical implementation, to emphasize the constant pattern and remedy the boundary effect, one can make numerical adjustment for the final estimator of η(x) after obtaining ˆβ on all grid points. For instance, we take the ˆβ 0 (x [N/2] ) at the central grid point x [N/2] and use the numerical approximation, where [N/2] denotes the nearest integer inclusively, ˆη a (x) = ˆβ a 0(x) = ˆβ 0 (x [N/2]) + k 1 j=[n/2] ˆβ 1 (x j)(x j+δ x j) + ˆβ 1 (x k)(x x k), where x k is the nearest grid point from x [N/2] to x, and δ = 1 if x > x [N/2] and 1 otherwise. It is easy to verify that this adjusted estimator is asymptotically equivalent, ˆη a (x) ˆη(x) = o p (1) for any x given a sufficiently dense grid. The assumption of known conditional marginals, F 1 X and F 2 X, may be relaxed. If F 1 X and F 2 X can be estimated from a parametric model, the estimated marginals are root-n consistent and the additional error is negligible relative to that from estimating the copula function. If there is no such prior knowledge of the marginals available, one can estimate the conditional marginals using a nonparametric approach and plug the estimates into the above penalized estimation. This inflates the error of the estimated copula function, and is characterized in Section 1.4. For specificity, we use the Nadaraya- Watson estimator suggested by Abegaz et al. (2012) for j = 1, 2, ˆF j X (y x) = ω ni (x, h j )I(Y ji y), ω ni (x, h j ) = K h j (X i x), (1.9) K hj (X k x) k=1 where h j s are bandwidths for controlling the smoothness.

22 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas Optimization algorithm and selection of parameters To maximize the penalized log-likelihood Q( β; x) w.r.t. β = ( β0, β 1 ), we proposed an iterative procedure that modifies the local linear approximation (LLA) algorithm proposed by Zou and Li (2008). Denote the estimates obtained in the mth iteration by β (m) 0 = s x β (m) 0, β (m) 1 = r x β (m) 1, β (m),(m) (m) (m) = ( β 0, β 1 ), η (m),(m) (x) = s 1 x β (m) 0 + r 1 x β (m) 1 (X i x), regard l(β; x) in (1.7) as a function of β, i.e., l( β) = l i ( β)k h (X i x)/k h (0), where the dependence on x is suppressed. We use the unpenalized local linear estimator (Acar et al., 2011) as an initial estimate. The algorithm usually converges within a few iterations from our numerical experience. Update the local intercept hessian 2 l( β)/ β 2 0, β (m) 0 by calculating the gradient l( β)/ β 0 and the β (m+1) 0 = { (m) β 2 l( β (m),(m) ) } 1 l( β (m),(m) ) 0. 2 (m) (m) β 0 β 0 Update the local slope β (m) 1 by the modified LLA algorithm. Denote X x = (X 1

23 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas15 x,..., X n x), µ i = (X i x) β 1, and D = diag(d 11,..., D nn ), 2 l( β (m+1),(m) ) β (m) 2 1 = X x DX x, D ii = 2 l i ( β (m+1),(m) ) µ i µi = µ (m) i. Define the working data y = (D 1/2 11 ˆµ (m) 1,..., D 1/2 compute β (m) 1 as follows. nn ˆµ (m) n ) and X x = D 1/2 X x. We (a) If p (1) (m) λ x ( β 1 ) = 0, β (m+1) 1 = (X x X x) 1 X x y. (b) If p (1) (m) λ x ( β 1 ) > 0, take a further transform X x = λ x X x/p (m) λ x ( β 1 ), and apply the coordinate descent algorithm (Friedman et al., 2007), { β (m+1) 1 1 = arg min β 1 2 y X β x mλ x β } 1. Then the local slope is given by (m+1) β 1 = λ β(m+1) x 1 /p (m) λ x ( β 1 ). It is important to tune the shrinkage parameter λ x that controls the magnitude of β 1 and thus the dependence strength. Suggested by Wang and Leng (2009), we adopt the Bayesian information criterion (BIC), BIC(λ x ) = 2 log c[u 1i, U 2i ; g 1 {ˆη λx (X i )}] K h(x i x) K h (0) + df log m x, where m x = n K h(x i x)/k h (0), ˆη λx (X i ) = ˆβ λx,0 + ˆβ λx,1(x i x) with a subscript λ x emphasizing the dependence on λ x, and df = 1 + I( ˆβ λx,1 > 0). For the bandwidth h in the copula estimation (1.8), we minimize the 2 two-fold crossvalidated likelihood (CVL, Acar et al., 2011). With a slight abuse of notation, denote the estimate based on the training data by ˆβ 0,h indicating the dependence on h, and the

24 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas16 testing set by {X1,..., X[n/2] }. Maximizing the objective function w.r.t h, CVL(h) = [n/2] log c[u 1i, U 2i ; g 1 { ˆβ 0,h (X i )}] (1.10) yields a data-driven choice of h, where ˆβ 0,h (X i ) s are assessed on the testing set. When the marginal distributions are nonparametrically estimated by the kernel method (1.9), the bandwidths h 1 and h 2 can also be included in the criterion (1.10). Lastly for choosing an appropriate copula family, since the likelihoods are on different scale, we adapt a twofold cross-validated prediction error (CVPE). Details can be found in Acar et al. (2011) and are omitted for conciseness. 1.3 Asymptotic Properties In this section, we show that the proposed penalized estimator enjoys the oracle properties, including the estimation consistency, sparsity and asymptotic normality. Here the sparsity is in the sense that, if the underlying dependence is constant at x, the local slope will be estimated exactly as zero. Recall that β = ( β 0, β 1 ) is the scaled version of β = (β 0, β 1 ), i.e., β 0 = s x β 0 and β 1 = r x β 1. For convenience, we drop the subscript x in λ x, and denote the true value of β = (β 0, β 1 ) and β = ( β 0, β 1 ) by β 0 = (β 00, β 01 ) and β 0 = ( β 00, β 01 ), respectively. We present the asymptotic properties in terms of the scaled penalized estimators β λ = ( β λ0, β λ1 ) that maximizes (1.8), and β N λ = ( β λ0 N, β λ1 N ) when using the nonparametrically estimated marginals (1.9). Without loss of generality, we assume that the bandwidths h j for estimating ˆF j X in (1.9) are on the same order of h for estimating the copula function in (1.8), and use a common kernel density K for both marginal and copula estimation. We now present the regularity conditions on the conditional copula density in (A1)-

25 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas17 (A5) needed for establishing the asymptotic results, collectively as Conditions (A), which are analogous to those in Abegaz et al. (2012). With a slight abuse of notation, denote l{g 1 (η); u 1, u 2 } = log c{u 1, u 2 ; g 1 (η)}, l (η; u 1, u 2 ) = ( / η)l{g 1 (η); u 1, u 2 }, and similarly for l (η; u 1, u 2 ) and l (η; u 1, u 2 ). Define l 1,s (η; u 1, u 2 ) = ( 2 / η u s )l{g 1 (η); u 1, u 2 } for s = 1, 2. (A1) The conditional copula density c{u 1, u 2 ; θ(x)} has a common support in [0, 1] 2 R. There exists an open set Θ containing the true parameter θ(x), such that for almost all (u 1, u 2 ), c(u i, u 2 ; θ) has the third derivative w.r.t. u 1, u 2 and θ for all θ Θ. (A2) The functions l, l, l, l and l i,s for s = 1, 2, are bounded and continuous. Moreover, l is a Lipschitz continuous trivariate function. (A3) E θ [l {g(θ); U 1, U 2 }] = 0 for all θ Θ. Moreover, I(θ) = E θ [l {g(θ); U 1, U 2 } 2 ] = E θ [l {g(θ); U 1, U 2 )] is positive and continuously differentiable on Θ. (A4) There exist functions Q 1 and Q 2 such that l {g(θ); u 1, u 2 } Q 1 (u 1, u 2 ), l {g(θ); u 1, u 2 } Q 2 (u 1, u 2 ) for all θ Θ, and E θ {Q 2 1(U 1, U 2 )} and E θ {Q 2 2(U 1, U 2 )} are uniformly bounded on Θ. (A5) For some a j 0 and c 1 > 0, l {g(θ); u 1, u 2 } c 1 2 j=1 {u j(1 u j )} a j, such that E{ 2 j=1 {U j(1 U j )} a j } <. Moreover, for some b i a j, 1 j k 2 and c 2 > 0, l 1,j {g(θ); u 1, u 2 } c 2 {u k (1 u k )} a k {uj (1 u j )} b j, such that E[{U k (1 U k )} a k {U j (1 U j )} ɛ j b j }] < for some ɛ j (0, 1/2). Conditions (A1) (A4) are standard, (A5) allows the score and its partial derivatives w.r.t. u 1 and u 2 to possibly diverge at the boundaries. This makes the results applicable for some commonly used copula models, such as Gaussian, Student-t, Clayton or Gumbel copula. The conditions on the bandwidth h, the penalty function P λx ( ) and the shrinkage

26 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas18 parameter λ are summarized in Conditions (B). We suppress the dependence of h and λ on n, and denote a n = P λ ( β 01 ) and b n = (nh) 1/2. (B1) nh 2+δ for some δ > 0, and n 1/5 h <. (B2) a n /h 2 0, and a 2 nnh 0, i.e., a n = o(b n ). (B3) P λ ( β 01 ) 0, and lim inf lim inf P λ(θ)/λ > 0. n θ 0 + Condition (B1) is to ensure that the bias dominates the variance from the copula and marginal estimation, (B2) guarantees that the strength of the true signal dominates the bias and the variance, thus the existence of a b n -consistent penalized estimator. Condition (B3) makes the penalty negligible to the likelihood and the singularity at the origin for achieving a sparse solution, which is fulfilled by the SCAD penalty (Fan and Li, 2001). Lastly Conditions (C) collect the standard requirements on other relevant quantities. (C) The parameter function η( ) has the uniformly bounded second derivative. The monotone link function g is invertible with g 0, and g 1 has the third continuous derivative. The density f of X has the first continuous derivative, and for each x in the domain of X, there exists some neighbourhood R x such that inf x R x f(x ) > 0. The kernel density K is symmetric and bounded with compact support on [ 1, 1]. Define γ r = x r K(x)dx, and N 2 and S 2 are 2 2 matrices which have γ i+j 2 and γ i+j 1 on the (i, j) th entry, respectively. Denote η(x, X) = η(x) + η (x)(x x) = β 0 + β 1 (X x), and η 0 (x, X) = β 00 + β 01 (X x) when evaluated at true β 0, let Σ x = I{θ(x)}f(x)N 2, Λ x = I{θ(x)}f(x) S 2, x

27 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas19 Z i = ( 1, X ) i x, h M 0 (Y 1i, Y 2i, X i ) = hl i{ η 0 (x, X i ); F 1 X (Y 1i x), F 2 X (Y 2i x)}z i K h (X i x). Theorem 1 concerns the asymptotic property of the copula parameter estimation when the true marginals are used. It states that the penalized method will estimate a local slope exactly as zero, if the underlying value is indeed so, and performs asymptotically as well as the unpenalized local linear estimator considered in Acar et al. (2011). Theorem 3. Assume that Conditions (A), (B) and (C) hold. (Consistency) If λ 0 as n, then β λ β 0 = O p (b n ), where b n = (nh) 1/2. (Sparsity) If λ 0 and nλ as n, then, for the local maximizer β λ of Q( β; x) satisfying β λ β 0 = O p (b n ), P ( β λ1 = 0 β 01 = 0) 1. (Normality) If nh 5 / log n = O(1) and nh 3 / log 2 n, then (Σ 1 x Γ xσ 1 x ) 1/2 [ nh( βλ β 0 ) (Σ 1 x hσ 1 x E{M 0 (Y 1, Y 2, X)} Λ x Σ 1 )(nh) 1/2 n x ] D N(0, I 2 ), where Γ x is a 2 2 matrix with (Γ x) rs = I{θ(x)}f(x) x r+s 2 K 2 (x)dx, and I 2 is the 2 2 identity matrix. The next theorem considers the case when the marginals are nonparametrically estimated by (1.9). Denote z = (1, (w x)/h), and define M 1 (Y 1i, X i ) = h l 1,1 { η 0 (x, w); F 1 X (y 1 x), F 2 X (y 2 x)} [ Kh (X i x) ] E{K h (X x)} I(Y 1i y 1 ) F 1 X (y 1 x) zk h (w x)dh(y 1, y 2 ; w), M 2 (Y 2i, X i ) = h l 1,2 { η 0 (x, w); F 1 X (y 1 x), F 2 X (y 2 x)} [ Kh (X i x) ] E{K h (X x)} I(Y 2i y 2 ) F 2 X (y 2 x) zk h (w x)dh X (y 1, y 2 ; w).

28 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas20 Theorem 4. Assume that Conditions (A), (B) and (C) hold. In addition, the conditional marginal c.d.f. F j X satisfy the conditions (R1) and ( R3) in Veraverbeke et al. (2011). (Consistency) If λ 0 as n, then β N λ β 0 = O p (b n ), where b n = (nh) 1/2. (Sparsity) If λ 0 and nλ as n, then, for the local maximizer β N λ satisfying β N λ β 0 = O p (b n ), P ( β λ1 N = 0 β 01 = 0) 1. (Normality) If nh 5 / log n = O(1) and nh 3 / log 2 n, then (Σ 1 x E [ Γ xσ 1 x ) 1/2 nh( βn λ β 0 ) (Σ 1 x hσ 1 x { M 0 (Y 1, Y 2, X) + n 1 n M 1(Y 1, X) + n 1 n M 2(Y 2, X) Λ x Σ 1 x )(nh) 1/2 n }] D N(0, I 2 ). We see from Theorem 2 that the asymptotic covariance originated from the copula parameter estimation dominates those from the nonparametric marginals, thus is the same as in theorem 1. The bias is inflated by M 1 (Y 1, X) and M 2 (Y 2, X) due to the nonparametric estimation of the conditional marginals. Similar to Theorem 1, it is not surprising that this penalized estimator has the same asymptotic behaviour as its unpenalized nonparametric counterpart considered in Abegaz et al. (2012). 1.4 Simulation Study In this section, we examine the performance of the proposed penalized estimation of various types of conditional copula functions. We present the results using the data generated from the Clayton family, while the cases of Gumbel and Frank families lead to similar conclusions and are omitted for conciseness. The inverse link is θ = g 1 (η) for η > 0, so that the copula parameter is in a proper range. To assess the performance under different scenarios, we use three copula parameter functions: η 1 and η 2 are smoothly joint

29 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas21 piecewise linear and piecewise quadratic, respectively, and η 3 (X i ) is global quadratic, η 1 (x) = 2 9 I(x 3.8) + { (x 3.8)2 }I(x > 3.8)I(x 3.9) +(x 3.62)I(x > 3.9), η 2 (x) = 2 { 2 }I(x 9 I(x 2.9) I(x 2.9)2 > 2.9)I(x 3) { 2 } +{4.1 10(x 3.5) 2 }I(x > 3)I(x 4) (x 4.1)2 (1.11) I(x > 4)I(x 4.1) + 2 I(x > 4.1), 9 η 3 (x) = 1 + 5(x 3.5) 2. To visualize the strength of the dependence, we display these functions in Figure 1.2 on a common scale using the Kendall s tau (Trivedi and Zimmer, 2007) τ(x) = C(u 1, u 2 ; x)dc(u 1, u 2 ; x) 1, and simple calculation yields τ(x) = θ(x)/{θ(x) + 2} for the Clayton copula. We first generate n = 1000 independent copies of the covariate X i from U[2, 5], then generate (U 1i, U 2i ) from the Clayton copula given θ(x i ) = η(x i ), and further obtain Y ki = F 1 k X i (U ki X i ) using F k X = Φ, the c.d.f. of N(0, 1). When using the estimated marginals, we calculate Ûki = ˆF k Xi (Y ki X i ) with ˆF k X obtained by (1.9), k = 1, 2, i = 1,..., n. The Epanechnikov kernel K(u) = 3/4(1 u 2 )I( u 1) is used, and the bandwidths h k are selected together with the copula parameter estimation by slightly modifying the cross-validated likelihood (1.10). For assessment, we use a dense grid of 100 equally spaced points on [2, 5] and apply the local penalized estimation at each point. We examine the estimated Kendall s tau functions ˆτ(x) as the dependence measure, and define the integrated squared bias

30 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas22 Figure 1.2: Kendall s tau functions τ j (x) that correspond to η j (x) in (1.11) for j = 1, 2, 3 (from left to right). (IBIAS 2 ), variance (IVAR) and mean square error (IMSE), IBIAS 2 (ˆτ) = IVAR(ˆτ) = IMSE(ˆτ) = [E{ˆτ(x)} τ(x)] 2 dx, χ E([ˆτ(x) E{ˆτ(x)}] 2 )dx, χ E[{ˆτ(x) τ(x)} 2 ]dx = IBIAS 2 + IVAR, χ which are approximated with 200 Monte Carlo runs. To evaluate the detection of zero slopes, we define the correct zero coverage (CZ) as proportion of true zero slopes that are correctly identified on the grid, while the nonzero coverage (CNZ) as the proportion of true nonzero slopes that are correctly identified. For comparison, we also perform unpenalized local linear estimation using true and estimated marginals, respectively (see Acar et al., 2011; Abegaz et al., 2012, for detail procedures). The results are summarized in Table 1.1.

31 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas23 Table 1.1: Comparisons between the proposed penalized estimation and the unpenalized local linear estimation, using the true and nonparametrically estimated conditional marginals, respectively, for the Clayton family. The underlying models correspond to η j defined in (1.11), where the IBIAS 2, IVAR and IMSE with their standard errors in parenthesis are with respect to Kendall s tau functions τ j (multiplied by 100 for visualization), j = 1, 2, 3. Marignal True Nonparametric Model Penalized Unpenalized Penalized Unpenalized CZ 81.1%(.0036) %(.0032) - CNZ 95.6%(.0040) %(.0045) - η 1 IBIAS (.0603).8943(.1809).3602(.0625) 1.028(.2520) IVAR.5107(.0201) 8.347(.1485).5602(.0249) 8.288(.1819) IMSE.7707(.0695) 9.242(.3269).9204(.0751) 9.317(.4293) CZ 91.0%(.0128) %(.0131) - CNZ 85.1%(.0089) %(.0086) - η 2 IBIAS (.0773) 15.38(1.885) 1.211(.0995) 15.37(1.938) IVAR 1.120(.0534).2387(1.632) 1.565(.0642).2610(1.665) IMSE 2.169(.1208) 15.61(3.515) 2.777(.1517) 15.64(3.602) CZ CNZ 100%(.0000) - 100%(.0000) - η 3 IBIAS (.0340).6527(.1006) 1.314(.0391).9142(.1220) IVAR.4462(.0146) 2.040(.0728).6506(.0189) 1.937(.0903) IMSE 1.898(.0395) 2.693(.1719) 1.964(.0475) 2.851(.2113) We can see that the proposed penalized estimation is capable of correctly identifying the majority of both zero and nonzero slopes in all three cases. Regarding the estimation, the penalized estimators improve both the mean squared error for all cases, while the gains for η 1 and η 2 are more pronounced. Therefore, although the penalized estimators behave asymptotically as well as the unpenalized ones, they in fact achieve more favourable finite sample performance in our simulations. One also observes slightly increased error from using the nonparametrically estimated marginals for both penalized and unpenalized methods. We conduct copula selection using a two-fold CVPE (Acar et al., 2011) among three Archimedean families, Clayton, Gumbel and Frank, and observe that the Clayton copula has been correctly chosen over 95% of all Monte Carlo runs. The results are summarized in Table 1.2.

32 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas24 Table 1.2: Proportion(%) of correct identified copula in each family under the calibration functions, η 1 (x), η 2 (x) and η 3 (x). T rue N onparametric calibration Clayton Frank Gumbel Clayton Frank Gumbel η 1 (x) Proportion η 2 (x) η 3 (x)

33 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas Application to Twin Birth Data In this section, we consider the Matched Multiple Birth Dataset that contains all US multiple births from 1995 to To be specific, we include live twin births with babies who survived beyond the first year and mothers of age between 18 and 40, and a random subset of 30 pairs of births at each week of the gestational age ranging from 28 to 42 weeks. Of interest is to study the dependence between the twin birth weights (in grams), denoted by BW 1 and BW 2, conditional on the gestational age (in weeks), denoted by GA. For completeness, we treat the unknown conditional marginals with both parametric and nonparametric estimation. In the former, we follow the suggestion by Acar et al. (2011), fitting a cubic polynomial model with the response BW ki and the covariate GA i, k = 1, 2 respectively. Denote the fitted values of BW ki by ˆµ k (GA i ) and the error variance by ˆσ k 2, then calculate Ûki = Φ[ˆσ 1 k {BW ki ˆµ k (GA i )}], k = 1, 2, i = 1,..., n. For the nonparametric case, we compute Ûki = ˆF k X (BW ki GA i ), where ˆF k X obtained by (1.9). The cross-validated likelihood (1.10) is used to tune the bandwidths used for estimating the copula as well as the marginals, and the tuning parameter λ x is chosen by BIC (1.10). To visualize the marginal transforms, the scatterplots with marginal histograms of BW ki are shown in Figure 1.3(a). In contrast, the transformed data, Ûki, using the parametric and nonparametric marginals are given in Figure 1.3(b) and 1.3(c), respectively. We perform the proposed penalized estimation under three common Archimedean copula families: Clayton, Frank, and Gumbel. For comparison, we also obtain unpenalized local linear estimates using the parametric and nonparametric marginals, respectively. The copula parameter function is expressed in the form of Kendall s tau to have the same scale across different copula families, shown in Figure 1.4, along with the 95% bootstrap confidence bands. It is interesting to see that, in all cases, the penalized es-

34 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas26 (a) (b) (c) Figure 1.3: Scatterplots and histograms for the original twin birth weights (left) and the transformed responses using the parametric (middle) and nonparametric (right) marginal estimates, respectively. timation features a constant dependence relationship between the twin birth weights, given the gestational age around 34 to 36 weeks. Such a phenomenon has not been revealed by other methods in previous studies. The copula selection is conducted using a two-fold CVPE and favours the Clayton family in both marginal settings. It is also

35 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas27 Figure 1.4: The Kendall s tau of estimated copula parameter functions under three Archimedean couples: Clayton, Frank and Gumbel. The top panels correspond to parametric marginals, and the bottom panels correspond to nonparametric marginals. In each panel, shown are the penalized estimate (solid) with 95% bootstrap confidence bands (dotted), as well as the unpenalized estimate (dotdashed). noted that, based on 5-fold cross-validation using 20 random repetitions, the penalized estimates achieve higher likelihoods compared to the unpenalized estimates in all three families.

36 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas Proofs of Main Theorems Proof of Theorem 1. Recall a n = P λ ( β 01 ) and b n = (nh) 1/2, let α n = b n + a n. With a slight abuse of notation, let l i ( β) = l(g 1 { η(x, X i )}; u 1, u 2 ) u1 =U 1i,u 2 =U 2i and similarly for l i( β), l i ( β), l i ( β) and l i1,s ( β) for s = 1, 2, i.e., suppress the last two arguments when no confusion arises, where η(x, X i ) = s 1 x β 0 + rx 1 β 1 (X i x). Denoting Q( β) = Q( β; x), we aim to show that for any ɛ > 0, there exists a sufficiently large constant C such that P { sup Q( β 0 + α n v) < Q( β 0 )} 1 ɛ, (1.12) v =C which implies β λ β 0 = O p (α n ). Denote L( β) = n l i( β)k h (X i x)/k h (0), Q( β 0 + α n v) Q( β 0 ) L( β 0 + α n v) L( β 0 ) + m x {P λ ( β 01 + α n v 1 ) P λ ( β 01 )} = {l i ( β 0 + α n v) l i ( β 0 )} + m x {P λ ( β 01 + α n v 1 ) P λ ( β 01 )} = 1 l i( β 0 ) K h(x i x) (X i x) r K r=0 h (0) r x I(r = 1) + s x I(r = 0) α nv r + 1 { 1 1 l i ( β 2 0 ) K h(x i x) (X i x) r (X i x) s K r=0 s=0 h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0) } αnv 2 r v s + 1 { l i (β ) K h(x i x) (X i x) r (X i x) s 6 K r=0 s=0 t=0 h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0) (X i x) t } r x I(t = 1) + s x I(t = 0) α3 nv r v s v t + m x {P λ ( β 01 + α n v 1 ) P λ ( β 01 )} = A 1 + A 2 + A 3 + A 4, where β lie between β 0 and ( β 0 + α n v). We now show A 1 = O p (nhα 2 n) v, A 2 O p (nhα 2 n) v 2, A 3 o p (nhα 2 n) v 2 and A 4 o p (nhα 2 n) v 2. Thus A 2 dominates

37 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas29 other terms and (1.12) holds. One has { 1 A 1 = l i( β u ) K h(x i x) (X i x) r } K r=0 h (0) r x I(r = 1) + s x I(r = 0) α nv r (1 + o p (1)) + 1 { 1 1 l i (β ) K h(x i x) (X i x) r (X i x) s 2 K r=0 s=0 h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0) ( β us β } 0s )α n v r = A 5 + A 6, where β lies between β 0 and β u. It suffices to show A 5 = 0 and A 6 O p (nhα 2 ) v, such that A 1 O p (nhα 2 ) v. As β u is the minimizer of L( β), then L( β)/ β β= βu = 0, i.e., A 5 = 0. For the term A 6, denote M irs = l i (β ) K h(x i x) (X i x) r (X i x) s K h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0). Using standard Taylor expansion, one can show that m x = O p (nh), s x = O p (1), r x = O p (h), (1.13) and for r = 0, 1, 1 n K h (X i x) X i x r = O p (h r ), 1 n K h (X i x) X i x r+2 = O p (h r+2 ).(1.14) Furthermore, using Theorem 1 in Acar et al. (2011), we have β u0 β 00 = β u1 β 01 = O p (b n ), where β u = ( β u0, β u1 ) denotes the unpenalized local linear estimator of β. Then A 6 = r=0 s=0 1 M irs α n ( β us β 0s )v r

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach 1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and

More information

Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Dependence Calibration in Conditional Copulas: A Nonparametric Approach Biometrics DOI: 10.1111/j.1541-0420.2010.01472. Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Department of Statistics, University of

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Conditional Copula Models for Right-Censored Clustered Event Time Data

Conditional Copula Models for Right-Censored Clustered Event Time Data Conditional Copula Models for Right-Censored Clustered Event Time Data arxiv:1606.01385v1 [stat.me] 4 Jun 2016 Candida Geerdens 1, Elif F. Acar 2, and Paul Janssen 1 1 Center for Statistics, Universiteit

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Bayesian Inference for Conditional Copula models with Continuous and Binary Responses

Bayesian Inference for Conditional Copula models with Continuous and Binary Responses Bayesian Inference for Conditional Copula models with Continuous and Binary Responses Radu Craiu Department of Statistics University of Toronto Joint with Avideh Sabeti (Toronto) and Mian Wei (Toronto)

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

A Goodness-of-fit Test for Copulas

A Goodness-of-fit Test for Copulas A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

A measure of radial asymmetry for bivariate copulas based on Sobolev norm

A measure of radial asymmetry for bivariate copulas based on Sobolev norm A measure of radial asymmetry for bivariate copulas based on Sobolev norm Ahmad Alikhani-Vafa Ali Dolati Abstract The modified Sobolev norm is used to construct an index for measuring the degree of radial

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Bayesian Inference for Conditional Copula models

Bayesian Inference for Conditional Copula models Bayesian Inference for Conditional Copula models Radu Craiu Department of Statistical Sciences University of Toronto Joint with Evgeny Levi (Toronto), Avideh Sabeti (Toronto) and Mian Wei (Toronto) CRM,

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Statistical testing of covariate effects in conditional copula models

Statistical testing of covariate effects in conditional copula models Electronic Journal of Statistics Vol. 7 (2013) 2822 2850 ISSN: 1935-7524 DOI: 10.1214/13-EJS866 Statistical testing of covariate effects in conditional copula models Elif F. Acar Department of Statistics

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL A Thesis Presented to the Faculty of San Diego State University In Partial Fulfillment of the Requirements for the Degree

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Copulas. MOU Lili. December, 2014

Copulas. MOU Lili. December, 2014 Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Statistics for high-dimensional data: Group Lasso and additive models

Statistics for high-dimensional data: Group Lasso and additive models Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION Alireza Bayestehtashk and Izhak Shafran Center for Spoken Language Understanding, Oregon Health & Science University, Portland, Oregon, USA

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

arxiv: v3 [stat.me] 25 May 2017

arxiv: v3 [stat.me] 25 May 2017 Bayesian Inference for Conditional Copulas using Gaussian Process Single Index Models Evgeny Levi Radu V. Craiu arxiv:1603.0308v3 [stat.me] 5 May 017 Department of Statistical Sciences, University of Toronto

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Financial Econometrics and Volatility Models Copulas

Financial Econometrics and Volatility Models Copulas Financial Econometrics and Volatility Models Copulas Eric Zivot Updated: May 10, 2010 Reading MFTS, chapter 19 FMUND, chapters 6 and 7 Introduction Capturing co-movement between financial asset returns

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Estimation of multivariate critical layers: Applications to rainfall data

Estimation of multivariate critical layers: Applications to rainfall data Elena Di Bernardino, ICRA 6 / RISK 2015 () Estimation of Multivariate critical layers Barcelona, May 26-29, 2015 Estimation of multivariate critical layers: Applications to rainfall data Elena Di Bernardino,

More information

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite

More information

On the Choice of Parametric Families of Copulas

On the Choice of Parametric Families of Copulas On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University of Toronto Collaborators: Mariana Craiu, University Politehnica, Bucharest Vienna, July 2008 Outline 1 Brief

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

arxiv: v2 [stat.me] 4 Jun 2016

arxiv: v2 [stat.me] 4 Jun 2016 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates 1 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates Ben Sherwood arxiv:1510.00094v2

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information