Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification. Jialin Zou
|
|
- Shonda Preston
- 5 years ago
- Views:
Transcription
1 Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification by Jialin Zou A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Statistical Science University of Toronto c Copyright 2015 by Jialin Zou
2 Abstract Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification Jialin Zou Doctor of Philosophy Graduate Department of Statistical Science University of Toronto 2015 Nonparametric estimation is a novelty statistical method which relaxes the distribution assumption about the relationship between response and covariate, in contrast to parametric estimation. This method has been applied in many field of interest, including density function, regression model and derivative function. One of the important application of nonparametric estimation is modelling dependence among random variables via copula approaches has attracted considerable research attention. With advances in data collection, the strength of dependence often varies according to some covariate, which motivates the dependence calibration using conditional copulas. We propose a penalized estimation framework for the copula parameter function that inherits the flexibility of a nonparametric method and, at the same time, yields a parsimonious and interpretable dependence structure. The theoretical analysis guarantees that the penalized estimators enjoy the oracle properties and behave asymptotically as well as their nonparametric counterparts, while numerical experiments demonstrate the improved empirical performance. We then apply the proposed method to a twin birth weights data. Another important application of nonparametric estimation is classifying the functional data. We consider the classification of sparse functional data that are often encountered in longitudinal studies and other scientific experiments. To utilize the information from not only the functional trajectories but also the observed class labels, we propose a probability enhanced method achieved by weighted support vector machine ii
3 based on its Fisher consistency property to estimate the effective dimension reduction space. Since only a few measurements are available for some, even all, individuals, a cumulative slicing approach is suggested to borrow information across individuals. We provide justification for validity of the probability-based effective dimension reduction space, and a straightforward implementation that yields a low-dimensional projection space ready for applying standard classifiers. The empirical performance is illustrated through simulated and real examples, particularly in contrast to classification results based on the prominent functional principal component analysis. iii
4 Dedication This thesis is dedicated to my parents. iv
5 Acknowledgements First, I would like to express my deepest thanks to my supervisor, Professor Fang Yao, for his guidance, enthusiasm and patience during my research. Without his help, finishing the thesis is impossible. Secondly, I also thank my thesis committee members, especially, Professor Radu V. Craiu from University of Toronto and Professor Yichao Wu from North Carolina State University for their feedback and comments on my thesis. Thirdly, I am grateful to the faculty members, Professor Sheldon Lin, Professor Radford Neal, Professor Nancy Reid, Professor Jeffrey S. Rosenthal, Professor Mike Evans and Professor Lawrence J. Brunner for teaching me statistical courses. I also express many thanks to all the staff at Department of Statistical Science, especially Andrea Carter, Christine Bulguryemez and Dermot Whelan for the support and help during my PhD program. Furthermore, I express many thanks to the graduate students at Department of Statistical Science for their support and help. Finally, I would like to thank my father and mother for their encouragement and love. v
6 Contents 1 Interpretable Dependence Calibration in Conditional Copulas Introduction Proposed Methodology Asymptotic Properties Simulation Study Application to Twin Birth Data Proofs of Main Theorems PEFCS for Classifying Sparse Functional Data Introduction Proposed Methodology Simulations Data Examples Concluding Remarks Bibliography 74 vi
7 List of Tables 1.1 Comparisons between the proposed penalized estimation and the unpenalized local linear estimation, using the true and nonparametrically estimated conditional marginals, respectively, for the Clayton family. The underlying models correspond to η j defined in (1.11), where the IBIAS 2, IVAR and IMSE with their standard errors in parenthesis are with respect to Kendall s tau functions τ j (multiplied by 100 for visualization), j = 1, 2, Proportion(%) of correct identified copula in each family under the calibration functions, η 1 (x), η 2 (x) and η 3 (x) The average classification error ( 100%) with its standard error in parenthesis obtained from 100 Monte Carlo repetitions in Simulation I The average classification error ( 100%), with its standard error in parenthesis obtained from 20 random partitions of the Berkeley growth data The average classification error ( 100%), with its standard error in parenthesis obtained from 20 random partitions of the spinal bone density data The average classification error ( 100%), with its standard error in parenthesis obtained from 20 random partitions of the primary biliary cirrhosis follow-up data vii
8 List of Figures 1.1 The Kendall s tau of the conditional copula estimates: local linear unpenalized estimate(dashed line) Kendall s tau functions τ j (x) that correspond to η j (x) in (1.11) for j = 1, 2, 3 (from left to right) Scatterplots and histograms for the original twin birth weights (left) and the transformed responses using the parametric (middle) and nonparametric (right) marginal estimates, respectively The Kendall s tau of estimated copula parameter functions under three Archimedean couples: Clayton, Frank and Gumbel. The top panels correspond to parametric marginals, and the bottom panels correspond to nonparametric marginals. In each panel, shown are the penalized estimate (solid) with 95% bootstrap confidence bands (dotted), as well as the unpenalized estimate (dotdashed) Height trajectories of 39 boys (top) and 54 girls (bottom) from the Berkeley growth data Spinal bone density data for Hispanic female (top) and male (bottom) Logarithm-transformed measurements of serum bilirubin for the patients that are alive (top) or dead (bottom) beyond ten years from the primary biliary cirrhosis data viii
9 Chapter 1 Interpretable Dependence Calibration in Conditional Copulas: A Penalized Approach 1.1 Introduction One of the challenging problems in statistics is how to characterize dependence structure among random variables. Although correlation is easy to obtain for measuring linear association as a common means, a full characterization of dependence among random variables is desirable but more difficult. Pioneered by Sklar s theorem (Sklar, 1959), copula has become a powerful tool for modelling dependence structure. To be specific, denote the marginal distributions of random variables Y 1 and Y 2 by F 1 and F 2, and their joint distribution by H, then the existence and uniqueness of the copula function C are guaranteed by Sklar s theorem, H(y 1, y 2 ) = C{F 1 (y 1 ), F 2 (y 2 )}. Along with theoretical developments, the applications of copula have also flourished, e.g., in finance and insurance (Frees and Valdez, 1998; Embrechts and Straumann, 2002; Cherubini et al., 2004), survival analysis (Clayton, 1978; Shih and Louis, 1995; Hougaard, 2000; Wang and Wells, 2000), among others. Although the ordinary copula has been widely studied, it failed to enjoy additional information from covariates. The conditional copula was recently proposed by introducing covariate into the copula modelling. By extending the 1
10 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas2 Sklar s theorem (Patton, 2006), the existence and uniqueness of a conditional copula are also guaranteed and covariate adjustment is brought into conditional distributions for improving estimation of copula parameter. For instance, based on information from a covariate X, the dependence surface between Y 1 and Y 2 can be modelled by C(U 1, U 2 ; X), where U 1 = F 1 X (Y 1 X) and U 2 = F 2 X (Y 2 X) with F j X being the conditional marginal cumulative distribution functions (c.d.f.). Patton (2006) showed that the conditional joint distribution for each X = x is uniquely defined by H(y 1, y 2 ; x) = C(U 1, U 2 ; x) for (y 1, y 2 ) in the support of (Y 1, Y 2 ). A copula family defined by C is often indexed by a parameter θ that plays a critical role in determining the dependence structure. As a consequence, the estimation of the copula parameter is of particular interest. Common approaches for estimating a single copula parameter include the maximum likelihood method (Genest and Rivest, 1993; Joe, 1997), or alternatively the nonparametric kernel method (Fermanian and Scaillet, 2003; Chen and Huang, 2007). For estimating the conditional copula parameter, we refer readers to Bartrama et al. (2007), Jondeau and Rockinger (2006) and Patton (2006) for parametric estimation. Due to the limitation of a priori, nonparametric estimation of the functional relationship between the conditional copula parameter θ and a covariate X has been called for. Gijbels et al. (2011) proposed empirical estimators that are fully nonparametric and studied the asymptotic properties in Veraverbeke et al. (2011). Acar et al. (2011) modelled the conditional copula as an unknown function of X, and expanded it around each point by utilizing the local polynomial technique (Fan and Gijbels, 1996), while treating the conditional marginal distributions as known. Abegaz et al. (2012) extended this framework with a nonparametric kernel estimator for the conditional marginals. Although a nonparametric estimation of the conditional copula is flexible, it some-
11 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas3 times can be hazardous to have overly fluctuated dependence over some range of the covariate. This is inherited from nonparametric regression and might not respect the underlying relationship. This consideration motivates us to inspect the Kendall s tau of the conditional copula estimates obtained by local linear modelling for the twin birth weight data from the Matched Multiple Birth Dataset of the National Centre for Health Statistics (Acar et al., 2011), shown in Figure 1.1. A careful inspection may question that the dependence strength does not necessarily change in the middle region. Moreover, should the dependence in the middle be constant, the copula model is more parsimonious with an enhanced interpretation. Figure 1.1: The Kendall s tau of the conditional copula estimates: local linear unpenalized estimate(dashed line). We tackle this problem by introducing a penalized estimation framework that detects the region over which the dependence structure is potentially constant to remove undesirable fluctuation (Yao and Zou, 2015). There have emerged extensive penalty approaches in high-dimensional literature, such as the nonnegative garrote (Breiman,
12 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas4 1995), the (adaptive) LASSO (Tibshirani, 1996; Zou, 2006), the Smoothly Clipped Absolute Deviation (SCAD) (Fan and Li, 2001), among others. Some of these approaches have also been adopted to nonparametric estimation for identifying regions of particular interest, among which a relevant work is Kong et al. (2013) that coupled local polynomial regression with the SCAD in varying coefficient model to identify nonzero regions. In this chapter, we introduce a specially designed penalty in the context of local likelihood for conditional copulas, so that the resultant estimator respects the underlying dependence relationship. Our proposal enjoys the flexibility of nonparametric estimation without suffering from unnecessary fluctuations. A main contribution is the theoretical analysis which guarantees that the proposed method enjoys the oracle properties and behaves asymptotically as well as its nonparametric counterparts, while the numerical study illustrates its superior finite sample performance. In the following, we briefly review the relevant topics that are involved in our proposed methodology for interpretable dependence calibration in conditional copulas Introduction to Copula Modeling A copula is used to describe the association of two or more random variables from any joint distribution. It can be defined as follow. Definition 1. A bivariate copula is a joint distribution of two uniform distributions, i.e., C(u 1, v 1 ) = P (U 1 u 1, V 1 v 1 ), (1.1) where U 1 Uniform(0, 1) and V 1 Uniform(0, 1). Equivalently, a copula has the following properties.
13 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas5 Proposition 1. A bivariate copula is a two dimensional function C whose support and range are [0, 1] 2 and [0, 1] and has the following properties: 1. C(0, u) = C(u, 0) = 0 and C(1, u) = C(u, 1) = u, u [0, 1]. 2. C(u 1, v 1 ) + C(u 2, v 2 ) C(u 1, v 2 ) C(u 2, v 1 ) 0 for all u 1, u 2, v 1, v 2 [0, 1] such that u 1 u 2 and v 1 v 2. The central theory about copula is Sklar s theorem (1959), which demonstrates the relationship among copula, joint distribution and marginal distribution. Theorem 1. (Sklar s Theorem) Suppose that H is the joint distribution for continuous random variables Y 1 and Y 2 with marginal distributions F 1 and F 2, then there exists a unique copula C such that H(y 1, y 2 ) = C{F 1 (y 1 ), F 2 (y 2 )}, for all (y 1, y 2 ) R 2. (1.2) Conversely, if F 1 and F 2 are distribution functions and C is a copula, then the function H defined by (1.2) is the joint distribution function with margin distributions F 1 and F 2. The detail proof can be found in Schweizer and Sklar (1983). Sklar s theorem guarantees the existence and uniqueness of the so-called copula function C. After the ordinary copula has been widely applied, the conditional copula was recently proposed by introducing the covariate into the copula function. Definition 2. (Conditional copula). The conditional copula C(Y 1, Y 2 X) is the joint distribution of U 1 = F 1 X (Y 1 X) and V 1 = F 2 X (Y 2 X) given X, where Y 1 X F 1 X ( X) and Y 2 X F 2 X ( X). Patton (2006) extended the Sklar s theorem to guarantee the existence and uniqueness of the conditional copula.
14 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas6 Theorem 2. (Sklar s Theorem for the conditional distributions) Suppose that Y 1 X = x and Y 2 X = x have conditional distributions F 1 X ( x) and F 2 X ( x), respectively, H X ( ; x) is the joint distribution of Y 1 and Y 2 given X = x, where the support of X is χ, then there exists a unique copula C( x) such that H X (y 1, y 2 ; x) = C(F 1 X (y 1 x), F 2 X (y 2 x); x), (1.3) if F 1 X ( x) and F 2 X ( x) are continuous in y 1 and y 2. Conversely, Y 1 X = x and Y 2 X = x have conditional distributions F 1 X ( x) and F 2 X ( x), respectively, and C( x) is a conditional copula which is measurable in x, then the function H X ( x) defined by (1.3) is the joint distribution function with margin distributions F 1 X ( x) and F 2 X ( x). Detail of the proof can be found in Patton (2002). Sklar s Theorem for the conditional distributions guarantees the existence and uniqueness of the so-called copula function C after bringing in the covariate, it extends the flexibility of copula. Note based on the theorem, all of F 1 X ( x), F 2 X ( x) and C(, ; x) are conditional on covariate x, otherwise the theorem fails Local Polynomial Regression Smoothing methods is a powerful approach to describe the complex data structure without any stringent assumptions. Many advanced techniques have been extensively studied in the regression setting and more complicated framework. Under the framework of the simple nonparametric regression, we have the model Y = η(x) + ε, ε (0, σ 2 ), (1.4) where η is the smooth function of interest, ε is the noise with mean zero and variance σ 2.
15 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas7 The local polynomial regression (Fan and Gijbels, 1996) is a popular and simple method to (1.4). By Taylor expansion, we approximate η(x) by p th order expansion η(x i ) p j=0 η (j) (x) (X i x) j j! p β j,x (X i x) j. (1.5) j=0 To account for the contribution from X i in the neighborhood of x, we minimize the local mean square error { Y i p j=0 } 2K( β j,x (X i x) j X i x ), (1.6) h where K is a one-dimensional symmetric kernel function and the bandwidth h determines the width of the local window. From (1.5), we obtain the estimator ˆη (j) (x) = j! ˆβ j,x, j = 0,..., p Adopting the idea from local polynomial, one can generalize it to a framework, when the least square loss is not appropriate, and shed light on further development of likelihood-based technique. Suppose that the observation (X i, Y i ) has the log-likelihood l{η(x i ), Y i }, where η(x i ) is to be estimated, the log-likelihood or the loss function for the entire n data points can be written as, l{η(x i ), Y i }. On the consideration of the contribution of the local log-likelihood, using expansion (1.5), the value of ˆη(x) at grid x is given by maximizing the local log-likelihood { p } l β j,x (X i x) j, Y i K( X i x ), h j=0 over β 0,x,..., β p,x. Similarly, the estimators are given by ˆη (j) (x) = j! ˆβ j,x, j = 0,..., p.
16 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas8 The local likelihood approach has been extended to a large number of problems. The generalized additive models (Hastie and Tibshirani, 1990), the generalized linear models (Fan et al., 1995) and the varying coefficient models (Cai et al., 2000) have been developed under this framework. In multiparameter regression, Aerts and Claeskens (2000) proposed the multiparameter likelihood models, among others Smoothly Clipped Absolute Deviation Fan and Li (2001) proposed the Smoothly Clipped Absolute Deviation (SCAD) penalty which yields the estimators with desired properties including sparsity, continuity and unbiasedness. To be specific, sparsity means that the small estimator is being set to zero to achieve the variable selection, and continuity means that the penalty can lead to the continuous estimator so that the model prediction is stable, while unbiasedness means that the estimator is unbiased if the true estimator is large. As a consequence, with the help of SCAD, Fan and Li (2001) provided a novelty approach to achieve the dimension reduction and variable selection simultaneously. The derivative formula is P λ(t) = λ{i(t λ) + (aλ t) + I(t > λ)}, t > 0, for some a > 2, (a 1)λ and the regular formula is P λ (t) = λ t, if t λ, t2 2a t + λ 2, 2(a 1) if λ < t aλ, (a + 1)λ 2, 2 if t aλ, where a = 3.7 is often used and λ is the tuning parameter which controls the penalty
17 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas9 strength. From the formula, we can obtain that the SCAD penalty is continuously differentiable in R except 0 with its derivative being zero in (, aλ) and (aλ, ). Under the multivariate linear regression framework, suppose Y is the response and β is a d 1 coefficient vector associated with the corresponding covariate vector X. We consider the model Y = Xβ + ɛ, where ɛ is from the normal distribution with mean zero and constant variance. With SCAD penalty, we can obtain the resulting estimators through the following equation min β 1,,β d (Y i X i β) 2 + n d P λ ( β j ). j=1 Note when the design matrices are orthonormal, we can obtain the explicit form of the resulting estimator as follow, ˆβ SCAD = ( ˆβ λ) + sign( ˆβ), if ˆβ < 2λ, (a 1) ˆβ sign( ˆβ)aλ, a 2 if 2λ < ˆβ aλ, ˆβ, if ˆβ aλ, where ˆβ is the unpenalized estimator, i.e., the least square estimator. The SCAD penalty has been studied extensively. Fan and Peng (2004) developed the oracle properties with a diverging number of covariates, while Kim et al. (2008) studied the sparsity property in high dimension case. The rest of the chapter is organized as follows. In section 1.2, we describe the proposed methodology, along with the algorithm and the selection of tuning parameters. Asymptotic properties are presented in Section 1.3, where both parametrically and
18 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas10 nonparametrically estimated marginals are considered. We illustrate the empirical performance through a simulation study in Section 1.4, and apply the proposed method to the twin birth weights data in Section 1.5. Technical proofs are deferred section Proposed Methodology Penalized local likelihood for interpretable dependence Let X be a continuous covariate that a pair of continuous response (Y 1, Y 2 ) is conditional on, and recall that the marginal c.d.f. given X are F 1 X and F 2 X. We first focus on the estimation of the copula parameter function, treating the marginals as known, and then extend to the case of estimated marginals. There exists a unique copula function C such that the joint conditional distribution (Y 1, Y 2 ) given X can be expressed as follows, H{y 1, y 2 ; θ(x)} = C{F 1 X (y 1 x), F 2 X (y 2 x); θ(x)} or h{y 1, y 2 ; θ(x)} = c{f 1 X (y 1 x), F 2 X (y 2 x); θ(x)}f 1 X (y 1 x)f 2 X (y 2 x), where f 1 X (y 1 x) and f 2 X (y 2 x) are conditional density functions. Let U 1,x = F 1 X (Y 1 x) and U 2,x = F 2 X (Y 2 x) that are uniformly distributed as U[0, 1], we have (U 1i, U 2i ) X i C{u 1i, u 2i ; θ(x i )}, where θ(x i ) = g 1 {η(x i )} and g is a known monotone link function to guarantee the value of θ in a proper range, i = 1, 2,..., n. We begin with the local polynomial expansion around a fixed x in the support of X, η(x i ) η(x) + η (x)(x i x) η (p) (x)(x i x) p /p!.
19 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas11 Denoting β k,x = η (k) (x)/k!, the copula function is approximated by θ(z) g 1 {β 0,x + β 1 (z x) β p,x (z x) p } for z in some neighbourhood of x and β 0,x = g{θ(x)}. For brevity we suppress the dependence of β k,x on x. It is known that estimating higher degree coefficients leads to larger variability and computational complexity, while a customary choice is a local linear smoother with p = 1 (Fan and Gijbels, 1996). Denoting β = (β 0, β 1 ), the local log-likelihood at x is l(β; x) = log c[u 1i, U 2i ; g 1 {β 0 + β 1 (X i x)}]k h (X i x), (1.7) where K h ( ) = h 1 K( /h), K is a compactly supported kernel density, and h is the bandwidth to control the amount of smoothing. Common choices of K include the Epanechnikov kernel, K(u) = 3 4 (1 u2 )I( u 1), if K is Epanechnikov kernel, where I( ) is the indicator function, as well as the triweight kernel or Gaussian kernel, etc. Our goal is to encourage the nonparametric estimation to stay constant whenever the underlying relationship is indeed so. Note that the local coefficients of higher degrees regulate how the dependence structure varies over the neighbourhood of x. Specifically the local slope β 1 represents the rate of smooth change at x in the copula parameter. To identify the constant region of the dependence, we use a sufficiently dense grid over the domain of X, say {x 1,..., x N } on which the estimates will be attained. If the local slope parameters are zero for a set of consecutive grid points, say {x j,..., x j+l }, we will
20 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas12 regard θ (1) (x) = 0 for x (x j, x j+l ). Given the smooth feature of the local linear fit, the resultant estimator of η( ) will appear constant over the region (x j, x j+l ). The above consideration suggests the penalization to be imposed on the local slope parameters at each grid point. To properly scale the penalty function when coupled with the local log-likelihood l(β; x) in (1.7), we divide l(β) by K h (0) so that K h (X i x)/k h (0) = O(1). At any fixed x, the data contributing to the estimation of η(x) only include those in its local window. Thus we define the effective sample size at x by m x = n K h(x i x)/k h (0). Lastly we standardize each column of the design matrix as in traditional linear model, before coupling with the penalty. Denote the standard deviation of {K 1/2 h (X i x)},...,n by s x, the standard deviation of {(X i x)k 1/2 (X i x)},...,n by r x. Then the local coefficients are scaled as β 0 = s x β 0 and β 1 = r x β 1. This scaling facilitates our asymptotic analysis, thanks to the same convergence rates for the estimates of β 0 and β 1. Now we aim for maximizing the following penalized local log-likelihood with respect to (w.r.t.) β = ( β 0, β 1 ), h Q( β; x) = log c[u 1i, U 2i ; g 1 {s 1 β x 0 + rx 1 β 1 (X i x)}] K h(x i x) + m x P λx ( K h (0) β 1 ), (1.8) where m x = n K h(x i x)/k h (0), P λx ( ) is the penalty function that tends to shrink the local slope β 1 to zero if the true value is, and λ x is the shrinkage parameter. We employ the SCAD penalty that yields estimators with desired consistency and sparsity, { P λ x (t) = λ x I(t λ x ) + (aλ x t) } + I(t > λ x ), t > 0, for some a > 2, (a 1)λ x where a = 3.7 is suggested by Fan and Li (2001). Other choices of P λx ( ) are available, such as the MCP (Zhang, 2010) and the (adaptive) LASSO (Tibshirani, 1996; Zou, 2006).
21 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas13 The adaptive LASSO is a convex penalty function, while MCP and SCAD are non-convex penalty functions. All of them produce continuous and unbiased solution. For practical implementation, to emphasize the constant pattern and remedy the boundary effect, one can make numerical adjustment for the final estimator of η(x) after obtaining ˆβ on all grid points. For instance, we take the ˆβ 0 (x [N/2] ) at the central grid point x [N/2] and use the numerical approximation, where [N/2] denotes the nearest integer inclusively, ˆη a (x) = ˆβ a 0(x) = ˆβ 0 (x [N/2]) + k 1 j=[n/2] ˆβ 1 (x j)(x j+δ x j) + ˆβ 1 (x k)(x x k), where x k is the nearest grid point from x [N/2] to x, and δ = 1 if x > x [N/2] and 1 otherwise. It is easy to verify that this adjusted estimator is asymptotically equivalent, ˆη a (x) ˆη(x) = o p (1) for any x given a sufficiently dense grid. The assumption of known conditional marginals, F 1 X and F 2 X, may be relaxed. If F 1 X and F 2 X can be estimated from a parametric model, the estimated marginals are root-n consistent and the additional error is negligible relative to that from estimating the copula function. If there is no such prior knowledge of the marginals available, one can estimate the conditional marginals using a nonparametric approach and plug the estimates into the above penalized estimation. This inflates the error of the estimated copula function, and is characterized in Section 1.4. For specificity, we use the Nadaraya- Watson estimator suggested by Abegaz et al. (2012) for j = 1, 2, ˆF j X (y x) = ω ni (x, h j )I(Y ji y), ω ni (x, h j ) = K h j (X i x), (1.9) K hj (X k x) k=1 where h j s are bandwidths for controlling the smoothness.
22 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas Optimization algorithm and selection of parameters To maximize the penalized log-likelihood Q( β; x) w.r.t. β = ( β0, β 1 ), we proposed an iterative procedure that modifies the local linear approximation (LLA) algorithm proposed by Zou and Li (2008). Denote the estimates obtained in the mth iteration by β (m) 0 = s x β (m) 0, β (m) 1 = r x β (m) 1, β (m),(m) (m) (m) = ( β 0, β 1 ), η (m),(m) (x) = s 1 x β (m) 0 + r 1 x β (m) 1 (X i x), regard l(β; x) in (1.7) as a function of β, i.e., l( β) = l i ( β)k h (X i x)/k h (0), where the dependence on x is suppressed. We use the unpenalized local linear estimator (Acar et al., 2011) as an initial estimate. The algorithm usually converges within a few iterations from our numerical experience. Update the local intercept hessian 2 l( β)/ β 2 0, β (m) 0 by calculating the gradient l( β)/ β 0 and the β (m+1) 0 = { (m) β 2 l( β (m),(m) ) } 1 l( β (m),(m) ) 0. 2 (m) (m) β 0 β 0 Update the local slope β (m) 1 by the modified LLA algorithm. Denote X x = (X 1
23 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas15 x,..., X n x), µ i = (X i x) β 1, and D = diag(d 11,..., D nn ), 2 l( β (m+1),(m) ) β (m) 2 1 = X x DX x, D ii = 2 l i ( β (m+1),(m) ) µ i µi = µ (m) i. Define the working data y = (D 1/2 11 ˆµ (m) 1,..., D 1/2 compute β (m) 1 as follows. nn ˆµ (m) n ) and X x = D 1/2 X x. We (a) If p (1) (m) λ x ( β 1 ) = 0, β (m+1) 1 = (X x X x) 1 X x y. (b) If p (1) (m) λ x ( β 1 ) > 0, take a further transform X x = λ x X x/p (m) λ x ( β 1 ), and apply the coordinate descent algorithm (Friedman et al., 2007), { β (m+1) 1 1 = arg min β 1 2 y X β x mλ x β } 1. Then the local slope is given by (m+1) β 1 = λ β(m+1) x 1 /p (m) λ x ( β 1 ). It is important to tune the shrinkage parameter λ x that controls the magnitude of β 1 and thus the dependence strength. Suggested by Wang and Leng (2009), we adopt the Bayesian information criterion (BIC), BIC(λ x ) = 2 log c[u 1i, U 2i ; g 1 {ˆη λx (X i )}] K h(x i x) K h (0) + df log m x, where m x = n K h(x i x)/k h (0), ˆη λx (X i ) = ˆβ λx,0 + ˆβ λx,1(x i x) with a subscript λ x emphasizing the dependence on λ x, and df = 1 + I( ˆβ λx,1 > 0). For the bandwidth h in the copula estimation (1.8), we minimize the 2 two-fold crossvalidated likelihood (CVL, Acar et al., 2011). With a slight abuse of notation, denote the estimate based on the training data by ˆβ 0,h indicating the dependence on h, and the
24 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas16 testing set by {X1,..., X[n/2] }. Maximizing the objective function w.r.t h, CVL(h) = [n/2] log c[u 1i, U 2i ; g 1 { ˆβ 0,h (X i )}] (1.10) yields a data-driven choice of h, where ˆβ 0,h (X i ) s are assessed on the testing set. When the marginal distributions are nonparametrically estimated by the kernel method (1.9), the bandwidths h 1 and h 2 can also be included in the criterion (1.10). Lastly for choosing an appropriate copula family, since the likelihoods are on different scale, we adapt a twofold cross-validated prediction error (CVPE). Details can be found in Acar et al. (2011) and are omitted for conciseness. 1.3 Asymptotic Properties In this section, we show that the proposed penalized estimator enjoys the oracle properties, including the estimation consistency, sparsity and asymptotic normality. Here the sparsity is in the sense that, if the underlying dependence is constant at x, the local slope will be estimated exactly as zero. Recall that β = ( β 0, β 1 ) is the scaled version of β = (β 0, β 1 ), i.e., β 0 = s x β 0 and β 1 = r x β 1. For convenience, we drop the subscript x in λ x, and denote the true value of β = (β 0, β 1 ) and β = ( β 0, β 1 ) by β 0 = (β 00, β 01 ) and β 0 = ( β 00, β 01 ), respectively. We present the asymptotic properties in terms of the scaled penalized estimators β λ = ( β λ0, β λ1 ) that maximizes (1.8), and β N λ = ( β λ0 N, β λ1 N ) when using the nonparametrically estimated marginals (1.9). Without loss of generality, we assume that the bandwidths h j for estimating ˆF j X in (1.9) are on the same order of h for estimating the copula function in (1.8), and use a common kernel density K for both marginal and copula estimation. We now present the regularity conditions on the conditional copula density in (A1)-
25 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas17 (A5) needed for establishing the asymptotic results, collectively as Conditions (A), which are analogous to those in Abegaz et al. (2012). With a slight abuse of notation, denote l{g 1 (η); u 1, u 2 } = log c{u 1, u 2 ; g 1 (η)}, l (η; u 1, u 2 ) = ( / η)l{g 1 (η); u 1, u 2 }, and similarly for l (η; u 1, u 2 ) and l (η; u 1, u 2 ). Define l 1,s (η; u 1, u 2 ) = ( 2 / η u s )l{g 1 (η); u 1, u 2 } for s = 1, 2. (A1) The conditional copula density c{u 1, u 2 ; θ(x)} has a common support in [0, 1] 2 R. There exists an open set Θ containing the true parameter θ(x), such that for almost all (u 1, u 2 ), c(u i, u 2 ; θ) has the third derivative w.r.t. u 1, u 2 and θ for all θ Θ. (A2) The functions l, l, l, l and l i,s for s = 1, 2, are bounded and continuous. Moreover, l is a Lipschitz continuous trivariate function. (A3) E θ [l {g(θ); U 1, U 2 }] = 0 for all θ Θ. Moreover, I(θ) = E θ [l {g(θ); U 1, U 2 } 2 ] = E θ [l {g(θ); U 1, U 2 )] is positive and continuously differentiable on Θ. (A4) There exist functions Q 1 and Q 2 such that l {g(θ); u 1, u 2 } Q 1 (u 1, u 2 ), l {g(θ); u 1, u 2 } Q 2 (u 1, u 2 ) for all θ Θ, and E θ {Q 2 1(U 1, U 2 )} and E θ {Q 2 2(U 1, U 2 )} are uniformly bounded on Θ. (A5) For some a j 0 and c 1 > 0, l {g(θ); u 1, u 2 } c 1 2 j=1 {u j(1 u j )} a j, such that E{ 2 j=1 {U j(1 U j )} a j } <. Moreover, for some b i a j, 1 j k 2 and c 2 > 0, l 1,j {g(θ); u 1, u 2 } c 2 {u k (1 u k )} a k {uj (1 u j )} b j, such that E[{U k (1 U k )} a k {U j (1 U j )} ɛ j b j }] < for some ɛ j (0, 1/2). Conditions (A1) (A4) are standard, (A5) allows the score and its partial derivatives w.r.t. u 1 and u 2 to possibly diverge at the boundaries. This makes the results applicable for some commonly used copula models, such as Gaussian, Student-t, Clayton or Gumbel copula. The conditions on the bandwidth h, the penalty function P λx ( ) and the shrinkage
26 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas18 parameter λ are summarized in Conditions (B). We suppress the dependence of h and λ on n, and denote a n = P λ ( β 01 ) and b n = (nh) 1/2. (B1) nh 2+δ for some δ > 0, and n 1/5 h <. (B2) a n /h 2 0, and a 2 nnh 0, i.e., a n = o(b n ). (B3) P λ ( β 01 ) 0, and lim inf lim inf P λ(θ)/λ > 0. n θ 0 + Condition (B1) is to ensure that the bias dominates the variance from the copula and marginal estimation, (B2) guarantees that the strength of the true signal dominates the bias and the variance, thus the existence of a b n -consistent penalized estimator. Condition (B3) makes the penalty negligible to the likelihood and the singularity at the origin for achieving a sparse solution, which is fulfilled by the SCAD penalty (Fan and Li, 2001). Lastly Conditions (C) collect the standard requirements on other relevant quantities. (C) The parameter function η( ) has the uniformly bounded second derivative. The monotone link function g is invertible with g 0, and g 1 has the third continuous derivative. The density f of X has the first continuous derivative, and for each x in the domain of X, there exists some neighbourhood R x such that inf x R x f(x ) > 0. The kernel density K is symmetric and bounded with compact support on [ 1, 1]. Define γ r = x r K(x)dx, and N 2 and S 2 are 2 2 matrices which have γ i+j 2 and γ i+j 1 on the (i, j) th entry, respectively. Denote η(x, X) = η(x) + η (x)(x x) = β 0 + β 1 (X x), and η 0 (x, X) = β 00 + β 01 (X x) when evaluated at true β 0, let Σ x = I{θ(x)}f(x)N 2, Λ x = I{θ(x)}f(x) S 2, x
27 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas19 Z i = ( 1, X ) i x, h M 0 (Y 1i, Y 2i, X i ) = hl i{ η 0 (x, X i ); F 1 X (Y 1i x), F 2 X (Y 2i x)}z i K h (X i x). Theorem 1 concerns the asymptotic property of the copula parameter estimation when the true marginals are used. It states that the penalized method will estimate a local slope exactly as zero, if the underlying value is indeed so, and performs asymptotically as well as the unpenalized local linear estimator considered in Acar et al. (2011). Theorem 3. Assume that Conditions (A), (B) and (C) hold. (Consistency) If λ 0 as n, then β λ β 0 = O p (b n ), where b n = (nh) 1/2. (Sparsity) If λ 0 and nλ as n, then, for the local maximizer β λ of Q( β; x) satisfying β λ β 0 = O p (b n ), P ( β λ1 = 0 β 01 = 0) 1. (Normality) If nh 5 / log n = O(1) and nh 3 / log 2 n, then (Σ 1 x Γ xσ 1 x ) 1/2 [ nh( βλ β 0 ) (Σ 1 x hσ 1 x E{M 0 (Y 1, Y 2, X)} Λ x Σ 1 )(nh) 1/2 n x ] D N(0, I 2 ), where Γ x is a 2 2 matrix with (Γ x) rs = I{θ(x)}f(x) x r+s 2 K 2 (x)dx, and I 2 is the 2 2 identity matrix. The next theorem considers the case when the marginals are nonparametrically estimated by (1.9). Denote z = (1, (w x)/h), and define M 1 (Y 1i, X i ) = h l 1,1 { η 0 (x, w); F 1 X (y 1 x), F 2 X (y 2 x)} [ Kh (X i x) ] E{K h (X x)} I(Y 1i y 1 ) F 1 X (y 1 x) zk h (w x)dh(y 1, y 2 ; w), M 2 (Y 2i, X i ) = h l 1,2 { η 0 (x, w); F 1 X (y 1 x), F 2 X (y 2 x)} [ Kh (X i x) ] E{K h (X x)} I(Y 2i y 2 ) F 2 X (y 2 x) zk h (w x)dh X (y 1, y 2 ; w).
28 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas20 Theorem 4. Assume that Conditions (A), (B) and (C) hold. In addition, the conditional marginal c.d.f. F j X satisfy the conditions (R1) and ( R3) in Veraverbeke et al. (2011). (Consistency) If λ 0 as n, then β N λ β 0 = O p (b n ), where b n = (nh) 1/2. (Sparsity) If λ 0 and nλ as n, then, for the local maximizer β N λ satisfying β N λ β 0 = O p (b n ), P ( β λ1 N = 0 β 01 = 0) 1. (Normality) If nh 5 / log n = O(1) and nh 3 / log 2 n, then (Σ 1 x E [ Γ xσ 1 x ) 1/2 nh( βn λ β 0 ) (Σ 1 x hσ 1 x { M 0 (Y 1, Y 2, X) + n 1 n M 1(Y 1, X) + n 1 n M 2(Y 2, X) Λ x Σ 1 x )(nh) 1/2 n }] D N(0, I 2 ). We see from Theorem 2 that the asymptotic covariance originated from the copula parameter estimation dominates those from the nonparametric marginals, thus is the same as in theorem 1. The bias is inflated by M 1 (Y 1, X) and M 2 (Y 2, X) due to the nonparametric estimation of the conditional marginals. Similar to Theorem 1, it is not surprising that this penalized estimator has the same asymptotic behaviour as its unpenalized nonparametric counterpart considered in Abegaz et al. (2012). 1.4 Simulation Study In this section, we examine the performance of the proposed penalized estimation of various types of conditional copula functions. We present the results using the data generated from the Clayton family, while the cases of Gumbel and Frank families lead to similar conclusions and are omitted for conciseness. The inverse link is θ = g 1 (η) for η > 0, so that the copula parameter is in a proper range. To assess the performance under different scenarios, we use three copula parameter functions: η 1 and η 2 are smoothly joint
29 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas21 piecewise linear and piecewise quadratic, respectively, and η 3 (X i ) is global quadratic, η 1 (x) = 2 9 I(x 3.8) + { (x 3.8)2 }I(x > 3.8)I(x 3.9) +(x 3.62)I(x > 3.9), η 2 (x) = 2 { 2 }I(x 9 I(x 2.9) I(x 2.9)2 > 2.9)I(x 3) { 2 } +{4.1 10(x 3.5) 2 }I(x > 3)I(x 4) (x 4.1)2 (1.11) I(x > 4)I(x 4.1) + 2 I(x > 4.1), 9 η 3 (x) = 1 + 5(x 3.5) 2. To visualize the strength of the dependence, we display these functions in Figure 1.2 on a common scale using the Kendall s tau (Trivedi and Zimmer, 2007) τ(x) = C(u 1, u 2 ; x)dc(u 1, u 2 ; x) 1, and simple calculation yields τ(x) = θ(x)/{θ(x) + 2} for the Clayton copula. We first generate n = 1000 independent copies of the covariate X i from U[2, 5], then generate (U 1i, U 2i ) from the Clayton copula given θ(x i ) = η(x i ), and further obtain Y ki = F 1 k X i (U ki X i ) using F k X = Φ, the c.d.f. of N(0, 1). When using the estimated marginals, we calculate Ûki = ˆF k Xi (Y ki X i ) with ˆF k X obtained by (1.9), k = 1, 2, i = 1,..., n. The Epanechnikov kernel K(u) = 3/4(1 u 2 )I( u 1) is used, and the bandwidths h k are selected together with the copula parameter estimation by slightly modifying the cross-validated likelihood (1.10). For assessment, we use a dense grid of 100 equally spaced points on [2, 5] and apply the local penalized estimation at each point. We examine the estimated Kendall s tau functions ˆτ(x) as the dependence measure, and define the integrated squared bias
30 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas22 Figure 1.2: Kendall s tau functions τ j (x) that correspond to η j (x) in (1.11) for j = 1, 2, 3 (from left to right). (IBIAS 2 ), variance (IVAR) and mean square error (IMSE), IBIAS 2 (ˆτ) = IVAR(ˆτ) = IMSE(ˆτ) = [E{ˆτ(x)} τ(x)] 2 dx, χ E([ˆτ(x) E{ˆτ(x)}] 2 )dx, χ E[{ˆτ(x) τ(x)} 2 ]dx = IBIAS 2 + IVAR, χ which are approximated with 200 Monte Carlo runs. To evaluate the detection of zero slopes, we define the correct zero coverage (CZ) as proportion of true zero slopes that are correctly identified on the grid, while the nonzero coverage (CNZ) as the proportion of true nonzero slopes that are correctly identified. For comparison, we also perform unpenalized local linear estimation using true and estimated marginals, respectively (see Acar et al., 2011; Abegaz et al., 2012, for detail procedures). The results are summarized in Table 1.1.
31 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas23 Table 1.1: Comparisons between the proposed penalized estimation and the unpenalized local linear estimation, using the true and nonparametrically estimated conditional marginals, respectively, for the Clayton family. The underlying models correspond to η j defined in (1.11), where the IBIAS 2, IVAR and IMSE with their standard errors in parenthesis are with respect to Kendall s tau functions τ j (multiplied by 100 for visualization), j = 1, 2, 3. Marignal True Nonparametric Model Penalized Unpenalized Penalized Unpenalized CZ 81.1%(.0036) %(.0032) - CNZ 95.6%(.0040) %(.0045) - η 1 IBIAS (.0603).8943(.1809).3602(.0625) 1.028(.2520) IVAR.5107(.0201) 8.347(.1485).5602(.0249) 8.288(.1819) IMSE.7707(.0695) 9.242(.3269).9204(.0751) 9.317(.4293) CZ 91.0%(.0128) %(.0131) - CNZ 85.1%(.0089) %(.0086) - η 2 IBIAS (.0773) 15.38(1.885) 1.211(.0995) 15.37(1.938) IVAR 1.120(.0534).2387(1.632) 1.565(.0642).2610(1.665) IMSE 2.169(.1208) 15.61(3.515) 2.777(.1517) 15.64(3.602) CZ CNZ 100%(.0000) - 100%(.0000) - η 3 IBIAS (.0340).6527(.1006) 1.314(.0391).9142(.1220) IVAR.4462(.0146) 2.040(.0728).6506(.0189) 1.937(.0903) IMSE 1.898(.0395) 2.693(.1719) 1.964(.0475) 2.851(.2113) We can see that the proposed penalized estimation is capable of correctly identifying the majority of both zero and nonzero slopes in all three cases. Regarding the estimation, the penalized estimators improve both the mean squared error for all cases, while the gains for η 1 and η 2 are more pronounced. Therefore, although the penalized estimators behave asymptotically as well as the unpenalized ones, they in fact achieve more favourable finite sample performance in our simulations. One also observes slightly increased error from using the nonparametrically estimated marginals for both penalized and unpenalized methods. We conduct copula selection using a two-fold CVPE (Acar et al., 2011) among three Archimedean families, Clayton, Gumbel and Frank, and observe that the Clayton copula has been correctly chosen over 95% of all Monte Carlo runs. The results are summarized in Table 1.2.
32 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas24 Table 1.2: Proportion(%) of correct identified copula in each family under the calibration functions, η 1 (x), η 2 (x) and η 3 (x). T rue N onparametric calibration Clayton Frank Gumbel Clayton Frank Gumbel η 1 (x) Proportion η 2 (x) η 3 (x)
33 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas Application to Twin Birth Data In this section, we consider the Matched Multiple Birth Dataset that contains all US multiple births from 1995 to To be specific, we include live twin births with babies who survived beyond the first year and mothers of age between 18 and 40, and a random subset of 30 pairs of births at each week of the gestational age ranging from 28 to 42 weeks. Of interest is to study the dependence between the twin birth weights (in grams), denoted by BW 1 and BW 2, conditional on the gestational age (in weeks), denoted by GA. For completeness, we treat the unknown conditional marginals with both parametric and nonparametric estimation. In the former, we follow the suggestion by Acar et al. (2011), fitting a cubic polynomial model with the response BW ki and the covariate GA i, k = 1, 2 respectively. Denote the fitted values of BW ki by ˆµ k (GA i ) and the error variance by ˆσ k 2, then calculate Ûki = Φ[ˆσ 1 k {BW ki ˆµ k (GA i )}], k = 1, 2, i = 1,..., n. For the nonparametric case, we compute Ûki = ˆF k X (BW ki GA i ), where ˆF k X obtained by (1.9). The cross-validated likelihood (1.10) is used to tune the bandwidths used for estimating the copula as well as the marginals, and the tuning parameter λ x is chosen by BIC (1.10). To visualize the marginal transforms, the scatterplots with marginal histograms of BW ki are shown in Figure 1.3(a). In contrast, the transformed data, Ûki, using the parametric and nonparametric marginals are given in Figure 1.3(b) and 1.3(c), respectively. We perform the proposed penalized estimation under three common Archimedean copula families: Clayton, Frank, and Gumbel. For comparison, we also obtain unpenalized local linear estimates using the parametric and nonparametric marginals, respectively. The copula parameter function is expressed in the form of Kendall s tau to have the same scale across different copula families, shown in Figure 1.4, along with the 95% bootstrap confidence bands. It is interesting to see that, in all cases, the penalized es-
34 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas26 (a) (b) (c) Figure 1.3: Scatterplots and histograms for the original twin birth weights (left) and the transformed responses using the parametric (middle) and nonparametric (right) marginal estimates, respectively. timation features a constant dependence relationship between the twin birth weights, given the gestational age around 34 to 36 weeks. Such a phenomenon has not been revealed by other methods in previous studies. The copula selection is conducted using a two-fold CVPE and favours the Clayton family in both marginal settings. It is also
35 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas27 Figure 1.4: The Kendall s tau of estimated copula parameter functions under three Archimedean couples: Clayton, Frank and Gumbel. The top panels correspond to parametric marginals, and the bottom panels correspond to nonparametric marginals. In each panel, shown are the penalized estimate (solid) with 95% bootstrap confidence bands (dotted), as well as the unpenalized estimate (dotdashed). noted that, based on 5-fold cross-validation using 20 random repetitions, the penalized estimates achieve higher likelihoods compared to the unpenalized estimates in all three families.
36 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas Proofs of Main Theorems Proof of Theorem 1. Recall a n = P λ ( β 01 ) and b n = (nh) 1/2, let α n = b n + a n. With a slight abuse of notation, let l i ( β) = l(g 1 { η(x, X i )}; u 1, u 2 ) u1 =U 1i,u 2 =U 2i and similarly for l i( β), l i ( β), l i ( β) and l i1,s ( β) for s = 1, 2, i.e., suppress the last two arguments when no confusion arises, where η(x, X i ) = s 1 x β 0 + rx 1 β 1 (X i x). Denoting Q( β) = Q( β; x), we aim to show that for any ɛ > 0, there exists a sufficiently large constant C such that P { sup Q( β 0 + α n v) < Q( β 0 )} 1 ɛ, (1.12) v =C which implies β λ β 0 = O p (α n ). Denote L( β) = n l i( β)k h (X i x)/k h (0), Q( β 0 + α n v) Q( β 0 ) L( β 0 + α n v) L( β 0 ) + m x {P λ ( β 01 + α n v 1 ) P λ ( β 01 )} = {l i ( β 0 + α n v) l i ( β 0 )} + m x {P λ ( β 01 + α n v 1 ) P λ ( β 01 )} = 1 l i( β 0 ) K h(x i x) (X i x) r K r=0 h (0) r x I(r = 1) + s x I(r = 0) α nv r + 1 { 1 1 l i ( β 2 0 ) K h(x i x) (X i x) r (X i x) s K r=0 s=0 h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0) } αnv 2 r v s + 1 { l i (β ) K h(x i x) (X i x) r (X i x) s 6 K r=0 s=0 t=0 h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0) (X i x) t } r x I(t = 1) + s x I(t = 0) α3 nv r v s v t + m x {P λ ( β 01 + α n v 1 ) P λ ( β 01 )} = A 1 + A 2 + A 3 + A 4, where β lie between β 0 and ( β 0 + α n v). We now show A 1 = O p (nhα 2 n) v, A 2 O p (nhα 2 n) v 2, A 3 o p (nhα 2 n) v 2 and A 4 o p (nhα 2 n) v 2. Thus A 2 dominates
37 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas29 other terms and (1.12) holds. One has { 1 A 1 = l i( β u ) K h(x i x) (X i x) r } K r=0 h (0) r x I(r = 1) + s x I(r = 0) α nv r (1 + o p (1)) + 1 { 1 1 l i (β ) K h(x i x) (X i x) r (X i x) s 2 K r=0 s=0 h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0) ( β us β } 0s )α n v r = A 5 + A 6, where β lies between β 0 and β u. It suffices to show A 5 = 0 and A 6 O p (nhα 2 ) v, such that A 1 O p (nhα 2 ) v. As β u is the minimizer of L( β), then L( β)/ β β= βu = 0, i.e., A 5 = 0. For the term A 6, denote M irs = l i (β ) K h(x i x) (X i x) r (X i x) s K h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0). Using standard Taylor expansion, one can show that m x = O p (nh), s x = O p (1), r x = O p (h), (1.13) and for r = 0, 1, 1 n K h (X i x) X i x r = O p (h r ), 1 n K h (X i x) X i x r+2 = O p (h r+2 ).(1.14) Furthermore, using Theorem 1 in Acar et al. (2011), we have β u0 β 00 = β u1 β 01 = O p (b n ), where β u = ( β u0, β u1 ) denotes the unpenalized local linear estimator of β. Then A 6 = r=0 s=0 1 M irs α n ( β us β 0s )v r
Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach
1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and
More informationDependence Calibration in Conditional Copulas: A Nonparametric Approach
Biometrics DOI: 10.1111/j.1541-0420.2010.01472. Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Department of Statistics, University of
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationConditional Copula Models for Right-Censored Clustered Event Time Data
Conditional Copula Models for Right-Censored Clustered Event Time Data arxiv:1606.01385v1 [stat.me] 4 Jun 2016 Candida Geerdens 1, Elif F. Acar 2, and Paul Janssen 1 1 Center for Statistics, Universiteit
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationBayesian Inference for Conditional Copula models with Continuous and Binary Responses
Bayesian Inference for Conditional Copula models with Continuous and Binary Responses Radu Craiu Department of Statistics University of Toronto Joint with Avideh Sabeti (Toronto) and Mian Wei (Toronto)
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationA Goodness-of-fit Test for Copulas
A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationA measure of radial asymmetry for bivariate copulas based on Sobolev norm
A measure of radial asymmetry for bivariate copulas based on Sobolev norm Ahmad Alikhani-Vafa Ali Dolati Abstract The modified Sobolev norm is used to construct an index for measuring the degree of radial
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationBayesian Inference for Conditional Copula models
Bayesian Inference for Conditional Copula models Radu Craiu Department of Statistical Sciences University of Toronto Joint with Evgeny Levi (Toronto), Avideh Sabeti (Toronto) and Mian Wei (Toronto) CRM,
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationInferences about Parameters of Trivariate Normal Distribution with Missing Data
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationNonparametric Regression. Badr Missaoui
Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y
More informationStatistical testing of covariate effects in conditional copula models
Electronic Journal of Statistics Vol. 7 (2013) 2822 2850 ISSN: 1935-7524 DOI: 10.1214/13-EJS866 Statistical testing of covariate effects in conditional copula models Elif F. Acar Department of Statistics
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC
More informationThe EM Algorithm for the Finite Mixture of Exponential Distribution Models
Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationIntroduction to Machine Learning
Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics
More informationTheoretical results for lasso, MCP, and SCAD
Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical
More informationA nonparametric method of multi-step ahead forecasting in diffusion processes
A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationSTATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University
STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL A Thesis Presented to the Faculty of San Diego State University In Partial Fulfillment of the Requirements for the Degree
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationLocal regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression
Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationCopulas. MOU Lili. December, 2014
Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationPARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran
PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION Alireza Bayestehtashk and Izhak Shafran Center for Spoken Language Understanding, Oregon Health & Science University, Portland, Oregon, USA
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationarxiv: v3 [stat.me] 25 May 2017
Bayesian Inference for Conditional Copulas using Gaussian Process Single Index Models Evgeny Levi Radu V. Craiu arxiv:1603.0308v3 [stat.me] 5 May 017 Department of Statistical Sciences, University of Toronto
More informationNonparametric Modal Regression
Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationFinancial Econometrics and Volatility Models Copulas
Financial Econometrics and Volatility Models Copulas Eric Zivot Updated: May 10, 2010 Reading MFTS, chapter 19 FMUND, chapters 6 and 7 Introduction Capturing co-movement between financial asset returns
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationEstimation of the Bivariate and Marginal Distributions with Censored Data
Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new
More informationBuilding a Prognostic Biomarker
Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationPenalized Splines, Mixed Models, and Recent Large-Sample Results
Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationEstimation of multivariate critical layers: Applications to rainfall data
Elena Di Bernardino, ICRA 6 / RISK 2015 () Estimation of Multivariate critical layers Barcelona, May 26-29, 2015 Estimation of multivariate critical layers: Applications to rainfall data Elena Di Bernardino,
More informationCURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University
CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite
More informationOn the Choice of Parametric Families of Copulas
On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University of Toronto Collaborators: Mariana Craiu, University Politehnica, Bucharest Vienna, July 2008 Outline 1 Brief
More informationMinimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.
Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationarxiv: v2 [stat.me] 4 Jun 2016
Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates 1 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates Ben Sherwood arxiv:1510.00094v2
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationDiscussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon
Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationNonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix
Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationSTATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify
More information