A Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies
|
|
- Lucy Manning
- 5 years ago
- Views:
Transcription
1 A Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies Nicoleta Serban 1 H. Milton Stewart School of Industrial Systems and Engineering Georgia Institute of Technology nserban@isye.gatech.edu Pengfei Li Department of Statistics and Actuarial Sciences University of Waterloo pengfei.li@uwaterloo.ca In this paper, we introduce a statistical hypothesis test for detecting mixtures in a regression model described by a regression function which is a weighted sum of multi-dimensional unimodal functions. Two regression components are mixed when the distance between their centers is small or the proportion of their contribution to the mixture is close to zero or one. Two challenges in model estimation under the null hypothesis of one regression component are that the mixing proportion lies on the boundary of the parameter space and that the parameters are non-identifiable. Therefore, the parameter estimators derived from standard nonlinear estimation approaches are inconsistent and unstable. To overcome these challenges, we study a penalized regression test statistic with a relatively simple quadratic approximation which can be used to simulate the quantiles of the test statistic under the null hypothesis. We also show that the parameter estimators using the penalized regression approach in this paper are consistent. The motivating application is detection of mixed components in multi-dimensional biomolecular Nuclear Magnetic Resonance (NMR) data. Keywords: biomolecular NMR, density mixture, mixture test, multi-dimensional mixture regression, likelihood ratio test. 1 Correspondent Author 1
2 1 Introduction In this paper, we focus on a specific nonlinear regression model described by Z i1,...,i d = L A l s (x i1,..., x id ; ω l, τ l ) + σϵ i1,...,i d, i 1 = 1,... M 1,..., i d = 1,..., M d (1) l=1 where the multi-dimensional regression component A l s (x i1,..., x id ; ω l, τ l ) is a parametric function uniquely identified by a set of parameters describing the amplitude (A l ), the width (τ l = (τ l1,..., τ ld ) τ ) and the location (ω l = (ω l1,..., ω ld ) τ ) of the regression component. The regression function is nonlinear in the location and width parameters. The shape function s is assumed to be known, symmetric and unimodal. The d-dimensional design points (x i1,..., x id ) are fixed and equally spaced and observed over a compact set. For brevity, we assume (x i1,..., x id ) [, 1] d. It follows that the location parameters ω l, l = 1,..., L also fall within the same compact set [, 1] d. The error terms ϵ i1,...,i d are commonly assumed to be independent and identically distributed. The observations Z i1,...,i d are intensity values. Each regression component in the model (1) will be revealed as a local maximum in the observed intensity data Z i1,...,i d, when the distance between components is large and when the signal-to-noise level is high (the amplitudes are well away from the noise level). That is, under these assumptions, there is a one-to-one mapping between the regression components and the local maxima in the data. However, in many applications such as the motivating NMR case study, the distance between components is small, and therefore, there will be mixed regression components that overlap into one local maximum. In Figure 1, we show an example of two-dimensional overlapped regression components; because the distance between components is small, the two components merge into one local maximum. It is this type of mixed components that is of interest in this paper. To identify statistically significant mixed components, we introduce a hypothesis test for 2
3 Figure 1: An example of two-dimensional overlapped regression components: contour and perspective plots. mixtures in the regression model described in (1). In practice, application of the mixture test involves two steps: 1. Apply a local maxima identification method (Serban, 27, 21 and the references therein); and 2. Apply the testing procedure in this paper to the local intensity data of each local maximum to decide whether it maps to one or more regression components. In NMR data analysis, the evaluation of local maxima for detection of overlapped components is commonly performed manually. The mixture statistical test introduced in this paper will reduce the manual intervention which is error prone and time consuming. For illustration of the mixture hypothesis test, we consider a specific parametric function s, the Lorentzian function, commonly used to model Nuclear Magnetic Resonance (NMR) data (see Section 5). However, our theoretical results apply to other symmetric unimodal shape functions as long as the regression components are identifiable. Under the assumption of a mixture of Lorentzian regression components, the model 3
4 becomes Z (l) = α ( d ) A/ s=1 τ s d s=1 {(x is ω s1 ) 2 τs 2 + 1} + (1 α) ( d ) A/ s=1 τ s d s=1 {(x is ω s2 ) 2 τs 2 + 1} + σϵ, (2) where Z (l) are observations from a local region of the complete data Z i1,...,i d. For simplicity of the notation, we will refer to Z (l) as Z i1,...,i d but we hereby note that we will apply the hypothesis test locally since the full data will generally consist of a large number of regression components. We do not need to correct for multiplicity in testing for two components across multiple regions of the complete data since the testing is not simultaneous. In model (2), the amplitude parameters A 1 and A 2 of the two components are uniquely defined by the parameters A and α (A = A 1 +A 2 and α = A 1 /(A 1 +A 2 )). We re-parameterize the model as in (2) to express the null hypothesis with respect to one parameter α rather than two parameters A 1 and A 2. Under this modeling framework, the null hypothesis is H : {α = or α = 1 or ω 1 = ω 2 } where ω 1 = (ω 11,..., ω d1 ) τ and ω 2 = (ω 12,..., ω d2 ) τ are the location parameters of the components that are tested for mixture. For testing the null hypothesis of one regression component, we assume that the width parameters τ s and the error variance σ 2 are known and fixed. The assumption of fixed σ 2 implies that the signal-to-noise ratio is fixed and therefore the false discovery rate of the regression components is also fixed. At large values of σ 2, small amplitude components will be non-detectable from the noise level; that is, σ 2 needs to be well below A max{α,1 α} d. On the s=1 τ s other hand, the width parameters change the shape of the regression function. For fixed parameters α and A but small values of τ s, the heights of the two components decrease and the tails become fatter which reduces the identifiability of the model parameters. Therefore, the assumption of fixed widths ensures some level of identifiability of the regression components. 4
5 The assumption of fixed width parameters and error variance is common practice (Koradi et al., 1998; Malmodin et al., 23). These parameters are commonly estimated from the complete data (all regression components) before applying the hypothesis testing procedure as we discuss in Section 3. Estimating these parameters from the complete data rather than locally will provide more accurate estimates since we pull information from multiple regression components. We highlight here that the model (1) is not a mixture of regressions but a regression model in which the regression function is a sum of weighted components, and therefore, the existing research on detecting mixture of densities (Titterington et al., 1985; Müller and Sawitzki, 1991; Roeder, 1994; Chen and Kalbflisch, 1996;, Walther, 24; Chen and Li, 29; Li et al., 29 and the references therein), does not apply to the statistical framework introduced in this paper. A widespread test statistic used in detecting density mixtures is the likelihood ratio test (LRT). Similar to the problem of detecting mixtures of densities, LRT has its limitations under the regression framework discussed in this paper. Under the null hypothesis, the model parameters are non-identifiable since the null hypothesis depends on two parametric conditions involving α, ω 1 and ω 2. Moreover, the (mixing) proportion parameter α is on the boundary of the parameter space since it takes values or 1. Due to these two irregularities, standard likelihood-based estimation (nonlinear least squares estimation under normality assumption) provides inconsistent and unstable parameter estimators (Jennrich, 1969). An alternative approach to the LRT is to place a prior distribution on the parameter α, and similarly to Aitkin and Rubin (1985), the mixing proportion becomes a nuisance parameter which is further integrated out. However, under the regression framework one regularity condition for the asymptotic distribution of the likelihood ratio test does not hold 5
6 for the Aitkin and Rubin (1985) approach (Serban, 25). Different re-parametrizations of the mixture regression model may also provide identifiable parameters under the null hypothesis. For mixture of densities, re-parameterization has been investigated under the assumption that the true null parameter is known (Lemadani and Pons, 1999) or that the true parameter is unknown (Dacunha-Castelle and Gassiat, 1997; Liu and Shao, 23). Other approaches impose constraints on the density means (Ghosh and Sen, 1985) or mixing proportion parameters (Chen et al., 21 and Chen and Kalbfleisch, 25). In detecting mixtures in the multi-dimensional regression model described in this paper, we proceed with the latter approach; that is, we constrain the mixing proportion parameter α to be away from the boundary values under the null hypothesis. This constraint overcomes both the identifiability problem and the estimation of the mixing proportion on the boundary of the parameter space. The constraint is defined through a penalty function p(α) which attains its maximum at α = 1. Under the null hypothesis, the penalty term is. To obtain 2 the penalized LR, we therefore need to employ penalized nonlinear regression under the null and alternative hypotheses. Consequently, we introduce an estimation approach that uses the idea of an Expectation-Maximization type algorithm (Chen and Li, 29; and Li et al., 29). In this paper, we show the consistency of the parameter estimates under the proposed estimation approach. One difficulty in penalized likelihood ratio (PLR) testing for mixture detection is deriving the (asymptotic) distribution of the test statistic under the null hypothesis. For density mixture detection, the test statistic is commonly distributed according to an asymptotic mixture distribution.5χ 2 +.5χ 2 1 (Chen and Kalbflisch, 1996; Chen et al., 21; Chen and Li, 29; and Li et al., 29). In contrast, in hypothesis testing for mixtures in the regression function, the asymptotic distribution of the PLR test statistic under the null hypothesis 6
7 doesn t have a closed form expression. In this paper we investigate a test statistic derived using a similar approach to the EM test statistic recommended by Chen and Li (29) and Li et al. (29). Because the asymptotic distribution of the EM-test does not have a close form expression, we derive a quadratic approximation for the distribution of the EM-test statistic and we use this distribution to obtain the quantiles of the null distribution of the test statistic using sampling techniques. We discuss the testing procedure in Section 2 along with asymptotic results in Section 2.2. In Section 3, we describe the application of mixture test to identification of overlapped regression components in the more general model for L > 2. In a simulation study in Section 4, we evaluate the accuracy (type I error) and the efficiency (power) of the testing procedure. The statistical application investigated in this paper is pertinent to the study of three-dimensional protein structure determination using NMR. We introduce the NMR biomolecular studies in Section 5 and the application of the proposed testing procedure to two and three-dimensional NMR data in Section Testing Procedure Define the likelihood function l(α, A, ω 1, ω 2 ) for the regression model in (2). Under the assumption of normal error distribution, the log-likelihood function is proportional to 1 2σ 2 M 1,...,M d i 1 =1,...,i d =1 (Z i1,...,i d Aαs (x i1,..., x id ; ω 1, τ ) A(1 α)s (x i1,..., x id ; ω 2, τ )) 2, (3) where α is a weight parameter and must be in the interval [, 1]. We penalize the loglikelihood function using a penalty p(α) which is a continuous function such that it is maximized at.5 and goes to negative infinity as α goes to or 1. Without loss of generality we 7
8 assume that α [,.5]. Define the penalized log-likelihood function as follows: pl(α, A, ω 1, ω 2 ) = l(α, A, ω 1, ω 2 ) + p(α). (4) 2.1 EM-test Statistic In this section, we introduce a test statistic which measures the discrepancy between the null hypothesis H of one regression component and the observed data. The test statistic is a version of the EM-test introduced by Chen and Li (29) and Li et al. (29) but extended to our regression framework. The primary motivation for using this test statistic is the efficiency of the LRT under the assumption of fixed α = α. Since α is not fixed, we instead proceed with an EM-type algorithm in which at the M step we assume α is fixed and estimate the other model parameters, and at the E step we update the parameter α; then repeat the E and M steps for obtaining a more suitable mixing proportion which improves the power. Only a small number of iterations, K, is used. The asymptotic approximation of the null distribution holds under finite K. The procedure to derive the EM-test statistic is described below. Step : Estimate the scaling parameter A and the location parameters ω 1 = ω 2 under the null model by maximizing the penalized likelihood function in (4). Denote (Â, ˆω ) = arg max pl(α =.5, A, ω, ω). Step 1: In the next steps, we obtain estimates for the amplitude and location parameters under the alternative hypothesis given initial values for the mixing proportion. Choose a number of initial values for α: α 1,..., α J in (,.5]. For each α value, we obtain estimates for the parameters A, ω 1, ω 2 using an iterative algorithm similar in its steps to the EM algorithm. Let α () = α for = 1,..., J and start with the first iteration k = 1 where 8
9 k K (K is the maximum number of iterations). Step 1.1. Estimate A, ω 1, ω 2 by maximizing the likelihood function holding fixed the proportion parameter α = α (k), i.e., (A (k), ω (k),1, ω(k),2 ) = arg max A, ω 1, ω 2 pl(α (k 1), A, ω 1, ω 2 ). Step 1.2. Update the mixing proportion parameter α (k) by minimizing C 1 = 1 2σ 2 C 2 = 1 2σ 2 M 1,...,M d i 1 =1,...,i d =1 M 1,...,M d i 1 =1,...,i d =1 g(α) = C 1 α 2 2C 2 α p(α) where (5) { A (k) } 2 s(x i1,..., x id ; ω (k),1, τ ) A(k) s(x i1,..., x id ; ω (k),2, τ ) { } Z i1,...,i d A (k) s(x i1 s(x i1,..., x id ; ω (k),2, τ ) { A (k) } s(x i1,..., x id ; ω (k),1, τ ) A(k) s(x i1,..., x id ; ω (k),2, τ ). Step 1.3. Let k = k + 1, and iterate Step 1.1 and Step 1.2 if k < K. Step 2: For = 1,..., J, define the statistic which depends on the initial value α, M (K) (α ) = 2{pl(α (K), A (K), ω (K),1, ω(k),2 ) pl(.5, Â, ˆω, ˆω )}. The EM-test statistic is defined as EM (K) = max{m (K) (α ), = 1,..., J}. We reect the null hypothesis if EM (K) is greater than a specified critical value which is determined from the limiting distribution of the test statistic (see Section 2.2). In our empirical studies, the maximum number of iterations is K = 1. A larger number of iterations does not enhance the efficiency of the testing procedure. For the EM-test statistic, we need to specify the initial values of the mixing parameter, the penalty p(α) and the penalizing constant C. In general, we recommend start- 9
10 ing with a small number of initial values for the mixing parameter, for example, α {.1,.2,.3,.4,.5}. Our empirical studies showed that a larger number of initial values for the mixing parameter does not improve the accuracy of the testing procedure. When selecting the penalty function p(α), we need to consider two important criteria. First, the equation used to update the mixing parameter in (5) does not have a minimum for every penalty function. We show in the Appendix that there exists a unique minimum to this equation for two commonly used penalties p(α) = C log(1 1 2α ) (in Li et al., 29) and p(α) = C log(4α(1 α)) (Chen et al., 21) for any constant C >. Second, the choice of the penalty function impacts the trade-off between the Type I error and the power of the test as well as the accuracy of the approximation under the null hypothesis. The penalty p(α) = C log(1 1 2α ) is a lasso-type penalty (Li et al., 29); it is a continuous function for all α.5, the null value. As pointed out by Li et al. (29) and Chen and Li (29), this penalty has similar properties to the lasso-type penalty for linear regression; consequently, the probability of the fitted value of α = 1/2 is positive. In contrast, the penalty p(α) = C log(4α(1 α)) is smooth for all α. Following the suggestion in Li et al. (29) we use p(α) = C log(1 1 2α ) in the implementation of the EM-test procedure. The constant C controls the penalization of the LRT when α 1 or α. The selection of the penalty constant C significantly affects the reliability (type I error) and the precision (power) of the test. The smaller C is, the less penalized the null hypothesis is leading to a high type I error but enhanced power. In our empirical studies, the optimal value for C is of order σ 2 log(n)n where σ 2 is the variance of the error which is assumed known and N is the sample size of the data - using the notation in equation (2), N = M 1 M 2... M d. The intuition behind this penalty constant is as follows. Assume that we know the true parameters A, α, ω 1, ω 2 and replace them in the least squares sum in (3). The result is a 1
11 sum of N squared normal errors. We guide our selection for C using the following property of a sequence of normal random variables ( P sup X i > σ ) 2 log(n) 1 as n. i=1,...,n Although this result doesn t provide a limiting condition for N i=1 X2 i, it can be used to assess its magnitude which is in the range of 2Nσ 2 log(n). An alternative test statistic that can be used to detect mixtures in the regression function is the modified likelihood ratio test (MLRT) in the spirit of Chen et al. (21). The MLRT can be viewed as a limiting case of the EM-test statistic when the iteration number K goes to. Theorem 1 below indicates that a finite number of iterations will only change the parameter estimates by o p (1). Therefore, Theorem 1 guarantees under finite number of iterations that the estimators of A, ω 1 and ω 2 are in a small neighborhood of the true parameter values; therefore, the quadratic approximation of the EM-test becomes valid. On the other hand, based on the same result, we cannot infer how the test statistic behaves when K ; an infinite number of iterations may lead to intractable theoretical results. 2.2 Asymptotic Results We first introduce the notation that is used in the asymptotic results in this section. Define U i1,...,i d = V i1,...,i d ; s = T i1,...,i d ; st = 1 d s=1 {(x is ω s ) 2 τ 2 s + 1}, 2(x is ω s )/τ s {(x is ω s ) 2 τs 2 + 1} U, W i1,...,i d ; s = 3(x is ω s) 2 τs 2 1 {(x is ω s ) 2 τs 2 + 1} U 2, 4(x is ω s )(x it ω t )/(τ s τ t ) {(x is ω s ) 2 τ 2 s + 1}{(x it ω t ) 2 τ 2 t + 1} U where ω = (ω 1,..., ω d ) τ is the location of the regression component under the null hypothesis. The notation above is based on the assumption that the shape function takes 11
12 the form of the Lorentzian function. For other shape functions, U i1,...,i d is the regression function under the null hypothesis, V i1,...,i d ; s is the first derivatives of the shape function, and W i1,...,i d ; s and T i1,...,i d ; st are the second derivatives of the shape function. Further denote the vectors a i1,...,i d = (U i1,...,i d, V i1,...,i d ;1,..., V i1,...,i d ;d) τ b i1,...,i d = (W i1,...,i d ;1,..., W i1,...,i d ;d, T i1,...,i d ;12,..., T i1,...,i d ;d 1d) τ c i1,...,i d = (a τ, b τ ) τ. Using the notation above, we define the matrix B = M 1,...,M d i 1 =1,...,i d =1 c i1,...,i d c τ (6) which can be further decomposed into B 11 = M 1,...,M d i 1 =1,...,i d =1 a i1,...,i d a τ, B 21 = B τ 12 = M 1,...,M d i 1 =1,...,i d =1 b i1,...,i d a τ, B 22 = M 1,...,M d i 1 =1,...,i d =1 b i1,...,i d b τ. Further, we denote B 22 = B 22 B 21 B 1 11 B τ 21. The asymptotic properties depend on the following three conditions. C1 The penalty function p(α) is a continuous function such that it is maximized at.5 and goes to negative infinity as α goes to or 1. C2 The matrix 1 N B converges to some positive matrix as N = s M s. C3 There exists a W > such that ω s1, ω s2 [ W, W ] for s = 1,..., d. In fixed design non-linear regression models, one regularity condition is that the matrix, 1 B N 11 converges to a positive definite matrix. Assumption C2 is stronger than this assumption. It is related to the strong identifiability condition proposed in Chen (1995). This condition 12
13 ensures that the estimates of ω 1 and ω 2 have the optimal convergence rate. It can be verified that if the design points x is for i = 1,..., M i are evenly spaced over [,1], then Assumption C2 is satisfied. Theorem 1 ensures that under the null hypothesis the penalized likelihood approach provides consistent parameter estimators. Theorem 1. Assume conditions C1, C2 and C3 are satisfied. Under the null hypothesis, we have, for = 1,..., J and any fixed finite K, p(α (K) ) = O p (1), A (K) A = O p (N 1/2 ), ω (K),1 ω = O p (N 1/4 ), ω (K),2 ω = O p (N 1/4 ). Based on this consistency result, we derive the following quadratic approximation of the EM-test statistic. Because the asymptotic distribution of the penalized likelihood ratio test does not have a close form expression, we use this approximation to obtain the null quantiles. Theorem 2 Assume conditions C1, C2 and C3 are satisfied and α 1 =.5. Let w be a multivariate normal random vector with mean vector and covariance matrix B 22. Under the null hypothesis, we have, for any fixed finite K, as N, { } EM (K) = sup θ 2θ τ w θ τ B22 θ + o p (1) with θ = (θ 2 1,..., θ 2 d, θ 1θ 2,..., θ d 1 θ d ) τ and (θ 1,..., θ d ) R d. The distribution of the quadratic approximation is difficult to find. To overcome this difficulty, we suggest the following algorithm to simulate the null distribution quantiles. 1. Generate S random vectors, { w (s), s = 1,..., S }, from multivariate normal distribution with mean and covariance matrix B
14 } 2. For each vector w (s), calculate Q s = sup θ {2θ τ w (s) θ τ B22 θ. Compute the quantiles of the Q 1,..., Q S and use them to approximate the quantiles of EM (K). The proofs of Theorem 1 and Theorem 2 are in the Appendix of this paper. Remark 1 Our asymptotic results apply to penalties such that p(.5) = as long as C >. If the penalty constant is selected such that C N as N, for example the suggested penalty constant C N = σ 2 N log(n), the penalty term converges to in probability and does not contribute the EM-test statistics. This will not change the asymptotic properties of the EM-test statistics. Remark 2 Since our testing problem in (2) is not a homogeneity test in mixture of densities but a homogeneity test in the mean function of a non-linear regression, the asymptotic results in Li et al. (29) and Chen and Li (29) do not apply to our testing problem generally. However, when d = 1, the limiting distribution of the EM (K) for testing homogeneity in the mean function of a non-linear regression is.5χ 2 +.5χ 2 1, which is the same as the one for detecting homogeneity in mixture of densities (Li et al., 29). 3 Application of Mixture Test In real applications of the regression model in (1), the primary goal is to accurately identify the regression components and estimate the model parameters. Existing research in this field commonly applies a local maxima identification method to the multiple component data (Serban, 27, 21 and the references therein) followed by manual intervention for assessing the regression components. In our notation, a local maximum is at location (x i1,..., x id ) if Z i1,...,i d is larger than its immediate intensity neighbors Z i1 ±1,...,i d,..., Z i1,...,i d ±1. A visual 14
15 equivalent in one- and two-dimensional data of a local maximum is a peak/spike. The set of local maxima are initial candidates of the regression components in model (1). A local maximum may correspond to zero (false positive), one or more regression components. We propose using the mixture test introduced in this paper to assess whether a local maximum corresponds to one or more regression components. The general procedure for the application of the mixture test to the more general model (1) takes three steps: Step 1. We first apply a local maxima identification method. We reduce the complete intensity data to smaller regions, each region corresponding to the surrounding intensity values of one local maximum. This data segmentation step involves an intermediary step at which we apply wavelet-based denoising; local maxima identification is applied to the denoised data to reduce the number of false positives (Serban, 21). At this intermediary step, we also estimate the noise variance σ 2 using the mean absolute deviation estimator suggested by Donoho (1995). Step 2. We fit a model with one regression component (L = 1) to each local maximum region and obtain estimates for τ. We obtain a common estimate for the width parameter by taking the median over all estimates of τ - this estimate of τ relies on the assumption that only a few regression components are mixed. This step also provides initial estimates of the amplitude and the location parameters. Step 3. Fixing σ 2 and τ at their estimates obtained in the previous two steps, we apply the testing procedure to each region where the regions could overlap. The size of a region in a d-dimensional design is M d where M is equal to the closest integer of the 9% quantile of the Cauchy distribution with spread min s=1,...,d {ˆτ s }. For each region, we decide whether it contains two regression components, i.e. reect the null hypothesis. Remark 3 In Step 3, the values of σ 2 and τ are fixed at the same values for all local maxima. 15
16 Therefore, the asymptotic results for fixing σ 2 are more relevant for this application of the testing procedure. Remark 4 It is important to note that a number of the discoveries or local maxima are false positives, which arise due to the additive noise, and therefore, the mixture test will be applied to true as well as to false positives. In our experiments (simulation and real data), when the mixture test is applied to data consisting of noise only (false positives or A = ), we consistently have convergence problems in Step 1.1 of the EM-test statistic computation. We can use this observation as an indication of the presence of a false positive which can be used in screening out the false positives. 4 Simulation Study In this section, we discuss a simulation study for evaluating the reliability (type I error) and efficiency (power) of the peak mixture test. The significance level is 1 α =.95. We compare the testing procedure introduced in this paper for varying values of the penalization constant C and with an additional simpler testing procedure, called overdispersion test. Overdispersion test. The over-dispersion test is often used to test one-component versus more than one-component or mixtures of densities (Neyman and Scott, 1966; Lindsay, 1995). Its extension under our specific testing problem is as follows. Let S(A, ω) denote the least squares sum: S(A, ω) = i 1,i 2,...,i d ( d ) A/ s=1 Z τ s i1,i 2,...,i d d s=1 {(x is ω s1 ) 2 τs 2 + 1} 2. Under the null hypothesis, S(A, ω) only contains the information about random variation, whereas under the alternative hypothesis, S(A, ω) contains two types of information: random 16
17 variation and the variation due to the misspecification of the mean function. Assuming σ known, we can use S(A, ω) to construct a test statistic as follows: T = S(Â, ˆω )/σ 2 (N 1 d) 2(N 1 d). Under null hypothesis, the test statistic is asymptotically normal as N. Specifically, if the data is from one-component model, then T converges to N(, 1). If the data is from a model consisting of more than one component, then T tends to be large. Therefore we reect the null hypothesis of one regression component if T is greater than the upper quantile of N(, 1). Simulation Setting. We simulate data following the general model in equation (1). In this model, the parameters of the lth component are the location parameter ω l = (ω 1l,..., ω dl ) τ, width parameter τ l = (τ 1l,..., τ dl ) τ, and amplitude parameter A l. We simulate data in two (d = 2) and three (d = 3) dimensions. In our simulation study, we assume that the mixture regression function s is the Lorentzian function. Therefore, the simulation model is Z i1,...,i d = L l=1 ( d ) A l / s=1 τ sl d ( s=1 (xs ω sl ) 2 τ 2 sl + 1 ) + σϵ. (7) The simulation parameters are as follows. The amplitudes A l, l = 1,..., L vary in the interval of values [1, 1] and the noise standard error is set to σ = 2, 5, 1 or 15. The width parameters are τ 1 =.4 and τ 2 =.6 for the two-dimensional simulation while adding τ 3 =.8 for three-dimensional simulated data. In this example, the number of Lorentzian components is L = 5 on a grid of points for d = 2 and for d = 3 (i.e. for the two-dimensional study, x i1 {1/512,..., 1} and x i2 {1/256,..., 1}). We simulate the error term from ϵ i1,...,i d N(, 1) independently. 17
18 Reliability. To evaluate the reliability of the EM-test introduced in this paper, we simulate from the model in (1) a dataset with L = 5 well separated regression components; that is, the distance between any two regression components i and is ω i ω > 1 for i. We first estimate the width and variance parameters from all 5 regression components following the first two steps mentioned in Section 3. We apply the EM-test to the region of each regression component as in step 3 in Section 3 and compare the EM-test values with the 95% approximate quantile of the null distribution (see Section 2.2 for the procedure used to compute the null quantiles with S = 1). The type I error is computed as the proportion of reections among the 5 regression components. The penalty function used in computing the EM-test statistic is p(α) = C log(1 1 2α ) with varying penalizing constants: C = Nσ 2 log(n) (optimal), C = Nσ 2 (medium) and C = σ 2 log(n) (small). The initial values for the mixing proportion are α {.1,.2,.3,.4,.5}. Figure 3 shows the type I error for d = 2 and d = 3 simulations. For both d = 2 and d = 3, the type I error is between.5-.1 for medium values of σ 2 at C = Nσ 2 log(n). As the penalizing constant decreases, the test becomes significantly unreliable. Efficiency. To evaluate the accuracy of the EM-test introduced in this paper, we simulate 25 pairs of regression components - a total of L = 5 components. The distance between two regression components in a pair is 2 < ω i ω < 4. That is, we generate only from the alternative hypothesis with varying mixing proportions and varying distances between the regression components. We estimate the variance parameter from all 5 regression components but use the true width values. We apply the EM-test to the region of each pair of regression components and compare the EM-test values with the 95% approximate quantile of the null distribution (see Section 2.2 for the procedure used to compute the null quantiles with S = 1). The power is computed as the proportion of reections among 18
19 the 25 pairs of regression components. Figure 3 shows the power for d = 2 and d = 3 simulations. The power is significantly higher for two-dimensional simulated data - there is a slight difference across varying penalizing constant with a lower power for the optimal C = Nσ 2 log(n). In contrast, for three-dimensional simulated data, the difference in the test power for varying penalizing constants is much larger. Discussion. By varying the constant C, there is a significant reduction in the type I error. The difference in type I error can be as high as.35 for two-dimensional data and.25 for three-dimensional data. The difference is less substantial for three dimensional data, because as dimensionality increases, there is more information (larger sample size) to be used in deciding whether there is one component or more. For example, in our implementation, the size of a region surrounding a local maximum is N = 17 2 = 225 for two dimensional data and N = 17 3 = These results validate our asymptotic theory. On the other hand, the power is not as sensitive to different values of C as the type I error. The largest difference for three-dimensional data is about.1. The power also decreases as the signal-to-noise ratio decreases. An explanation for this is that at large noise level, small amplitude components will be non-detectable from the noise level. When comparing with the simpler approach, the overdispersion test, the power is higher than for the optimal C for both two and three-dimensional simulated data. However, the type I error is significantly higher for three-dimensional data when comparing across all C values, and only slightly higher than the optimal C for two-dimensional data. The poor performance for the three-dimensional design is because of the poor separation of the distribution of S(Â, ˆω ) under the null and alternative hypotheses (see Figure??). We therefore conclude that although we may gain some power by using the overdispersion test, this gain is offset by the loss in the test reliability especially for higher dimensional data. This is critical for our 19
20 application since a large number of false positives in higher dimensional NMR data could lead to significant distortion of the predicted protein structure. (a) 2D Simulation (b) 3D Simulation Type I Error Optimal C Medium C Small C Over Dispersion Type I Error Optimal C Medium C Small C Over Dispersion Standard Deviation Standard Deviation Figure 2: Type I error calculated over 5 regression components generated under the null hypothesis. Reliability vs. Efficiency. In conclusion, as the dimensionality and the error variance increase, the test efficiency decreases also. We can enhance the efficiency by decreasing the penalizing constant but this in turn reduces the reliability of the test. This simulation study clearly illustrates how the penalizing constant controls the trade-off between reliability and efficiency. From this simulation as well as from other simulations not reported here we recommend using the constant C = Nσ 2 log(n). We have performed other simulation studies with a larger number of initial values for the mixing proportion parameter as well as with a different penalty function, p(α) = C log(4α(1 α)). From this extensive simulation study, we therefore conclude that the most important 2
21 (a) 2D Simulation (b) 3D Simulation Power Optimal C Medium C Small C Over Dispersion Power Optimal C Medium C Small C Over Dispersion Standard Deviation Standard Deviation Figure 3: Test power calculated over 5 regression components generated under the alternative hypothesis. tuning parameter in the application of the EM-test remains the penalizing constant C. 5 Case Study 5.1 Motivation and Background In NMR data analysis for biomolecular studies, one primary obective is to estimate parameters (e.g. chemical shifts) of the atomic nuclei of a protein. Under protein magnetization, targeted atomic nuclei in the protein undergo energy transfers; each energy transfer induces a signal which is mathematically described by a decaying sinusoid (see Supplemental material). Therefore, the NMR signal generated by a d-dimensional NMR experiment is a sum 21
22 (a) σ = 2 (b) σ = 15 Density e+ 2e 6 4e 6 6e 6 8e 6 Density e+ 1e 4 2e 4 3e 4 4e 4 5e 4 e+ 1e+5 2e+5 3e Figure 4: The density estimate of the distribution of S(Â, ˆω ) under the null (solid line) and alternative (dotted line) hypotheses. of noisy decaying sinusoids S(t 1, t 2,..., t d ) = ( L d ) A l e iϕ l e t s/τ sl e it sω sl + ϵ t1,...,t d (8) l=1 s=1 where each sinusoid is generated by an energy transfer between d atomic nuclei in d- dimensional NMR experiments (Hoch and Stern, 1996). The model parameters of interest are the resonance frequencies ω l = (ω 1l,..., ω dl ) τ (translated into chemical shifts), and the signal amplitudes A l (translated into structural distance of the atomic nuclei in specific NMR experiments). Also L is the number of observed energy transfers, which is large and unknown. The protein structure is resolved by accurately estimating the resonance frequencies and the signal amplitudes from data generated by NMR experiments. 22
23 The traditional methodology in biomolecular NMR data analysis involves Fourier transformation (FT) of the NMR signal data complemented by other pre-processing steps (Hoch and Stern, 1996). After Fourier Transform, the resulting model is a d-dimensional mixture regression model as described by the model in equation (1) where the shape function is an approximate Lorentzian function (Serban, 25). In this model, the parameters of the lth regression component are the location parameter ω l = (ω 1l,..., ω dl ) τ, which are the signal frequencies, width parameter τ l = (τ 1l,..., τ dl ) τ, and amplitude parameters A l. Because of the one-to-one mapping between energy transfers and regression components, the problem of identifying the parameters of the atomic nuclei undergoing energy transfers translates into accurately identifying and estimating the model parameters. Importantly, in multi-dimensional NMR frequency data, since a large number of regression components have similar frequencies, we expect that some components partially overlap or mix. It is important to de-mix the components since each regression component provides specific information about the structure of the protein. In certain cases, the lack of a small number of essential components can lead to a significant deviation of the predicted structure (Güntert, 23). Many of the existing software packages for biomolecular NMR data analysis incorporate routines for component identification but most of them do not incorporate a routine for detecting mixtures (Güntert, 23; Gronwald and Kalbitzer, 24). The common practice for detecting mixed or overlapped components is by visualizing the contours of all local maxima and manually selecting the ones that display significant mixing. However, manual intervention is tedious and time consuming since generally L is large. The testing procedure in this paper is a knowledge-based means for detecting a small number of candidate overlapped components that can be investigated visually. This procedure will reduce the time spent to 23
24 manually select mixed components by at least ten folds. Remark 5 In Section 1, we mention that the assumption of equal widths is common in NMR biomolecular studies. The width parameters in the frequency domain map to the decaying times τ l, l = 1,..., L in the time domain. Since in many biomolecular NMR studies, we detect the exchange of energy between the same pair of nuclei (e.g. 1 5N-proton), the decaying times are similar, and therefore the width parameters are similar. Remark 6 Another assumption for the mixture regression test is that the location parameters ω l, l = 1,..., L are bounded (condition C3). For NMR biomolecular data, this assumption holds since the NMR signals are filtered such that the frequencies are within a specific bandwidth. 5.2 Application Data: 2D and 3D NMR Experiments Data Specifications. The experimental data used to illustrate the applicability of the mixture regression test are for a doubly-labeled sample of a 13 residue RNA binding proteinrho13 - using standard double (HSQC) and triple resonance (HNCOCA and HNCA) experiments on a 1 mm protein sample at a proton frequency of 6 MHz as introduced in Briercheck et al. (1998). The data were processed with FELIX (Accelrys Software Inc.) using apodization and linear prediction methods that are typical for these types of experiments. The HSQC spectrum is a 2D NMR data in which the regression components show correlations between the amide nitrogen and the attached amide proton in the protein sequence. The intensities Z i1,i 2 are observed over a two-dimensional grid of points with M 1 = 512 and M 2 = 256. The HNCOCA is a three-dimensional experiment generating 3D NMR data in which each component arises due to correlations between the amide nitrogen and amide proton of a specific residue and the alpha carbon of the preceding residue in the protein 24
25 Dataset No. of Local Maxima No. of Reections (EM-test) HSQC HNCA 235 HNCOCA 118 Table 1: Number of local maxima and the number of mixed regression components for the three application datasets. sequence. HNCA is also a three-dimensional experiment in which components are paired with similar amide nitrogen and amide proton frequencies. In HNCA, a pair of components arises due to correlations between the amide nitrogen, amide proton and the alpha carbon nuclei of the preceding residue and of the intra-residue. Therefore, for this experiment, the true number of regression components will be about twice the number of protein residues and half of the components will match the components in HNCOCA. For both HNCOCA and HNCA NMR datasets, the intensity values Z i1,i 2,i 3 are observed over a grid of points of size Mixture Test Application. Using the local maxima identification method briefly described in Section 3 and introduced in Serban (27, 21), we identified 146 local maxima for the HSQC method, 235 for the HNCA method and 118 for the HNCOCA method. We applied the EM-test to the two-dimensional and three-dimensional data generated by these three NMR experiments for the RNA binding protein- rho13. For computational simplicity and effectiveness, the mixture test was applied only to data in the neighborhood of a local maximum as specified in Section 3 (step 3). The width parameters τ s, s = 1,..., d were estimated by estimating the widths of all local maxima and then taking the median over all estimates. The number of reections (mixed regression components) along with the number of local maxima identified using the method in Serban (27, 21) are provided in Table 1. In this study because the error variance is unknown, the estimated variance is the median 25
26 absolute value of the high resolution wavelet coefficients after wavelet transformation of each of the three datasets (Serban, 21). The regression component widths are estimated by first interpolating a Lorentzian function at each local maximum; the common width estimate is then the median over the widths estimated from this interpolation step. The contour plots for the nine reected regression components for the HSQC data are in Figure 4. A visual inspection of the contour plots further suggests which of the nine reections are true negatives. The contour plots 1-3 and 9 are asymmetric and larger in size - a pattern that reveals a mixture of two components overlapped into one local maximum. The five contour plots 4-8 are symmetric, all with similar widths in both directions. In the supplemental material we complement the contour plots with perspective plots which show the shape and the width of the nine reections. We complemented the analysis of two-dimensional data with manual intervention - visually assessment of each of the 146 local maxima. Through this experiment we didn t identify more than the four mixtures discussed above which shows that the application of the mixture test does minimizes the effort of detection of overlapped or mixed components by reducing the number of screened local maxima from 146 to only 9. For the two three-dimensional datasets, we do not reect the null hypothesis for any local maxima. The reason for not detecting any mixture of regression components is that the protein in this study has a small number of residues leading to a small number of components (L 26 for HNCA and L 13 for HNCOCA) which are spread over a three dimensional data leading to higher resolution. Generally, one advantage of higher dimensional NMR data is an increase in resolution measured by the distance between components, which will result in a smaller number of mixed components than in lower dimensional data. On the other hand, the drawback of higher dimensional data is lower signal-to-noise ratio - a smaller number 26
27 ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 ) ( 8 ) ( 9 ) Figure 5: The contour plots for the nine reections from the application of the EM-test to the HSQC data. of identifiable regression components from the noisy background - in our application, we underestimate the number of components for the two three-dimensional datasets. Therefore, complementing the NMR data analysis with a mixture test will increase the ability to identify regression components with lower dimensional NMR experiments which have not only higher signal-to-noise ratio but they are less time-consuming and less expensive. 27
28 Appendix A: Existence of a Solution to g(α) An updated value of α given that A = A (k), ω 1 = ω (k),1 and ω 2 = ω (k),2 are fixed, is derived from maximizing penalized log-likelihood function with respect to α. This is equivalent to minimizing g(α) in equation (6) in the main manuscript with C 1 = 1 2σ 2 C 2 = 1 2σ 2 ( A (k) d / s=1 τ s ) ( A (k) d / s=1 τ s } d s=1 {(x is ω (k),s1 )2 τs d s=1 {(x is ω (k) ( A (k) d ) / s=1 Z τ s i1,...,i d } d s=1 {(x is ω (k),s2 )2 τs ) ( A (k) d / s=1 τ s d s=1 {(x is ω (k),s1 )2 τ 2 s + 1 } ),s2 )2 τ 2 ( A (k) d / s=1 τ s } s + 1 ) d s=1 {(x is ω (k),s2 )2 τ 2 s }. Penalty p(α) = C log(4α(1 α)). To obtain an updated value of α by minimizing g(α), we take the first order derivative of g(α) and equate to zero. The equation derived from equating the first order derivative when using the penalty p(α) = C log(4α(1 α)) is g (α) = 2C 1 α 2C 2 C(1 2α) α(1 α). Setting g (α) =, there exists a real solution in the interval (, 1) for C > since lim α g (α) = and lim α 1 g (α) = +. Moreover, the second order derivative is ( ) 2 (1 2α)2 2C 1 + C + α(1 α) α 2 (1 α) 2 which is positive for all α [, 1] since C 1. This implies that g (α) has a unique solution in [, 1] and this solution is the minimal point of g(α). 28
29 Penalty p(α) = C log(1 1 2α ). Let α = C 2 /C 1, which is the minimal point of C 1 α 2 2C 2 α. If α <.5, then the minimal point of g(α) should be in (,.5]. Note that when α (,.5], g(α) = C 1 α 2 2C 2 α C log(2α) and g (α) = 2C 1 α 2C 2 C/α. Setting g (α) =, we get a unique positive solution ˆα = 2C 2 + 4C C 1 C 4C 1. The second derivative over (,.5] is always positive. Therefore, for the penalty p(α) = C log(1 1 2α ), the updated value of α is α (k+1) = min {.5, 2C 2 + 4C C 1 C 4C 1 }. Similarly, when α.5, the updated value of α is α (k+1) = max {.5, 2C 1 + 2C 2 4(C 1 C 2 ) 2 + 8C 1 C 4C 1 }. In summary, α (k+1) = max { min.5, 2C 2+ {.5, 2C 1+2C 2 } 4C2 2+8C 1C 4C 1 } 4(C 1 C 2 ) 2 +8C 1 C 4C 1, if C 2 /C 1 <.5, if C 2 /C 1.5. Appendix B: Proofs of Theorems 1 and 2 Without loss of generality, we assume that ω s =, τ s = 1 and σ = 1. Let A = Z i1,...,i d d N(, 1). s=1 (x2 is + 1) 29
30 In the following, we use to denote M 1,...,M d i 1 =1,...,i d =1. A brief roadmap for the proofs is as follows. Lemma 1 shows that any estimator with α bounded away from or 1, and with a large likelihood value, is consistent for A, ω 1 and ω 2 under the null hypothesis. Lemma 2 strengthens Lemma 1 by providing specific convergence rates. Theorems 1 and 2 then follow directly from these two lemmas. Lemma 1 Assume the same conditions in Theorem 1. Let (ᾱ, Ā, ω 1, ω 2 ) be estimators of (α, A, ω 1, ω 2 ) such that δ ᾱ.5 for some δ (,.5]. Assume that l(ᾱ, Ā, ω 1, ω 2 ) l(.5, A, ω, ω ) c >. Then, under the null hypothesis, Ā A = o p (1), ω 1 = o p (1) and ω 2 = o p (1). Proof. Under the assumptions α [δ,.5], ω s1, ω s2 [ W, W ], only when A = A, ω 1 = ω 2 =, the mixture regression model reduces to the null model. That is, the parameters A, ω 1 and ω 2 are identifiable. Then Wald (1949) s idea can be applied to show that Ā A = o p (1), ω 1 = o p (1) and ω 2 = o p (1). For the convenience of presentation, let = Z i1,...,i d A d s=1 (x2 is + 1), which has N(, 1) distribution under the null hypothesis, and h i1,...,i d (A, ω) = A d s=1 {(x is ω s ) 2 + 1} A d s=1 (x2 is + 1). 3
31 In the following, we use to denote M 1,...,M d i 1 =1,...,i d =1. Lemma 2 Assume the same conditions in Theorem 1. Let (ᾱ, Ā, ω 1, ω 2 ) be estimators of (α, A, ω 1, ω 2 ) such that under the null hypothesis, Ā A = o p (1), ω 1 = o p (1), ω 2 = o p (1). Assume that pl(ᾱ, Ā, ω 1, ω 2 ) pl(.5, A, ω, ω ) c >. Then, under the null hypothesis p(ᾱ) = O p (1), Ā A = O p (N 1/2 ), ω 1 = O p (N 1/4 ), ω 2 = O p (N 1/4 ). Proof. Let R 1 (ᾱ, Ā, ω 1, ω 2 ) = 2{l(ᾱ, Ā, ω 1, ω 2 ) l(.5, A, ω, ω )}. Note that the penalty function is non-negative. It follows that 2c 2{pl(ᾱ, Ā, ω 1, ω 2 ) pl(.5, A, ω, ω )} R 1 (ᾱ, Ā, ω 1, ω 2 ). With the notation h i1,...,i d (, ), we can write R 1 (ᾱ, Ā, ω 1, ω 2 ) into the following form: R 1 (ᾱ, Ā, ω 1, ω 2 ) = 2 {ᾱh i1,...,i d (Ā, ω 1) + (1 ᾱ)h i1,...,i d (Ā, ω 2)} {ᾱh i1,...,i d (Ā, ω 1) + (1 ᾱ)h i1,...,i d (Ā, ω 2)} 2. Applying 2nd-order Taylor expansion to 1/ d s=1 {(x is ω s1 ) 2 + 1}, we get h i1,...,i d (Ā, ω 1) = (Ā A )U i1,...,i d + d Ā ω s1 V i1,...,i d ;s + s=1 + s<t Ā ω s1 ω t1 T i1,...,i d ;st + e i1,...,i d (Ā, ω 1), d Ā ω s1w 2 i1,...,i d ;s s=1 where e i1,...,i d (Ā, ω 1) is the reminder. Similar approximation can also obtained for h i1,...,i d (Ā, ω 2). 31
Noise Reduction for Enhanced Component Identification in Multi-Dimensional Biomolecular NMR Studies
Noise Reduction for Enhanced Component Identification in Multi-Dimensional Biomolecular NMR Studies Nicoleta Serban 1 The objective of the research presented in this paper is to shed light into the benefits
More informationTheoretical Limits of Component Identification in a Separable Nonlinear Least Squares Problem
To appear in the Journal of Nonparametric Statistics Vol. 00, No. 00, Month 0XX, 1 5 Theoretical Limits of Component Identification in a Separable Nonlinear Least Squares Problem R. Hilton a and N. Serban
More informationA MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS
A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS Hanfeng Chen, Jiahua Chen and John D. Kalbfleisch Bowling Green State University and University of Waterloo Abstract Testing for
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationOn the Power of Tests for Regime Switching
On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating
More informationExtended Bayesian Information Criteria for Gaussian Graphical Models
Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationNonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix
Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract
More informationTesting for Homogeneity in Genetic Linkage Analysis
Testing for Homogeneity in Genetic Linkage Analysis Yuejiao Fu, 1, Jiahua Chen 2 and John D. Kalbfleisch 3 1 Department of Mathematics and Statistics, York University Toronto, ON, M3J 1P3, Canada 2 Department
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationUncertainty quantification and visualization for functional random variables
Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,
More informationBiochemistry 530 NMR Theory and Practice. Gabriele Varani Department of Biochemistry and Department of Chemistry University of Washington
Biochemistry 530 NMR Theory and Practice Gabriele Varani Department of Biochemistry and Department of Chemistry University of Washington 1D spectra contain structural information.. but is hard to extract:
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationMODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM
Statistica Sinica 19 (009), 1603-1619 MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM Yuejiao Fu, Jiahua Chen and John D. Kalbfleisch York University, University of British Columbia
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationLineShapeKin NMR Line Shape Analysis Software for Studies of Protein-Ligand Interaction Kinetics
LineShapeKin NMR Line Shape Analysis Software for Studies of Protein-Ligand Interaction Kinetics http://lineshapekin.net Spectral intensity Evgenii L. Kovrigin Department of Biochemistry, Medical College
More informationNMR Resonance Assignment Assisted by Mass Spectrometry
NMR Resonance Assignment Assisted by Mass Spectrometry This lecture talked about a NMR resonance assignment assisted by mass spectrometry [1, 2]. 1 Motivation Nuclear magnetic resonance (NMR) provides
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationComposite Hypotheses and Generalized Likelihood Ratio Tests
Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationOpen Problems in Mixed Models
xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For
More informationBMB/Bi/Ch 173 Winter 2018
BMB/Bi/Ch 173 Winter 2018 Homework Set 8.1 (100 Points) Assigned 2-27-18, due 3-6-18 by 10:30 a.m. TA: Rachael Kuintzle. Office hours: SFL 220, Friday 3/2 4:00-5:00pm and SFL 229, Monday 3/5 4:00-5:30pm.
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationThe Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series
The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient
More informationDiscussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon
Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationSome General Types of Tests
Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationTESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS
TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS By Hanfeng Chen and Jiahua Chen 1 Bowling Green State University and University of Waterloo Abstract.
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationCh. 5 Hypothesis Testing
Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,
More informationLikelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science
1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School
More informationAdjusted Empirical Likelihood for Long-memory Time Series Models
Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationBayesian Nonparametric Point Estimation Under a Conjugate Prior
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationNonparametric Modal Regression
Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationLabor-Supply Shifts and Economic Fluctuations. Technical Appendix
Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January
More informationSupplementary Note on Bayesian analysis
Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationDefect Detection using Nonparametric Regression
Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare
More informationPrevious lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)
Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value
More informationOPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE
17th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE Abdourrahmane M. Atto 1, Dominique Pastor, Gregoire Mercier
More informationMixture of Gaussians Models
Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures
More information2.1.3 The Testing Problem and Neave s Step Method
we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationHypothesis Testing - Frequentist
Frequentist Hypothesis Testing - Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationEstimating a pseudounitary operator for velocity-stack inversion
Stanford Exploration Project, Report 82, May 11, 2001, pages 1 77 Estimating a pseudounitary operator for velocity-stack inversion David E. Lumley 1 ABSTRACT I estimate a pseudounitary operator for enhancing
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationAn Introduction to Wavelets and some Applications
An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54
More informationHigh Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data
High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationForecasting Wind Ramps
Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators
More informationLearning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )
Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationLecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary
ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood
More informationOn Mixture Regression Shrinkage and Selection via the MR-LASSO
On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationTesting Algebraic Hypotheses
Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18 Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable:
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationSolutions for Examination Categorical Data Analysis, March 21, 2013
STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationNUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction
NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More information2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?
ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we
More informationEM Algorithms for Ordered Probit Models with Endogenous Regressors
EM Algorithms for Ordered Probit Models with Endogenous Regressors Hiroyuki Kawakatsu Business School Dublin City University Dublin 9, Ireland hiroyuki.kawakatsu@dcu.ie Ann G. Largey Business School Dublin
More information1 Lyapunov theory of stability
M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationTesting Homogeneity Of A Large Data Set By Bootstrapping
Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationIMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES
IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES Bere M. Gur Prof. Christopher Niezreci Prof. Peter Avitabile Structural Dynamics and Acoustic Systems
More information