A Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies

Size: px
Start display at page:

Download "A Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies"

Transcription

1 A Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies Nicoleta Serban 1 H. Milton Stewart School of Industrial Systems and Engineering Georgia Institute of Technology nserban@isye.gatech.edu Pengfei Li Department of Statistics and Actuarial Sciences University of Waterloo pengfei.li@uwaterloo.ca In this paper, we introduce a statistical hypothesis test for detecting mixtures in a regression model described by a regression function which is a weighted sum of multi-dimensional unimodal functions. Two regression components are mixed when the distance between their centers is small or the proportion of their contribution to the mixture is close to zero or one. Two challenges in model estimation under the null hypothesis of one regression component are that the mixing proportion lies on the boundary of the parameter space and that the parameters are non-identifiable. Therefore, the parameter estimators derived from standard nonlinear estimation approaches are inconsistent and unstable. To overcome these challenges, we study a penalized regression test statistic with a relatively simple quadratic approximation which can be used to simulate the quantiles of the test statistic under the null hypothesis. We also show that the parameter estimators using the penalized regression approach in this paper are consistent. The motivating application is detection of mixed components in multi-dimensional biomolecular Nuclear Magnetic Resonance (NMR) data. Keywords: biomolecular NMR, density mixture, mixture test, multi-dimensional mixture regression, likelihood ratio test. 1 Correspondent Author 1

2 1 Introduction In this paper, we focus on a specific nonlinear regression model described by Z i1,...,i d = L A l s (x i1,..., x id ; ω l, τ l ) + σϵ i1,...,i d, i 1 = 1,... M 1,..., i d = 1,..., M d (1) l=1 where the multi-dimensional regression component A l s (x i1,..., x id ; ω l, τ l ) is a parametric function uniquely identified by a set of parameters describing the amplitude (A l ), the width (τ l = (τ l1,..., τ ld ) τ ) and the location (ω l = (ω l1,..., ω ld ) τ ) of the regression component. The regression function is nonlinear in the location and width parameters. The shape function s is assumed to be known, symmetric and unimodal. The d-dimensional design points (x i1,..., x id ) are fixed and equally spaced and observed over a compact set. For brevity, we assume (x i1,..., x id ) [, 1] d. It follows that the location parameters ω l, l = 1,..., L also fall within the same compact set [, 1] d. The error terms ϵ i1,...,i d are commonly assumed to be independent and identically distributed. The observations Z i1,...,i d are intensity values. Each regression component in the model (1) will be revealed as a local maximum in the observed intensity data Z i1,...,i d, when the distance between components is large and when the signal-to-noise level is high (the amplitudes are well away from the noise level). That is, under these assumptions, there is a one-to-one mapping between the regression components and the local maxima in the data. However, in many applications such as the motivating NMR case study, the distance between components is small, and therefore, there will be mixed regression components that overlap into one local maximum. In Figure 1, we show an example of two-dimensional overlapped regression components; because the distance between components is small, the two components merge into one local maximum. It is this type of mixed components that is of interest in this paper. To identify statistically significant mixed components, we introduce a hypothesis test for 2

3 Figure 1: An example of two-dimensional overlapped regression components: contour and perspective plots. mixtures in the regression model described in (1). In practice, application of the mixture test involves two steps: 1. Apply a local maxima identification method (Serban, 27, 21 and the references therein); and 2. Apply the testing procedure in this paper to the local intensity data of each local maximum to decide whether it maps to one or more regression components. In NMR data analysis, the evaluation of local maxima for detection of overlapped components is commonly performed manually. The mixture statistical test introduced in this paper will reduce the manual intervention which is error prone and time consuming. For illustration of the mixture hypothesis test, we consider a specific parametric function s, the Lorentzian function, commonly used to model Nuclear Magnetic Resonance (NMR) data (see Section 5). However, our theoretical results apply to other symmetric unimodal shape functions as long as the regression components are identifiable. Under the assumption of a mixture of Lorentzian regression components, the model 3

4 becomes Z (l) = α ( d ) A/ s=1 τ s d s=1 {(x is ω s1 ) 2 τs 2 + 1} + (1 α) ( d ) A/ s=1 τ s d s=1 {(x is ω s2 ) 2 τs 2 + 1} + σϵ, (2) where Z (l) are observations from a local region of the complete data Z i1,...,i d. For simplicity of the notation, we will refer to Z (l) as Z i1,...,i d but we hereby note that we will apply the hypothesis test locally since the full data will generally consist of a large number of regression components. We do not need to correct for multiplicity in testing for two components across multiple regions of the complete data since the testing is not simultaneous. In model (2), the amplitude parameters A 1 and A 2 of the two components are uniquely defined by the parameters A and α (A = A 1 +A 2 and α = A 1 /(A 1 +A 2 )). We re-parameterize the model as in (2) to express the null hypothesis with respect to one parameter α rather than two parameters A 1 and A 2. Under this modeling framework, the null hypothesis is H : {α = or α = 1 or ω 1 = ω 2 } where ω 1 = (ω 11,..., ω d1 ) τ and ω 2 = (ω 12,..., ω d2 ) τ are the location parameters of the components that are tested for mixture. For testing the null hypothesis of one regression component, we assume that the width parameters τ s and the error variance σ 2 are known and fixed. The assumption of fixed σ 2 implies that the signal-to-noise ratio is fixed and therefore the false discovery rate of the regression components is also fixed. At large values of σ 2, small amplitude components will be non-detectable from the noise level; that is, σ 2 needs to be well below A max{α,1 α} d. On the s=1 τ s other hand, the width parameters change the shape of the regression function. For fixed parameters α and A but small values of τ s, the heights of the two components decrease and the tails become fatter which reduces the identifiability of the model parameters. Therefore, the assumption of fixed widths ensures some level of identifiability of the regression components. 4

5 The assumption of fixed width parameters and error variance is common practice (Koradi et al., 1998; Malmodin et al., 23). These parameters are commonly estimated from the complete data (all regression components) before applying the hypothesis testing procedure as we discuss in Section 3. Estimating these parameters from the complete data rather than locally will provide more accurate estimates since we pull information from multiple regression components. We highlight here that the model (1) is not a mixture of regressions but a regression model in which the regression function is a sum of weighted components, and therefore, the existing research on detecting mixture of densities (Titterington et al., 1985; Müller and Sawitzki, 1991; Roeder, 1994; Chen and Kalbflisch, 1996;, Walther, 24; Chen and Li, 29; Li et al., 29 and the references therein), does not apply to the statistical framework introduced in this paper. A widespread test statistic used in detecting density mixtures is the likelihood ratio test (LRT). Similar to the problem of detecting mixtures of densities, LRT has its limitations under the regression framework discussed in this paper. Under the null hypothesis, the model parameters are non-identifiable since the null hypothesis depends on two parametric conditions involving α, ω 1 and ω 2. Moreover, the (mixing) proportion parameter α is on the boundary of the parameter space since it takes values or 1. Due to these two irregularities, standard likelihood-based estimation (nonlinear least squares estimation under normality assumption) provides inconsistent and unstable parameter estimators (Jennrich, 1969). An alternative approach to the LRT is to place a prior distribution on the parameter α, and similarly to Aitkin and Rubin (1985), the mixing proportion becomes a nuisance parameter which is further integrated out. However, under the regression framework one regularity condition for the asymptotic distribution of the likelihood ratio test does not hold 5

6 for the Aitkin and Rubin (1985) approach (Serban, 25). Different re-parametrizations of the mixture regression model may also provide identifiable parameters under the null hypothesis. For mixture of densities, re-parameterization has been investigated under the assumption that the true null parameter is known (Lemadani and Pons, 1999) or that the true parameter is unknown (Dacunha-Castelle and Gassiat, 1997; Liu and Shao, 23). Other approaches impose constraints on the density means (Ghosh and Sen, 1985) or mixing proportion parameters (Chen et al., 21 and Chen and Kalbfleisch, 25). In detecting mixtures in the multi-dimensional regression model described in this paper, we proceed with the latter approach; that is, we constrain the mixing proportion parameter α to be away from the boundary values under the null hypothesis. This constraint overcomes both the identifiability problem and the estimation of the mixing proportion on the boundary of the parameter space. The constraint is defined through a penalty function p(α) which attains its maximum at α = 1. Under the null hypothesis, the penalty term is. To obtain 2 the penalized LR, we therefore need to employ penalized nonlinear regression under the null and alternative hypotheses. Consequently, we introduce an estimation approach that uses the idea of an Expectation-Maximization type algorithm (Chen and Li, 29; and Li et al., 29). In this paper, we show the consistency of the parameter estimates under the proposed estimation approach. One difficulty in penalized likelihood ratio (PLR) testing for mixture detection is deriving the (asymptotic) distribution of the test statistic under the null hypothesis. For density mixture detection, the test statistic is commonly distributed according to an asymptotic mixture distribution.5χ 2 +.5χ 2 1 (Chen and Kalbflisch, 1996; Chen et al., 21; Chen and Li, 29; and Li et al., 29). In contrast, in hypothesis testing for mixtures in the regression function, the asymptotic distribution of the PLR test statistic under the null hypothesis 6

7 doesn t have a closed form expression. In this paper we investigate a test statistic derived using a similar approach to the EM test statistic recommended by Chen and Li (29) and Li et al. (29). Because the asymptotic distribution of the EM-test does not have a close form expression, we derive a quadratic approximation for the distribution of the EM-test statistic and we use this distribution to obtain the quantiles of the null distribution of the test statistic using sampling techniques. We discuss the testing procedure in Section 2 along with asymptotic results in Section 2.2. In Section 3, we describe the application of mixture test to identification of overlapped regression components in the more general model for L > 2. In a simulation study in Section 4, we evaluate the accuracy (type I error) and the efficiency (power) of the testing procedure. The statistical application investigated in this paper is pertinent to the study of three-dimensional protein structure determination using NMR. We introduce the NMR biomolecular studies in Section 5 and the application of the proposed testing procedure to two and three-dimensional NMR data in Section Testing Procedure Define the likelihood function l(α, A, ω 1, ω 2 ) for the regression model in (2). Under the assumption of normal error distribution, the log-likelihood function is proportional to 1 2σ 2 M 1,...,M d i 1 =1,...,i d =1 (Z i1,...,i d Aαs (x i1,..., x id ; ω 1, τ ) A(1 α)s (x i1,..., x id ; ω 2, τ )) 2, (3) where α is a weight parameter and must be in the interval [, 1]. We penalize the loglikelihood function using a penalty p(α) which is a continuous function such that it is maximized at.5 and goes to negative infinity as α goes to or 1. Without loss of generality we 7

8 assume that α [,.5]. Define the penalized log-likelihood function as follows: pl(α, A, ω 1, ω 2 ) = l(α, A, ω 1, ω 2 ) + p(α). (4) 2.1 EM-test Statistic In this section, we introduce a test statistic which measures the discrepancy between the null hypothesis H of one regression component and the observed data. The test statistic is a version of the EM-test introduced by Chen and Li (29) and Li et al. (29) but extended to our regression framework. The primary motivation for using this test statistic is the efficiency of the LRT under the assumption of fixed α = α. Since α is not fixed, we instead proceed with an EM-type algorithm in which at the M step we assume α is fixed and estimate the other model parameters, and at the E step we update the parameter α; then repeat the E and M steps for obtaining a more suitable mixing proportion which improves the power. Only a small number of iterations, K, is used. The asymptotic approximation of the null distribution holds under finite K. The procedure to derive the EM-test statistic is described below. Step : Estimate the scaling parameter A and the location parameters ω 1 = ω 2 under the null model by maximizing the penalized likelihood function in (4). Denote (Â, ˆω ) = arg max pl(α =.5, A, ω, ω). Step 1: In the next steps, we obtain estimates for the amplitude and location parameters under the alternative hypothesis given initial values for the mixing proportion. Choose a number of initial values for α: α 1,..., α J in (,.5]. For each α value, we obtain estimates for the parameters A, ω 1, ω 2 using an iterative algorithm similar in its steps to the EM algorithm. Let α () = α for = 1,..., J and start with the first iteration k = 1 where 8

9 k K (K is the maximum number of iterations). Step 1.1. Estimate A, ω 1, ω 2 by maximizing the likelihood function holding fixed the proportion parameter α = α (k), i.e., (A (k), ω (k),1, ω(k),2 ) = arg max A, ω 1, ω 2 pl(α (k 1), A, ω 1, ω 2 ). Step 1.2. Update the mixing proportion parameter α (k) by minimizing C 1 = 1 2σ 2 C 2 = 1 2σ 2 M 1,...,M d i 1 =1,...,i d =1 M 1,...,M d i 1 =1,...,i d =1 g(α) = C 1 α 2 2C 2 α p(α) where (5) { A (k) } 2 s(x i1,..., x id ; ω (k),1, τ ) A(k) s(x i1,..., x id ; ω (k),2, τ ) { } Z i1,...,i d A (k) s(x i1 s(x i1,..., x id ; ω (k),2, τ ) { A (k) } s(x i1,..., x id ; ω (k),1, τ ) A(k) s(x i1,..., x id ; ω (k),2, τ ). Step 1.3. Let k = k + 1, and iterate Step 1.1 and Step 1.2 if k < K. Step 2: For = 1,..., J, define the statistic which depends on the initial value α, M (K) (α ) = 2{pl(α (K), A (K), ω (K),1, ω(k),2 ) pl(.5, Â, ˆω, ˆω )}. The EM-test statistic is defined as EM (K) = max{m (K) (α ), = 1,..., J}. We reect the null hypothesis if EM (K) is greater than a specified critical value which is determined from the limiting distribution of the test statistic (see Section 2.2). In our empirical studies, the maximum number of iterations is K = 1. A larger number of iterations does not enhance the efficiency of the testing procedure. For the EM-test statistic, we need to specify the initial values of the mixing parameter, the penalty p(α) and the penalizing constant C. In general, we recommend start- 9

10 ing with a small number of initial values for the mixing parameter, for example, α {.1,.2,.3,.4,.5}. Our empirical studies showed that a larger number of initial values for the mixing parameter does not improve the accuracy of the testing procedure. When selecting the penalty function p(α), we need to consider two important criteria. First, the equation used to update the mixing parameter in (5) does not have a minimum for every penalty function. We show in the Appendix that there exists a unique minimum to this equation for two commonly used penalties p(α) = C log(1 1 2α ) (in Li et al., 29) and p(α) = C log(4α(1 α)) (Chen et al., 21) for any constant C >. Second, the choice of the penalty function impacts the trade-off between the Type I error and the power of the test as well as the accuracy of the approximation under the null hypothesis. The penalty p(α) = C log(1 1 2α ) is a lasso-type penalty (Li et al., 29); it is a continuous function for all α.5, the null value. As pointed out by Li et al. (29) and Chen and Li (29), this penalty has similar properties to the lasso-type penalty for linear regression; consequently, the probability of the fitted value of α = 1/2 is positive. In contrast, the penalty p(α) = C log(4α(1 α)) is smooth for all α. Following the suggestion in Li et al. (29) we use p(α) = C log(1 1 2α ) in the implementation of the EM-test procedure. The constant C controls the penalization of the LRT when α 1 or α. The selection of the penalty constant C significantly affects the reliability (type I error) and the precision (power) of the test. The smaller C is, the less penalized the null hypothesis is leading to a high type I error but enhanced power. In our empirical studies, the optimal value for C is of order σ 2 log(n)n where σ 2 is the variance of the error which is assumed known and N is the sample size of the data - using the notation in equation (2), N = M 1 M 2... M d. The intuition behind this penalty constant is as follows. Assume that we know the true parameters A, α, ω 1, ω 2 and replace them in the least squares sum in (3). The result is a 1

11 sum of N squared normal errors. We guide our selection for C using the following property of a sequence of normal random variables ( P sup X i > σ ) 2 log(n) 1 as n. i=1,...,n Although this result doesn t provide a limiting condition for N i=1 X2 i, it can be used to assess its magnitude which is in the range of 2Nσ 2 log(n). An alternative test statistic that can be used to detect mixtures in the regression function is the modified likelihood ratio test (MLRT) in the spirit of Chen et al. (21). The MLRT can be viewed as a limiting case of the EM-test statistic when the iteration number K goes to. Theorem 1 below indicates that a finite number of iterations will only change the parameter estimates by o p (1). Therefore, Theorem 1 guarantees under finite number of iterations that the estimators of A, ω 1 and ω 2 are in a small neighborhood of the true parameter values; therefore, the quadratic approximation of the EM-test becomes valid. On the other hand, based on the same result, we cannot infer how the test statistic behaves when K ; an infinite number of iterations may lead to intractable theoretical results. 2.2 Asymptotic Results We first introduce the notation that is used in the asymptotic results in this section. Define U i1,...,i d = V i1,...,i d ; s = T i1,...,i d ; st = 1 d s=1 {(x is ω s ) 2 τ 2 s + 1}, 2(x is ω s )/τ s {(x is ω s ) 2 τs 2 + 1} U, W i1,...,i d ; s = 3(x is ω s) 2 τs 2 1 {(x is ω s ) 2 τs 2 + 1} U 2, 4(x is ω s )(x it ω t )/(τ s τ t ) {(x is ω s ) 2 τ 2 s + 1}{(x it ω t ) 2 τ 2 t + 1} U where ω = (ω 1,..., ω d ) τ is the location of the regression component under the null hypothesis. The notation above is based on the assumption that the shape function takes 11

12 the form of the Lorentzian function. For other shape functions, U i1,...,i d is the regression function under the null hypothesis, V i1,...,i d ; s is the first derivatives of the shape function, and W i1,...,i d ; s and T i1,...,i d ; st are the second derivatives of the shape function. Further denote the vectors a i1,...,i d = (U i1,...,i d, V i1,...,i d ;1,..., V i1,...,i d ;d) τ b i1,...,i d = (W i1,...,i d ;1,..., W i1,...,i d ;d, T i1,...,i d ;12,..., T i1,...,i d ;d 1d) τ c i1,...,i d = (a τ, b τ ) τ. Using the notation above, we define the matrix B = M 1,...,M d i 1 =1,...,i d =1 c i1,...,i d c τ (6) which can be further decomposed into B 11 = M 1,...,M d i 1 =1,...,i d =1 a i1,...,i d a τ, B 21 = B τ 12 = M 1,...,M d i 1 =1,...,i d =1 b i1,...,i d a τ, B 22 = M 1,...,M d i 1 =1,...,i d =1 b i1,...,i d b τ. Further, we denote B 22 = B 22 B 21 B 1 11 B τ 21. The asymptotic properties depend on the following three conditions. C1 The penalty function p(α) is a continuous function such that it is maximized at.5 and goes to negative infinity as α goes to or 1. C2 The matrix 1 N B converges to some positive matrix as N = s M s. C3 There exists a W > such that ω s1, ω s2 [ W, W ] for s = 1,..., d. In fixed design non-linear regression models, one regularity condition is that the matrix, 1 B N 11 converges to a positive definite matrix. Assumption C2 is stronger than this assumption. It is related to the strong identifiability condition proposed in Chen (1995). This condition 12

13 ensures that the estimates of ω 1 and ω 2 have the optimal convergence rate. It can be verified that if the design points x is for i = 1,..., M i are evenly spaced over [,1], then Assumption C2 is satisfied. Theorem 1 ensures that under the null hypothesis the penalized likelihood approach provides consistent parameter estimators. Theorem 1. Assume conditions C1, C2 and C3 are satisfied. Under the null hypothesis, we have, for = 1,..., J and any fixed finite K, p(α (K) ) = O p (1), A (K) A = O p (N 1/2 ), ω (K),1 ω = O p (N 1/4 ), ω (K),2 ω = O p (N 1/4 ). Based on this consistency result, we derive the following quadratic approximation of the EM-test statistic. Because the asymptotic distribution of the penalized likelihood ratio test does not have a close form expression, we use this approximation to obtain the null quantiles. Theorem 2 Assume conditions C1, C2 and C3 are satisfied and α 1 =.5. Let w be a multivariate normal random vector with mean vector and covariance matrix B 22. Under the null hypothesis, we have, for any fixed finite K, as N, { } EM (K) = sup θ 2θ τ w θ τ B22 θ + o p (1) with θ = (θ 2 1,..., θ 2 d, θ 1θ 2,..., θ d 1 θ d ) τ and (θ 1,..., θ d ) R d. The distribution of the quadratic approximation is difficult to find. To overcome this difficulty, we suggest the following algorithm to simulate the null distribution quantiles. 1. Generate S random vectors, { w (s), s = 1,..., S }, from multivariate normal distribution with mean and covariance matrix B

14 } 2. For each vector w (s), calculate Q s = sup θ {2θ τ w (s) θ τ B22 θ. Compute the quantiles of the Q 1,..., Q S and use them to approximate the quantiles of EM (K). The proofs of Theorem 1 and Theorem 2 are in the Appendix of this paper. Remark 1 Our asymptotic results apply to penalties such that p(.5) = as long as C >. If the penalty constant is selected such that C N as N, for example the suggested penalty constant C N = σ 2 N log(n), the penalty term converges to in probability and does not contribute the EM-test statistics. This will not change the asymptotic properties of the EM-test statistics. Remark 2 Since our testing problem in (2) is not a homogeneity test in mixture of densities but a homogeneity test in the mean function of a non-linear regression, the asymptotic results in Li et al. (29) and Chen and Li (29) do not apply to our testing problem generally. However, when d = 1, the limiting distribution of the EM (K) for testing homogeneity in the mean function of a non-linear regression is.5χ 2 +.5χ 2 1, which is the same as the one for detecting homogeneity in mixture of densities (Li et al., 29). 3 Application of Mixture Test In real applications of the regression model in (1), the primary goal is to accurately identify the regression components and estimate the model parameters. Existing research in this field commonly applies a local maxima identification method to the multiple component data (Serban, 27, 21 and the references therein) followed by manual intervention for assessing the regression components. In our notation, a local maximum is at location (x i1,..., x id ) if Z i1,...,i d is larger than its immediate intensity neighbors Z i1 ±1,...,i d,..., Z i1,...,i d ±1. A visual 14

15 equivalent in one- and two-dimensional data of a local maximum is a peak/spike. The set of local maxima are initial candidates of the regression components in model (1). A local maximum may correspond to zero (false positive), one or more regression components. We propose using the mixture test introduced in this paper to assess whether a local maximum corresponds to one or more regression components. The general procedure for the application of the mixture test to the more general model (1) takes three steps: Step 1. We first apply a local maxima identification method. We reduce the complete intensity data to smaller regions, each region corresponding to the surrounding intensity values of one local maximum. This data segmentation step involves an intermediary step at which we apply wavelet-based denoising; local maxima identification is applied to the denoised data to reduce the number of false positives (Serban, 21). At this intermediary step, we also estimate the noise variance σ 2 using the mean absolute deviation estimator suggested by Donoho (1995). Step 2. We fit a model with one regression component (L = 1) to each local maximum region and obtain estimates for τ. We obtain a common estimate for the width parameter by taking the median over all estimates of τ - this estimate of τ relies on the assumption that only a few regression components are mixed. This step also provides initial estimates of the amplitude and the location parameters. Step 3. Fixing σ 2 and τ at their estimates obtained in the previous two steps, we apply the testing procedure to each region where the regions could overlap. The size of a region in a d-dimensional design is M d where M is equal to the closest integer of the 9% quantile of the Cauchy distribution with spread min s=1,...,d {ˆτ s }. For each region, we decide whether it contains two regression components, i.e. reect the null hypothesis. Remark 3 In Step 3, the values of σ 2 and τ are fixed at the same values for all local maxima. 15

16 Therefore, the asymptotic results for fixing σ 2 are more relevant for this application of the testing procedure. Remark 4 It is important to note that a number of the discoveries or local maxima are false positives, which arise due to the additive noise, and therefore, the mixture test will be applied to true as well as to false positives. In our experiments (simulation and real data), when the mixture test is applied to data consisting of noise only (false positives or A = ), we consistently have convergence problems in Step 1.1 of the EM-test statistic computation. We can use this observation as an indication of the presence of a false positive which can be used in screening out the false positives. 4 Simulation Study In this section, we discuss a simulation study for evaluating the reliability (type I error) and efficiency (power) of the peak mixture test. The significance level is 1 α =.95. We compare the testing procedure introduced in this paper for varying values of the penalization constant C and with an additional simpler testing procedure, called overdispersion test. Overdispersion test. The over-dispersion test is often used to test one-component versus more than one-component or mixtures of densities (Neyman and Scott, 1966; Lindsay, 1995). Its extension under our specific testing problem is as follows. Let S(A, ω) denote the least squares sum: S(A, ω) = i 1,i 2,...,i d ( d ) A/ s=1 Z τ s i1,i 2,...,i d d s=1 {(x is ω s1 ) 2 τs 2 + 1} 2. Under the null hypothesis, S(A, ω) only contains the information about random variation, whereas under the alternative hypothesis, S(A, ω) contains two types of information: random 16

17 variation and the variation due to the misspecification of the mean function. Assuming σ known, we can use S(A, ω) to construct a test statistic as follows: T = S(Â, ˆω )/σ 2 (N 1 d) 2(N 1 d). Under null hypothesis, the test statistic is asymptotically normal as N. Specifically, if the data is from one-component model, then T converges to N(, 1). If the data is from a model consisting of more than one component, then T tends to be large. Therefore we reect the null hypothesis of one regression component if T is greater than the upper quantile of N(, 1). Simulation Setting. We simulate data following the general model in equation (1). In this model, the parameters of the lth component are the location parameter ω l = (ω 1l,..., ω dl ) τ, width parameter τ l = (τ 1l,..., τ dl ) τ, and amplitude parameter A l. We simulate data in two (d = 2) and three (d = 3) dimensions. In our simulation study, we assume that the mixture regression function s is the Lorentzian function. Therefore, the simulation model is Z i1,...,i d = L l=1 ( d ) A l / s=1 τ sl d ( s=1 (xs ω sl ) 2 τ 2 sl + 1 ) + σϵ. (7) The simulation parameters are as follows. The amplitudes A l, l = 1,..., L vary in the interval of values [1, 1] and the noise standard error is set to σ = 2, 5, 1 or 15. The width parameters are τ 1 =.4 and τ 2 =.6 for the two-dimensional simulation while adding τ 3 =.8 for three-dimensional simulated data. In this example, the number of Lorentzian components is L = 5 on a grid of points for d = 2 and for d = 3 (i.e. for the two-dimensional study, x i1 {1/512,..., 1} and x i2 {1/256,..., 1}). We simulate the error term from ϵ i1,...,i d N(, 1) independently. 17

18 Reliability. To evaluate the reliability of the EM-test introduced in this paper, we simulate from the model in (1) a dataset with L = 5 well separated regression components; that is, the distance between any two regression components i and is ω i ω > 1 for i. We first estimate the width and variance parameters from all 5 regression components following the first two steps mentioned in Section 3. We apply the EM-test to the region of each regression component as in step 3 in Section 3 and compare the EM-test values with the 95% approximate quantile of the null distribution (see Section 2.2 for the procedure used to compute the null quantiles with S = 1). The type I error is computed as the proportion of reections among the 5 regression components. The penalty function used in computing the EM-test statistic is p(α) = C log(1 1 2α ) with varying penalizing constants: C = Nσ 2 log(n) (optimal), C = Nσ 2 (medium) and C = σ 2 log(n) (small). The initial values for the mixing proportion are α {.1,.2,.3,.4,.5}. Figure 3 shows the type I error for d = 2 and d = 3 simulations. For both d = 2 and d = 3, the type I error is between.5-.1 for medium values of σ 2 at C = Nσ 2 log(n). As the penalizing constant decreases, the test becomes significantly unreliable. Efficiency. To evaluate the accuracy of the EM-test introduced in this paper, we simulate 25 pairs of regression components - a total of L = 5 components. The distance between two regression components in a pair is 2 < ω i ω < 4. That is, we generate only from the alternative hypothesis with varying mixing proportions and varying distances between the regression components. We estimate the variance parameter from all 5 regression components but use the true width values. We apply the EM-test to the region of each pair of regression components and compare the EM-test values with the 95% approximate quantile of the null distribution (see Section 2.2 for the procedure used to compute the null quantiles with S = 1). The power is computed as the proportion of reections among 18

19 the 25 pairs of regression components. Figure 3 shows the power for d = 2 and d = 3 simulations. The power is significantly higher for two-dimensional simulated data - there is a slight difference across varying penalizing constant with a lower power for the optimal C = Nσ 2 log(n). In contrast, for three-dimensional simulated data, the difference in the test power for varying penalizing constants is much larger. Discussion. By varying the constant C, there is a significant reduction in the type I error. The difference in type I error can be as high as.35 for two-dimensional data and.25 for three-dimensional data. The difference is less substantial for three dimensional data, because as dimensionality increases, there is more information (larger sample size) to be used in deciding whether there is one component or more. For example, in our implementation, the size of a region surrounding a local maximum is N = 17 2 = 225 for two dimensional data and N = 17 3 = These results validate our asymptotic theory. On the other hand, the power is not as sensitive to different values of C as the type I error. The largest difference for three-dimensional data is about.1. The power also decreases as the signal-to-noise ratio decreases. An explanation for this is that at large noise level, small amplitude components will be non-detectable from the noise level. When comparing with the simpler approach, the overdispersion test, the power is higher than for the optimal C for both two and three-dimensional simulated data. However, the type I error is significantly higher for three-dimensional data when comparing across all C values, and only slightly higher than the optimal C for two-dimensional data. The poor performance for the three-dimensional design is because of the poor separation of the distribution of S(Â, ˆω ) under the null and alternative hypotheses (see Figure??). We therefore conclude that although we may gain some power by using the overdispersion test, this gain is offset by the loss in the test reliability especially for higher dimensional data. This is critical for our 19

20 application since a large number of false positives in higher dimensional NMR data could lead to significant distortion of the predicted protein structure. (a) 2D Simulation (b) 3D Simulation Type I Error Optimal C Medium C Small C Over Dispersion Type I Error Optimal C Medium C Small C Over Dispersion Standard Deviation Standard Deviation Figure 2: Type I error calculated over 5 regression components generated under the null hypothesis. Reliability vs. Efficiency. In conclusion, as the dimensionality and the error variance increase, the test efficiency decreases also. We can enhance the efficiency by decreasing the penalizing constant but this in turn reduces the reliability of the test. This simulation study clearly illustrates how the penalizing constant controls the trade-off between reliability and efficiency. From this simulation as well as from other simulations not reported here we recommend using the constant C = Nσ 2 log(n). We have performed other simulation studies with a larger number of initial values for the mixing proportion parameter as well as with a different penalty function, p(α) = C log(4α(1 α)). From this extensive simulation study, we therefore conclude that the most important 2

21 (a) 2D Simulation (b) 3D Simulation Power Optimal C Medium C Small C Over Dispersion Power Optimal C Medium C Small C Over Dispersion Standard Deviation Standard Deviation Figure 3: Test power calculated over 5 regression components generated under the alternative hypothesis. tuning parameter in the application of the EM-test remains the penalizing constant C. 5 Case Study 5.1 Motivation and Background In NMR data analysis for biomolecular studies, one primary obective is to estimate parameters (e.g. chemical shifts) of the atomic nuclei of a protein. Under protein magnetization, targeted atomic nuclei in the protein undergo energy transfers; each energy transfer induces a signal which is mathematically described by a decaying sinusoid (see Supplemental material). Therefore, the NMR signal generated by a d-dimensional NMR experiment is a sum 21

22 (a) σ = 2 (b) σ = 15 Density e+ 2e 6 4e 6 6e 6 8e 6 Density e+ 1e 4 2e 4 3e 4 4e 4 5e 4 e+ 1e+5 2e+5 3e Figure 4: The density estimate of the distribution of S(Â, ˆω ) under the null (solid line) and alternative (dotted line) hypotheses. of noisy decaying sinusoids S(t 1, t 2,..., t d ) = ( L d ) A l e iϕ l e t s/τ sl e it sω sl + ϵ t1,...,t d (8) l=1 s=1 where each sinusoid is generated by an energy transfer between d atomic nuclei in d- dimensional NMR experiments (Hoch and Stern, 1996). The model parameters of interest are the resonance frequencies ω l = (ω 1l,..., ω dl ) τ (translated into chemical shifts), and the signal amplitudes A l (translated into structural distance of the atomic nuclei in specific NMR experiments). Also L is the number of observed energy transfers, which is large and unknown. The protein structure is resolved by accurately estimating the resonance frequencies and the signal amplitudes from data generated by NMR experiments. 22

23 The traditional methodology in biomolecular NMR data analysis involves Fourier transformation (FT) of the NMR signal data complemented by other pre-processing steps (Hoch and Stern, 1996). After Fourier Transform, the resulting model is a d-dimensional mixture regression model as described by the model in equation (1) where the shape function is an approximate Lorentzian function (Serban, 25). In this model, the parameters of the lth regression component are the location parameter ω l = (ω 1l,..., ω dl ) τ, which are the signal frequencies, width parameter τ l = (τ 1l,..., τ dl ) τ, and amplitude parameters A l. Because of the one-to-one mapping between energy transfers and regression components, the problem of identifying the parameters of the atomic nuclei undergoing energy transfers translates into accurately identifying and estimating the model parameters. Importantly, in multi-dimensional NMR frequency data, since a large number of regression components have similar frequencies, we expect that some components partially overlap or mix. It is important to de-mix the components since each regression component provides specific information about the structure of the protein. In certain cases, the lack of a small number of essential components can lead to a significant deviation of the predicted structure (Güntert, 23). Many of the existing software packages for biomolecular NMR data analysis incorporate routines for component identification but most of them do not incorporate a routine for detecting mixtures (Güntert, 23; Gronwald and Kalbitzer, 24). The common practice for detecting mixed or overlapped components is by visualizing the contours of all local maxima and manually selecting the ones that display significant mixing. However, manual intervention is tedious and time consuming since generally L is large. The testing procedure in this paper is a knowledge-based means for detecting a small number of candidate overlapped components that can be investigated visually. This procedure will reduce the time spent to 23

24 manually select mixed components by at least ten folds. Remark 5 In Section 1, we mention that the assumption of equal widths is common in NMR biomolecular studies. The width parameters in the frequency domain map to the decaying times τ l, l = 1,..., L in the time domain. Since in many biomolecular NMR studies, we detect the exchange of energy between the same pair of nuclei (e.g. 1 5N-proton), the decaying times are similar, and therefore the width parameters are similar. Remark 6 Another assumption for the mixture regression test is that the location parameters ω l, l = 1,..., L are bounded (condition C3). For NMR biomolecular data, this assumption holds since the NMR signals are filtered such that the frequencies are within a specific bandwidth. 5.2 Application Data: 2D and 3D NMR Experiments Data Specifications. The experimental data used to illustrate the applicability of the mixture regression test are for a doubly-labeled sample of a 13 residue RNA binding proteinrho13 - using standard double (HSQC) and triple resonance (HNCOCA and HNCA) experiments on a 1 mm protein sample at a proton frequency of 6 MHz as introduced in Briercheck et al. (1998). The data were processed with FELIX (Accelrys Software Inc.) using apodization and linear prediction methods that are typical for these types of experiments. The HSQC spectrum is a 2D NMR data in which the regression components show correlations between the amide nitrogen and the attached amide proton in the protein sequence. The intensities Z i1,i 2 are observed over a two-dimensional grid of points with M 1 = 512 and M 2 = 256. The HNCOCA is a three-dimensional experiment generating 3D NMR data in which each component arises due to correlations between the amide nitrogen and amide proton of a specific residue and the alpha carbon of the preceding residue in the protein 24

25 Dataset No. of Local Maxima No. of Reections (EM-test) HSQC HNCA 235 HNCOCA 118 Table 1: Number of local maxima and the number of mixed regression components for the three application datasets. sequence. HNCA is also a three-dimensional experiment in which components are paired with similar amide nitrogen and amide proton frequencies. In HNCA, a pair of components arises due to correlations between the amide nitrogen, amide proton and the alpha carbon nuclei of the preceding residue and of the intra-residue. Therefore, for this experiment, the true number of regression components will be about twice the number of protein residues and half of the components will match the components in HNCOCA. For both HNCOCA and HNCA NMR datasets, the intensity values Z i1,i 2,i 3 are observed over a grid of points of size Mixture Test Application. Using the local maxima identification method briefly described in Section 3 and introduced in Serban (27, 21), we identified 146 local maxima for the HSQC method, 235 for the HNCA method and 118 for the HNCOCA method. We applied the EM-test to the two-dimensional and three-dimensional data generated by these three NMR experiments for the RNA binding protein- rho13. For computational simplicity and effectiveness, the mixture test was applied only to data in the neighborhood of a local maximum as specified in Section 3 (step 3). The width parameters τ s, s = 1,..., d were estimated by estimating the widths of all local maxima and then taking the median over all estimates. The number of reections (mixed regression components) along with the number of local maxima identified using the method in Serban (27, 21) are provided in Table 1. In this study because the error variance is unknown, the estimated variance is the median 25

26 absolute value of the high resolution wavelet coefficients after wavelet transformation of each of the three datasets (Serban, 21). The regression component widths are estimated by first interpolating a Lorentzian function at each local maximum; the common width estimate is then the median over the widths estimated from this interpolation step. The contour plots for the nine reected regression components for the HSQC data are in Figure 4. A visual inspection of the contour plots further suggests which of the nine reections are true negatives. The contour plots 1-3 and 9 are asymmetric and larger in size - a pattern that reveals a mixture of two components overlapped into one local maximum. The five contour plots 4-8 are symmetric, all with similar widths in both directions. In the supplemental material we complement the contour plots with perspective plots which show the shape and the width of the nine reections. We complemented the analysis of two-dimensional data with manual intervention - visually assessment of each of the 146 local maxima. Through this experiment we didn t identify more than the four mixtures discussed above which shows that the application of the mixture test does minimizes the effort of detection of overlapped or mixed components by reducing the number of screened local maxima from 146 to only 9. For the two three-dimensional datasets, we do not reect the null hypothesis for any local maxima. The reason for not detecting any mixture of regression components is that the protein in this study has a small number of residues leading to a small number of components (L 26 for HNCA and L 13 for HNCOCA) which are spread over a three dimensional data leading to higher resolution. Generally, one advantage of higher dimensional NMR data is an increase in resolution measured by the distance between components, which will result in a smaller number of mixed components than in lower dimensional data. On the other hand, the drawback of higher dimensional data is lower signal-to-noise ratio - a smaller number 26

27 ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 ) ( 8 ) ( 9 ) Figure 5: The contour plots for the nine reections from the application of the EM-test to the HSQC data. of identifiable regression components from the noisy background - in our application, we underestimate the number of components for the two three-dimensional datasets. Therefore, complementing the NMR data analysis with a mixture test will increase the ability to identify regression components with lower dimensional NMR experiments which have not only higher signal-to-noise ratio but they are less time-consuming and less expensive. 27

28 Appendix A: Existence of a Solution to g(α) An updated value of α given that A = A (k), ω 1 = ω (k),1 and ω 2 = ω (k),2 are fixed, is derived from maximizing penalized log-likelihood function with respect to α. This is equivalent to minimizing g(α) in equation (6) in the main manuscript with C 1 = 1 2σ 2 C 2 = 1 2σ 2 ( A (k) d / s=1 τ s ) ( A (k) d / s=1 τ s } d s=1 {(x is ω (k),s1 )2 τs d s=1 {(x is ω (k) ( A (k) d ) / s=1 Z τ s i1,...,i d } d s=1 {(x is ω (k),s2 )2 τs ) ( A (k) d / s=1 τ s d s=1 {(x is ω (k),s1 )2 τ 2 s + 1 } ),s2 )2 τ 2 ( A (k) d / s=1 τ s } s + 1 ) d s=1 {(x is ω (k),s2 )2 τ 2 s }. Penalty p(α) = C log(4α(1 α)). To obtain an updated value of α by minimizing g(α), we take the first order derivative of g(α) and equate to zero. The equation derived from equating the first order derivative when using the penalty p(α) = C log(4α(1 α)) is g (α) = 2C 1 α 2C 2 C(1 2α) α(1 α). Setting g (α) =, there exists a real solution in the interval (, 1) for C > since lim α g (α) = and lim α 1 g (α) = +. Moreover, the second order derivative is ( ) 2 (1 2α)2 2C 1 + C + α(1 α) α 2 (1 α) 2 which is positive for all α [, 1] since C 1. This implies that g (α) has a unique solution in [, 1] and this solution is the minimal point of g(α). 28

29 Penalty p(α) = C log(1 1 2α ). Let α = C 2 /C 1, which is the minimal point of C 1 α 2 2C 2 α. If α <.5, then the minimal point of g(α) should be in (,.5]. Note that when α (,.5], g(α) = C 1 α 2 2C 2 α C log(2α) and g (α) = 2C 1 α 2C 2 C/α. Setting g (α) =, we get a unique positive solution ˆα = 2C 2 + 4C C 1 C 4C 1. The second derivative over (,.5] is always positive. Therefore, for the penalty p(α) = C log(1 1 2α ), the updated value of α is α (k+1) = min {.5, 2C 2 + 4C C 1 C 4C 1 }. Similarly, when α.5, the updated value of α is α (k+1) = max {.5, 2C 1 + 2C 2 4(C 1 C 2 ) 2 + 8C 1 C 4C 1 }. In summary, α (k+1) = max { min.5, 2C 2+ {.5, 2C 1+2C 2 } 4C2 2+8C 1C 4C 1 } 4(C 1 C 2 ) 2 +8C 1 C 4C 1, if C 2 /C 1 <.5, if C 2 /C 1.5. Appendix B: Proofs of Theorems 1 and 2 Without loss of generality, we assume that ω s =, τ s = 1 and σ = 1. Let A = Z i1,...,i d d N(, 1). s=1 (x2 is + 1) 29

30 In the following, we use to denote M 1,...,M d i 1 =1,...,i d =1. A brief roadmap for the proofs is as follows. Lemma 1 shows that any estimator with α bounded away from or 1, and with a large likelihood value, is consistent for A, ω 1 and ω 2 under the null hypothesis. Lemma 2 strengthens Lemma 1 by providing specific convergence rates. Theorems 1 and 2 then follow directly from these two lemmas. Lemma 1 Assume the same conditions in Theorem 1. Let (ᾱ, Ā, ω 1, ω 2 ) be estimators of (α, A, ω 1, ω 2 ) such that δ ᾱ.5 for some δ (,.5]. Assume that l(ᾱ, Ā, ω 1, ω 2 ) l(.5, A, ω, ω ) c >. Then, under the null hypothesis, Ā A = o p (1), ω 1 = o p (1) and ω 2 = o p (1). Proof. Under the assumptions α [δ,.5], ω s1, ω s2 [ W, W ], only when A = A, ω 1 = ω 2 =, the mixture regression model reduces to the null model. That is, the parameters A, ω 1 and ω 2 are identifiable. Then Wald (1949) s idea can be applied to show that Ā A = o p (1), ω 1 = o p (1) and ω 2 = o p (1). For the convenience of presentation, let = Z i1,...,i d A d s=1 (x2 is + 1), which has N(, 1) distribution under the null hypothesis, and h i1,...,i d (A, ω) = A d s=1 {(x is ω s ) 2 + 1} A d s=1 (x2 is + 1). 3

31 In the following, we use to denote M 1,...,M d i 1 =1,...,i d =1. Lemma 2 Assume the same conditions in Theorem 1. Let (ᾱ, Ā, ω 1, ω 2 ) be estimators of (α, A, ω 1, ω 2 ) such that under the null hypothesis, Ā A = o p (1), ω 1 = o p (1), ω 2 = o p (1). Assume that pl(ᾱ, Ā, ω 1, ω 2 ) pl(.5, A, ω, ω ) c >. Then, under the null hypothesis p(ᾱ) = O p (1), Ā A = O p (N 1/2 ), ω 1 = O p (N 1/4 ), ω 2 = O p (N 1/4 ). Proof. Let R 1 (ᾱ, Ā, ω 1, ω 2 ) = 2{l(ᾱ, Ā, ω 1, ω 2 ) l(.5, A, ω, ω )}. Note that the penalty function is non-negative. It follows that 2c 2{pl(ᾱ, Ā, ω 1, ω 2 ) pl(.5, A, ω, ω )} R 1 (ᾱ, Ā, ω 1, ω 2 ). With the notation h i1,...,i d (, ), we can write R 1 (ᾱ, Ā, ω 1, ω 2 ) into the following form: R 1 (ᾱ, Ā, ω 1, ω 2 ) = 2 {ᾱh i1,...,i d (Ā, ω 1) + (1 ᾱ)h i1,...,i d (Ā, ω 2)} {ᾱh i1,...,i d (Ā, ω 1) + (1 ᾱ)h i1,...,i d (Ā, ω 2)} 2. Applying 2nd-order Taylor expansion to 1/ d s=1 {(x is ω s1 ) 2 + 1}, we get h i1,...,i d (Ā, ω 1) = (Ā A )U i1,...,i d + d Ā ω s1 V i1,...,i d ;s + s=1 + s<t Ā ω s1 ω t1 T i1,...,i d ;st + e i1,...,i d (Ā, ω 1), d Ā ω s1w 2 i1,...,i d ;s s=1 where e i1,...,i d (Ā, ω 1) is the reminder. Similar approximation can also obtained for h i1,...,i d (Ā, ω 2). 31

Noise Reduction for Enhanced Component Identification in Multi-Dimensional Biomolecular NMR Studies

Noise Reduction for Enhanced Component Identification in Multi-Dimensional Biomolecular NMR Studies Noise Reduction for Enhanced Component Identification in Multi-Dimensional Biomolecular NMR Studies Nicoleta Serban 1 The objective of the research presented in this paper is to shed light into the benefits

More information

Theoretical Limits of Component Identification in a Separable Nonlinear Least Squares Problem

Theoretical Limits of Component Identification in a Separable Nonlinear Least Squares Problem To appear in the Journal of Nonparametric Statistics Vol. 00, No. 00, Month 0XX, 1 5 Theoretical Limits of Component Identification in a Separable Nonlinear Least Squares Problem R. Hilton a and N. Serban

More information

A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS

A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS Hanfeng Chen, Jiahua Chen and John D. Kalbfleisch Bowling Green State University and University of Waterloo Abstract Testing for

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Testing for Homogeneity in Genetic Linkage Analysis

Testing for Homogeneity in Genetic Linkage Analysis Testing for Homogeneity in Genetic Linkage Analysis Yuejiao Fu, 1, Jiahua Chen 2 and John D. Kalbfleisch 3 1 Department of Mathematics and Statistics, York University Toronto, ON, M3J 1P3, Canada 2 Department

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Biochemistry 530 NMR Theory and Practice. Gabriele Varani Department of Biochemistry and Department of Chemistry University of Washington

Biochemistry 530 NMR Theory and Practice. Gabriele Varani Department of Biochemistry and Department of Chemistry University of Washington Biochemistry 530 NMR Theory and Practice Gabriele Varani Department of Biochemistry and Department of Chemistry University of Washington 1D spectra contain structural information.. but is hard to extract:

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM

MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM Statistica Sinica 19 (009), 1603-1619 MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM Yuejiao Fu, Jiahua Chen and John D. Kalbfleisch York University, University of British Columbia

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

LineShapeKin NMR Line Shape Analysis Software for Studies of Protein-Ligand Interaction Kinetics

LineShapeKin NMR Line Shape Analysis Software for Studies of Protein-Ligand Interaction Kinetics LineShapeKin NMR Line Shape Analysis Software for Studies of Protein-Ligand Interaction Kinetics http://lineshapekin.net Spectral intensity Evgenii L. Kovrigin Department of Biochemistry, Medical College

More information

NMR Resonance Assignment Assisted by Mass Spectrometry

NMR Resonance Assignment Assisted by Mass Spectrometry NMR Resonance Assignment Assisted by Mass Spectrometry This lecture talked about a NMR resonance assignment assisted by mass spectrometry [1, 2]. 1 Motivation Nuclear magnetic resonance (NMR) provides

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

BMB/Bi/Ch 173 Winter 2018

BMB/Bi/Ch 173 Winter 2018 BMB/Bi/Ch 173 Winter 2018 Homework Set 8.1 (100 Points) Assigned 2-27-18, due 3-6-18 by 10:30 a.m. TA: Rachael Kuintzle. Office hours: SFL 220, Friday 3/2 4:00-5:00pm and SFL 229, Monday 3/5 4:00-5:30pm.

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS

TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS By Hanfeng Chen and Jiahua Chen 1 Bowling Green State University and University of Waterloo Abstract.

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Adjusted Empirical Likelihood for Long-memory Time Series Models

Adjusted Empirical Likelihood for Long-memory Time Series Models Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure) Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value

More information

OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE

OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE 17th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE Abdourrahmane M. Atto 1, Dominique Pastor, Gregoire Mercier

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Hypothesis Testing - Frequentist

Hypothesis Testing - Frequentist Frequentist Hypothesis Testing - Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Estimating a pseudounitary operator for velocity-stack inversion

Estimating a pseudounitary operator for velocity-stack inversion Stanford Exploration Project, Report 82, May 11, 2001, pages 1 77 Estimating a pseudounitary operator for velocity-stack inversion David E. Lumley 1 ABSTRACT I estimate a pseudounitary operator for enhancing

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

An Introduction to Wavelets and some Applications

An Introduction to Wavelets and some Applications An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Testing Algebraic Hypotheses

Testing Algebraic Hypotheses Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18 Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable:

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

EM Algorithms for Ordered Probit Models with Endogenous Regressors

EM Algorithms for Ordered Probit Models with Endogenous Regressors EM Algorithms for Ordered Probit Models with Endogenous Regressors Hiroyuki Kawakatsu Business School Dublin City University Dublin 9, Ireland hiroyuki.kawakatsu@dcu.ie Ann G. Largey Business School Dublin

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES

IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES Bere M. Gur Prof. Christopher Niezreci Prof. Peter Avitabile Structural Dynamics and Acoustic Systems

More information