Optimal Estimation and Completion of Matrices with Biclustering Structures

Size: px
Start display at page:

Download "Optimal Estimation and Completion of Matrices with Biclustering Structures"

Transcription

1 Journal of Machine Learning Research 17 (2016) 1-29 Submitted 12/15; Revised 5/16; Published 9/16 Otimal Estimation and Comletion of Matrices with Biclustering Structures Chao Gao Yu Lu Yale University Zongming Ma University of Pennsylvania Harrison H. Zhou Yale University Editor: Edo Airoldi Abstract Biclustering structures in data matrices were first formalized in a seminal aer by John Hartigan (Hartigan, 1972) where one seeks to cluster cases and variables simultaneously. Such structures are also revalent in block modeling of networks. In this aer, we develo a theory for the estimation and comletion of matrices with biclustering structures, where the data is a artially observed and noise contaminated matrix with a certain underlying biclustering structure. In articular, we show that a constrained least squares estimator achieves minimax rate-otimal erformance in several of the most imortant scenarios. To this end, we derive unified high robability uer bounds for all sub-gaussian data and also rovide matching minimax lower bounds in both Gaussian and binary cases. Due to the close connection of grahon to stochastic block models, an immediate consequence of our general results is a minimax rate-otimal estimator for sarse grahons. Keywords: Biclustering, grahon, matrix comletion, missing data, stochastic block models, sarse network 1. Introduction In a range of imortant data analytic scenarios, we encounter matrices with biclustering structures. For instance, in gene exression studies, one can organize the rows of a data matrix to corresond to individual cancer atients and the columns to transcrits. Then the atients are exected to form grous according to different cancer subtyes and the genes are also exected to exhibit clustering effect according to the different athways they belong to. Therefore, after aroriate reordering of the rows and the columns, the data matrix is exected to have a biclustering structure contaminated by noises (Lee et al., 2010). Here, the observed gene exression levels are real numbers. In a different context, such a biclustering structure can also be resent in network data. For examle, stochastic block model (SBM for short) (Holland et al., 1983) is a oular model for exchangeable networks. In SBMs, the grah nodes are artitioned into k disjoint communities and the robability that any air of nodes are connected is determined entirely by the community membershis of the nodes. Consequently, if one rearranges the nodes from the same communities together in c 2016 Chao Gao, Yu Lu, Zongming Ma and Harrison H. Zhou.

2 Gao, Lu, Ma and Zhou the grah adjacency matrix, then the mean adjacency matrix, where each off-diagonal entry equals the robability of an edge connecting the nodes reresented by the corresonding row and column, also has a biclustering structure. The goal of the resent aer is to develo a theory for the estimation (and comletion when there are missing entries) of matrices with biclustering structures. To this end, we roose to consider the following general model X ij = θ ij + ɛ ij, i [n 1 ], j [n 2 ], (1) where for any ositive integer m, we let [m] = {1,..., m}. Here, for each (i, j), θ ij = E[X ij ] and ɛ ij is an indeendent iece of mean zero sub-gaussian noise. Moreover, we allow entries to be missing comletely at random (Rubin, 1976). Thus, let E ij be i.i.d. Bernoulli random variables with success robability (0, 1] indicating whether the (i, j)th entry is observed, and define the set of observed entries Our final observations are Ω = {(i, j) : E ij = 1}. (2) X ij, (i, j) Ω. (3) To model the biclustering structure, we focus on the case where there are k 1 row clusters and k 2 column clusters, and the values of {θ ij } are taken as constant if the rows and the columns belong to the same clusters. The goal is then to recover the signal matrix θ R n 1 n 2 from the observations (3). To accomodate most interesting cases, esecially the case of undirected networks, we shall also consider the case where the data matrix X is symmetric with zero diagonals. In such cases, we also require X ij = X ji and E ij = E ji for all i j. Main contributions In this aer, we roose a unified estimation rocedure for artially observed data matrix generated from model (1) (3). We establish high robability uer bounds for the mean squared errors of the resulting estimators. In addition, we show that these uer bounds are minimax rate-otimal in both the continuous case and the binary case by roviding matching minimax lower bounds. Furthermore, SBM can be viewed as a secial case of the symmetric version of (1). Thus, an immediate alication of our results is the network comletion roblem for SBMs. With artially observed network edges, our method gives a rate-otimal estimator for the robability matrix of the whole network in both the dense and the sarse regimes, which further leads to rate-otimal grahon estimation in both regimes. Connection to the literature If only a low rank constraint is imosed on the mean matrix θ, then (1) (3) becomes what is known in the literature as the matrix comletion roblem (Recht et al., 2010). An imressive list of algorithms and theories have been develoed for this roblem, including but not limited to Candès and Recht (2009); Keshavan et al. (2009); Candès and Tao (2010); Candes and Plan (2010); Cai et al. (2010); Keshavan et al. (2010); Recht (2011); Koltchinskii et al. (2011). In this aer, we investigate an alternative biclustering structural assumtion for the matrix comletion roblem, which was first roosed by John Hartigan (Hartigan, 1972). Note that a biclustering structure 2

3 Matrix Comletion with Biclustering Structures automatically imlies low-rankness. However, if one alies a low rank matrix comletion algorithm directly in the current setting, the resulting estimator suffers an inferior error bound to the minimax rate-otimal one. Thus, a full exloitation of the biclustering structure is necessary, which is the focus of the current aer. The results of our aer also imly rate-otimal estimation for sarse grahons. Previous results on grahon estimation include Airoldi et al. (2013); Wolfe and Olhede (2013); Olhede and Wolfe (2014); Borgs et al. (2015); Choi (2015) and the references therein. The minimax rates for dense grahon estimation were derived by Gao et al. (2015a). During the time when this aer was written, we became aware of an indeendent result on otimal sarse grahon estimation by Klo et al. (2015). There are also an interesting line of works on biclustering (Flynn and Perry, 2012; Rohe et al., 2012; Choi and Wolfe, 2014). While these aers aim to recover the clustering structures of rows and columns, the goal of the current aer is to estimate the underlying mean matrix with otimal rates. Organization After a brief introduction to notation, the rest of the aer is organized as follows. In Section 2, we introduce the recise formulation of the roblem and roose a constrained least squares estimator for the mean matrix θ. In Section 3, we show that the roosed estimator leads to minimax otimal erformance for both Gaussian and binary data. Section 4 resents some extensions of our results to sarse grahon estimation and adatation. Imlementation and simulation results are given in Section 5. In Section 6, we discuss the key oints of the aer and roose some oen roblems for future research. The roofs of the main results are laid out in Section 7, with some auxiliary results deferred to the aendix. Notation For a vector z [k] n, define the set z 1 (a) = {i [n] : z(i) = a} for a [k]. For a set S, S denotes its cardinality and 1 S denotes the indicator function. For a matrix A = (A ij ) R n 1 n 2, the l 2 norm and l norm are defined by A = ij A2 ij and A = max ij A ij, resectively. The inner roduct for two matrices A and B is A, B = ij A ijb ij. Given a subset Ω [n 1 ] [n 2 ], we use the notation A, B Ω = (i,j) Ω A ijb ij and A Ω = (i,j) Ω A2 ij. Given two numbers a, b R, we use a b = max(a, b) and a b = min(a, b). The floor function a is the largest integer no greater than a, and the ceiling function a is the smallest integer no less than a. For two ositive sequences {a n }, {b n }, a n b n means a n Cb n for some constant C > 0 indeendent of n, and a n b n means a n b n and b n a n. The symbols P and E denote generic robability and exectation oerators whose distribution is determined from the context. 2. Constrained least squares estimation Recall the generative model defined in (1) and also the definition of the set Ω in (2) of the observed entries. As we have mentioned, throughout the aer, we assume that the ɛ ij s are indeendent sub-gaussian noises with sub-gaussianity arameter uniformly bounded from above by σ > 0. More recisely, we assume Ee λɛ ij e λ2 σ 2 /2, for all i [n 1 ], j [n 2 ] and λ R. (4) 3

4 Gao, Lu, Ma and Zhou We consider two tyes of biclustering structures. One is rectangular and asymmetric, where we assume that the mean matrix belongs to the following arameter sace { Θ k1 k 2 (M) = θ = (θ ij ) R n 1 n 2 : θ ij = Q z1 (i)z 2 (j), z 1 [k 1 ] n 1, z 2 [k 2 ] n 2, (5) Q [ M, M] k 1 k 2 }. In other words, the mean values within each bicluster is homogenous, i.e., θ ij = Q ab if the ith row belongs to the ath row cluster and the jth column belong to the bth column cluster. The other tye of structures we consider is the square and symmetric case. In this case, we imose symmetry requirement on the data generating rocess, i.e., n 1 = n 2 = n and X ij = X ji, E ij = E ji, for all i j. (6) Since the case is mainly motivated by undirected network data where there is no edge linking any node to itself, we also assume X ii = 0 for all i [n]. Finally, the mean matrix is assumed to belong to the following arameter sace Θ s k (M) = {θ = (θ ij ) R n n : θ ii = 0, θ ij = θ ji = Q z(i)z(j) for i > j, z [k] n, Q = Q T [ M, M] k k}. (7) We roceed by assuming that we know the arameter sace Θ which can be either Θ k1 k 2 (M) or Θ s k (M) and the rate of an indeendent entry being observed. The issues of adatation to unknown numbers of clusters and unknown observation rate are addressed later in Section 4.1 and Section 4.2. Given Θ and, we roose to estimate θ by the following rogram { min θ 2 2 } θ Θ X, θ Ω. (8) If we define Y ij = X ij E ij /, (9) then (8) is equivalent to the following constrained least squares roblem min Y θ Θ θ 2, (10) and hence the name of our estimator. When the data is binary, Θ = Θ s k (1) and = 1, the roblem secializes to estimating the mean adjacency matrix in stochastic block models, and the estimator defined as the solution to (10) reduces to the least squares estimator in Gao et al. (2015a). 3. Main results In this section, we rovide theoretical justifications of the constrained least squares estimator defined as the solution to (10). Our first result is the following universal high robability uer bounds. 4

5 Matrix Comletion with Biclustering Structures Theorem 1. For any global otimizer of (10) and any constant C > 0, there exists a constant C > 0 only deending on C such that ˆθ θ 2 C M 2 σ 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ), with robability at least 1 ex ( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )) uniformly over θ Θ k1 k 2 (M) and all error distributions satisfying (4). For the symmetric arameter sace Θ s k (M), the bound is simlified to ˆθ θ 2 C M 2 σ 2 ( k 2 + n log k ), with robability at least 1 ex ( C ( k 2 + n log k )) uniformly over θ Θ s k (M) and all error distributions satisfying (4). When (M 2 σ 2 ) is bounded, the rate in Theorem 1 is (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) / which can be decomosed into two arts. The art involving k 1 k 2 reflects the number of arameters in the biclustering structure, while the art involving (n 1 log k 1 + n 2 log k 2 ) results from the comlexity of estimating the clustering structures of rows and columns. It is the rice one needs to ay for not knowing the clustering information. In contrast, the minimax rate for matrix comletion under low rank assumtion would be (n 1 n 2 )(k 1 k 2 )/ (Koltchinskii et al., 2011; Ma and Wu, 2015), since without any other constraint the biclustering assumtion imlies that the rank of the mean matrix θ is at most k 1 k 2. Therefore, we have (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) / (n 1 n 2 )(k 1 k 2 )/ as long as both n 1 n 2 and k 1 k 2 tend to infinity. Thus, by fully exloiting the biclustering structure, we obtain a better convergence rate than only using the low rank assumtion. In the rest of this section, we discuss two most reresentative cases, namely the Gaussian case and the symmetric Bernoulli case. The latter case is also known in the literature as stochastic block models. The Gaussian case the following result. Secializing Theorem 1 to Gaussian random variables, we obtain Corollary 2. Assume ɛ ij iid N(0, σ 2 ) and M C 1 σ for some constant C 1 > 0. For any constant C > 0, there exists some constant C only deending on C 1 and C such that ˆθ θ 2 C σ2 (k 1k 2 + n 1 log k 1 + n 2 log k 2 ), with robability at least 1 ex ( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )) uniformly over θ Θ k1 k 2 (M). For the symmetric arameter sace Θ s k (M), the bound is simlified to ˆθ θ 2 C σ2 ( k 2 + n log k ), with robability at least 1 ex ( C ( k 2 + n log k )) uniformly over θ Θ s k (M). We now resent a rate matching lower bound in the Gaussian model to show that the result of Corollary 2 is minimax otimal. To this end, we use P (θ,σ 2,) to indicate the ind robability distribution of the model X ij N(θ ij, σ 2 ) with observation rate. 5

6 Gao, Lu, Ma and Zhou Theorem 3. Assume σ2 0, such that inf ˆθ su P (θ,σ 2,) θ Θ k1 k 2 (M) when log k 1 log k 2, and inf ˆθ ( k1 k 2 n 1 n 2 + log k 1 n 2 + log k 2 n 1 ) M 2. There exist some constants C, c > su θ Θ s k (M) P (θ,σ 2,) ) ( ˆθ θ 2 > C σ2 (k 1k 2 + n 1 log k 1 + n 2 log k 2 ) > c, ( ˆθ θ 2 > C σ2 ( k 2 + n log k )) > c. The symmetric Bernoulli case When the observed matrix is symmetric with zero diagonal and Bernoulli random variables as its suer-diagonal entries, it can be viewed as the adjacency matrix of an undirected network and the roblem of estimating its mean matrix with missing data can be viewed as a network comletion roblem. Given a artially observed Bernoulli adjacency matrix {X ij } (i,j) Ω, one can redict the unobserved edges by estimating the whole mean matrix θ. Also, we assume that each edge is observed indeendently with robability. Given a symmetric adjacency matrix X = X T {0, 1} n n with zero diagonals, the stochastic block model (Holland et al., 1983) assumes {X ij } i>j are indeendent Bernoulli random variables with mean θ ij = Q z(i)z(j) [0, 1] with some matrix Q [0, 1] k k and some label vector z [k] n. In other words, the robability that there is an edge between the ith and the jth nodes only deends on their community labels z(i) and z(j). The following class then includes all ossible mean matrices of stochastic block models with n nodes and k clusters and with edge robabilities uniformly bounded by ρ: Θ + k (ρ) = {θ [0, 1] n n : θ ii = 0, θ ij = θ ji = Q z(i)z(j), Q = Q T [0, ρ] k k, z [k] n}. (11) By the definition in (7), Θ + k (ρ) Θs k (ρ). Although the tail robability of Bernoulli random variables does not satisfy the sub- Gaussian assumtion (4), a slightly modification of the roof of Theorem 1 leads to the following result. The roof of Corollary 4 will be given in Section A in the aendix. Corollary 4. Consider the otimization roblem (10) with Θ = Θ s k (ρ). For any global otimizer ˆθ and any constant C > 0, there exists a constant C > 0 only deending on C such that ˆθ θ 2 C ρ ( k 2 + n log k ), with robability at least 1 ex ( C ( k 2 + n log k )) uniformly over θ Θ s k (ρ) Θ+ k (ρ). When ρ = = 1, Corollary 4 imlies Theorem 2.1 in Gao et al. (2015a). A rate matching lower bound is given by the following theorem. We denote the robability distribution of a stochastic block model with mean matrix θ Θ + k (ρ) and observation rate by P (θ,). Theorem 5. For stochastic block models, we have ( ( ( ρ k 2 + n log k ) inf ˆθ su θ Θ + k (ρ) P (θ,) for some constants C, c > 0. ˆθ θ 2 > C ρ 2 n 2 )) > c, 6

7 Matrix Comletion with Biclustering Structures The lower bound is the minimum of two terms. When ρ k2 +n log k, the rate becomes ρ(k 2 +n log k) ρ 2 n 2 ρ(k2 +n log k). It is achieved by the constrained least squares estimator according to Corollary 4. When ρ < k2 +n log k, the rate is dominated by ρ 2 n 2. In this case, n 2 a trivial zero estimator achieves the minimax rate. In the case of = 1, a comarable result has been found indeendently by Klo et al. (2015). However, our result here is more general as it accommodates missing observations. Moreover, the general uer bounds in Theorem 1 even hold for networks with weighted edges. 4. Extensions In this section, we extends the estimation rocedure and the theory in Sections 2 and 3 toward three directions: adatation to unknown observation rate, adatation to unknown model arameters, and sarse grahon estimation. 4.1 Adatation to unknown observation rate The estimator (10) deends on the knowledge of the observation rate. When is not too small, such a knowledge is not necessary for achieving the desired rates. Define n1 n2 i=1 j=1 ˆ = E ij (12) n 1 n 2 for the asymmetric and for the symmetric case, and redefine ˆ = n 2 1 i<j n E ij 1 2 n(n 1) (13) Y ij = X ij E ij /ˆ (14) where the actual definition of ˆ is chosen between (12) and (13) deending on whether one is dealing with the asymmetric or symmetric arameter sace. Then we have the following result for the solution to (10) with Y redefined by (14). Theorem 6. For Θ = Θ k1 k 2 (M), suose for some absolute constant C 1 > 0, C 1 [log(n 1 + n 2 )] 2 k 1 k 2 + n 1 log k 1 + n 2 log k 2. Let ˆθ be the solution to (10) with Y defined as in (14). Then for any constant C > 0, there exists a constant C > 0 only deending on C and C 1 such that ˆθ θ 2 C M 2 σ 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ), with robability at least 1 (n 1 n 2 ) C uniformly over θ Θ and all error distributions satisfying (4). For Θ = Θ s k (M), the same result holds if we relace n 1 and n 2 with n and k 1 and k 2 with k in the foregoing statement. 7

8 Gao, Lu, Ma and Zhou 4.2 Adatation to unknown model arameters We now rovide an adative rocedure for estimating θ without assuming the knowledge of the model arameters k 1, k 2 and M. The rocedure can be regarded as a variation of a 2-fold cross validation (Wold, 1978). We give details on the rocedure for the asymmetric arameter saces Θ k1 k 2 (M), and that for the symmetric arameter saces Θ s k (M) can be obtained similarly. To adat to k 1, k 2 and M, we slit the data into two halves. Namely, samle i.i.d. T ij from Bernoulli( 1 2 ). Define = {(i, j) [n 1] n 2 : T ij = 1}. Define Yij = 2X ij E ij T ij / and Yij c = 2X ij E ij (1 T ij )/ for all (i, j) [n 1 ] [n 2 ]. Then, for some given (k 1, k 2, M), the least squares estimators using Y and Y c are given by ˆθ k 1 k 2 M = argmin Y θ 2, θ Θ k1 k 2 (M) ˆθ c k 1 k 2 M = argmin Y c θ 2. θ Θ k1 k 2 (M) Select the arameters by (ˆk 1, ˆk 2, ˆM) = argmin ˆθ k 1 k 2 M Y c 2 c, (k 1,k 2,M) [n 1 ] [n 2 ] M { } where M = h n 1 +n 2 : h [(n 1 + n 2 ) 6 ], and define ˆθ = ˆθ ˆk1ˆk2 ˆM. Similarly, we can also c define ˆθ by validate the arameters using Y. The final estimator is given by {ˆθ c ij, (i, j) ; ˆθ ij = ˆθ ij, (i, j) c. Theorem 7. Assume (n 1 + n 2 ) 1 M (n 1 + n 2 ) 5 (n 1 + n 2 ) 1. For any constant C > 0, there exists a constant C > 0 only deending on C such that ˆθ θ 2 C M 2 σ 2 ( k 1 k 2 + n 1 log k 1 + n 2 log k 2 + log(n ) 1 + n 2 ), with robability at least 1 ex ( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )) (n 1 n 2 ) C over θ Θ k1 k 2 (M) and all error distributions satisfying (4). uniformly Comared with Theorem 1, the rate given by Theorem 7 has an extra 1 log(n 1 + n 2 ) term. A sufficient condition for this extra term to be inconsequential is log(n 1+n 2 ) n 1 n 2. Theorem 7 is adative for all (k 1, k 2 ) [n 1 ] [n 2 ] and for (n 1 + n 2 ) 1 M (n 1 + n 2 ) 5 (n 1 + n 2 ) 1. In fact, by choosing a larger M, we can extend the adative region for M to (n 1 + n 2 ) a M (n 1 + n 2 ) b for arbitrary constants a, b > Sarse grahon estimation Consider a random grah with adjacency matrix {X ij } {0, 1} n n, whose samling rocedure is determined by (ξ 1,..., ξ n ) P ξ, X ij (ξ i, ξ j ) Bernoulli(θ ij ), where θ ij = f(ξ i, ξ j ). (15) 8

9 Matrix Comletion with Biclustering Structures For i [n], X ii = θ ii = 0. Conditioning on (ξ 1,..., ξ n ), X ij = X ji is indeendent across i > j. The function f on [0, 1] 2, which is assumed to be symmetric, is called a grahon. The concet of grahon is originated from grah limit theory (Hoover, 1979; Lovász and Szegedy, 2006; Diaconis and Janson, 2007; Lovász, 2012) and the studies of exchangeable arrays (Aldous, 1981; Kallenberg, 1989). It is the underlying nonarametric object that generates the random grah. Statistical estimation of grahon has been considered by Wolfe and Olhede (2013); Olhede and Wolfe (2014); Gao et al. (2015a,b); Lu and Zhou (2015) for dense networks. Using Corollary 4, we resent a result for sarse grahon estimation. Let us start with secifying the function class of grahons. Define the derivative oerator by j+k jk f(x, y) = ( x) j f(x, y), ( y) k and we adot the convention 00 f(x, y) = f(x, y). The Hölder norm is defined as f Hα = max su jk f(x, y) + max j+k α x,y D su j+k= α (x,y) (x,y ) D jk f(x, y) jk f(x, y ) (x x, y y ) α α, where D = {(x, y) [0, 1] 2 : x y}. Then, the sarse grahon class with Hölder smoothness α is defined by F α (ρ, L) = {0 f ρ : f Hα L ρ, f(x, y) = f(y, x) for all x D}, where L > 0 is the radius of the class, which is assumed to be a constant. As argued in Gao et al. (2015a), it is sufficient to aroximate a grahon with Hölder smoothness by a iecewise constant function. In the random grah setting, a iecewise constant function is the stochastic block model. Therefore, we can use the estimator defined by (10). Using Corollary 4, a direct bias-variance tradeoff argument leads to the following result. An indeendent finding of the same result is also made by Klo et al. (2015). Corollary 8. Consider the otimization roblem (10) where Y ij = X ij and Θ = Θ s k (M) with k = n 1 1+α 1 and M = ρ. Given any global otimizer ˆθ of (10), we estimate f by ˆf(ξ i, ξ j ) = ˆθ ij. Then, for any constant C > 0, there exists a constant C > 0 only deending on C and L such that 1 ( ) 2 ˆf(ξi n 2, ξ j ) f(ξ i, ξ j )) Cρ (n 2α log n α+1 +, n i,j [n] with robability at least 1 ex( C (n 1 α+1 + n log n)) uniformly over f Fα (ρ, L) and P ξ. Corollary 8 imlies an interesting hase transition henomenon. When α (0, 1), the rate becomes ρ(n 2α 2α α+1 + log n n ) ρn α+1, which is the tyical nonarametric rate times a sarsity index of the network. When α 1, the rate becomes ρ(n 2α α+1 + log n n ) ρ log n n, which does not deend on the smoothness α. Corollary 8 extends Theorem 2.3 of Gao et al. (2015a) to the case ρ < 1. In Wolfe and Olhede (2013), the grahon f is defined in a different way. Namely, they considered the setting where (ξ 1,..., ξ n ) are i.i.d. Unif[0, 1] random variables under P ξ. Then, the adjacency matrix is generated with 9

10 Gao, Lu, Ma and Zhou Bernoulli random variables having means θ ij = ρf(ξ i, ξ j ) for a nonarametric grahon f satisfying f(x, y)dxdy = 1. For this setting, with aroriate smoothness assumtion, we can estimate f by ˆf(ξ i, ξ j ) = ˆθ ij /ρ. The rate of convergence would be ρ 1 (n 2α α+1 + log n n ). Using the result of Theorem 7, we resent an adative version for Corollary 8. The estimator we consider is a symmetric version of the one introduced in Section 4.2. The only difference is that we choose the set M as {m/n : m [n + 1]}. The estimator is fully data driven in the sense that it does not deend on α or ρ. Corollary 9. Assume ρ n 1. Consider the adative estimator ˆθ introduced in Theorem 7, and we set ˆf(ξ i, ξ j ) = ˆθ ij. Then, for any constant C > 0, there exists a constant C > 0 only deending on C and L such that 1 n 2 i,j [n] ( ˆf(ξi, ξ j ) f(ξ i, ξ j )) 2 Cρ (n 2α α+1 + log n n with robability at least 1 n C uniformly over f F α (ρ, L) and P ξ. 5. Numerical Studies To introduce an algorithm solving (8) or (10), we write (10) in an alternative way, min L(Q, z 1, z 2 ), z 1 [k 1 ] n 1,z 2 [k 2 ] n 2,Q [l,u] k 1 k 2 where l and u are the lower and uer constraints of the arameters and L(Q, z 1, z 2 ) = (Y ij Q z1 (i)z 2 (j)) 2. (i,j) [n 1 ] [n 2 ] For biclustering, we set l = M and u = M. For SBM, we set l = 0 and u = ρ. We do not imose symmetry for SBM to gain comutational convenience without losing much statistical accuracy. The simle form of L(Q, z 1, z 2 ) indicates that we can iteratively otimize over (Q, z 1, z 2 ) with exlicit formulas. This motivates the following algorithm. The iteration stes (16), (17) and (18) of Algorithm 1 can be equivalently exressed as Q = argmin L(Q, z 1, z 2 ); Q [l,u] k 1 k 2 z 1 = argmin L(Q, z 1, z 2 ); z 1 [k 1 ] n 1 z 2 = argmin L(Q, z 1, z 2 ). z 2 [k 2 ] n 2 Thus, each iteration will reduce the value of the objective function. It is worth noting that Algorithm 1 can be viewed as a two-way extension for the ordinary k-means algorithm. Since the objective function is non-convex, one cannot guarantee convergence to global otimum. We initialize the algorithm via a sectral clustering ste using multile random starting oints, which worked well on simulated datasets we resent below. Now we resent some numerical results to demonstrate the accuracy of the error rate behavior suggested by Theorem 1 on simulated data. ), 10

11 Matrix Comletion with Biclustering Structures Algorithm 1: A Biclustering Algorithm Inut : {X ij } (i,j) Ω, the numbers of column and row clusters (k 1, k 2 ), lower and uer constraints (l, u) and the number of random starting oints m. Outut: An n 1 n 2 matrix ˆθ with ˆθ ij = Q z1 (i)z 2 (j). Prerocessing: Let X ij = 0 for (i, j) Ω, ˆ = Ω /(n 1 n 2 ) and Y = X/ˆ. 1 Initialization Ste Aly singular value decomosition and obtain Y = UDV T. Run k-means algorithm on the first k 1 columns of U with m random starting oints to get z 1. Run k-means algorithm on the first k 2 columns of V with m random starting oints to get z 2. while not converge do 2 Udate Q: for each (a, b) [k 1 ] [k 2 ], 1 Q ab = z1 1 (a) z 1 2 (b) If Q ab > u, let Q ab = u. If Q ab < l, let Q ab = l. 3 Udate z 1 : for each i [n 1 ], 4 Udate z 2 : for each j [n 2 ], z 1 (i) = argmin a [k 1 ] z 2 (j) = argmin b [k 2 ] n 2 j=1 n 1 i=1 i z 1 1 (a) j z 1 2 (b) Y ij. (16) (Q az2 (j) A ij ) 2. (17) (Q z1 (i)b A ij ) 2. (18) 11

12 Gao, Lu, Ma and Zhou Bernoulli case. Our theoretical result indicates the rate of recovery is ( ) ρ k 2 + log k n 2 n for the root mean squared error (RMSE) 1 n ˆθ θ. When k is not too large, the dominating term is ρ log k n. We are going to confirm this rate by simulation. We first generate our data from SBM with the number of blocks k {2, 4, 8, 16}. The observation rate = 0.5. For every fixed k, we use four different Q = 0.51 k 1 T k + 0.1tI k with t = 1, 2, 3, 4 and generate the community labels z uniformly on [k]. Then we calculate the error 1 n ˆθ θ. Panel (a) of Figure 1 shows the error versus the samle size n. In Panel (b), we rescale the x-axis to N = n log k. The curves for different k align well with each other and the error decreases at the rate of 1/N. This confirms our theoretical results in Theorem 1. mse k=2 k=4 k=8 k=16 mse k=2 k=4 k=8 k= samle size n n rescaled samle size N = logk (a) (b) Figure 1: Plots of 1 n ˆθ θ when using our algorithm on SBM. Each curve corresonds to a different samle size n with a fixed k. (a) Plots of error against the raw samle size n. (b) Plots of the same error against rescaled samle size n/ log k. Gaussian case. We simulate data with Gaussian noise under four different settings of k 1 and k 2. For each (k 1, k 2 ) {(4, 4), (4, 8), (8, 8), (8, 12)}, the entries of matrix Q are indeendently and uniformly generated from {1, 2, 3, 4, 5}. The cluster labels z 1 and z 2 are uniform on [k 1 ] and [k 2 ] resectively. After generating Q, z 1 and z 2, we add an N(0, 1) noise to the data and observe X ij with robability = 0.1. For each number of rows n 1, we set the number of columns as n 2 = n 1 log k 1 / log k 2. Panel (a) of Figure 2 shows the error versus n 1. In Panel (b), we rescale the x-axis by N = n1 log k 2. Again, the lots for different (k 1, k 2 ) align fairly well and the error decreases roughly at the rate of 1/N. 12

13 Matrix Comletion with Biclustering Structures mse k 1 =4, k 2 =4 k 1 =4, k 2 =8 k 1 =8, k 2 =8 k 1 =8, k 2 =12 mse k 1 =4, k 2 =4 k 1 =4, k 2 =8 k 1 =8, k 2 =8 k 1 =8, k 2 = samle size n n 1 rescaled samle size N = logk 2 (a) (b) Figure 2: Plots of error 1 n1 n 2 ˆθ θ when using our algorithm on biclustering data with gaussian noise. Each curve corresonds to a fixed (k 1, k 2 ). (a) Plots of error against n 1. (b) Plots of the same error against n 1 / log k 2. Sarse Bernoulli case. We also study recovery of sarse SBMs. We do the same simulation as the Bernoulli case excet that we choose Q = k 1 T k tI k for t = 1, 2, 3, 4. The results are shown in Figure 3. Adatation to unknown arameters. We use the 2-fold cross validation rocedure roosed in Section 4.2 to adatively choose the unknown number of clusters k and the sarsity level ρ. We use the setting of sarse SBM with the number of block k {4, 6} and Q = k 1 T k + 0.1tI k for t = 1, 2, 3, 4. When running our algorithms, we search over all the (k, ρ) air for k {2, 3,, 8} and ρ {0.2, 0.3, 0.4, 0.5}. In Table 1, we reort the errors for different configurations of Q. The first row is the error obtained by our adative rocedure and the second row is the error using the true k and ρ. Consistent with our Theorem 7, the error from the adative rocedure is almost the same as the oracle error. rescaled samle size (k=4) adative mse oracle mse (k=6) adative mse oracle mse Table 1: Errors of the adative rocedure versus the oracle. The effect of constraints. The otimization (10) and Algorithm 1 involves the constraint Q [l, u] k 1 k 2. It is curious whether this constraint really hels reduce the error or 13

14 Gao, Lu, Ma and Zhou mse samle size k=2 k=4 k=8 k=16 mse k=2 k=4 k=8 k= n rescaled samle size N = ρlogk (a) (b) Figure 3: Plots of error 1 n ˆθ θ when using our algorithm on sarse SBM. Each curve corresonds to a fixed k. (a) Plots of error against the raw samle size n. (b) Plots of the same error against rescaled samle size n/ρ log k. merely an artifact of the roof. We investigate the effect of this constraint on simulated data by comaring Algorithm 1 with its variation without the constraint for both Gaussian case and sarse Bernoulli case. Panel (a) of Figure 4 shows the lots of sarse SBM with 8 communities. Panel (b) is the lots of Gaussian case with (k 1, k 2 ) = (4, 8). For both anels, when the rescaled samle size is small, the effect of constraint is significant, while as the rescaled samle size increases, the erformance of two estimators become similar. 6. Discussion This aer studies the otimal rates of recovering a matrix with biclustering structure. While the recent rogresses in high-dimensional estimation mainly focus on sarse and low rank structures, the study of biclustering structure does not gain much attention. This aer fills in the ga. In what follows, we discuss some key oints of the aer and some ossible future directions of research. Difference from low-rankness. A biclustering structure is imlicitly low-rank. Therefore, we show that by exloring the stronger biclustering assumtion, one can achieve better rates of convergence in estimation and comletion. The minimax rates derived in this aer recisely characterize how much one can gain by taking advantage of this structure. Relation to other structures. A natural question to investigate is whether there is similarity between the biclustering structure and the well-studied sarsity structure. The aer Gao et al. (2015b) gives a general theory of structured estimation in linear models that uts both sarse and biclustering structures in a unified theoretical framework. According 14

15 Matrix Comletion with Biclustering Structures mse unconstrained constrained mse unconstrained constrained Rescaled Samle Size Rescaled Samle Size (a) Sarse SBM with k = 8 (b) Gaussian biclustering with (k 1, k 2 ) = (4, 8) Figure 4: Constrained versus unconstrained least-squares to this general theory, the k 1 k 2 art in the minimax rate is the comlexity of arameter estimation and the n 1 log k 1 + n 2 log k 2 art is the comlexity of structure estimation. Oen roblems. The otimization roblem (10) is not convex, thus causing difficulty in devising a rovably otimal olynomial-time algorithm. An oen question is whether there is a convex relaxation of (10) that can be solved efficiently without losing much statistical accuracy. Another interesting roblem for future research is whether the objective function in (10) can be extended beyond the least squares framework. 7. Proofs 7.1 Proof of Theorem 1 Below, we focus on the roof for the asymmetric arameter sace Θ k1 k 2 (M). The result for the symmetric arameter sace Θ s k (M) can be obtained by letting k 1 = k 2 and by taking care of the diagonal entries. Since ˆθ Θ k1 k 2 (M), there exists ẑ 1 [k 1 ] n 1, ẑ 2 [k 2 ] n 2 and ˆQ [ M, M] k 1 k 2 such that ˆθ ij = ˆQẑ1 (i)ẑ 2 (j). For this (ẑ 1, ẑ 2 ), we define a matrix θ by θ ij = 1 ẑ1 1 (a) ẑ 1 2 (b) θ ij, (i,j) ẑ 1 1 (a) ẑ 1 2 (b) for any (i, j) ẑ1 1 (a) ẑ 1 2 (b) and any (a, b) [k 1] [k 2 ]. To facilitate the roof, we need to following three lemmas, whose roofs are given in the sulementary material. Lemma 10. For any constant C > 0, there exists a constant C 1 > 0 only deending on C, such that ˆθ θ 2 M 2 σ 2 C 1 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ), with robability at least 1 ex( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )). 15

16 Gao, Lu, Ma and Zhou Lemma 11. For any constant C > 0, there exists a constant C 2 > 0 only deending on C, such that the inequality θ θ 2 C 2 (M 2 σ 2 )(n 1 log k 1 + n 2 log k 2 )/ imlies θ θ θ θ, Y θ M C 2 σ 2 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ), with robability at least 1 ex( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )). Lemma 12. For any constant C > 0, there exists a constant C 3 > 0 only deending on C, such that M ˆθ θ, 2 σ 2 Y θ C3 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ), with robability at least 1 ex( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )). Proof of Theorem 1. Alying union bound, the results of Lemma hold with robability at least 1 3 ex ( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )). We consider the following two cases. Case 1: θ θ 2 C 2 (M 2 σ 2 )(k 1 k 2 + n 1 log k 1 + n 2 log k 2 )/. Then we have ˆθ θ 2 2 ˆθ θ θ θ 2 2(C 1 + C 2 ) M 2 σ 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) by Lemma 10. Case 2: θ θ 2 > C 2 (M 2 σ 2 )(k 1 k 2 + n 1 log k 1 + n 2 log k 2 )/. By the definition of the estimator, we have ˆθ Y 2 θ Y 2. After rearrangement, we have ˆθ θ 2 2 ˆθ θ, Y θ = 2 ˆθ θ, Y θ + 2 θ θ, Y θ θ θ 2 ˆθ θ, Y θ + 2 θ θ θ θ, Y θ 2 ˆθ θ, Y θ + 2( θ ˆθ θ θ + ˆθ θ ) θ θ, Y θ 2(C 2 + C 3 + C 1 C 2 ) M 2 σ 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) ˆθ θ 2, which leads to the bound ˆθ θ 2 4(C 2 + C 3 + C 1 C 2 ) M 2 σ 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ). Combining the two cases, we have ˆθ θ 2 C M 2 σ 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ), with robability at least 1 3 ex ( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )) for C = 4(C 2 + C 3 + C1 C 2 ) 2(C 1 + C 2 ). 16

17 Matrix Comletion with Biclustering Structures 7.2 Proof of Theorem 7 We first resent a lemma for the tail behavior of sum of indeendent roducts of sub- Gaussian and Bernoulli random variables. Its roof is given in the sulementary material. Lemma 13. Let {X i } be indeendent sub-gaussian random variables with mean θ i [ M, M] and Ee λ(x i θ i ) e λ2 σ 2 /2. Let {E i } be indeendent Bernoulli random variables with mean. Assume {X i } and {E i } are all indeendent. Then for λ /(M σ) and Y i = X i E i /, we have Ee λ(y i θ i ) 2e (M 2 +2σ 2 )λ 2 /. Moreover, for n i=1 c2 i = 1, { } n P c i (Y i θ i ) t i=1 for any t > 0. { ( 4 ex min t 2 4(M 2 + 2σ 2 ), t 2(M σ) c Proof of Theorem 7. Consider the mean matrix θ that belongs to the sace Θ k1 k 2 (M). By the definition of (ˆk 1, ˆk 2, ˆM), we have ˆθ ˆk1ˆk2 ˆM Y c 2 c ˆθ k 1 k 2 m Y c 2 c, where k 1 and k 2 are true numbers of row and column clusters and m is chosen to be the smallest element in M that is no smaller than M. After rearrangement, we have ˆθ ˆk1ˆk2 θ ˆM 2 c ˆθ k 1 k 2 m θ 2 c + 2 ˆθ ˆk1ˆk2 ˆθ ˆk1ˆk2 ˆθ ˆM k 1 k 2 m ˆθ ˆM k 1 k 2 m c ˆθ ˆk1ˆk2 ˆθ ˆM k 1 k 2 m, Y c θ c ˆθ k 1 k 2 m θ 2 c + 2 ˆθ ˆk1ˆk2 ˆM ˆθ k 1 k 2 m c max (l 1,l 2 ) [n 1 ] [n 2 ] By Lemma 13 and the indeendence structure, we have ˆθ max l1 l 2 h ˆθ k 1 k 2 m (l 1,l 2,h) [n 1 ] [n 2 ] M ˆθ l 1 l 2 h ˆθ k 1 k 2 m, Y c θ c c )} c (19) ˆθ l1 l 2 h ˆθ k 1 k 2 m ˆθ l 1 l 2 h ˆθ k 1 k 2 m, Y c θ c C(M σ)log(n 1 + n 2 ), with robability at least 1 (n 1 n 2 ) C. Using triangle inequality and Cauchy-Schwarz inequality, we have ˆθ ˆk1ˆk2 θ ˆM 2 c 3 2 ˆθ k 1 k 2 m θ 2 c + 1 ( ) 2 ˆθ ˆk1ˆk2 θ ˆM 2 c + 4C2 (M 2 σ 2 log(n1 + n 2 ) 2 ). By rearranging the above inequality, we have ( ) ˆθ ˆk1ˆk2 θ ˆM 2 c 3 ˆθ k 1 k 2 m θ 2 c + 8C2 (M 2 σ 2 log(n1 + n 2 ) 2 ). A symmetric argument leads to ( ) ˆθ ˆk c θ 1ˆk2 ˆM 2 3 ˆθ k c 1 k 2 m θ 2 + 8C 2 (M 2 σ 2 log(n1 + n 2 ) 2 ). 17 c.

18 Gao, Lu, Ma and Zhou Summing u the above two inequalities, we have ( ) ˆθ θ 2 3 ˆθ k 1 k 2 m θ ˆθ k c 1 k 2 m θ C 2 (M 2 σ 2 log(n1 + n 2 ) 2 ). (20) Using Theorem 1 to bound ˆθ k 1 k 2 m θ 2 and ˆθ k c 1 k 2 m θ 2 can be bounded by C m2 σ 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ). Given that m = M ( 1 + m M ) M 2M by the choice of m, the roof is comlete. 7.3 Proof of Theorem 6 Recall the augmented data Y ij = X ij E ij /. Define Y ij = X ij E ij /ˆ. Let us give two lemmas to facilitate the roof. Lemma 14. Assume log(n 1+n 2 ) n 1 n 2. For any C > 0, there is some constant C > 0 such that Y Y 2 C [ M 2 + σ 2 log(n 1 + n 2 ) ] log(n 1 + n 2 ) 2, with robability at least 1 (n 1 n 2 ) C. Lemma 15. The inequalities in Lemma continue to hold with bounds and resectively. M 2 σ 2 C 1 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) + 2 Y Y 2, M C 2 σ 2 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) + Y Y, M 2 σ 2 C 3 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) + ˆθ θ Y Y, Proof of Theorem 6. The roof is similar to that of Theorem 1. We only need to relace Lemma by Lemma 14 and Lemma 15 to get the desired result. 7.4 Proofs of Theorem 3 and Theorem 5 This section gives roofs of the minimax lower bounds. We first introduce some notation. For any robability measures P, Q, define the Kullback Leibler divergence by D(P Q) = ( ) log dp dq dp. The chi-squared divergence is defined by χ 2 (P Q) = ( ) dp dq dp 1. The main tool we will use is the following roosition. Proosition 16. Let (Ξ, l) be a metric sace and {P ξ : ξ Ξ} be a collection of robability measures. For any totally bounded T Ξ, define the Kullback-Leibler diameter and the chi-squared diameter of T by d KL (T ) = su D(P ξ P ξ ), ξ,ξ T d χ 2(T ) = su χ 2 (P ξ P ξ ). ξ,ξ T 18

19 Matrix Comletion with Biclustering Structures Then inf ˆξ inf ˆξ { su P ξ l 2(ˆξ(X), ) } ξ ɛ2 1 d KL(T ) + log 2 ξ Ξ 4 } { su P ξ l 2(ˆξ(X), ) ξ ɛ2 ξ Ξ 4 1 log M(ɛ, T, l), (21) 1 M(ɛ, T, l) d χ 2(T ) M(ɛ, T, l), (22) for any ɛ > 0, where the acking number M(ɛ, T, l) is the largest number of oints in T that are at least ɛ away from each other. The inequality (21) is the classical Fano s inequality. The version we resent here is by Yu (1997). The inequality (22) is a generalization of the classical Fano s inequality by using chi-squared divergence instead of KL divergence. It is due to Guntuboyina (2011). The following roosition bounds the KL divergence and the chi-squared divergence for both Gaussian and Bernoulli models. Proosition 17. For the Gaussian model, we have D ( ) P (θ,σ 2,) P (θ,σ 2,) 2σ 2 θ θ 2, For the Bernoulli model with any θ, θ [ρ/2, 3ρ/4] n 1 n 2, we have D ( ) 8 P (θ,) P (θ,) ρ θ θ 2, χ 2 ( ) ( P (θ,σ 2,) P (θ,σ 2,) ex σ 2 θ θ 2) 1. χ 2 ( ( ) ) 8 P (θ,) P (θ,) ex ρ θ θ 2 1. Finally, we need the following Varshamov Gilbert bound. The version we resent here is due to (Massart, 2007, Lemma 4.7). Lemma 18. There exists a subset {ω 1,..., ω N } {0, 1} d such that for some N ex (d/8). H(ω i, ω j ) ω i ω j 2 d, for any i j [N], (23) 4 Proof of Theorem 3. We focus on the roof for the asymmetric arameter sace Θ k1 k 2 (M). The result for the symmetric arameter sace Θ s k (M) can be obtained by letting k 1 = k 2 and by taking care of the diagonal entries. Let us assume n 1 /k 1 and n 2 /k 2 are integers without loss of generality. We first derive the lower bound for the nonarametric rate σ 2 k 1 k 2 /. Let us fix the labels by z 1 (i) = ik 1 /n 1 and z 2 (j) = jk 2 /n 2. For any ω {0, 1} k 1 k 2, define Q ω ab = c σ 2 k 1 k 2 ω ab. (24) n 1 n 2 By Lemma 18, there exists some T {0, 1} k 1k 2 such that T ex(k 1 k 2 /8) and H(ω, ω ) k 1 k 2 /4 for any ω, ω T and ω ω. We construct the subsace } Θ(z 1, z 2, T ) = {θ R n 1 n 2 : θ ij = Q ω z1 (i)z2 (j), ω T. 19

20 Gao, Lu, Ma and Zhou By Proosition 17, we have su χ 2 ( ( P (θ,σ 2,) P (θ,σ,)) 2 ex c 2 ) k 1 k 2. θ,θ Θ(z 1,z 2,T ) For any two different θ and θ in Θ(z 1, z 2, T ) associated with ω, ω T, we have θ θ 2 c2 σ 2 H(ω, ω ) c2 σ 2 4 k 1k 2. ( ) c Therefore, M 2 σ 2 4 k 1k 2, Θ(z 1, z 2, T ), ex(k 1 k 2 /8). Using (22) with an aroriate c, we have obtained the rate σ2 k 1k 2 in the lower bound. Now let us derive the clustering rate σ 2 n 2 log k 2 /. Let us ick ω 1,..., ω k2 {0, 1} k 1 such that H(ω a, ω b ) k 1 4 for all a b. By Lemma 18, this is ossible when ex(k 1 /8) k 2. Then, define Q a = c σ 2 n 2 log k 2 n 1 n 2 ω a. (25) Define z 1 by z 1 (i) = ik 1 /n 1. Fix Q and z 1 and we are gong to let z 2 vary. Select a set Z 2 [k 2 ] n 2 such that Z 2 ex(cn 2 log k 2 ) and H(z 2, z 2 ) n 2 6 for any z 2, z 2 Z k and z 2 z 2. The existence of such Z 2 is roved by Gao et al. (2015a). Then, the subsace we consider is Θ(z 1, Z 2, Q) = { θ R n 1 n 2 } : θ ij = Q z1 (i)z 2 (j), z 2 Z 2. By Proosition 17, we have su D ( P (θ,σ 2,) P (θ,σ,)) 2 c 2 n 2 log k 2. θ,θ Θ(z 1,Z 2,Q) For any two different θ and θ in Θ(z 1, Z 2, Q) associated with z 2, z 2 Z 2, we have n 2 θ θ 2 = θ z2 (j) θ z 2 (j) 2 H(z 2, z 2) c2 σ 2 n 2 log k 2 n 1 n 1 n 2 4 c2 σ 2 n 2 log k j=1 ( ) Therefore, M c 2 σ 2 n 2 log k 2 24, Θ(z 1, Z 2, Q), ex(cn 2 log k 2 ). Using (21) with some aroriate c, we obtain the lower bound σ2 n 2 log k 2. A symmetric argument gives the rate σ2 n 1 log k 1. Combining the three arts using the same argument in Gao et al. (2015a), the roof is comlete. Proof of Theorem 5. The roof is similar to that of Theorem 3. The only differences are (24) relaced by ) Q ω ab (c = 1 2 ρ + ρk 2 n ρ ω ab and (25) relaced by Q a = 1 2 ρ + (c ρ log k n 1 2 ρ It is easy to check that the constructed subsaces are subsets of Θ + k (ρ). Then, a symmetric modification of the roof of Theorem 3 leads to the desired conclusion. ) ω a. 20

21 Matrix Comletion with Biclustering Structures 7.5 Proofs of Corollary 8 and Corollary 9 The result of Corollary 8 can be derived through a standard bias-variance trade-off argument by combining Corollary 4 and Lemma 2.1 in Gao et al. (2015a). The result of Corollary 9 follows Theorem 7. By studying the roof of Theorem 7, (20) holds for all k. Choosing the best k to trade-off bias and variance gives the result of Corollary 9. We omit the details here. Acknowledgments The researches of CG, YL and HHZ are suorted in art by NSF grant DMS The research of ZM is suorted in art by NSF Career grant DMS Aendix A. Proofs of auxiliary results In this section, we give roofs of Lemma We first introduce some notation. Define the set Z k1 k 2 = {z = (z 1, z 2 ) : z 1 [k 1 ] n 1, z 2 [k 2 ] n 2 }. For a matrix G R n 1 n 2 and some z = (z 1, z 2 ) Z k1 k 2, define 1 Ḡ ab (z) = z1 1 (a) z 1 2 (b) G ij, (i,j) z 1 1 (a) z 1 2 (b) for all a [k 1 ], b [k 2 ]. To facilitate the roof, we need the following two results. Proosition 19. For the estimator ˆθ ij = ˆQẑ1 (i)ẑ 2 (j), we have for all a [k 1 ], b [k 2 ]. ˆQ ab = sign(ȳab(ẑ)) ( Ȳab(ẑ) M ), Lemma 20. Under the setting of Lemma 13, define S = 1 n n i=1 (Y i θ i ) and τ = 2(M 2 + 2σ 2 )/(M σ). Then we have the following results: a. Let T = S1{ S τ n}, then Ee T 2 /(8(M 2 +2σ 2 )) 5; b. Let R = τ n S 1{ S > τ n}, then Ee R/(8(M 2 +2σ 2 )) 9. Proof. By (19), Then Ee λt 2 = ( ) { ( P S > t 4 ex min 0 ( ) P e λt 2 > u du 1 + ( e λτ 2 n = 1 + P S > 1 t 2 4(M 2 + 2σ 2 ), 1 ) log u λ )} nt. 2(M σ) ( ) log u P T > du λ e λτ 2 n du = u /(4λ(M 2 +2σ 2 )) du. 1 21

22 Gao, Lu, Ma and Zhou Choosing λ = /(8(M 2 + 2σ 2 )), we get Ee T 2 /(8(M 2 +2σ 2 )) 5. We roceed to rove the second claim. Ee λr = P(R = 0) + P(R > 0)E[e λr R > 0] = P(R = 0) + P(R > 0) = P(R = 0) P(e λr > u R > 0)du P(e λr > u, R > 0)du P(R = 0) + P(R > 0)e λτ 2n e τ 2 n/(4(m 2 +2σ 2 ))+λτ 2n + = 1 + 4e τ 2 n/(4(m 2 +3σ 2 ))+λτ 2n + 4 e λτ2 n e λτ2 n P(e λr > u)du e λτ2 n Choosing λ = /(8(M 2 + 2σ 2 )), we get Ee R/(8(M 2 +2σ 2 )) 9. P(e nλτ S > u)du u /(2λτ(M σ)) du Proof of Lemma 10. By the definitions of ˆθ ij and θ ij and Proosition 19, we have for any (i, j) ẑ 1 1 M θ ab (ẑ), if Ȳab(ẑ) M; ˆθ ij θ ij = Ȳ ab (ẑ) θ ab (ẑ), if M Ȳab(ẑ) < M; M θ ab (ẑ), if Ȳab(ẑ) < M (a) ẑ 1 2 (b). Define W = Y θ, and it is easy to check that ˆθ ij θ ij W ab (ẑ) 2M W ab (ẑ) τ, where ẑ = (ẑ 1, ẑ 2 ) and τ is defined in Lemma 20. Then ˆθ θ 2 max z Z k1 k 2 ẑ1 1 (a) ẑ2 1 (b) ( W ab (ẑ) τ z1 1 (a) z2 1 (b) ( 2. W ab (z) τ) (26) ) 2 For any a [k 1 ], b [k 2 ] and z 1 [k 1 ] n 1, z 2 [k 2 ] n 2, define n 1 (a) = z1 1 (a), n 2 (b) = z 1 2 (b) and V ab(z) = n 1 (a)n 2 (b) W ab(z) 1{ W ab(z) τ}, R ab (z) = n 1 (a)n 2 (b)τ W ab (z) 1{ W ab (z) > τ}. Then, ˆθ θ 2 max z Z k1 k 2 ( V 2 ab (z) + R ab (z) ). (27) 22

23 Matrix Comletion with Biclustering Structures By Markov s inequality and Lemma 20, we have P Vab 2 e t/(8(m 2 +2σ 2 )) e V 2 ab (z)/(8(m 2 +2σ 2 )) { } t ex 8(M 2 + 2σ 2 ) + k 1k 2 log 5, and P R ab (z) > t e t/(8(m 2 +2σ 2 )) { } t ex 8(M 2 + 2σ 2 ) + k 1k 2 log 9, e R ab(z)/(8(m 2 +2σ 2 )) Alying union bound and using the fact that log [k 1 ] n + log [k 2 ] n 2 = n 1 log k 1 + n 2 log k 2, { } P max Vab 2 (z) > t t ex z Z k1 k 2 8(M 2 + 2σ 2 ) + k 1k 2 log 5 + n 1 log k 1 + n 2 log k 2. For any given constant C M > 0, we choose t = C 2 σ 2 1 (k 1 k 2 +n 1 log k 1 +n 2 log k 2 ) for some sufficiently large C 1 > 0 to obtain max z Z k1 k 2 Vab 2 (z) C M 2 σ 2 1 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) (28) with robability at least 1 ex ( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )). sufficiently large C 2 > 0, we have max z Z k1 k 2 Similarly, for some M 2 σ 2 R ab (z) C 2 (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) (29) with robability at least 1 ex ( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 )). Plugging (28) and (29) into (27), we comlete the roof. Proof of Lemma 11. Note that θ ij θ ij = is a function of ẑ 1 and ẑ 2. Then we have θ ij θ ij (Y ij θ ij ) ij ( θ ij θ ij ) 2 ij θ ab (ẑ)1{(i, j) ẑ1 1 (a) ẑ 1 2 (b)} θ ij max z Z k1 k 2 γ ij (z)(y ij θ ij ), ij 23

24 Gao, Lu, Ma and Zhou where γ ij (z) θ ab (z)1{(i, j) z 1 1 (a) z 1 2 (b)} θ ij satisfies ij γ ij(z) 2 = 1. Consider the event θ θ 2 C 2 (M 2 σ 2 )(k 1 k 2 + n 1 log k 1 + n 2 log k 2 )/ for some C 2 to be secified later, we have γ ij (z) 2M θ θ 4M 2 C 2 (M 2 σ 2 )(k 1 k 2 + n 1 log k 1 + n 2 log k 2 ). By Lemma 13 and union bound, we have P max z Z k1 γ ij (z)(y ij θ ij ) k 2 ij > t P γ ij (z)(y ij θ ij ) > t z 1 [k 1 ] n 1,z 2 [k 2 ] n 2 ex ( C (k 1 k 2 + n 1 log k 1 + n 2 log k 2 ) ), by setting t = C 2 (M 2 σ 2 )(k 1 k 2 + n 1 log k 1 + n 2 log k 2 )/ for some sufficiently large C 2 deending on C. Thus, the lemma is roved. Proof of Lemma 12. By definition, ˆθ θ, Y θ ( = sign(ȳab(ẑ)) ( Ȳab(ẑ) M ) θ ) ab (ẑ) Wab (ẑ) ẑ1 1 (a) ẑ 1 2 (b) ( max z Z k1 sign(ȳab(z)) ( Ȳab(z) M ) θ ) ab (z) Wab (z) z1 1 k 2 (a) z 1 2 (b). By definition, we have ( sign(ȳab(z)) ( Ȳab(z) M ) θ ab (z)) Wab (z) W ab (z) 2 τ W ab (z). ij For any fixed z 1 [k 1 ] n 1, z 2 [k 2 ] n 2, define n 1 (a) = z1 1 (a) for a [k 1], n 2 (b) = z1 1 (b) for b [k 2 ] and V ab (z) = n 1 (a)n 2 (b) W ab (z) 1{ W ab (z) τ}, R ab (z) = τn 1 (a)n 2 (b) W ab (z) 1{ W ab (z) > τ}. Then ˆθ θ, Y θ max Vab 2 z Z k1 k 2 (z) + R ab (z). Following the same argument in the roof of Lemma 10, a choice of t = C 3 (M 2 σ 2 )(k 1 k 2 + n 1 log k 1 + n 2 log k 2 )/ for some sufficiently large C 3 > 0 will comlete the roof. 24

25 Matrix Comletion with Biclustering Structures Proof of Lemma 13. When λ /(M σ), λθ i / 1 and λ 2 σ 2 / 2 1. Then Ee λ(y i θ i ) = Ee λ(x/ θ i) + (1 )Ee λθ i e λ 2 σ ( λθ i + λθi + (1 )e λθ i 2(1 )2 2 λ 2 θi 2 ) (1 + λ2 σ 2 2 ) + (1 )(1 λθ i + 2λ 2 θ 2 i ) 1 + 2(1 )θ2 i + σ2 λ λ 3 θ i σ 2 2(1 )2 + 3 λ 4 σ 2 θi 2 + 2(1 )λ 2 θi (2M 2 + σ 2 )λ 2 / + λ 3 θ i σ 2 / 2 + 2λ 4 σ 2 θi 2 / (2M 2 + σ 2 )λ 2 / + λ 2 σ 2 / + 2λ 2 σ 2 / 2 + (2M 2 + 4σ 2 )λ 2 / 2e (M 2 +2σ 2 )λ 2 /. The second inequality is due to the fact that e x 1 + 2x for all x 0 and e x 1 + x + 2x 2 for all x 1. Then for λ (M σ) c, Markov inequality imlies ( n ) P c i (Y i θ i ) t i=1 } 2 ex { λt + λ2 (M 2 + 2σ 2 ). { t By choosing λ = min 2(M 2 +2σ 2 ), (M σ) c }, we get (19). Proof of Corollary 4. For indeendent Bernoulli random variables X i Ber(θ i ) with θ i [0, ρ] for i [n]. Let Y i = X i E i /, where {E i } are indenendent Bernoulli random variables and {E i } and {X i } are indeendent. Note that EY i = i, EYi 2 ρ/ and Y i 1/. Then Bernstein s inequality (Massart, 2007, Corollary 2.10) imlies { 1 P n } n (Y i θ i ) t i=1 { ( t 2 2 ex min 4ρ, 3 )} nt for any t > 0. Let S = 1 n n i=1 (Y i θ i ), T = S1{ S 3ρ n} and R = 3ρ n1{ S > 3ρ n}. Following the same arguments as in the roof of Lemma 20, we have Ee T 2 /(8ρ) 5 and Ee R/(8ρ) 9. Consequently, Lemma 10, Lemma 11 and Lemma 12 hold for the Bernoulli case. Then the rest of the roof follows from the roof of Theorem 1. Proof of Lemma 14. By the definitions of Y and Y, we have Y Y 2 (ˆ 1 1 ) 2 max Xij 2 i,j E ij. Therefore, it is sufficient to bound the three terms. For the first term, we have ˆ 1 1 ˆ 1 1 ˆ ˆ + 2, 25 ij 4 (30)

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

Best approximation by linear combinations of characteristic functions of half-spaces

Best approximation by linear combinations of characteristic functions of half-spaces Best aroximation by linear combinations of characteristic functions of half-saces Paul C. Kainen Deartment of Mathematics Georgetown University Washington, D.C. 20057-1233, USA Věra Kůrková Institute of

More information

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces Abstract and Alied Analysis Volume 2012, Article ID 264103, 11 ages doi:10.1155/2012/264103 Research Article An iterative Algorithm for Hemicontractive Maings in Banach Saces Youli Yu, 1 Zhitao Wu, 2 and

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

A Note on Guaranteed Sparse Recovery via l 1 -Minimization

A Note on Guaranteed Sparse Recovery via l 1 -Minimization A Note on Guaranteed Sarse Recovery via l -Minimization Simon Foucart, Université Pierre et Marie Curie Abstract It is roved that every s-sarse vector x C N can be recovered from the measurement vector

More information

Applications to stochastic PDE

Applications to stochastic PDE 15 Alications to stochastic PE In this final lecture we resent some alications of the theory develoed in this course to stochastic artial differential equations. We concentrate on two secific examles:

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

Stochastic integration II: the Itô integral

Stochastic integration II: the Itô integral 13 Stochastic integration II: the Itô integral We have seen in Lecture 6 how to integrate functions Φ : (, ) L (H, E) with resect to an H-cylindrical Brownian motion W H. In this lecture we address the

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Improved Capacity Bounds for the Binary Energy Harvesting Channel Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables Imroved Bounds on Bell Numbers and on Moments of Sums of Random Variables Daniel Berend Tamir Tassa Abstract We rovide bounds for moments of sums of sequences of indeendent random variables. Concentrating

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems Int. J. Oen Problems Comt. Math., Vol. 3, No. 2, June 2010 ISSN 1998-6262; Coyright c ICSRS Publication, 2010 www.i-csrs.org Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various

More information

An Inverse Problem for Two Spectra of Complex Finite Jacobi Matrices

An Inverse Problem for Two Spectra of Complex Finite Jacobi Matrices Coyright 202 Tech Science Press CMES, vol.86, no.4,.30-39, 202 An Inverse Problem for Two Sectra of Comlex Finite Jacobi Matrices Gusein Sh. Guseinov Abstract: This aer deals with the inverse sectral roblem

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Cryptanalysis of Pseudorandom Generators

Cryptanalysis of Pseudorandom Generators CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we

More information

ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM

ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM JOHN BINDER Abstract. In this aer, we rove Dirichlet s theorem that, given any air h, k with h, k) =, there are infinitely many rime numbers congruent to

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

Brownian Motion and Random Prime Factorization

Brownian Motion and Random Prime Factorization Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668

More information

THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT

THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT ZANE LI Let e(z) := e 2πiz and for g : [0, ] C and J [0, ], define the extension oerator E J g(x) := g(t)e(tx + t 2 x 2 ) dt. J For a ositive weight ν

More information

p-adic Measures and Bernoulli Numbers

p-adic Measures and Bernoulli Numbers -Adic Measures and Bernoulli Numbers Adam Bowers Introduction The constants B k in the Taylor series exansion t e t = t k B k k! k=0 are known as the Bernoulli numbers. The first few are,, 6, 0, 30, 0,

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 000-9939XX)0000-0 ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS WILLIAM D. BANKS AND ASMA HARCHARRAS

More information

B8.1 Martingales Through Measure Theory. Concept of independence

B8.1 Martingales Through Measure Theory. Concept of independence B8.1 Martingales Through Measure Theory Concet of indeendence Motivated by the notion of indeendent events in relims robability, we have generalized the concet of indeendence to families of σ-algebras.

More information

ECE 534 Information Theory - Midterm 2

ECE 534 Information Theory - Midterm 2 ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

DISCRIMINANTS IN TOWERS

DISCRIMINANTS IN TOWERS DISCRIMINANTS IN TOWERS JOSEPH RABINOFF Let A be a Dedekind domain with fraction field F, let K/F be a finite searable extension field, and let B be the integral closure of A in K. In this note, we will

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS #A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,

More information

Inequalities for the L 1 Deviation of the Empirical Distribution

Inequalities for the L 1 Deviation of the Empirical Distribution Inequalities for the L 1 Deviation of the Emirical Distribution Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, Marcelo J. Weinberger June 13, 2003 Abstract We derive bounds on the robability

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

SCHUR S LEMMA AND BEST CONSTANTS IN WEIGHTED NORM INEQUALITIES. Gord Sinnamon The University of Western Ontario. December 27, 2003

SCHUR S LEMMA AND BEST CONSTANTS IN WEIGHTED NORM INEQUALITIES. Gord Sinnamon The University of Western Ontario. December 27, 2003 SCHUR S LEMMA AND BEST CONSTANTS IN WEIGHTED NORM INEQUALITIES Gord Sinnamon The University of Western Ontario December 27, 23 Abstract. Strong forms of Schur s Lemma and its converse are roved for mas

More information

STRONG TYPE INEQUALITIES AND AN ALMOST-ORTHOGONALITY PRINCIPLE FOR FAMILIES OF MAXIMAL OPERATORS ALONG DIRECTIONS IN R 2

STRONG TYPE INEQUALITIES AND AN ALMOST-ORTHOGONALITY PRINCIPLE FOR FAMILIES OF MAXIMAL OPERATORS ALONG DIRECTIONS IN R 2 STRONG TYPE INEQUALITIES AND AN ALMOST-ORTHOGONALITY PRINCIPLE FOR FAMILIES OF MAXIMAL OPERATORS ALONG DIRECTIONS IN R 2 ANGELES ALFONSECA Abstract In this aer we rove an almost-orthogonality rincile for

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Introduction to Banach Spaces

Introduction to Banach Spaces CHAPTER 8 Introduction to Banach Saces 1. Uniform and Absolute Convergence As a rearation we begin by reviewing some familiar roerties of Cauchy sequences and uniform limits in the setting of metric saces.

More information

arxiv: v2 [stat.me] 3 Nov 2014

arxiv: v2 [stat.me] 3 Nov 2014 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk

More information

arxiv: v2 [math.na] 6 Apr 2016

arxiv: v2 [math.na] 6 Apr 2016 Existence and otimality of strong stability reserving linear multiste methods: a duality-based aroach arxiv:504.03930v [math.na] 6 Ar 06 Adrián Németh January 9, 08 Abstract David I. Ketcheson We rove

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES OHAD GILADI AND ASSAF NAOR Abstract. It is shown that if (, ) is a Banach sace with Rademacher tye 1 then for every n N there exists

More information

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations

More information

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar 15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider

More information

LORENZO BRANDOLESE AND MARIA E. SCHONBEK

LORENZO BRANDOLESE AND MARIA E. SCHONBEK LARGE TIME DECAY AND GROWTH FOR SOLUTIONS OF A VISCOUS BOUSSINESQ SYSTEM LORENZO BRANDOLESE AND MARIA E. SCHONBEK Abstract. In this aer we analyze the decay and the growth for large time of weak and strong

More information

Sampling and Distortion Tradeoffs for Bandlimited Periodic Signals

Sampling and Distortion Tradeoffs for Bandlimited Periodic Signals Samling and Distortion radeoffs for Bandlimited Periodic Signals Elaheh ohammadi and Farokh arvasti Advanced Communications Research Institute ACRI Deartment of Electrical Engineering Sharif University

More information

Sobolev Spaces with Weights in Domains and Boundary Value Problems for Degenerate Elliptic Equations

Sobolev Spaces with Weights in Domains and Boundary Value Problems for Degenerate Elliptic Equations Sobolev Saces with Weights in Domains and Boundary Value Problems for Degenerate Ellitic Equations S. V. Lototsky Deartment of Mathematics, M.I.T., Room 2-267, 77 Massachusetts Avenue, Cambridge, MA 02139-4307,

More information

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels oname manuscrit o. will be inserted by the editor) Quantitative estimates of roagation of chaos for stochastic systems with W, kernels Pierre-Emmanuel Jabin Zhenfu Wang Received: date / Acceted: date Abstract

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Extension of Minimax to Infinite Matrices

Extension of Minimax to Infinite Matrices Extension of Minimax to Infinite Matrices Chris Calabro June 21, 2004 Abstract Von Neumann s minimax theorem is tyically alied to a finite ayoff matrix A R m n. Here we show that (i) if m, n are both inite,

More information

Haar type and Carleson Constants

Haar type and Carleson Constants ariv:0902.955v [math.fa] Feb 2009 Haar tye and Carleson Constants Stefan Geiss October 30, 208 Abstract Paul F.. Müller For a collection E of dyadic intervals, a Banach sace, and,2] we assume the uer l

More information

A Social Welfare Optimal Sequential Allocation Procedure

A Social Welfare Optimal Sequential Allocation Procedure A Social Welfare Otimal Sequential Allocation Procedure Thomas Kalinowsi Universität Rostoc, Germany Nina Narodytsa and Toby Walsh NICTA and UNSW, Australia May 2, 201 Abstract We consider a simle sequential

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS CASEY BRUCK 1. Abstract The goal of this aer is to rovide a concise way for undergraduate mathematics students to learn about how rime numbers behave

More information

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law On Isoerimetric Functions of Probability Measures Having Log-Concave Densities with Resect to the Standard Normal Law Sergey G. Bobkov Abstract Isoerimetric inequalities are discussed for one-dimensional

More information

Maxisets for μ-thresholding rules

Maxisets for μ-thresholding rules Test 008 7: 33 349 DOI 0.007/s749-006-0035-5 ORIGINAL PAPER Maxisets for μ-thresholding rules Florent Autin Received: 3 January 005 / Acceted: 8 June 006 / Published online: March 007 Sociedad de Estadística

More information

On Character Sums of Binary Quadratic Forms 1 2. Mei-Chu Chang 3. Abstract. We establish character sum bounds of the form.

On Character Sums of Binary Quadratic Forms 1 2. Mei-Chu Chang 3. Abstract. We establish character sum bounds of the form. On Character Sums of Binary Quadratic Forms 2 Mei-Chu Chang 3 Abstract. We establish character sum bounds of the form χ(x 2 + ky 2 ) < τ H 2, a x a+h b y b+h where χ is a nontrivial character (mod ), 4

More information

On a Markov Game with Incomplete Information

On a Markov Game with Incomplete Information On a Markov Game with Incomlete Information Johannes Hörner, Dinah Rosenberg y, Eilon Solan z and Nicolas Vieille x{ January 24, 26 Abstract We consider an examle of a Markov game with lack of information

More information

The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001

The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001 The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001 The Hasse-Minkowski Theorem rovides a characterization of the rational quadratic forms. What follows is a roof of the Hasse-Minkowski

More information

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS #A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt

More information

2 Asymptotic density and Dirichlet density

2 Asymptotic density and Dirichlet density 8.785: Analytic Number Theory, MIT, sring 2007 (K.S. Kedlaya) Primes in arithmetic rogressions In this unit, we first rove Dirichlet s theorem on rimes in arithmetic rogressions. We then rove the rime

More information

Adaptive Estimation of the Regression Discontinuity Model

Adaptive Estimation of the Regression Discontinuity Model Adative Estimation of the Regression Discontinuity Model Yixiao Sun Deartment of Economics Univeristy of California, San Diego La Jolla, CA 9293-58 Feburary 25 Email: yisun@ucsd.edu; Tel: 858-534-4692

More information

Recursive Estimation of the Preisach Density function for a Smart Actuator

Recursive Estimation of the Preisach Density function for a Smart Actuator Recursive Estimation of the Preisach Density function for a Smart Actuator Ram V. Iyer Deartment of Mathematics and Statistics, Texas Tech University, Lubbock, TX 7949-142. ABSTRACT The Preisach oerator

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE GUSTAVO GARRIGÓS ANDREAS SEEGER TINO ULLRICH Abstract We give an alternative roof and a wavelet analog of recent results

More information

Finding a sparse vector in a subspace: linear sparsity using alternating directions

Finding a sparse vector in a subspace: linear sparsity using alternating directions IEEE TRANSACTION ON INFORMATION THEORY VOL XX NO XX 06 Finding a sarse vector in a subsace: linear sarsity using alternating directions Qing Qu Student Member IEEE Ju Sun Student Member IEEE and John Wright

More information

A construction of bent functions from plateaued functions

A construction of bent functions from plateaued functions A construction of bent functions from lateaued functions Ayça Çeşmelioğlu, Wilfried Meidl Sabancı University, MDBF, Orhanlı, 34956 Tuzla, İstanbul, Turkey. Abstract In this resentation, a technique for

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

Sharp gradient estimate and spectral rigidity for p-laplacian

Sharp gradient estimate and spectral rigidity for p-laplacian Shar gradient estimate and sectral rigidity for -Lalacian Chiung-Jue Anna Sung and Jiaing Wang To aear in ath. Research Letters. Abstract We derive a shar gradient estimate for ositive eigenfunctions of

More information

Lecture 3 Consistency of Extremum Estimators 1

Lecture 3 Consistency of Extremum Estimators 1 Lecture 3 Consistency of Extremum Estimators 1 This lecture shows how one can obtain consistency of extremum estimators. It also shows how one can find the robability limit of extremum estimators in cases

More information

Elementary theory of L p spaces

Elementary theory of L p spaces CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )

More information

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm Gabriel Noriega, José Restreo, Víctor Guzmán, Maribel Giménez and José Aller Universidad Simón Bolívar Valle de Sartenejas,

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Positive decomposition of transfer functions with multiple poles

Positive decomposition of transfer functions with multiple poles Positive decomosition of transfer functions with multile oles Béla Nagy 1, Máté Matolcsi 2, and Márta Szilvási 1 Deartment of Analysis, Technical University of Budaest (BME), H-1111, Budaest, Egry J. u.

More information

Diverse Routing in Networks with Probabilistic Failures

Diverse Routing in Networks with Probabilistic Failures Diverse Routing in Networks with Probabilistic Failures Hyang-Won Lee, Member, IEEE, Eytan Modiano, Senior Member, IEEE, Kayi Lee, Member, IEEE Abstract We develo diverse routing schemes for dealing with

More information

SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS

SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS In this section we will introduce two imortant classes of mas of saces, namely the Hurewicz fibrations and the more general Serre fibrations, which are both obtained

More information

Finding Shortest Hamiltonian Path is in P. Abstract

Finding Shortest Hamiltonian Path is in P. Abstract Finding Shortest Hamiltonian Path is in P Dhananay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune, India bstract The roblem of finding shortest Hamiltonian ath in a eighted comlete grah belongs

More information

A structure theorem for product sets in extra special groups

A structure theorem for product sets in extra special groups A structure theorem for roduct sets in extra secial grous Thang Pham Michael Tait Le Anh Vinh Robert Won arxiv:1704.07849v1 [math.nt] 25 Ar 2017 Abstract HegyváriandHennecartshowedthatifB isasufficientlylargebrickofaheisenberg

More information

arxiv: v2 [quant-ph] 2 Aug 2012

arxiv: v2 [quant-ph] 2 Aug 2012 Qcomiler: quantum comilation with CSD method Y. G. Chen a, J. B. Wang a, a School of Physics, The University of Western Australia, Crawley WA 6009 arxiv:208.094v2 [quant-h] 2 Aug 202 Abstract In this aer,

More information

On the Toppling of a Sand Pile

On the Toppling of a Sand Pile Discrete Mathematics and Theoretical Comuter Science Proceedings AA (DM-CCG), 2001, 275 286 On the Toling of a Sand Pile Jean-Christohe Novelli 1 and Dominique Rossin 2 1 CNRS, LIFL, Bâtiment M3, Université

More information

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule The Grah Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule STEFAN D. BRUDA Deartment of Comuter Science Bisho s University Lennoxville, Quebec J1M 1Z7 CANADA bruda@cs.ubishos.ca

More information

8 STOCHASTIC PROCESSES

8 STOCHASTIC PROCESSES 8 STOCHASTIC PROCESSES The word stochastic is derived from the Greek στoχαστικoς, meaning to aim at a target. Stochastic rocesses involve state which changes in a random way. A Markov rocess is a articular

More information

2 Asymptotic density and Dirichlet density

2 Asymptotic density and Dirichlet density 8.785: Analytic Number Theory, MIT, sring 2007 (K.S. Kedlaya) Primes in arithmetic rogressions In this unit, we first rove Dirichlet s theorem on rimes in arithmetic rogressions. We then rove the rime

More information

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A Dahleh George Verghese Deartment of Electrical Engineering and Comuter Science Massachuasetts Institute of Technology c Chater Matrix Norms

More information