arxiv: v4 [stat.ml] 21 Dec 2016

Size: px
Start display at page:

Download "arxiv: v4 [stat.ml] 21 Dec 2016"

Transcription

1 Online Active Linear Regression via Thresholing Carlos Riquelme Stanfor University Ramesh Johari Stanfor University Baosen Zhang University of Washington arxiv: v4 [stat.ml] 21 Dec 2016 Abstract We consier the problem of online active learning to collect ata for regression moeling. Specifically, we consier a ecision maer with a limite experimentation buget who must efficiently learn an unerlying linear population moel. Our main contribution is a novel threshol-base algorithm for selection of most informative observations; we characterize its performance an funamental lower bouns. We exten the algorithm an its guarantees to sparse linear regression in high-imensional settings. Simulations suggest the algorithm is remarably robust: it provies significant benefits over passive ranom sampling in real-worl atasets that exhibit high nonlinearity an high imensionality significantly reucing both the mean an variance of the square error. 1 Introuction This paper stuies online active learning for estimation of linear moels. Active learning is motivate by the premise that in many sequential ata collection scenarios, labeling or obtaining output from observations is costly. Thus ongoing ecisions must be mae about whether to collect ata on a particular unit of observation. Active learning has a rich history; see, e.g., [3, 6, 7, 8, 17]. As a motivating example, suppose that an online mareting organization plans to sen isplay avertising promotions to a new target maret. Their goal is to estimate the revenue that can be expecte for an iniviual with a given covariate vector. Unfortunately, proviing the promotion an collecting ata on each iniviual is costly. Thus the goal of the mareting organization is to acquire first the most informative observations. They must o this in an online fashion: opportunities to isplay the promotion to iniviuals arrive sequentially over time. In online active learning, this is achieve by selecting those observational units target iniviuals in this case) that provie the most information to the moel fitting proceure. Linear moels are ubiquitous in both theory an practice often use even in settings where the ata may exhibit strong nonlinearity in large part because of their interpretability, flexibility, an simplicity. As a consequence, in practice, people ten to a a large number of features an interactions to the moel, hoping to capture the right signal at the expense of introucing some noise. Moreover, the input space can be upate an extene iteratively after ata collection if the ecision maer feels preictions on a hel-out set are not goo enough. As a consequence, often times the number of covariates becomes higher than the number of available observations. In those cases, selecting the subsequent most informative ata is even more critical. Accoringly, our focus is on actively choosing observations for optimal preiction of the resulting high-imensional linear moels. Our main contributions are as follows. We initially focus on stanar linear moels, an buil the theory that we later exten to high imensional settings. First, we evelop an algorithm that sequentially selects observations if they have sufficiently large norm, in an appropriate space epenent on the ata-generating istribution). Secon, we provie a comprehensive theoretical analysis of our algorithm, incluing upper an lower bouns. We focus on minimizing mean square preiction error MSE), an show a high probability upper boun on the MSE of our approach cf. Theorem 3.1). In aition, we provie a lower boun on the best possible achievable performance in high probability

2 an expectation cf. Section 4). In some istributional settings of interest we show that this lower boun structurally matches our upper boun, suggesting our algorithm is near-optimal. The results above show that the improvement of active learning progressively weaens as the imension of the ata grows, an a new approach is neee. To tacle our original goal an aress this egraation, uner stanar sparsity assumptions, we esign an aaptive extension of the thresholing algorithm that initially evotes some buget to learn the sparsity pattern of the moel, in orer to subsequently apply active learning to the relevant lower imensional subspace. We fin that in this setting, the active learning algorithm provies significant benefit over passive ranom sampling. Theoretical guarantees are given in Theorem 3.3. Finally, we empirically evaluate our algorithm s performance. Our tests on real worl ata show our approach is remarably robust: the gain of active learning remains significant even in settings that fall outsie our theory. Our results suggest that the threshol-base rule may be a valuable tool to leverage in observation-limite environments, even when the assumptions of our theory may not exactly hol. Active learning has mainly been stuie for classification; see, e.g., [1, 2, 9, 10, 28]. For regression, see, e.g., [5, 18, 24] an the references within. A closely relate wor to our setting is [23]: they stuy online or stream-base active learning for linear regression, with ranom esign. They propose a theoretical algorithm that partitions the space by stratification base on Monte-Carlo methos, where a recently propose algorithm for linear regression [14] is use as a blac box. It converges to the globally optimal oracle ris uner possibly misspecifie moels with suitable assumptions). Due to the relatively wea moel assumptions, they achieve a constant gain over passive learning. As we aopt stronger assumptions well-specifie moel), we are able to achieve larger than constant gains, with a computationally simpler algorithm. Suppose covariate vectors are Gaussian with imension ; the total number of observations is n; an the algorithm is allowe to label at most of them. Then, we beat the stanar σ 2 / MSE to obtain σ 2 2 /[ + 2δ 1) log ] when n = δ, so active learning truly improves performance when = Ωexp)) or δ = Ω). While [23] oes not tacle high-imensional settings, we overcome the exponential ata requirements via l 1 -regularization. The remainer of the paper is organize as follows. We efine our setting in Section 2. In Section 3, we introuce the algorithm an provie analysis of a corresponing upper boun. Lower bouns are given in Section 4. Simulations are presente in Section 5, an Section 6 conclues. 2 Problem Definition The online active learning problem for regression is efine as follows. We sequentially observe n covariate vectors in a -imensional space X i R, which are i.i.. When presente with the i-th observation, we must choose whether we want to label it or not, i.e., choose to observe the outcome. If we ecie to label the observation, then we obtain Y i R. Otherwise, we o not see its label, an the outcome remains unnown. We can label at most out of the n observations. We assume covariates are istribute accoring to some nown istribution D, with zero mean EX = 0, an covariance matrix Σ = EXX T. We relax this assumption later. In aition, we assume that Y follows a linear moel: Y = X T β + ɛ, where β R an ɛ N 0, σ 2 ) i.i.. We enote observations by X, X i R, components by X j R, an sets in bolface: X R, Y R. After selecting observations, X, Y), we output an estimate ˆβ R, with no intercept. 1 Our goal is to minimize the expecte MSE of ˆβ in Σ norm, i.e. E ˆβ β 2 Σ, uner ranom esign; that is, when the X i s are ranom an the algorithm may be ranomize. This is relate to the A-optimality criterion, [22]. We use the experimentation buget to minimize the variance of ˆβ by sampling X from a ifferent threshole istribution. Minimizing expecte MSE is equivalent to minimizing the trace of the normalize inverse of the Fisher information matrix X T X, E[Y X T ˆβ ) 2 ] = E[ ˆβ β 2 Σ] + σ 2 1 We assume covariates an outcome are centere. = σ 2 E [ TrΣX T X) 1 ) ] + σ 2 2

3 where expectations are over all sources of ranomness. In this setting, the OLS estimator is the best linear unbiase estimator by the Gauss Marov Theorem. Also, for any set X of i.i.. observations, OLS ˆβ := ˆβ has sampling istribution ˆβ X N β, σ 2 X T X) 1 ), [13]. In Section 3.3, we tacle high-imensionality, where, via Lasso estimators within a two-stage algorithm. 3 Algorithm an Main Results In this section we motivate the algorithm, state the main result quantifying its performance for general istributions, an provie a high-level overview of the proof. A corollary for the Gaussian istribution is presente, an we also exten the algorithm by maing the threshol aaptive. Finally, we show how to generalize the results to sparse linear regression. In Appenix E, we erive a CLT approximation with guarantees that is useful in complex or unnown istributional settings. Without loss of generality, we assume that each observation is white, that is, E[XX T ] is the ientity matrix. For correlate observations X, we apply X := D 1/2 U T X to whiten them, Σ = UDU T see Appenix A). Note that TrΣX T X ) 1 ) = TrX T X) 1 ). We boun the whitene trace as λ max X T X) TrXT X) 1 ) λ min X T X). 1) To minimize the expecte MSE, we nee to maximize the minimum eigenvalue of X T X with high probability. The thresholing proceure in Algorithm 1 maximizes the minimum eigenvalue of X T X through two observations. First, since the sum of eigenvalues of X T X is the trace of X T X, which is in turn the sum of the norm of the observations, the algorithm chooses observations of large weighte) norm. Secon, the eigenvalues of X T X shoul be balance, that is, have similar magnitues. This is achieve by selecting the appropriate weights for the norm. Let ξ R + be a vector of weights efining the norm X 2 ξ = j=1 ξ jxj 2. Let Γ > 0 be a threshol. Algorithm 1 simply selects the observations with ξ-weighte norm larger than Γ. The selecte observations can be thought as i.i.. samples from an inuce istribution D: the original istribution conitional on X ξ Γ. Suppose observations are chosen an enote by X R. Then EX T X = i=1 EXi X it = i=1 Hi = H, where H is the covariance matrix with respect to D. This covariance matrix is iagonal uner ensity symmetry assumptions, as thresholing preserves uncorrelation; its iagonal terms are H jj = E DX 2 j = E D [X 2 j X ξ Γ] =: φ j. 2) Hence, λ min EX T X) = min j φ j, an λ max EX T X) = max j φ j. The main technical result in Theorem 3.1 is to lin the eigenvalues of the ranom matrix X T X to its eterministic counter part EX T X. From the above calculations, the goal is to fin ξ, Γ) such that min j φ j max j φ j, an both are as large as possible. The first objective is achieve when there exists some φ such that E D [X 2 j X ξ Γ] = φ j = φ, for all j. 3) We note that if X has inepenent components with the same marginal istribution after whitening), then it suffices to choose ξ j = 1 for all j. It is necessary to choose unequal weights when the marginal istributions of the components are ifferent, e.g., some are Gaussian an some are uniform, or components are epenent. For joint Gaussian, whitening removes epenencies, so we set ξ j = Thresholing Algorithm The algorithm is simple. For each incoming observation X i we compute its weighte norm X i ξ possibly after whitening if necessary). If the norm is above the threshol Γ, then we select the observation, otherwise we ignore it. We stop when we have collecte observations. Note that ranom sampling is equivalent to setting Γ = 0. We want to catch the largest observations given our buget, therefore we require that Γ satisfies P D X ξ Γ) = /n. 4) 3

4 Algorithm 1 Thresholing Algorithm. 1: Set ξ, Γ) R +1 satisfying 3) an 4). 2: Set S =. 3: for observation 1 i n o 4: Observe X i. 5: Compute X i = D 1/2 U T X i. 6: if X i ξ > Γ or S = n i + 1 then 7: Choose X i : S = S X i. 8: if S = then 9: brea. 10: en if 11: en if 12: en for 13: Return OLS estimate ˆβ base on observations in S. If we apply this rule to n inepenent observations coming from D, on average we select of them: the ξ largest. If ξ, Γ) is a solution to 3) an 4), then c ξ, c Γ) is also a solution for any c > 0. So we require i ξ i =. Algorithm 1 can be seen as a regularizing process similar to rige regression, where the amount of regularization epens on the istribution D an the buget ratio /n; it improves the conitioning of the problem. Guarantees when Σ is unnown can be erive as follows: we allocate an initial sequence of points to estimation of the inverse of the covariance matrix, an the remainer to labeling where we no longer upate our estimate). In this manner observations remain inepenent. Note that O) observations are require for accurate recovery when D is subgaussian, an O log ) if subexponential, [26]. Errors by using the estimate to whiten an mae ecisions are boune, small with high probability via Cauchy Schwarz), an the result is equivalent to using a slightly worse threshol. Algorithm 1 b Aaptive Thresholing Algorithm. 1: Set S =. 2: for observation 1 i n o 3: Observe X i, estimate Σ i = Ûi D i Ûi T. 4: Compute X i 1/2 = D i Ûi T X i. 5: Let ξ i, Γ i ) satisfy 3) an 5). 6: if X i ξi > Γ i or S =n i + 1 then 7: Choose X i : S = S X i. 8: if S = then 9: brea. 10: en if 11: en if 12: en for 13: Return OLS estimate ˆβ base on observations in S. Algorithm 1 eeps the threshol fixe from the beginning, leaing to a mathematically convenient analysis, as it generates i.i.. observations. However, Algorithm 1b, which is aaptive an upates its parameters after each observation, prouces slightly better results, as we empirically show in Appenix K. Before maing a ecision on X i, Algorithm 1b fins ξ i, Γ i ) satisfying 3) an P D X i ξi Γ i ) = S i 1 n i + 1, 5) where S i 1 is the number of observations alreay labele. The iea is ientical: set the threshol to capture, on average, the number of observations still to be labele, that is S i 1, out of the number still to be observe, n i

5 Importantly, active learning not only ecreases the expecte MSE, but also its variance. Since the variance of the MSE for fixe X epens on j 1/λ jx T X) 2 [13], it is also minimize by selecting observations that lea to large eigenvalues of X T X. 3.2 Main Theorem Theorem 3.1 states that by sampling observations from D where ξ, Γ) satisfy 3), the estimation performance is significantly improve, compare to ranomly sampling observations from the original istribution. Section 4 shows the gain in Theorem 3.1 essentially cannot be improve an Algorithm 1 is optimal. A setch of the proof is provie at the en of this section see Appenix B). Theorem 3.1 Let n > >. Assume observations X R are istribute accoring to subgaussian D with covariance matrix Σ R. Also, assume marginal ensities are symmetric aroun zero after whitening. Let X be a matrix with observations sample from the istribution inuce by the thresholing rule with parameters ξ, Γ) R +1 + satisfying 3). Let α > 0, so that t = α C > 0, then, with probability at least 1 2 exp ct 2 ) TrΣX T X) 1 ) where constants c, C epen on the subgaussian norm of D. 1 α) 2 φ, 6) While Theorem 3.1 is state in fairly general terms, we can apply the result to specific settings. We first present the Gaussian case where white components are inepenent. The proof is in Appenix D. Corollary 3.2 If the observations in Theorem 3.1 are jointly Gaussian with covariance matrix Σ R, ξ j = 1 for all j = 1,...,, an Γ = C + 2 logn/), for some constant C 1, then with probability at least 1 2 exp ct 2 ) we have that TrΣX T X) 1 ) ) 1 α) logn/). 7) The MSE of ranom sampling for white Gaussian ata is proportional to / 1), by the inverse Wishart istribution. Active learning provies a gain factor of orer 1/1 + 2 logn/)/) with high probability a very similar 1 α term shows up for ranom sampling). Note that our algorithm may select fewer than observations. Then, when the number of observations yet to be seen equals the remaining labeling buget, we shoul select all of them equivalent to ranom sampling). The number of observations with X ξ > Γ has binomial istribution, is highly concentrate aroun its mean, with variance 1 /n). By the Chernoff Bouns, the probability that the algorithm selects fewer than C ecreases exponentially fast in C. Thus, these eviations are ominate in the boun of Theorem 3.1 by the leaing term. In practice, one may set the threshol in 4) by choosing 1 + ɛ) observations for some small ɛ > 0, or use the aaptive threshol in Algorithm 1b. 3.3 Sparsity an Regularization The gain provie by active learning in our setting suffers from the curse of imensionality, as it iminishes very fast when increases, an Section 4 shows the gain cannot be improve in general. For high imensional settings where ) we assume s-sparsity in β, that is, we assume the support of β contains at most s non-zero components, for some s. In Appenix J, we also provie relate results for Rige regression. We state the two-stage Sparse Thresholing Algorithm see Algorithm 2) an show this algorithm effectively overcomes the curse of imensionality. For simplicity, we assume the ata is Gaussian, D = N 0, Σ). Base, for example, on the results of [25] an Theorem 1 in [16], we coul exten our results to subgaussian ata via the Orthogonal Matching Pursuit algorithm for recovery. The two-stage algorithm wors as follows. First, we focus on recovering the true support, S = Sβ), by selecting the very first 1 observations without thresholing), an computing the Lasso estimator ˆβ 1. Secon, we assign the weights ξ: for i S ˆβ 1 ), we set ξ i = 1, otherwise we set ξ i = 0. Then, we 5

6 apply the thresholing rule to select the remaining 2 = 1 observations. While observations are collecte in all imensions, our final estimate ˆβ 2 is the OLS estimator compute only incluing the observations selecte in the secon stage, an exclusively in those imensions in S ˆβ 1 ). Note that, in general, the points that en up being selecte by our algorithm are informational outliers, while not necessarily geometric outliers in the original space. After applying the whitening transformation, ignoring some imensions base on the Lasso results, an then thresholing base on a weighte norm possibly learnt from ata say, if components are not inepenent, an we recover the covariance matrix in a online fashion), the algorithm is able to ientify goo points for the unerlying ata istribution an β. Algorithm 2 Sparse Thresholing Algorithm. 1: Set S 1 =, S 2 =. Let = 1 + 2, n = 1 + n 2. 2: for observation 1 i 1 o 3: Observe X i. Choose X i : S 1 = S 1 X i. 4: en for 5: Set γ = 1/2, λ = 4σ 2 log)/γ : Compute Lasso estimate ˆβ 1 on S 1, regularization λ. 7: Set weights: ξ i = 1 if i S ˆβ 1 ), ξ i = 0 otherwise. 8: Set Γ = C s + 2 logn 2 / 2 ). 9: Factorize Σ S ˆβ1)S ˆβ 1) = UDU T. 10: for observation i n o 11: Observe X i R. Restrict to X i S := Xi S ˆβ 1) Rs. 12: Compute X i S = D 1/2 U T XS i. 13: if XS i ξ > Γ or 2 S 2 = n i + 1 then 14: Choose XS i : S 2 = S 2 XS i. 15: if S 2 = 2 then 16: brea. 17: en if 18: en if 19: en for 20: Return OLS estimate ˆβ 2 base on observations in S 2. Theorem 3.3 summarizes the performance of Algorithm 2; it requires the stanar assumptions on Σ, λ an min i β i for support recovery see Theorem 3 in [27]). Theorem 3.3 Let D = N 0, Σ). Assume Σ, λ an min i β i satisfy the stanar conitions given in Theorem 3 of [27]. Assume we run the Sparse Thresholing algorithm with 1 = C s log observations to recover the support of β, for an appropriate C 0. Let X 2 be 2 = 1 observations sample via thresholing on S ˆβ 1 ). It follows that for α > 0 such that t = α 2 C s > 0, there exist some universal constants c 1, c 2, an c, C that epen on the subgaussian norm of D S ˆβ 1 ), such that with probability at least it hols that 1 2e minc2 mins,log s)) logc1),ct2 log2)) TrΣ SS X T 2 X 2 ) 1 s ) ). 1 α) logn2/2) 2 Performance for ranom sampling with the Lasso estimator is Os log /). A regime of interest is s, = C 1 s log, an n = C 2, for large enough C 1, an C 2 > 0. In that case, Algorithm 2 leas to a boun of orer smaller than 1/ log), as oppose to a weaer constant guarantee for ranom sampling. The gain is at least a log factor with high probability. The proof is in Appenix H. In practice, the performance of the algorithm is improve by using all the observations to fit the final estimate ˆβ 2, as shown in simulations. However, in that case, observations are no longer i.i.. Also, using thresholing to select the initial 1 observations ecreases the probability of maing a mistae in support recovery. In Section 5 we provie simulations comparing ifferent methos. s 6

7 Normalize MSE max 0.575) Ranom Sampling Algorithm 1 Algorithm 2 Normalize MSE max 0.013) Ranom Sampling Correct Support) Algorithm 1 Correct Support) Algorithm 2 Algorithm 2 all observations) a) Zooming out b) Zooming in. Figure 1: Sparse Linear Regression 700 iters). We fix the effective imension to s = 7, an increase the ambient imension from = 100 to = 500. The buget scales as = Cs log for C 3.4, while n = 4. We set 1 = 2/3 an 2 = / Proof of Theorem 3.1 The complete proof of Theorem 3.1 is in Appenix B. We only provie a setch here. The proof is a irect application of spectral results in [26], which are erive via a covering argument using a iscrete net N on the unit Eucliean sphere S 1, together with a Bernstein-type concentration inequality that controls eviations of Xw 2 for each element w N in the net. Finally, a union boun is taen over the net. Importantly, the proof shows that if our algorithm uses ξ, Γ) which are approximate solutions to 3), then 10) still hols with min j E DX j 2 in the enominator of the RHS, instea of φ. This fact can be quite useful in practice, when F is unnown. We can evote some initial buget X 1,..., X T to recover F, an then fin ξ, Γ) approximately solving 3) an 4) uner ˆF. Note that no labeling is require. Also, the result can be extene to subexponential istributions. In this case, the probabilistic boun will be weaer incluing a term in front of the exponential). More generally, our probabilistic bouns are strongest when C log for some constant C 0, a common situation in active learning [23], where super-linear requirements in seem unavoiable in noisy settings. A simple boun for the parameter φ can be calculate as follows. Assume there exists ξ, Γ) such that φ j = φ an consier the weighte square norm Z ξ = [ ] X 2 j = j=1 ξ jxj 2. Then E D [Z ξ ] = j=1 ξ je D j=1 ξ [ jφ j = φ, an φ = E D Zξ Z ξ Γ 2] / Γ 2 / = F 1 Z ξ 1 /n)/, which implies that 1/λ min EX T X) = 1/φ /Γ 2. For specific istributions, Γ 2 / can be easily compute. The last inequality is close to equality in cases where the conitional ensity ecays extremely fast for values of j=1 ξ jx 2 j above Γ2. Heavy-taile istributions allocate mass to significantly higher values, an φ coul be much larger than Γ 2 /. 4 Lower Boun In this section we erive a lower boun for the > setting. Suppose all the ata are given. Again choose the observations with largest norms, enote by X. To minimize the preiction error, the best possible X T X is iagonal, with ientical entries, an trace equal to the sum of the norms. No selection algorithm, online or offline, can o better. Algorithm 1 achieves this by selecting observations with large norms an uncorrelate entries through whitening if necessary). Theorem 4.1 captures this intuition. 7

8 Normalize MSE max ) n x 1000) Algorithm 1b Ranom Sampling a) Protein Structure; 150 iters. Normalize MSE max ) n x 1000) Algorithm 1b Ranom Sampling b) Bie Sharing; 300 iters. Normalize MSE max ) n x 1000) Algorithm 1b 5 Ranom Sampling c) YearPreictionMSD; 150 iters. Figure 2: MSE of ˆβ OLS. The 0.05, 0.95) quantile conf. int. isplaye. Soli meian; Dashe mean. Theorem 4.1 Let A be an algorithm for the problem we escribe in Section 2. Then, E A TrΣX T X) 1 ) 2 [ ] 8) E i=1 X i) 2 E [ 1 max i [n] X i 2], where X i) is the white observation with the i-th largest norm. Moreover, fix α 0, 1). Let F be the cf of max i [n] X i 2. Then, TrΣX T X) 1 ) 2 / F 1 1 α) with probability at least 1 α. The proof is in Appenix E. The upper boun in Theorem 3.1 has a similar structure, with enominator equal to φ. By Theorem 3.1, φ = E D [Xj 2 X 2 ξ Γ2 ] for every component j. [ Hence, summing over all components: φ = E D X 2 / ]. The latter expectation is taen with respect to D, which only captures the expecte ξ-largest observations out of n, as oppose to E D [1/) i=1 X i) 2 /] in 8). The weights ξ simply account for the fact that, in reality, we cannot mae all components have equal norm, something we implicitly assume in our lower boun. We specialize the lower boun to the Gaussian setting, for which we compute the upper boun of Theorem 3.1. The proofs are base on the Fisher-Tippett Theorem an the Gumbel istribution; see Appenix F. Corollary 4.2 For Gaussian observations X i N 0, Σ) an large n, for any algorithm A E A TrΣX T X) 1 ) ). 2 log n + log log n Moreover, let α 0, 1). Then, for any A with probability at least 1 α an C = 2 log Γ/2)/, TrΣX T X) 1 ) / 2 log n + log log n 1 1 log log 1 α C The results from Corollary 3.2 have the same structure as the lower boun; hence in this setting our algorithm is near optimal. Similar results an conclusions are erive for the CLT approximation in Appenix I. 5 Simulations We conucte experiments in various settings: regularize estimators in high-imensions, an the basic thresholing approach in real-worl ata to explore its performance on strongly non-linear environments. Regularize Estimators. We compare the performance in high-imensional settings of ranom sampling an Algorithm 1 both with an appropriately ajuste Lasso estimator against Algorithm 2, which taes into account the structure of the problem s ). For completeness, we also show 8

9 the performance of Algorithm 2 when all observations are inclue in the final OLS estimate, an that of ranom sampling RS) an Algorithm 1 Thr) when the true support S is nown in avance, an the OLS compute on S. In Figure 1 a), we see that Algorithm 2 ramatically reuces the MSE, while in Figure 1 b) we zoom-in to see that, quite remarably, Algorithm 2 using all observations for the final estimate outperforms ranom sampling that nows the sparsity pattern in hinsight. We use 1 = 2/3) for recovery. More experiments are provie in Appenix K. Real-Worl Data. We show the results of Algorithm 1b online Σ estimation) with the simplest istributional assumption Gaussian threshol, ξ j = 1) versus ranom sampling on publicly available real-worl atasets UCI, [20]), measuring test square preiction error. We fix a sequence of values of n, together with = n, an for each pair n, ) we run a number of iterations. In each one, we ranomly split the ataset in training n observations, ranom orer), an test rest of them). Finally, ˆβ OLS is compute on selecte observations, an the preiction error estimate on the test set. All atasets are initially centere to have zero means covariates an response). Confience intervals are provie. We first analyze the Physicochemical Properties of Protein Tertiary Structure ataset observations), where we preict the size of the resiue, base on = 9 variables, incluing the total surface area of the protein an its molecular mass. Figure 2 a) shows the results; Algorithm 1b outperforms ranom sampling for all values of n, ). The reuction in variance is substantial. In the Bie Sharing ataset [12] we preict the number of hourly users of the service, given weather conitions, incluing temperature, win spee, humiity, an temporal covariates. There are observations, an we use = 12 covariates. Our estimator has lower mean, meian an variance MSE than ranom sampling; Figure 2 b). Finally, for the YearPreictionMSD ataset [4], we preict the year a song was release base on = 90 covariates, mainly metaata an auio features. There are observations. The MSE an variance i strongly improve; Figure 2 c). In the examples we see that, while active learning leas to strong improvements in MSE an variance reuction for moerate values of with respect to, the gain vanishes when grows large. This was expecte; the reason might be that by sampling so many outliers, we en up learning about parts of the space where heavy non-linearities arise, which may not be important to the test istribution. However, the motivation of active learning are situations of limite labeling buget, an hybri approaches combining ranom sampling an thresholing coul be easily implemente if neee. 6 Conclusion Our paper provies a comprehensive analysis of thresholing algorithms for online active learning of linear regression moels, which are shown to perform well both theoretically an empirically. Several natural open irections suggest themselves. Aitional robustness coul be guarantee in other settings by combining our algorithm as a blac box with other approaches: for example, some aition of ranom sampling or stratifie sampling coul be use to etermine if significant nonlinearity is present, an to etermine the fraction of observations that are collecte via thresholing. 7 Acnowlegments The authors woul lie to than Sven Schmit for his excellent comments an suggestions, Mohamma Ghavamzaeh for fruitful iscussions, an the anonymous reviewers for their valuable feebac. We gratefully acnowlege support from the National Science Founation uner grants CMMI , CNS , an CNS References [1] M.-F. Balcan, A. Beygelzimer, an J. Langfor. Agnostic active learning. In Proceeings of the 23r international conference on Machine learning, pages ACM, [2] M.-F. Balcan, A. Broer, an T. Zhang. Margin base active learning. In Learning Theory, pages Springer, [3] M.-F. Balcan, S. Hannee, an J. W. Vaughan. The true sample complexity of active learning. Machine learning, 802-3): ,

10 [4] T. Bertin-Mahieux, D. P. Ellis, B. Whitman, an P. Lamere. The million song ataset [5] W. Cai, Y. Zhang, an J. Zhou. Maximizing expecte moel change for active learning in regression. In Data Mining ICDM), 2013 IEEE 13th International Conference on, pages IEEE, [6] R. M. Castro an R. D. Nowa. Minimax bouns for active learning. pages 5 19, [7] D. Cohn, L. Atlas, an R. Laner. Improving generalization with active learning. Machine learning, 152): , [8] D. A. Cohn, Z. Ghahramani, an M. I. Joran. Active learning with statistical moels. Journal of artificial intelligence research, [9] S. Dasgupta an D. Hsu. Hierarchical sampling for active learning. In Proceeings of the 25th international conference on Machine learning, pages ACM, [10] S. Dasgupta, C. Monteleoni, an D. J. Hsu. A general agnostic active learning algorithm. In Avances in neural information processing systems, pages , [11] P. Embrechts, C. Klüppelberg, an T. Miosch. Moelling extremal events, volume 33. Springer Science & Business Meia, [12] H. Fanaee-T an J. Gama. Event labeling combining ensemble etectors an bacgroun nowlege. Progress in Artificial Intelligence, pages 1 15, [13] A. E. Hoerl an R. W. Kennar. Rige regression: Biase estimation for nonorthogonal problems. Technometrics, 121):55 67, [14] D. Hsu an S. Sabato. Heavy-taile regression with a generalize meian-of-means. In Proceeings of the 31st International Conference on Machine Learning ICML-14), pages 37 45, [15] T. Inglot. Inequalities for quantiles of the chi-square istribution. Probability an Mathematical Statistics, 302): , [16] A. Joseph. Variable selection in high-imension with ranom esigns an orthogonal matching pursuit. Journal of Machine Learning Research, 141): , [17] V. Koltchinsii. Raemacher complexities an bouning the excess ris in active learning. The Journal of Machine Learning Research, 11: , [18] A. Krause an C. Guestrin. Nonmyopic active learning of gaussian processes: an explorationexploitation approach. In Proceeings of the 24th international conference on Machine learning, pages ACM, [19] B. Laurent an P. Massart. Aaptive estimation of a quaratic functional by moel selection. Annals of Statistics, pages , [20] M. Lichman. UCI machine learning repository [21] K. B. Petersen et al. The matrix cooboo. [22] F. Puelsheim. Optimal esign of experiments, volume 50. siam, [23] S. Sabato an R. Munos. Active regression by stratification. In Avances in Neural Information Processing Systems, pages , [24] M. Sugiyama an S. Naajima. Pool-base active learning in approximate linear regression. Machine Learning, 753): , [25] J. Tropp an A. C. Gilbert. Signal recovery from partial information via orthogonal matching pursuit, [26] R. Vershynin. Introuction to the non-asymptotic analysis of ranom matrices. arxiv preprint arxiv: ,

11 [27] M. J. Wainwright. Sharp threshols for high-imensional an noisy sparsity recovery usingconstraine quaratic programming lasso). Information Theory, IEEE Transactions on, 555): , [28] Y. Wang an A. Singh. Noise-aaptive margin-base active learning an lower bouns uner tsybaov noise conition. arxiv preprint arxiv: , Appenix A Whitening Before thresholing the norm of incoming observations, it is useful to ecorrelate an stanarize their components, i.e., to whiten the ata. Then, we apply the algorithm to uncorrelate covariates, with zero mean an unit variance not necessarily inepenent). The covariance matrix Σ can be ecompose as Σ = UDU T, where U is orthogonal, an D iagonal with ii = λ i Σ). We whiten each observation to X = D 1/2 U T X R 1 while for X R, X = XUD 1/2 ), so that E X X T = I. We enote whitene observations by X an X in the appenix. After some algebra we see that, λ max X T X) TrΣXT X) 1 ) = Tr X T X) 1 ). 9) λ min X T X) We focus on algorithms that maximize the minimum eigenvalue of X T X with high probability, or, in general, leaing to large an even eigenvalues of X T X. B Proof of Theorem 3.1 Theorem B.1 Let n > >. Assume observations X R are istribute accoring to subgaussian D with covariance matrix Σ R. Also, assume marginal ensities are symmetric aroun zero after whitening. Let X be a matrix with observations sample from the istribution inuce by the thresholing rule with parameters ξ, Γ) R +1 + satisfying 3). Let α > 0, so that t = α C > 0, then, with probability at least 1 2 exp ct 2 ) TrΣX T X) 1 ) where constants c, C epen on the subgaussian norm of D. 1 α) 2 φ, 10) Proof We woul lie to choose out of n observations X 1,..., X n F ii. Assume our sampling inuces a new istribution F. The loss we want to minimize for our OLS estimate ˆβ = ˆβX, Y) is [ ) ] 2 E X,Y F,X F X T ˆβ X T β = σ 2 [ E X,Y F Tr ΣX T X) 1)], 11) where we assume Gaussian noise with variance σ 2. Let us see how we construct F. We sample X F, we whiten the observation Z = Σ 1/2 X, an then we select it or not accoring to a fixe thresholing rule. If Z ξ Γ, then we eep X = Σ 1/2 Z. We choose ξ an Γ so that there exists φ > 0, such that for all i = 1,...,, E W F[W i) 2 ] = φ, 12) where W i) enotes the i-th component of W F. Note that F = Fξ, Γ). Z is a linear transformation of X; W is not a linear transformation of Z. Moreover, the covariance matrix of F is Σ F = φ I. If F is a general subgaussian istribution, thresholing coul change the mean away from zero. 11

12 Assume after running our algorithm, we en up with X R. We enote by W the observations after whitening, note that by esign every w W passe our test: w ξ Γ. In other wors, w F. We see that W = XΣ 1/2 or, alternatively, X = WΣ 1/2. Now, we can erive Tr ΣX T X) 1) = Tr Σ Σ 1/2 W T WΣ 1/2) ) 1 13) = Tr Σ 1/2 Σ 1/2 W T WΣ 1/2 Σ 1/2) 1 ) 14) W = Tr T W ) ) 1 15) = Tr Σ 1/2 F Σ 1 F Σ1/2 F W T W ) ) 1 16) = Tr Σ 1 1 ) WT W) 17) F 1 = Tr φ I ) 1 WT W) 18) = 1 φ Tr 1 ) WT W) 19) φ 1 = λ min WT W) φ where W is actually white ata. Thus, note that WT W/ I as. 1 λ 1 ). 20) min W T W Assume that F is subgaussian such that if >, then X T X has full ran with probability one. Thresholing will not change the shape of the tails of the istribution, F will also be subgaussian. At this point, we nee to measure how fast λ min WT W) / goes to 1. We can use Theorem 5.39 in [26] which guarantees that, for α > 0 such that t = α C > 0, with probability at least 1 2 exp ct 2 ) we have ) 1 λ min W T W 1 α) 2, 21) as W is white subgaussian. It follows that for α > C /, with probability at least 1 2 exp ct 2 ) Note that 1/1 α) 1 + O /). Tr ΣX T X) 1) C Proof of TrX 1 ) TrDiagX) 1 ) 1 α) 2 φ. 22) In orer to justify that we want S = X T X to be as close as possible to iagonal, we show the following lemma. Uner our assumptions S is symmetric positive efinite with probability 1. Lemma C.1 Let X be a n n symmetric positive efinite matrix. Then, TrX 1 ) TrDiagX) 1 ), 23) where Diag ) returns a iagonal matrix with the same iagonal as the argument. In other wors, we show that for all positive efinite matrices with the same iagonal elements, the iagonal matrix matrix with all off iagonal elements being 0) has the least trace after the inverse operation. 12

13 Proof We show this by inuction. Consier a 2 2 matrix [ ] a b X = b c an TrX 1 1 ) = a + c) 25) ac b2 since ac b 2 > 0 X is positive efinite), the above expression is minimize when b 2 = 0, that is, X is iagonal. 24) Assume the statement is true for all n n matrices. Let X be a n + 1) n + 1) positive efinite matrix. Decompose it as [ ] A b X = b T. 26) c By the bloc inverse formula, see for example [21]) TrX 1 ) = TrA 1 ) TrA 1 bb T A 1 ), 27) where = c b T A 1 b. Note > 0 by Schur s complement for positive efinite matrices. Using the inuction hypothesis, TrA 1 ) TrDiagA) 1 ). By the positive efiniteness of A, b T A 1 b 0, therefore 1 1 c. Also, TrA 1 bb T A 1 ) 0. Thus, an the result follows. D Proof of Corollary 3.2 TrX 1 ) TrA 1 ) + 1 c = TrDiagX) 1 ), 28) Corollary D.1 If the observations in Theorem 3.1 are jointly Gaussian with covariance matrix Σ R, ξ j = 1 for all j = 1,...,, an Γ = C + 2 logn/), for some constant C 1, then with probability at least 1 2 exp ct 2 ) we have that TrΣX T X) 1 ) ) 1 α) logn/). 29) Proof We have to show that ξ j = 1 for all j, an Γ = C + 2 logn/) satisfy the equations P D X ξ Γ ) = α = n, 30) E D [ X 2 j X 2 ξ Γ 2 ] = φ, for all j, 31) an φ > logn/)/). The components of X are inepenent, as observations are jointly Gaussian. It immeiately follows that ξ j = 1, for all 1 j. Thus, Z ξ = j=1 X j χ 2, Γ 2 = F 1 1 ). 32) χ 2 n The value of Z ξ is strongly concentrate aroun its mean, EZ ξ =. We now use two tail approximations to obtain our esire result. By [19], we have that PZ ξ 2 x + 2x) exp x). 33) 13

14 If we tae exp x) = α, then x = logn/). In this case, we conclue that n ) n ) ) P Z ξ + 2 log + 2 log α = n. 34) Note that P X ξ > Γ) = PZ ξ > Γ 2 ) = α. Therefore, by efinition n ) n ) Γ + 2 log + 2 log. 35) On the other han, we woul lie to show that n )) P Z ξ + 2 log α, 36) as that woul irectly imply that Γ + 2 log n/). We can use Proposition 3.1 of [15]. For > 2 an x > 2, PZ ξ x) 1 { e x x + 2 exp x 2) log x ) + log ) }. Tae x = + 2ψ, where ψ = logn/). It follows that PZ ξ + 2ψ) 1 { e 2 + 2ψ ψ exp 1 2ψ 2) log 1 + 2ψ ) )} + log 2 = 1 { e 2 + 2ψ ψ exp log 1 + 2ψ ) 12 } 2 log exp { ψ} exp { ψ}, 37) where we assume, for example, 9 an n/ > 17 as in Proposition 5.1 of [15]). In any case, in those rare cases in our context) where < 9 an n/ very small, the previous boun still hols if we subtract a small constant C [0, 5/2] from the LHS: PZ ξ + 2ψ C). Equivalently, from 37) We conclue that PZ ξ + 2 logn/)) /n = α. 38) n ) n ) n ) + 2 log Γ + 2 log + 2 log. 39) Finally, we have that φ Γ2 By Theorem B.1, the corollary follows. 2 log n/) ) E CLT Approximation As we explain in the main text, it is sometimes ifficult to irectly compute the istribution of the ξ-norm of a white observation, given by Z ξ. Recall that Γ 2 = F 1 Z ξ 1 /n). Fortunately, Z ξ is the sum of ranom variables, an, in high-imensional spaces, a CLT approximation can help us to choose a goo threshol. In this section we erive some theoretical guarantees. The CLT is a goo iea for boune variables as the square is still boune, an therefore subgaussian), but if the unerlying components X j are unboune subgaussian, Z ξ will be at least subexponential as the square of a subgaussian ranom variable is subexponential, [26], an a higher threshol lie that coming from chi-square is more appropriate. 14

15 In aition, in the context of heavy-tails, catastrophic effects are expecte, as Pmax j X j > t) P j X j > t), leaing to observations ominate by single imensions. Assume that components X j are inepenent while not necessarily ientically istribute). By Lyapunov s CLT, one can show that 2 Z ξ = ξ j X2 j N, E[ X4 j ] 1 ). j=1 It follows that Γ satisfies P D Xi ξ Γ ) = /n if Γ 2 + Φ 1 1 ) ξi 2 E[ X4 n i ] 1 ). i=1 In the sequel, assume is large enough, an the approximation error is negligible. Define γ = E[ X4 i ] 1 ). i=1 ξ2 i Corollary E.1 Assume Z ξ = N, γ 2) an Γ 2 = + γ Φ 1 1 /n), with ξ j satisfying 3). Let α > 0, so that t = α C > 0. then with probability at least 1 2 exp ct 2 ) we have that TrΣX T X) 1 ) 1 α) γ )). 41) 2 logn/) γ log logn/) O j=1 ξ 2 j logn/) Proof Note that, by efinition, X 2 ξ Z ξ an Γ jointly solve the equations require by Theorem B.1. In orer to apply the theorem, all we nee to o is to estimate the magnitue of φ = E D [X 2 j X 2 ξ Γ 2 ] Γ2 = 1 + γ Φ 1 1 /n). 42) Therefore, we want to fin bouns on tail probabilities of the normal istribution. By Theorem 2.1 of [15], we have that for small /n 2 logn/) log4 logn/)) + 2 an the result follows. 2 2 logn/) Φ 1 1 ) n 43) log2 logn/)) + 3/2 2 logn/) 2, 44) 2 logn/) We can also show how to apply the previous result to inepenent uniform istributions centere aroun zero. In that case, we have that the fourth moment is E[ X j 4] = 9/5, so γ = 4 5, leaing to a gain factor )) 8 logn/) log logn/) φ = 1 + o. 5 logn/) F Proof of Theorem 4.1 Theorem F.1 Let A be an algorithm for the problem we escribe in Section 2. Then, E A TrΣX T X) 1 ) 2 [ E i=1 X ] 45) i) 2 E [ 1 max i [n] X i 2], 2 Some mil aitional moment/regularity conitions on each X j are require to satisfy Lyapunov s Conition. 15

16 where X i) is the white observation with the i-th largest norm. Moreover, fix α 0, 1). Let F be the cf of max i [n] X i 2. Then, with probability at least 1 α TrΣX T X) 1 ) 2 / F 1 1 α). 46) Proof We want to minimize TrΣX T X) 1 ) = Tr X T X) 1 ). Let us efine S = X T X. One can prove that H TrH 1 ) is convex for symmetric positive efinite matrices H. It then follows by Jensen s Inequality assuming >, so S is symmetric positive efinite with high probability) ETrS 1 ) TrES) 1 1 ) = λ j ES). 47) Let ES be the expecte value of S for an arbitrary algorithm A that selects its observations sequentially. We want to unerstan what is the minimum possible value the RHS of 47) can tae. The sum of eigenvalues is upper boune by λ j ES) = TrES) = ES jj ) = E[ X ij] 2 j=1 j=1 j=1 = j=1 i=1 E[ X i 2 ] i=1 [ ] E X i) 2 E i=1 [ ] max X i 2, i [n] where X i) enotes the observation with the i-th largest norm. Because ES is symmetric positive efinite, its eigenvalues are real non-negative, so that [ 0 < λ min ES) TrES) E i=1 X ] i) 2 E [ max i [n] X i 2]. We conclue that the solution to the minimization problem of 47) that is, when all eigenvalues are equal is lower boune by ETrS 1 1 ) λ j=1 j ES) 2 [ E i=1 X ] 2 i) 2 E [ max i [n] X i 2], which proves 45). In orer to prove the high-probability statement 46), note that TrΣX T X) 1 ) = Tr X T X) 1 1 ) = λ i X T X) i=1 i=1 We irectly conclue that with probability at least 1 α, 1 j=1 X j 2 / 2 j=1 X j) 2 2 max i [n] X i. 48) 2 max X i 2 F 1 1 α) 49) i [n] as F is the cf of max i [n] X i 2. It follows that with probability at least 1 α, TrΣX T X) 1 ) 2 F 1 1 α). 50) 16

17 G Proof of Corollary 4.2 Corollary G.1 For Gaussian observations X i N 0, Σ) an large n, for any algorithm A E A TrΣX T X) 1 ) ). 51) 2 log n + log log n Moreover, let α 0, 1). Then, for any A with probability at least 1 α an C = 2 log Γ/2)/, TrΣX T X) 1 ) ). 52) 2 log n + log log n 1 1 log log 1 α C Proof In orer to apply Theorem F.1, we nee to upper boun E [ max i [n] X i 2], where X i is a -imensional gaussian ranom variable with ientity covariance matrix. In other wors, we nee to upper boun the expecte maximum of n chi-square ranom variables with egrees of freeom. Let us start by proving 51). We can use extreme value theory to fin the limiting istribution of the maximum of n ranom variables. Firstly, note that the chi-square istribution is a particular case of the Gamma istribution. More specifically, χ 2 Γ/2, 2). If we parameterize the Γ istribution by α shape) an β rate), then α = /2 an β = 1/2. By the Fisher-Tippett Theorem we now that there are only three limiting istributions for lim n X n) = lim n max i n X i, where the X i are ii ranom variables, namely, Frechet, Weibull an Gumbel istributions. It is nown that the Gamma istribution is in the max-omain of attraction of the Gumbel istribution. Further, the normalizing constants are nown see Chapter 3 of [11]). In particular, we now that if X n) := max i [n] X i 2 lim P X n) 2x + 2 ln n + 2/2 1) ln ln n 2 ln Γ/2) ) = Λx) = e e x. 53) n We can assume that the asymptotic limit hols, as n is in practice very large, an compute the mean value of X n). As X n) is a positive ranom variable, E[X n) ] = = 0 0 P X n) t ) t 54) 1 P X n) t ) ) t 55) We mae the change of variables t = 2x + C, where C = 2 ln n + 2) ln ln n 2 ln Γ/2). Then, E[X n) ] = = = 0 C/2 C/2 0 C/2 0 C/2 P X n) t ) t 56) 21 P X n) 2x + C ) ) x 57) 21 e e x ) x 58) 21 e e x ) x + where γ is the Euler Mascheroni constant. We conclue that 0 21 e e x ) x 59) 2 x + 2γ = C + 2γ, 60) E[X n) ] C + 2γ 2 ln n + 2) ln ln n. 61) 17

18 If we tae the largest observations, an assume we coul split the weight equally among all imensions which is esirable), we see that the best we can o in expectation is upper boune by ) 2 ln n E[X n)] + ln ln n. 62) Now, let us prove 52). The following inequalities simplify our tas to fining a high-probability upper boun on max i [n] X i 2. We have that TrΣX T X) 1 ) = Tr X T X) 1 ) = 1 λ i X T X) i=1 i=1 1 j=1 X j 2 / 2 j=1 X j) 2 2 max i [n] X i. 63) 2 Fix α [0, 1]. We nee to fin a constant Q such that with probability at least 1 α, Q max i [n] X i 2, so that we conclue that TrΣX T X) 1 ) 2 /Q with high probability. By 53) we now that lim P X n) 2x + 2 ln n + 2/2 1) ln ln n 2 ln Γ/2) ) = Λx) = e e x. 64) n For large n, we assume the previous upper boun for X n) is exact. We want to fin Q > 0 such that P X n) Q ) = 1 α. Note that if 1 α = e e x, then x = log log 1 1 α. 65) It follows that Q = 2 ln n + 2/2 1) ln ln n log log1 α) 1 2 ln Γ/2). Finally, 52) follows as Q = 2 log n H Proof of Theorem log log n log log1 α) ln Γ/2). 66) Recall the Sparse Thresholing Algorithm below. We show the following theorem. Theorem H.1 Let D = N 0, Σ). Assume Σ, λ an min i β i satisfy the stanar conitions given in Theorem 3 of [27]. Assume we run the Sparse Thresholing algorithm with 1 = C s log observations to recover the support of β, for an appropriate C 0. Let X 2 be 2 = 1 observations sample via thresholing on S ˆβ 1 ). It follows that for α > 0 such that t = α 2 C s > 0, there exist some universal constants c 1, c 2, an c, C that epen on the subgaussian norm of D S ˆβ 1 ), such that with probability at least it hols that 1 2e minc2 mins,log s)) logc1),ct2 log2)) TrΣ SS X T 2 X 2 ) 1 s ) ). 1 α) logn2/2) 2 For support recovery, we use Theorem 3 from [27]: Theorem H.2 Consier the linear moel with ranom Gaussian esign Y = Xβ + ɛ, with i.i.. rows x i N 0, Σ) R, 67) with noise ɛ N 0, σ 2 I ). Assume the covariance matrix Σ satisfies Σ S C SΣ SS ) 1 1 γ), for some γ 0, 1], 68) s 18

19 Algorithm 3 Sparse Thresholing Algorithm. 1: Set S 1 =, S 2 =. Let = 1 + 2, n = 1 + n 2. 2: for observation 1 i 1 o 3: Observe X i. Choose X i : S 1 = S 1 X i. 4: en for 5: Set γ = 1/2, λ = 4σ 2 log)/γ : Compute Lasso estimate ˆβ 1 base on S 1, with regularization λ. 7: Set weights: ξ i = 1 if i S ˆβ 1 ), ξ i = 0 otherwise. 8: Set Γ = C s + 2 logn 2 / 2 ). Factorize Σ S ˆβ1)S ˆβ 1) = UDU T. 9: for observation i n o 10: Observe X i R. Restrict to X i S := Xi S ˆβ 1) Rs. 11: Compute X i S = D 1/2 U T XS i. 12: if XS i ξ > Γ or 2 S 2 = n i + 1 then 13: Choose XS i : S 2 = S 2 XS i. 14: if S 2 = 2 then 15: brea. 16: en if 17: en if 18: en for 19: Return OLS estimate ˆβ 2 with observations in S 2. λ min Σ SS ) C min > 0. 69) Let S = s. Consier the family of regularization parameters for φ 2 φ ρ u Σ λ φ ) = SC S) 2σ 2 log) γ 2. 70) If for some fixe δ > 0, the sequence,, s) an regularization sequence {λ } satisfy ) 2s log s) 1 + δ) θ uσ) 1 + σ2 C min λ 2 s, 71) then the following hols with prob at least 1 c 1 exp c 2 min{s, log s)}): 1. The Lasso has a unique solution ˆβ with support in S i.e. S ˆβ) Sβ )). 2. Define the gap gλ ) := c 3 λ Σ 1/2 SS σ 2 log s C min. 72) Then, if β min := min i S β i > gλ ), the signe support S ± ˆβ) is ientical to S ± β ), an moreover ˆβ S β S gλ ). The require efinitions to apply the previous theorem are ρ l Σ) = 1 2 min i j Σ ii + Σ jj 2Σ ij ), θ l Σ) = ρ l Σ SC S) C max 2 γσ)) 2, ρ u Σ) = max Σ ii, 73) i θ lσ) = ρ uσ SC S) C min γ 2 Σ). 74) Proof Theorem H1) 19

20 Let X N 0, Σ) with Σ satisfying 68) an 69). Let λ φ ) be lie in 70), for some φ > 2. Assume we choose the number of observations 1 in the first stage to be at least ) δ) θ u Σ) 1 + σ2 C min λ 2 s s log s) 75) = CΣ,, s) s log s), 76) an that β min is greater than 72). Then, with probability at least 1 c 1 exp c 2 min{s, log s)}), we recover the right support Sβ ) = S ˆβ) in the first stage of the algorithm. Conitional on this event, we apply our algorithm on the remaining observations. In the secon stage, we only loo at those imensions in S ˆβ), by setting weights ξ S ˆβ) = 1, an zero otherwise. Finally, we run OLS along the imensions in the recovere support, an using the observations collecte uring the secon stage. Importantly, note that the new observations are N 0, Σ SS ). We can now apply our original results. Denote by X 2 R 2 s the set of observations collecte in the secon stage of the algorithm. In particular, by Corollary D.1, we conclue that for α > 0 such that t = α 2 C s > 0, the following hols with probability at least 1 2 exp ct 2 ) TrΣ SS X T 2 X 2 ) 1 s ) ). 77) 1 α) logn2/2) 2 Uner the event that the recovery is correct, the contribution to the MSE of the components of β that are not in its support is zero. In other wors, β ˆβ 2 2 Σ = β ˆβ 2 ) T Σβ ˆβ 2 ) 78) = β S ˆβ 2S ) T Σ SS β S ˆβ 2S ) = β S ˆβ 2S 2 Σ SS. 79) As the events that the first an secon stages succee are inepenent, we conclue 77) hols with probability at least 1 c 1 e c2 min{s,log s)} 2e ct2 80) s 1 2e minc2 mins,log s)) logc1),ct2 log2)). 81) I Proof of CLT Lower Boun Corollary I.1 Assume the norm of white observations is istribute accoring to Z ξ = N, γ 2). Then, we have that for any algorithm A E A TrΣX T X) 1 ) Proof By Theorem F.1, we nee to compute E [ max i [n] X i 2]. By assumption X i 2 N, γ 2) for each i, which implies [ ] [ E max X i 2 = E + max γ X i 2 ] i [n] i [n] γ [ ] + γ E max N 0, 1) i [n] an the result follows. 1 + γ ). 82) 2 log n 83) 84) + γ 2 log n, 85) 20

21 J Rige Regression Regularize linear estimators also benefit from large an balance observations. We show that, uner mil assumptions, the performance of the rige regression is irectly aligne with that of previous sections. The rige estimator is ˆβ λ = X T X + λi ) 1 X T Y, given X, Y) an λ > 0. The following result shows how large values of λ min X T X) help to control the MSE of ˆβ λ. As the optimal penalty parameter λ is unnown until the en of the ata collection process, we assume it is uniformly ranom in a small interval. Theorem J.1 Let R > 0. Assume the penalty parameter for rige regression is chosen uniformly at ranom λ U[0, R]. Then, the MSE of ˆβ λ is upper boune by E λ,x ˆβ λ β 2 E X f λ min X T X) ), 86) where f is the following ecreasing function of λ min : fλ min ) = σ2 λ min + R + β λ min R log 1 + R ) + λ ) min. 87) λ min λ min + R Proof The SVD ecomposition of X = USV T implies that X T X = V SU T USV T = V S 2 V T, where U an V are orthogonal matrices. We efine W = X T X + λi ) 1, an see that W = V S 2 + λi)v T ) 1 1 = V Diag s 2 jj + λ In this case, the MSE of ˆβ λ has two sources: square bias an the trace of the covariance matrix. The covariance matrix of ˆβ λ is Cov ˆβ λ ) = σ 2 W X T XW, while its bias is given by λw β see [13]). Thus, Cov ˆβ λ ) = σ 2 V Diag s 2 jj s 2 jj + λ)2 ) j=1 ) j=1 V T. V T. 88) Note that s 2 jj = λ j, where s jj s are the singular values of X, an λ j s the eigenvalues of X T X. As [ V is orthogonal, Tr Cov ˆβ ] λ ) = σ 2 j=1 λ j/λ j + λ) 2. Unfortunately, in practice, the value of λ is unnown before collecting the ata. A common technique consists in using an aitional valiation set to choose the optimal regularization parameter λ. Generally, in supervise learning, the valiation set comes from the same istribution as the test set, while in active learning it oes not. As in the unregularize case, we want to train on unliely ata, but we want to test on liely ata. We achieve robustness against this fact as follows. We fix some fairly large R > 0 such that we assume λ 0, R). We treat λ as a ranom variable, an we impose a uniform prior D λ over 0, R). Then, we see that [ E λ D λ [Tr Cov ˆβ ]] λ ) = σ 2 j=1 = σ 2 j=1 λ j R λ j + λ) 2 R λ 1 λ j + R σ2 λ min + R. 89) 21

22 The square bias can be upper boune by [ λ 2 β T W T W β = β T λ 2 ] V Diag λ j + λ) 2 V T β j ) 2 λ β 2 2 max i λ j + λ ) 2 = β 2 λ 2. 90) λ min + λ for every λ > 0, as λ j 0 for all j. Taing expectations on both sies of 90) with respect to λ D λ, an after some algebra E Dλ Bias 2 ˆβ λ ) β 2 1 2λ min 2 R log 1 + R ) + λ min λ min λ min + R, 91) where the RHS is a ecreasing function of λ min that tens to zero as λ min grows. It follows that E ˆβ λ β 2 can be controlle by minimizing λ min X T X), an we can focus on minimizing λ min X T X) by the equivalence shown in the Problem Definition section of the main paper. K Simulations We conucte several experiments in various settings. We present here some experiments that complement those showe in the main paper. In particular, we show experiments for linear moels, synthetic linear moels, synthetic non-linear ata, an aitional regularize an real-worl atasets. K.1 Linear Moels We first empirically show the results prove in Theorem B.1. For a sequence of values of n, we choose = n observations in R, with fixe = 10. The observations are generate accoring to N 0, I), an y follows a linear moel with β i U 5, 5). For each tuple n, ) we repeat the experiment 200 times, an compute the square error β is nown). The results in Figure 3 a) show the average MSE of Algorithm 1 significantly outperforms that of ranom sampling. We also see a strong variance reuction. Figure 3 b) restricts the comparison to fixe an aaptive threshol algorithms; while the latter outperforms the former, the ifference is small. In Figure 3 c) we eep n an fixe, an vary. Finally, in Figure 4 a) we show the case where Σ I. For completeness, we repeate the simulation with observations generate accoring to a joint Gaussian istribution with a ranom covariance matrix that ha TrΣ) = 21.59, λ min = 5, an λ max = Figure 4 a) shows that thresholing algorithms outperform ranom sampling in a similar way as in the white case presente in the paper. Also, Figure 4 b) shows how the aaptive threshol slightly beats the fixe one. Finally, in Figure 4 c), we show the results of simulations when observations are sample from Laplace correlate marginals through a Gaussian Copula). We compare ranom sampling to two versions of the thresholing algorithm. The most simple one, enote by Unif-Weig Algorithm, assigns uniform weights i.e., ξ i = 1 for all i). On the other han, enote by Opt-Weig Algorithm the algorithm that uses the optimal weights previously pre-compute, in this case max i ξ i / min i ξ i 7, inepenent variables ten to require higher weights). As one woul expect, the latter oes better than the former. However, it is remarable that the ifference between ranom an thresholing is way more substantial than the ifference between optimal an approximate thresholing, an observation that can be very useful in practice. K.2 Synthetic Non-Linear Data The theory an algorithms presente in this paper are base on the linearity of the moel. To unerstan the impact of this assumption, we perform an experiment where the response moel was 22

23 Normalize MSE max 0.004) n 1K 2K 2K 5K 8K 15K 22K 28K 35K Algorithm 1b Algorithm 1 Ranom Sampling Theorem ψ =0.3) Passive Theory Normalize MSE max 0.002) n 1K 2K 2K 5K 8K 15K 22K 28K 35K Algorithm 1b Algorithm 1 Theorem ψ =0.3) a) With = n, = b) With = n, = 10. Normalize MSE max 0.003) Algorithm 1 Ranom Sampling c) With n = 3000, = 10. Figure 3: MSE of ˆβ OLS ; white Gaussian obs, 0.25, 0.75) quantile confience intervals isplaye in a), c). y = x T β + ψx T x for various values of ψ, an β i U 5, 5). Note that high-orer terms an transformations can easily be inclue in the esign matrix not one in this case). As expecte, the results in Figure 5 show an intersection point. The active learning algorithms are robust to some level of non-linearity but, at some point, ranom sampling becomes more effective. K.3 Regularization An appealing property of the propose algorithms is that their gain is preserve uner regularize estimators such as rige an lasso. This is specially relevant as it allows for higher imensional moels where transformations an interactions of the original variables are ae to better capture non-linearities in the ata an regularization is use to avoi overfitting. In fact, our algorithm can be thought of as a type of regularizing process. We repeate the first experiment from the linear moel simulations, using the rige estimator with λ = Figure 6 a) shows that the average MSEs of Algorithms 1 an 1b strongly outperform the results of ranom sampling. Their variance is less than 30% that of ranom sampling in all cases. We performe two experiments with Lasso estimators to investigate the behavior of our algorithms in the presence of sparse moels. We o not test Algorithm 2 here, but only simple thresholing approaches. First, we fixe n = 5000, = 150, = 70 an white Gaussian ata. The imension of the latent subspace, or effective imension of the moel, ranges from eff = 5 to eff = 70. Results are shown in Figure 6 b). Algorithm 1 an Algorithm 1b strongly improve the performance of ranom sampling, while their variance is at most half that of ranom sampling. 23

24 n Algorithm 1 Ranom Sampling n Algorithm 1b Algorithm 1 MSE 0.4 MSE a) With = n, = b) With = n, = 10. Normalize MSE max 0.016) n 1K 1K 1K 1K 1K 1K 1K Unif-Weig Algorithm Opt-Weig Algorithm Ranom Sampling c) Laplace Correlate Data via Gaussian Copula; Uniform vs. Optimal Weights. Figure 4: In a), b), MSE of ˆβ OLS ; N 0, Σ) ata, 0.05, 0.95) conf. intervals. Normalize MSE max 0.010) Algorithm 1b Algorithm 1 Ranom Sampling ψ x 1E-2) Figure 5: Moel is y = i β ix i + ψ i x2 i. In the secon experiment, we fixe eff = 7, an progressively increase the imension of the space from = 70 to = 450. Also, we ept fixe n = 1000 an = 100. Results are shown in Figure 6 c). Thresholing algorithms consistently ecrease the MSE of the lasso estimator with respect to ranom sampling, even though we are aing a large number of purely noisy imensions. The reason is simple. While these algorithms o not actively try to fin the latent subspace Algorithm 2 oes), their observations will be, on average, larger in those imensions too. There may be ways to leverage this 24

25 Normalize MSE max 0.004) n Algorithm 1 Ranom Sampling Normalize MSE max 0.055) Algorithm 1b Algorithm 1 Ranom Sampling a) Rige; = 10, = n effective b) Lasso; n = 5000, = 150, = 70. Normalize MSE max 0.007) Algorithm 1b Algorithm 1 Ranom Sampling c) Lasso; n = 1000, = 100, eff = 7 Figure 6: MSE of regularize estimators, λ = 0.01; white Gaussian obs. The 0.05, 0.95) confience intervals in a), an 0.25, 0.75) in b). fact, lie batche approaches where weights ξ are upate by giving more importance to promising imensions. K.4 Real Worl Datasets The Combine Cycle Power ataset has 9568 observations. The outcome is the net hourly electrical energy output of the plant, an it has = 4 covariates: temperature, pressure, humiity, an exhaust vacuum. In Figure 7, we see the phenomenon explaine in the main paper for large, the gain vanishes). In this case, an after aing all secon orer interactions, active learning solves the problem. Ranom sampling with interactions is not shown as the error was much larger. In aition, in Figure 8 we show the scatterplots of the atasets use in the paper we omitte the YearPreictionMSD ataset as = 90). 25

26 Normalize MSE max ) n x 1000) Algorithm 1b Ranom Sampling Algorithm 1b +interactions) Figure 7: Combine Cycle Power 150 iters). a) Protein Structure Dataset. b) Combine Cycle Power Plant Dataset. c) Bie Sharing Dataset. Figure 8: Scatter Plots of Real Worl Datasets. 26

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

A Unified Theorem on SDP Rank Reduction

A Unified Theorem on SDP Rank Reduction A Unifie heorem on SDP Ran Reuction Anthony Man Cho So, Yinyu Ye, Jiawei Zhang November 9, 006 Abstract We consier the problem of fining a low ran approximate solution to a system of linear equations in

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Lecture 6 : Dimensionality Reduction

Lecture 6 : Dimensionality Reduction CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing

More information

Spurious Significance of Treatment Effects in Overfitted Fixed Effect Models Albrecht Ritschl 1 LSE and CEPR. March 2009

Spurious Significance of Treatment Effects in Overfitted Fixed Effect Models Albrecht Ritschl 1 LSE and CEPR. March 2009 Spurious Significance of reatment Effects in Overfitte Fixe Effect Moels Albrecht Ritschl LSE an CEPR March 2009 Introuction Evaluating subsample means across groups an time perios is common in panel stuies

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

1. Aufgabenblatt zur Vorlesung Probability Theory

1. Aufgabenblatt zur Vorlesung Probability Theory 24.10.17 1. Aufgabenblatt zur Vorlesung By (Ω, A, P ) we always enote the unerlying probability space, unless state otherwise. 1. Let r > 0, an efine f(x) = 1 [0, [ (x) exp( r x), x R. a) Show that p f

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Jointly continuous distributions and the multivariate Normal

Jointly continuous distributions and the multivariate Normal Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016

arxiv: v2 [cond-mat.stat-mech] 11 Nov 2016 Noname manuscript No. (will be inserte by the eitor) Scaling properties of the number of ranom sequential asorption iterations neee to generate saturate ranom packing arxiv:607.06668v2 [con-mat.stat-mech]

More information

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS BY BENJAMIN AVANZI, LUKE C. CASSAR AND BERNARD WONG ABSTRACT In this paper we investigate the potential of Lévy copulas as a tool for

More information

Hyperbolic Systems of Equations Posed on Erroneous Curved Domains

Hyperbolic Systems of Equations Posed on Erroneous Curved Domains Hyperbolic Systems of Equations Pose on Erroneous Curve Domains Jan Norström a, Samira Nikkar b a Department of Mathematics, Computational Mathematics, Linköping University, SE-58 83 Linköping, Sween (

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

2886 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 5, MAY 2015

2886 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 5, MAY 2015 886 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 61, NO 5, MAY 015 Simultaneously Structure Moels With Application to Sparse an Low-Rank Matrices Samet Oymak, Stuent Member, IEEE, Amin Jalali, Stuent Member,

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Qubit channels that achieve capacity with two states

Qubit channels that achieve capacity with two states Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

Simultaneous Input and State Estimation with a Delay

Simultaneous Input and State Estimation with a Delay 15 IEEE 5th Annual Conference on Decision an Control (CDC) December 15-18, 15. Osaa, Japan Simultaneous Input an State Estimation with a Delay Sze Zheng Yong a Minghui Zhu b Emilio Frazzoli a Abstract

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

arxiv: v1 [cs.lg] 22 Mar 2014

arxiv: v1 [cs.lg] 22 Mar 2014 CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information

Robust Bounds for Classification via Selective Sampling

Robust Bounds for Classification via Selective Sampling Nicolò Cesa-Bianchi DSI, Università egli Stui i Milano, Italy Clauio Gentile DICOM, Università ell Insubria, Varese, Italy Francesco Orabona Iiap, Martigny, Switzerlan cesa-bianchi@siunimiit clauiogentile@uninsubriait

More information

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas The Role of Moels in Moel-Assiste an Moel- Depenent Estimation for Domains an Small Areas Risto Lehtonen University of Helsini Mio Myrsylä University of Pennsylvania Carl-Eri Särnal University of Montreal

More information

Discrete Mathematics

Discrete Mathematics Discrete Mathematics 309 (009) 86 869 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: wwwelseviercom/locate/isc Profile vectors in the lattice of subspaces Dániel Gerbner

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

Why Bernstein Polynomials Are Better: Fuzzy-Inspired Justification

Why Bernstein Polynomials Are Better: Fuzzy-Inspired Justification Why Bernstein Polynomials Are Better: Fuzzy-Inspire Justification Jaime Nava 1, Olga Kosheleva 2, an Vlaik Kreinovich 3 1,3 Department of Computer Science 2 Department of Teacher Eucation University of

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms On Characterizing the Delay-Performance of Wireless Scheuling Algorithms Xiaojun Lin Center for Wireless Systems an Applications School of Electrical an Computer Engineering, Purue University West Lafayette,

More information

Local Linear ICA for Mutual Information Estimation in Feature Selection

Local Linear ICA for Mutual Information Estimation in Feature Selection Local Linear ICA for Mutual Information Estimation in Feature Selection Tian Lan, Deniz Erogmus Department of Biomeical Engineering, OGI, Oregon Health & Science University, Portlan, Oregon, USA E-mail:

More information

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.

More information

Chapter 4. Electrostatics of Macroscopic Media

Chapter 4. Electrostatics of Macroscopic Media Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1

More information

CONTROL CHARTS FOR VARIABLES

CONTROL CHARTS FOR VARIABLES UNIT CONTOL CHATS FO VAIABLES Structure.1 Introuction Objectives. Control Chart Technique.3 Control Charts for Variables.4 Control Chart for Mean(-Chart).5 ange Chart (-Chart).6 Stanar Deviation Chart

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Designing of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution

Designing of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution International Journal of Statistics an Systems ISSN 973-675 Volume, Number 3 (7), pp. 475-484 Research Inia Publications http://www.ripublication.com Designing of Acceptance Double Sampling Plan for Life

More information

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary

More information

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH English NUMERICAL MATHEMATICS Vol14, No1 Series A Journal of Chinese Universities Feb 2005 TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH He Ming( Λ) Michael K Ng(Ξ ) Abstract We

More information

On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness

On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness Journal of Machine Learning Research 18 2018 1-60 Submitte 3/17; Revise 2/18; Publishe 4/18 On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct an Reverse Nearest Neighbors, an

More information

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

II. First variation of functionals

II. First variation of functionals II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis

Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis Chuang Wang, Yonina C. Elar, Fellow, IEEE an Yue M. Lu, Senior Member, IEEE Abstract We present a high-imensional analysis

More information

Image Denoising Using Spatial Adaptive Thresholding

Image Denoising Using Spatial Adaptive Thresholding International Journal of Engineering Technology, Management an Applie Sciences Image Denoising Using Spatial Aaptive Thresholing Raneesh Mishra M. Tech Stuent, Department of Electronics & Communication,

More information