LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy
|
|
- Justina Cross
- 6 years ago
- Views:
Transcription
1 1 LoPub: High-Dimensional Crowdsourced Data Publication with Local Dierential Privacy Xuebin Ren, Chia-Mu Yu, Weiren Yu, Shusen Yang, Xinyu Yang, Julie A. McCann, and Philip S. Yu Abstract High-dimensional crowdsourced data collected rom numerous users produces rich knowledge or our society. However, it also brings unprecedented privacy threats to the participants. Local privacy, a variant o dierential privacy, is proposed to eliminate privacy concerns. Unortunately, achieving local privacy on high-dimensional crowdsourced data raises great challenges in terms o both computational eiciency and eectiveness. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we irst propose eicient multi-dimensional joint distribution estimation algorithms that maintain local privacy. Then, we develop a Locally privacy-preserving high-dimensional data Publication algorithm, LoPub, by taking advantage o our distribution estimation techniques. In particular, both correlations and joint distributions among multiple attributes are identiied to reduce the dimensionality o crowdsourced data, thus achieving both eiciency and eectiveness in high-dimensional data publication. To the best o our knowledge, this is the irst work addressing high-dimensional crowdsourced data publication with local privacy. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme signiicantly outperorms existing estimation schemes in terms o both communication overhead and estimation speed, and conirm that our LoPub scheme can keep average 8% and 6% accuracy over the published approximate datasets in terms o SVM and random orest classiication, respectively. Index Terms Local privacy, high-dimensional data, crowdsourced data, data publication 1 INTRODUCTION With the development o various integrated sensors and crowd sensing systems [19], crowdsourced inormation rom all aspects can be collected and analyzed to better produce rich knowledge about the group, which can beneit everyone in the crowdsourced system [2]. Particularly, with multi-dimensional crowdsourced data (data with multiple attributes), a lot o potential inormation and patterns behind the data can be mined or extracted to provide accurate dynamics and reliable prediction or both group and individuals. However, the participants privacy can still be easily inerred or identiied due to the publication o crowdsourced data [15], [33], especially high-dimensional data, even though some existing privacy-preserving schemes and end-to-end encryption are used. The reasons or privacy leaks are two-old: Non-local Privacy. Most existing solutions or privacy protection ocus on centralized datasets under the assumption that the server is trusted. However, despite the privacy protection against dierence and inerence attacks rom aggregate queries, an individual s data may still suer rom privacy leakage beore aggregation because o the lack o local privacy [17], [7] on the user side. Curse o High-dimensionality. With the increase o data dimensions, some existing privacy-preserving techniques like dierential privacy [8], i straightorwardly applied to multiple attributes with high correlations, will become vulnerable [25], [35], thereby increasing the success ratio o many reerence attacks like cross-checking. Even worse, according X. Ren, S. Yang, and X. Yang are with Xi an Jiaotong University. s: {xb.ren@stu, shusenyang@mail, yxyphd@mail}.xjtu.edu.cn C.M. Yu is with National Chung Hsing University. chimayu@gmail.com W. Yu is with Imperial College London and Aston University. s:weiren.yu@imperial.ac.uk, w.yu3@aston.ac.uk J. McCann is with Imperial College London. j.mccann@imperial.ac.uk P. Yu is with University o Illinoise at Chicago. psyu@uic.edu to the composition theorem [26], dierential privacy degrades exponentially when multiple correlated queries are processed. In addition to privacy vulnerability, the large scale o various data records collected rom many distributed users can exaserbate the ineiciency o data processing. Especially in IoT applications, the ubiquitous but resource-constrained sensors require extremely high eiciency and low overhead. For example, privacypreserving real-time pricing mechanisms require not only eective privacy guarantees or individuals electricity usage but also ast response to the dynamical changes o demands and supply in the smart grid [24]. Thus, it is important to provide an eicient privacy-preserving method to publish crowdsourced high-dimensional data. Contributions. To address the above concerns, this paper makes the ollowing contributions. We are the irst to address the problem o highdimensional crowdsourced data publication with local privacy to the best o our knowledge. We propose a locally privacy-preserving scheme or crowdsensing systems to collect and build highdimensional data rom distributed users. Particularly, dierential privacy is directly achieved or each distributed user. Then, based on EM and Lasso regression, we propose eicient algorithms or multivariate joint distribution estimation. By taking advantage o speciic marginal distributions rom the locally privacy-preserved data ater dimensionality and sparsity reduction, we propose LoPub solution that can generate an approximation o the original crowdsourced data with the guarantee o local privacy. We implemented and evaluated our schemes on real-world datasets. Experimental results conirm the eiciency and eectiveness o our proposed distribution estimation and data release mechanisms. Due to the page limit, some detailed examples and explanations that are not presented in this paper can be ound in our ull length preprint technical report [28].
2 2 Fig. 1: Main procedures o high-dimensional data publishing with non-local ǫ = ǫ 1 +ǫ 2 privacy 2 RELATED WORK 2.1 Privacy in Centralized Setting Dierential privacy [8] orms a mathematical oundation or privacy protection by imposing proper randomness on statistical query results. Examples o the use o d- ierential privacy include privacy-preserving data aggregation, where dierential privacy o individuals can be guaranteed by injecting careully-calibrated Laplacian noise [5], [13], [18], [22], [35]. For privacy-preserving lowdimensional data publication, to show crowd statistics and draw the correlations between attributes, both the dierentially privacy-preserving histogram (univariate distribution) [3] and contingency table [27] are widely investigated. However, the techniques or non-interactive dierential privacy [9], [1] in these works suer rom the curse o dimensionality [35], [5]. Particularly, the composition theorems [26] have pointed out that the privacy levels degrade when multiple related queries are processed.to deal with the correlations in high-dimensional data, dierent schemes (e.g., approximations via low dimensional data clusters) have been proposed [5], [6], [18], [21], [32], [35]. Among them, the state-o-art scheme [5] proposed to reduce the dimension by using junction tree to model the correlations. Moreover, Su et al. [31] proposed a multi-party setting to publish synthetic dataset rom multiple data curators. However, their multi-party computation can only protect privacy between data servers and individual s local privacy cannot be guaranteed. Due to the lack o local privacy guarantee, these works, as summarized in Figure 1, may be exposed to some insider attackers, thus being unable to directly apply to crowdsourced systems. 2.2 Privacy in Distributed Setting The schemes mentioned above mainly deal with centralized datasets. Nonetheless, there could be scenarios, where distributed users contribute to the aggregate s- tatistics. Despite the privacy protection against dierence and inerence attacks rom aggregate queries, an individual s data may also suer rom privacy leakage beore aggregation [11]. Hence, local privacy [7], [16], [17] has been proposed to provide local privacy guarantees or distributed users. In addition, local privacy rom the end user can ensure the consistency o the privacy guarantees when there are multiple accesses to users data, in contrast to non-local privacy schemes that has to properly split and assign privacy budgets to dierent steps [5], [21], [35]. In existing work [15][12][14], local privacy is implemented with randomized response technique [34]. However, the correlations and sparsity in high-dimensional data are not well considered, which will cause low scalability and utility or highdimensional data [25], [35]. Fig. 2: An architecture o distributed high dimensional private data collection and publication Dierent rom these work, we propose a novel mechanism to publish high-dimensional crowdsourced data with local privacy or individuals. We compare our work with three similar existing solutions described in the Table 1. More speciically, our method has lower communication costs, time and storage complexity, compared to state-o-the-art approaches. TABLE 1: Comparison o LoPub with existing methods Comparison LoPub (Our method) RAPPOR [12] EM [14] JTree [5] Local privacy Y Y Y N High Dimension Y N N Y Communication O( j Ω j ) O( j Ω j ) O( j Ω j ) - Time Complexity Low Large Large - Space Complexity Low Large Large - Ω j is the domain size o the j-th dimension. 3 SYSTEM MODEL Our system model is depicted in Figure 2, where a number o users and a central server constitute a crowdsourcing system. The users generate multi-dimensional data records, and then send these data to the central server. The server gathers all the data and estimates high-dimensional crowdsourced data distribution with local privacy, aiming to release a privacy-preserving dataset to third-parties or conducting data analysis. In this paper, we mainly ocus on data privacy, and thus the detailed network model is omitted. Problem Statement. Given a collection o data records with d attributes rom dierent users, our goal is to help the central server publish a synthetic dataset that has the approximate joint distribution o d attributes with local privacy. Formally, let N be the total number o users (i.e., data records 1 ) and suiciently large. Let X = {X 1,X 2,...,X N } be the crowdsourced dataset, where X i denotes the data record rom the ith user. We assume that there are d attributes A = {A 1,A 2,...,A d } in X. Then each data record X i can be represented as X i = {x i 1,x i 2,...,x i d }, where xi j denotes the jth element o the ith user record. For each attribute A j (j = 1,2,...,d), we denote Ω j = {ωj 1,ω2 j,...,ω Ωj j } as the domain o A j, where ωj i is the ith possible attribute value o Ω j and Ω j is the cardinality o Ω j. With the above notations, our problem can be ormulated as ollows. Given a dataset X with local privacy, we aim to release an approximate dataset X with the same attributes A and N users record in X such that P X (A 1...A d ) P X (A 1...A d ), (1) 1. For brevity, we assume that each user sends only one data record to the central server.
3 3 where P X (A 1...A d ) P X (x i 1 = ω 1,...,x i d = ω d), i = 1,...,N, ω 1,...,ω d Ω d and P X (x i 1 = ω 1,...,x i d = ω d) is deined as the d-dimensional joint distribution on X. To ocus our research on data privacy, we assume that the central server and users are all honest-but-curious in the sense that they will honestly ollow the protocols in the system without maliciously manipulating their received data. However, they may be curious about others data and even collide to iner others data. In addition, the central server and users share the same public inormation, such as the privacy-preserving protocols (including the hash unctions used). 4 PRELIMINARIES 4.1 Dierential Privacy Dierential privacy is the de-acto standard or providing privacy guarantees [8]. It limits the adversary s ability o inerring the participation or absence o any user in a data set via adding careully calibrated noise (e.g., Laplacian noise [8]) to query results. The algorithm M is ǫ-dierentially private i or all neighboring datasets D 1 and D 2 that dier on a single element (e.g., the data o one person), and all subsets S o the image o M, Pr[M(D 1 ) S] e ǫ Pr[M(D 2 ) S], (2) where ǫ is the privacy budget to speciy the level o privacy protection and smaller ǫ means better privacy. According to the composition theorem [29], an extra privacy budget will be required when multiple related queries are sequentially applied to dierential privacy mechanisms. 4.2 Local Dierential Privacy Generally, dierential privacy research ocuses on centralized databases and implicitly assumes a trusted server. Aiming to eliminate this assumption, local dierential privacy (or simply local privacy) is proposed or crowdsourced systems to provide a stringent privacy guarantee that data contributors trust no one [7], [17]. In particular, or any user i, a mechanism M satisies ǫ-local privacy i or any two data records X i,y i Ω 1 Ω d, and or any possible privacy-preserving outputs X i Range(M), Pr[M(X i ) = X i ] e ǫ Pr[M(Y i ) = X i ], (3) where the probability is taken overm s randomness and ǫ has a similar impact on privacy as in the ordinary dierential privacy (Equation (2)). The simplest orm o local privacy is the randomized response [34], which has been widely used in the survey o people s yes or no opinions about a private issue. Participants o the survey are required to give their true answers with a certain probability or random answers with the remaining probability. Due to the randomness, the surveyor cannot determine the individuals true answers (i.e., local privacy is guaranteed) but still can predict the true proportions o alternative answers. Recently, RAPPOR has been proposed or statistics aggregation [12]. The basic idea o RAPPOR is the extension o the randomized response technique via long binary strings to uniquely represent arbitrary domain. However, it is not directly applicable to multiple dimensional data with large domain size since the binary strings will have exponential length increments in terms o the number o dimensions. To address this problem, Fanti et al. [14] propose an association learning scheme, which extends the 1-dimensional RAPPOR to estimate the 2-dimensional joint distribution. However, the sparsity in the multi-dimensional domain and the way it iteratively scans RAPPOR strings means that it will incur considerable computational complexity. 5 LOPUB: HIGH-DIMENSIONAL DATA PUBLI- CATION WITH LOCAL PRIVACY We propose LoPub, a novel solution to achieve highdimensional crowdsourced data publication with local privacy. In this section, we irst introduce the basic idea behind LoPub and then elaborate the algorithmic procedures in more detail. 5.1 Basic idea Privacy-preserving high-dimensional crowdsourced data publication aims at releasing an approximate dataset with similar statistical inormation (i.e., in terms o s- tatistical distribution as deined in Equation (1)) to the source data while guaranteeing the local privacy. This problem can be considered in our stages: First, to achieve local privacy, some local transormation should be designed to the user side to cloak individuals original data records. Then, the central server needs to obtain the statistical inormation, a.k.a, the distribution o original data. There are two plausible solutions. One is to obtain the 1-dimensional distribution on each attribute independently. Unortunately, the lack o consideration o correlations between dimensions will lose the utility o original dataset. Another is to consider all attributes as one and compute the d-dimensional joint distribution. However, due to combinations, the possible domain will increase exponentially with the number o dimensions, thus leading to both low scalability and signal-noise-ratio problems [35]. Thereore, the next crucial problem is to ind a solution or reducing the dimensionality while keeping the necessary correlations. Finally, with the statistical distribution inormation on low-dimensional data, how to synthesize a new dataset is the remaining problem. To this end, we present LoPub, a locally privacypreserving data publication scheme or highdimensional crowdsourced data. Figure 3 shows the overview o LoPub, which mainly consists o our mechanisms: local privacy protection, multi-dimensional distribution estimation, dimensionality reduction, and data synthesizing. 1) Local Privacy Protection. We irst propose the local transormation process that adopts randomized response technique to cloak the original multidimensional data records on distributed users to provide local privacy or all individuals in the crowdsourced system. Particularly, we locally transorm each attribute value to a random bit string. Then, the local privacy-preserved data is sent to and aggregated at the central server. 2) Multi-dimensional Distribution Estimation. We then propose multi-dimensional joint distribution estimation schemes to obtain both the joint and marginal probability distribution on multidimensional data. Inspired by [14], we irst extend the EM-based approach or high-dimensional
4 4 Fig. 3: An overview o LoPub distribution estimation. However, such a straightorward extension does not consider the sparsity in high-dimensional data, which will lead to high complexity or distribution estimation. To guarantee ast estimation, we then present a Lasso-based approach with the cost o slight accuracy degradation. Finally, we propose a hybrid approach striking the balance between the accuracy and eiciency. 3) Dimensionality Reduction. Based on the multidimensional distribution inormation, we then propose to reduce the dimensionality by identiying mutual-correlated attributes among all dimensions and split the high-dimensional attributes into several compact low-dimensional attribute clusters. In this paper, considering the heterogeneous attributes, we adopt mutual inormation and an undirected dependency graph to measure and model the correlations o attributes, respectively. Then, we propose to split the attributes according to the junction tree built rom the dependency graph. In addition, we also propose a heuristic pruning scheme to urther boost the process o correlation identiication. 4) Synthesizing the New Dataset. Finally, we propose to sample each low-dimensional dataset according to the connectivity o attribute clusters and the estimated joint or conditional distribution on each attribute cluster, thus synthesizing a new privacy-preserving dataset. 5.2 Local Transormation or High-dimensional Data Record Design Rationale A common ramework o locally private distribution estimation is that each individual user applies a local transormation on the data or privacy protection and then sends the transormed data to the server. The server estimates the joint distribution according to the transormed data. Local transormation in our design includes two key steps: one is mapping to Bloom ilters and the other is adding randomness. Particularly, Bloom ilters over attribute domain Ω with multiple hash unctions can hash all the variables in the domain TABLE 2: Notation N number o users (data records) in the system X entire crowdsourced dataset on the server side X i data record rom the ith user x i j jth element o X i d number o attributes in X R set o all attribute clusters A j : jth attribute o X Ω j domain o A j ω j candidate attribute value in Ω j H j (x) hash unctions or A j that map x into a Bloom ilter s i j Bloom ilter o x i j (Si j = H j(x i j )) s i j [b] bth bit o si j ŝ i j randomized Bloom ilter o s i j ŝ i j [b] bth bit o ŝi j m j length o s i j probability o lipping a bit o a Bloom ilter into a pre-deined space. Thus, the unique bit strings are the representative eatures o the original report. Then, ater privacy protection by randomized responses, a large number o samples with various levels o noise are generated by individual users. Ater aggregation, the central server obtains a large sample space with random noise. As a result, one may estimate the distribution rom the noised sample space by taking advantage o machine learning techniques such as EM algorithm and regression analysis. Under the above ramework, a key observation can be made: i eatures are mutually independent, the combinations o eatures rom dierent candidate sets are also mutually independent. Thereore, when Bloom ilters o each attribute are mutually independent (i.e., no collisions or all bits), then the Cartesian product o Bloom ilters o dierent attributes are mutually independent. In this sense, with mutually independent eatures o Bloom ilters, existing machine learning techniques like EM and Lasso regression are eective or the multivariate distribution estimation. Some notations used in this paper are listed in Table Algorithmic Procedures o Local Transormation Beore describing the distribution estimation, we present that details about the local transormation or highdimensional crowdsourced data. In essence, local transormation consists o three steps: 1) For the ith user, we have an original data record X i = {x i 1,xi 2,...,xi d } with d attributes. For each attribute A j (j = 1,...,d), we employ h hash unctions H j ( ) to map x i j to a length-m j bit string s i j (called a Bloom ilter); we calculate s i j = H j(x i j ),j = 1,...,d. 2) Each bit s i j [b] (b = 1,2,...,m j) in s i j is randomly lipped into or 1 according to the ollowing rule: s i ŝ i j [b], with probability o 1 j[b] = 1, with probability o /2 (4), with probability o /2 where [,1] is a user-controlled lipping probability that quantiies the level o randomness or local privacy. 3) Ater deriving randomized Bloom ilter ŝ i j (j = 1,...,d), we concatenates ŝ i 1,...,ŝi d to obtain a stochastic ( d j=1 m j)-bit vector, [ŝi 1 [1],...,ŝ i 1 [m 1]... ŝ i d [1],...,ŝi d [m d] ] (5)
5 5 and send it to the server. Detailed examples illustrating the above procedures can be reerred to [28]. Parameter Setup: According to the characteristics o Bloom ilter [3], given the alse positive probability p and the number Ω i o elements to be inserted, the optimal length m j o Bloom ilter can be calculated as m j = ln(1/p) (ln2) 2 Ω j. (6) Furthermore, the optimal number h j o hash unctions in the Bloom ilter is h j = m j ln(1/p) ln2 = Ω j (ln2). (7) So, the optimal h = ln(1/p) (ln 2) is used or all dimensions. Privacy Analysis: Because local transormation is perormed by the individual user, no one can obtain the original record X i, local privacy can be easily achieved and we only have to analyze the privacy guarantee on the user side. In addition, since both hash operations and randomized response on all attributes are independent, the local transormation on data consumes no extra privacy budget with the increase o number o dimensions d, as pointed by the composition theorem [26]. According to the conclusion in [12], dierential privacy obtained on the user side is ( ) 2 ǫ = 2hln, (8) where h is the number o hash unctions in the Bloom ilter and is the probability that a bit vector was lipped. Overall, since the same transormation is done by all users independently, this ǫ-local privacy guarantee is equivalent or all distributed users. Communication Overhead: Theorem 1: The minimal communication cost C LoPub ater the local transormation d C LoPub m j = ln(1/p) d (ln2) 2 Ω j. (9) j=1 j=1 Proo I we assume that the domain o each attribute is publicly known by both users and the server, then the communication cost o non-private collection is basically d j=1 ln Ω j, which is related to the domain size. Nevertheless, in our method, due to local privacy, the communication cost is d j=1 m j, which is related to the length o the Bloom ilters because only randomly lipped bit strings (not the original data) are sent. For comparison, under the same condition, when RAPPOR [12] is directly applied to the k-dimensional data, all Ω 1 Ω k candidate value will be regarded as 1-dimensional data, then the cost is C RAPPOR ln(1/p) (ln2) 2 k Ω j, (1) j=1 where k j=1 Ω j is due to the size o the candidate set Ω 1 Ω k. Dierence between Equation 9 and 1 is because our LoPub, compared with straightorward RAPPOR, considers the mutual independency between multiple attributes. 5.3 Multivariate Distribution Estimation with Local Privacy Ater receiving randomized bit strings, the central server can aggregate them and estimate their joint distribution. For example, an EM-based estimation algorithm [14] was proposed to estimate 2-dimensional joint distribution. However, due to high complexity and overheads, it is only preerable to low dimensions with small domain, which is impractical to many real-world datasets with high dimensions. Thereore, we then propose a Lasso regression based algorithm with high eiciency and also a hybrid algorithm to achieve a balance between eiciency and accuracy EM-based Distribution Estimation Here, we irst extend EM-based estimation [14] to k- dimensional dataset (2 k d) and then elaborate its computational complexity to show its ineiciency in high-dimensional crowdsourced data. Beore illustrating the algorithm, we irst introduce the ollowing notations. Without loss o generality, we considerk speciied attributes asa 1,A 2,...,A k and their index collection C = {1,2,...,k}. For simplicity, the event A j = ω j or x j = ω j is abbreviated as ω j. For example, the prior probability P(x 1 = ω 1,x 2 = ω 2,...,x k = ω k ) can be simpliied into P(ω 1 ω 2...ω k ) or P(ω C ). Algorithm 1 depicts the extended EM-based approach or estimating k-dimensional joint distribution. More speciically, it consists o the ollowing ive main steps. Algorithm 1 EM-based k-dimensional Joint Distribution (EM JD) Require: C : attribute indexes cluster, i.e., C = {1,2,...,k} A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability, δ : convergence accuracy. Ensure: P(A C ): joint distribution o k attributes speciied by C. 1: initialize P (ω C ) = 1/( Ω j ). j C 2: or each i = 1,...,N do 3: or each j C do 4: compute P(ŝ i j ω j) = m j )ŝi b=1 ( j [b] (1 2 2 )1 ŝi j [b]. 5: end or 6: compute P(ŝ i C ω C) = P(ŝ i j ω j). j C 7: end or 8: initialize t = /* number o iterations */ 9: repeat 1: or each i = 1,...,N do 11: or each (ω C ) Ω 1 Ω 2 Ω k do 12: compute P t(ω C ŝ i C ) = Pt(ω C) P(ŝ i C ω C) P t(ω C )P(ŝ i C ω C) ω C 13: end or 14: end or 15: set P t+1 (ω C ) = 1 N N i=1 Pt(ω C ŝ i C ) 16: update t = t+1 17: until maxp ω t(ω C ) maxp t 1 (ω C ) δ. C ω C 18: return P(A C ) = P t(ω C ) 1) Beore executing EM procedures, we set an uniorm distribution P(ω 1 ω 2...ω k ) = 1/( k Ω j ) as the j=1 initial prior probability. 2) According to Equation (4), each bit s i j [b] will be lipped with probability 2. Thus, by comparing the
6 6 bits H j (ω j ) with the randomized bits, the conditional probability P(ŝ i j ω j) can be computed (see line 4 o Algorithm 1). 3) Due to the independence between attributes (and their Bloom ilters), the joint conditional probability can be easily calculated by combining each individual attribute; i.e., P(ŝ i C ω C) = P(ŝ i j ω j). j C 4) Given all the conditional distributions o one particular combination o bit strings, their corresponding posterior probability can be computed by the Bayes Theorem, P t (ω C ŝ i C) = P t(ω C ) P(ŝ i C ω C) P t (ω C )P(ŝ i C ω C). (11) ω C where P t (ω C )=P t (ω 1 ω 2...ω k ) is the k dimensional joint probability at the tth iteration. 5) Ater identiying posterior probability or each user, we calculate the mean o the posterior probability rom a large number o users to update the prior probability. The prior probability is used in another iteration to compute the posterior probability in the next iteration. The above EM-like procedures are executed iteratively until convergence, i.e., the maximum dierence between two estimations is smaller than the speciied threshold. The above algorithm can converge to a good estimation when the initial value is well chosen. EM-based k- dimensional joint distribution estimation will also ail when converging to local optimum. Especially when k increases, there will be many local optimum to prevent good convergence because sample space o all combinations in Ω j1 Ω j2 Ω jk explodes exponentially. Complexity: Beore the analysis o complexity, we should note that number o user records N needs to be suiciently large according to the analysis in [12], i.e., N v k, where v denotes the average size o Ω j, otherwise it is diicult to estimate reliably rom a small sample space with low signal-noise-ratio. Theorem 2: Suppose that the average length o m j is m and the average Ω j is v. Then, the time complexity o Algorithm 1 is O ( Nkmv k +tnv 2k). (12) Proo EM-based estimation will scan all N users bit strings with the length o km one by one to compute the conditional probability or v k dierent combinations, the time complexity basically can be estimated as O(N(km)(v k )). Also, in the tth iteration, computing the posterior probability o each combination when observing each bit string will incur the time complexity o O(tN(v k ) 2 ). As a consequence, the overall time complexity is O ( tnv 2k +Nkmv k). Theorem 3: The space complexity o Algorithm 1 is O ( Nkm+2Nv k). (13) Proo In Algorithm 1, the necessary storage includes N users bit strings with the length o km, so it is O(N km). The prior probabilities on k dimensions is O(v k ). The conditional probabilities and posterior probabilities on v k candidates or all bit strings is O(2Nv k ). So, the overall complexity is O ( Nkm + 2Nv k + v k) = O ( Nkm+2Nv k) since N is the dominant variable. According to Theorem 2, the space overhead could be daunting when either N or k is large. This makes the perormance o EM-basedk-dimensional distribution estimation degrade dramatically and not applicable to high dimensional data Lasso-based Distribution Estimation To improve the eiciency o the k-dimensional joint distribution estimation, we present a Lasso regressionbased algorithm here. As mentioned in Section 5.2.1, the bit strings are the representative eatures o the original report. Ater randomized responses and lipping, a large number o noisy samples will be generated by individual users. More precisely, one may consider that the central server receives a large number o samples rom speciic distribution, however, with random noise. In this sense, one may estimate the distribution rom the noisy sample space by taking advantage o linear regression y = Mβ, where M is predictor variables and y is response variable, and β is the regression coeicient vector. The use o Bloom ilter can guarantee that the eatures (predictor variables M) re-extracted at the server are the same as ones extracted by the user. Moreover, response variable y can be estimated rom the randomized bit strings according to the statistic characters o known. Thereore, the only problem is to ind a good solution to the linear regression y = Mβ. Obviously, k-dimensional data may incur a output domain Ω 1... Ω k with the size o Ω 1... Ω k, which increases exponentially with k. With ixed N entries in the dataset X, the requencies o many combination ω 1 ω 2...ω k Ω 1... Ω k are rather small or even zero. So, M is sparse and only part o the sparse but eective predictor variables need to be chosen. Otherwise, the general linear regression techniques will lead to overitting problem. Here, we resort to Lasso regression, eectively solving the sparse linear regression by choosing predictor variables. Algorithm 2 Lasso-based k-dimensional Joint Distribution (Lasso JD) Require: C : attribute indexes cluster i.e., {1,2,...,k}, A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability. Ensure: P(A C ): joint distribution o k attributes speciied by C. 1: or each j C do 2: or each b = 1,2,...,m j do 3: compute ŷ j [b] = N i=1ŝi j [b] 4: compute y j [b] = (ŷ j [b] N/2)/(1 ) 5: end or 6: set H j (Ω j ) = {H j (ω) ω Ω j } 7: end or 8: set y = [ y 1 [1],...,y 1 [m 1 ] y 2 [1],...,y 2 [m 2 ]... y k [1],...,y k [m k ] ] 9: set M = [ H 1 (Ω 1 ) H 2 (Ω 2 ) H k (Ω k ) ] 1: compute β = Lasso regression(m, y) 11: return P(A C ) = β/n Our Lasso-based estimation is described in Algorithm 2 and consists o the ollowing our major steps. 1) Ater receiving all randomized Bloom ilters, or each bit b in each attribute j, the server counts the number o 1 s as ŷ j [b] = N i=1ŝi j [b]. 2) The true count sum o each bit y j [b] can be estimated as y j [b] = (ŷ j [b] N/2)/(1 ) according to the randomized response applied to the true count.
7 7 Fig. 4: Illustration o Lasso JD These count sums o all bits orm a vector y with the length o k j=1 m j. 3) To construct the eatures o the overall candidate set o attribute ω 1...ω k, the Bloom ilters on each dimension Ω j is re-implemented by the server with the same hash unctions H j (). Suppose all distinct Bloom ilters on Ω j are H j (Ω j ) = {H j (ω) ω Ω j }, where they are orthogonal with each other. [ The candidate set o Bloom ilters is then M = H1 (Ω 1 ) H 2 (Ω 2 ) H k (Ω k ) ] and the members in M are still mutual orthogonal. 4) Fit a Lasso regression model to the counter vector y and the candidate matrix M, and then choose the non-zero coeicients as the corresponding requencies o each candidate string. By reshaping the coeicient vector into a k-dimensional matrix by natural order and dividing with N, we can derive the k-dimensional joint distribution estimation P(A 1 A 2...A k ). For example, in Figure. 4, we it a linear regression to y 12 and the candidate matrix M to estimate the joint distribution P A1A 2. Generally, the regression operation, the core o the estimation, will lose accuracy only when there are many collisions between Bloom ilter strings. However, as mentioned in Section 5.2.1, i there is no collision in the bit strings o each single dimension, then there is no collision in conjuncted bit strings o dierent dimensions. In act, the probability o collision in conjuncted bit strings will not increase with dimensions. For example, suppose the collision rate o Bloom ilter in one dimension is p, then the collision rate will decrease to p k when we connect bit strings o k dimensions together. Thereore, we only need to choose proper m and h according to Equation (6) and (7) to lower the collision probability or each dimension and then we are guaranteed to have a proper estimation or multiple dimensions. Complexity: Compared with Algorithm 1, our Lassobased estimation can eectively reduce the time and space complexity. Theorem 4: The time complexity o Algorithm 2 is O ( v 3k +kmv 2k +Nkm ). (14) Proo Algorithm 2 involves two parts: to compute the bit counter vector, N bit strings with each length o km will be summed up and this operation at most incurs the complexity o O(N km); and Lasso regression with v k candidates (total domain size) and km samples (the length o the bit counter vector is km) has the complexity o O ( (v k ) 3 +(v k ) 2 (km) ). Based on the general assumption that N dominates Equation (14), then we can see the complexity in Equation (14) is much less than Equation (12) in Theorem 2. Theorem 5: The space complexity o Algorithm 2 is O ( Nkm+v k km ). (15) Proo In Algorithm 2, the storage overhead consists o three parts: users bit strings O(Nkm), a count vector with size O(km), and the candidate bit matrix M with size O(kmv k ). Thereore, the overall space complexity o our proposed Lasso based estimation algorithm is O ( Nkm + km + v k km ) = O ( Nkm + v k km ), which is also smaller than Equation (13) as N is dominant. The empirical results are shown in Section 6. The eiciency comes rom the act that the N bit strings o length m will be scanned to count sum only once and then one-time Lasso regression is itted to estimate the distribution. In addition, Lasso regression could extract the important (i.e., requent) eatures with high probability, which its well with the sparsity o high-dimensional data Hybrid Algorithm Recall that, with suicient samples, EM-based estimation can demonstrate good convergence but also high complexity. On the other hand, Lasso-based estimation can be very eicient with a slight accuracy deviation compared with the EM-based algorithm. The high complexity o the EM-based algorithm stems rom two parts: irst, it iteratively scans users reports and builds a prior distribution table, which has the size o O(Nv k ).For each record o table, one has to compare mj bits. However, when the dimension is high, the combination o Ω j will be very sparse and has lots o zero items. Second, the initial value o the uniormly random assignment will lead to slow convergence. To achieve a balance between the EM-based estimation and Lasso-based estimation, we propose a hybrid algorithm, Lasso+EM JD (Algorithm 3), which irst e- liminates the redundant candidates and estimates the initial value with Lasso-based algorithm and then reines the convergence using EM-based algorithm. The hybrid algorithm has two advantages: 1) The sparse candidates will be selected by the Lassobased estimation algorithm. So, the EM-based algorithm can just compute the conditional probability on these sparse candidates instead o all candidates, which can greatly reduce both time and space complexity. 2) The lasso-based algorithm can give a good initial estimation o the joint distribution. Compared with using initial values with random assignments, using the initial value estimated with the Lasso-based algorithm can urther boost the convergence o the EM algorithm, which is sensitive to the initial value especially when the candidate space is sparse. Theorem 6: The time complexity o Algorithm 3 is O ( (v 3k +kmv 2k +Nkm)+(tN(v ) 2 +Nkm(v )) ), (16) wherev is the average size o sparse items inω 1... Ω k, and v < v k.
8 8 Algorithm 3 Lasso+EM k-dimensional Joint Distribution (Lasso+EM JD) Require: A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability. Ensure: P(A 1 A 2...A k ): k-dimensional joint distribution. 1: compute P (ω 1 ω 2...ω k ) = Lasso JD(A j,ω j,{ŝ i j }N i=1,) 2: set C = {x x C,P (x) = }. 3: or each i = 1,...,N do 4: or each j = 1,...,k do 5: compute P(ŝ i j ω j) = m j )ŝi b=1 ( j [b] (1 2 2 )1 ŝi j [b]. 6: end or 7: i ω 1 ω 2...ω k C then 8: P(ŝ i 1ŝi 2...ŝi k ω 1ω 2...ω k ) = 9: else 1: compute P(ŝ i 1ŝi 2...ŝi k ω 1ω 2...ω k ) = k j=1 P(ŝi j ω j). 11: end i 12: end or 13: initialize t = /* number o iterations */ 14: repeat 15: : /* (similar to Algorithm 1) */ 17: : until P t(ω 1 ω 2...ω k ) converges. 19: return P(A 1 A 2...A k ) = P t(ω 1 ω 2...ω k ) Proo See Theorem 2 and Theorem 4, the only dierence is that ater the Lasso based estimation, only sparse items in Ω 1... Ω k are selected. Theorem 7: The space complexity o Algorithm 3 is O ( Nkm+v k km+2nv ). (17) Proo See Theorem 3 and Theorem Dimension Reduction with Local Privacy Dimension Reduction via 2-dimensional Joint Distribution Estimation The key to reducing dimensionality in a highdimensional dataset is to ind the compact clusters, within which all attributes are tightly correlated to or dependent on each other. Inspired by [35], [5] but without extra privacy budget on dimension reduction, our dimension reduction based on locally once-or-all privacy-preserved data records consists o the ollowing three steps: 1) Pairwise Correlation Computation. We use mutual inormation to measure pairwise correlations between attributes. The mutual inormation is calculated as I m,n = p ij ln p ij (18) p i p j j Ω n i Ω m where, Ω m and Ω n are the domains o attributes A m and A n, respectively. p i and p j represent the probability that A m is the ith value in Ω m and the probability that A n is the jth value in Ω n, respectively. Then, p ij is their joint probability. Particulary,p ij can then be eiciently obtained with our proposed multi-dimensional joint distribution estimation algorithms in Section 5.3, i.e, the hybrid estimation Algorithm 3. Without loss o generality, the term JD reers to the multi-dimensional joint distribution estimation algorithms. As the corresponding marginal distribution, both p i and p j then can be learned rom p ij or estimated with the 2-dimensional joint distribution o A i (or A j ) and itsel A i (or A j ). 2) Dependency Graph Construction. Dependency graph can be used to depict the correlations among attributes. Assume each attribute A j is a node in the dependency graph and an edge between two nodes A m and A n represents that attribute A m and A n are correlated. Based on mutual inormation, the dependency graph o attributes can be constructed as ollows. First, an adjacent matrix G d d (dependency graph o all d attributes) is initialized with all s. Then, all the attribute pairs (A m,a n ) are chosen to compare their mutual inormation with an threshold τ m,n, which is deined as τ m,n = min( Ω m 1, Ω n 1) φ 2 /2, (19) where φ ( φ 1) is a lexible parameter determining the desired correlation level. Normally φ =.2 represent the basic correlation. G m,n and G n,m are both set to be 1 i and only i I m,n > τ m,n. 3) Compact Clusters Building. By triangulation, the dependency graph G d d can be transormed to a junction tree, in which each node represents an attribute A j. Then, based on the junction tree algorithm, several clusters C 1,C 2,...,C l can be obtained as the compact clusters o attributes, in which attributes are mutually correlated. Hence, the whole attributes set can be divided into several compact attribute clusters and the number o dimensions can be eectively reduced. Detailed examples can be reerred to [28]. Algorithm 4 Dimension reduction with local privacy Require: A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability, φ : dependency degree Ensure: C 1,C 2,...,C l : attribute indexes clusters 1: initialize G d d =. 2: or each j = 1,2,...,d do 3: estimate P(A j ) by JD (i.e., Lasso+EM JD Algorithm 3) 4: end or 5: or each attribute m = 1,2,...,d 1 do 6: or each attribute n = m +1,m+2,...,d do 7: estimate P(A ma n) by JD 8: compute I m,n = i Ω m i Ω n p ij ln p ij p i p j 9: compute τ m,n = min( Ω m 1, Ω n 1) φ 2 /2 1: i I(m,n) τ mn then 11: set G m,n = G n,m = 1 12: end i 13: end or 14: end or 15: build dependency graph with G d d 16: triangulate the dependency graph into a junction tree 17: split the junction tree into several cliques C 1,C 2,...,C l with elimination algorithm. 18: return C = {C 1,C 2,...,C l } Theorem 8: The time complexity o Algorithm 4 is O(d 2 (v 6 +2mv 4 +2Nm+tN(v ) 2 +2Nm(v ))). (2) Proo The core o the dimension reduction process is the ( d 2) times o 2-dimensional joint distribution estimation. The complexity o each 2-dimensional joint distribution estimation can be derived rom Equation (16) when adopting the hybrid algorithm (Algorithm 3). The complexity o building junction tree on d d dependency graph is negligible when compared with the joint distribution estimation.
9 9 Theorem 9: The space complexity o Algorithm 4 is O(2Nm+2v 2 m+2nv ). (21) Proo When we compute the mutual correlations between any pairs, a 2-dimensional joint distribution estimation algorithm will be triggered with the space complexity o O(2Nm + 2mv 2 + 2Nv ), since k = 2 is substituted into Equation (17). This maximum complexity dominates Algorithm 4. The space complexity o building junction tree on d d dependency graph is negligible when compared with the joint distribution estimation Entropy based Pruning Scheme In existing work [18], [32] on homogeneous data, correlations can be simply captured by distance or similarity metrics [36]. However, in our work, mutual inormation is used to measure general correlations since heterogenous attributes (a.k.a., attributes with dierent domains) are also considered. As shown in Equation (18), to calculate the mutual inormation o variables X and Y, the joint probability on the joint combination is inevitable, thus making the pairwise computation o dependency necessary. Although mutual inormation is already simpler than Kendall rank coeicients in the similar work [21], here, we also propose a pruning-based heuristic to boost this pairwise correlation learning process. Intuitively, there are dierent situations in Algorithm 4: 1. When φ = or φ = 1, all attributes will be considered mutually correlated or independent. Thus, there is no need to compute pairwise correlation. 2. With the increase o φ ( < φ < 1), less dependencies will be included in the adjacent matrix G d d o dependency graph, which will become sparser. This also means that we may selectively neglect some pairs. Inspired by the relationship between mutual inormation and inormation entropy 2, we irst heuristically ilter out some portion o attributes A x with least relative inormation entropy RH(A x ) = H(A x )/ Ω x, and then veriy the mutual inormation among the remaining attributes, thus reducing the pairwise computations. Furthermore, the adjacent matrix G d d o dependency graph varies in dierent datasets. For example, the adjacent matrix G d d is rarely sparse in binary datasets but very sparse in non-binary datasets. Based on this observation, we can urther simpliy the calculation by inding the independency in binary datasets or inding the dependency in non-binary datasets. For example, we irst set all entries og d d or a binary datasets as1 s and start rom the attributes with least relative inormation entropy RH(A x ) = H(A x )/ Ω x to ind the uncorrelated attributes. While or non-binary datasets, we irst set G d d as s and then start rom the attributes with largest average entropy to ind the correlated attributes. 5.5 Synthesizing New Dataset For brevity, we irst deine A C = {A j j C} and ˆX C = {x j j C}. Then the process o synthesizing the new dataset via sampling is shown in the ollowing Algorithm The relationship between mutual inormation and inormation entropy can be represented as I(X;Y) = H(X) + H(Y) H(X,Y), where H(X) and H(X,Y) denote the inormation entropy o variable X and their joint entropy o X and Y, respectively. Algorithm 5 Entropy based Pruning Scheme Require: A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability, φ : dependency degree Ensure: G d d : adjacent matrix G d d o dependency graph o attributes A j (j = 1,2,...,d) 1: initialize G d d = 2: or each j = 1,2,...,k do 3: compute P(A j ) = JD(A j,ω j,{ŝ i j }N i=1,) 4: compute RH(A j ) = 1 plogp Ω j p P(A j ) 5: end or 6: sort list A = {A 1,A 2,...,A j } according to entropy H(A j ) 7: pick up the previous length(list A ) (1 φ) items rom list A as a new list list A 8:... 9: compute pairwise mutual inormation among list A and set dependency graph G d d as in Algorithm 4. 1: return G d d Algorithm 6 New Dataset Synthesizing Require: C : a collection o attribute index clusters C 1,...C l, A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability, Ensure: ˆX: Synthetic Dataset o X 1: initialize R = 2: repeat 3: randomly choose an attribute index cluster C C 4: estimate joint distribution P(A C ) by JD 5: sample ˆX C according to P(A C ) 6: C = C C, R = R C, D = {D C D R } 7: 8: or each D D do estimate joint distribution P(A D ) by JD 9: obtain conditional distribution P(A D R A D R ) rom P(A D ) 1: sample ˆX D R according to P(A D R A D R ) and ˆX D R 11: C = C D, R = R D, D = {D C D R } 12: end or 13: until C = 14: return ˆX We irst initialize a set R to keep the sampled attribute indexes. Then, we randomly choose an attribute index cluster C to estimate the joint distribution and sample new data ˆX in the attributes A j, j C. Next, we remove C rom the cluster collection C into R, and ind the connected component D o C. In the connected component, each cluster D is traversed and sampled as ollows. irst estimate the joint distribution on the attributes A D by our proposed distribution estimations and obtain the conditional distribution P(A D R A D R ). Then, sample ˆX D R according to this conditional distribution and the sampled data ˆX D R. Ater the traverse o D, the attributes in the irst connected components are sampled. Then randomly choose cluster in the remaining C to sample the attributes in the second connected components, until all clusters are sampled. Finally, a new synthetic dataset ˆX is generated according to the estimated correlations and distributions in origin dataset X. Theorem 1: The time complexity o Algorithm 6 is O(l(v 3k +kmv 2k +Nkm+tN(v ) 2 +Nkm(v ))), (22) where l is the number o clusters ater dimension reduction and k here reers to average number o dimensions in these clusters.
10 1 Fig. 5: Main procedures o high-dimensional data publishing with ǫ local privacy Proo The core o the dataset synthesizing is actually multiple (l times) k-dimensional joint distribution estimation. Theorem 11: The space complexity o Algorithm 6 is O(Nkm+v k km+2nv +Nd). (23) Proo Every time, a k-dimensional joint distribution estimation algorithm (with space complexity o O(Nkm + v k km + 2Nv )) is processed to draw a new dataset. A new dataset with the size O(N d) is maintained while synthesizing. The overall process o LoPub can be summarized in Figure 5. Clearly, all the processed are conducted on the locally privacy-preserved data. Thereore, compared with existing non-local privacy schemes in Figure 1, LoPub can provide consistency local privacy guarantee on all crowdsourced users, thus avoiding insider attacks and multiple assignment o privacy budget. 6 EVALUATION In this section, we conducted extensive experiments on real datasets to demonstrate the eiciency o our algorithms in terms o computation time and accuracy. We used three real-world datasets: Retail [1], Adult [4], and TPC-E [2]. Retail is part o a retail market basket dataset. Each record contains distinct items purchased in a shopping visit. Adult is extracted rom the 1994 US Census. This dataset contains personal inormation, such as gender, salary, and education level. TPC-E contains trade records o Trade type, Security, Security status tables in the TPC-E benchmark. It should be noted that some continuous domain were binned in the preprocess or simplicity. Datasets Type #. Records (N) #. Attributes (d) Domain Size Retail Binary 27, Adult Integer 45, TPC-E Mixed 4, All the experiments were run on a machine with Intel Core i5-52u CPU 2.2GHz and 8GB RAM, using Windows 7. We simulated the crowdsourced environment as ollows. First, users read each data record individually and locally transorm it into privacy-preserving bit strings. Then, the crowdsourced bit strings are gathered by the central server or synthesizing and publishing the high-dimensional dataset. LoPub can be realized by combining distribution estimations and data synthesizing techniques. Thus, we implemented dierent LoPub realizations using Python 2.7 with the ollowing three strategies. 1) EM JD, the generalized EM-based multivariate joint distribution estimation algorithm. 2) Lasso JD, our proposed Lasso-based multivariate joint distribution estimation algorithm. 3) Lasso+EM JD, our proposed hybrid estimation algorithm that uses the Lasso JD to ilter out some candidates to reduce the complexity and replace the initial value to boost the convergence o EM JD. It is worth mentioning that we compared only the above algorithms since our algorithm adopts a novel local privacy paradigm on high-dimensional data. Other competitors are either or non-local privacy [5], [35], [21] or on low-dimension data [12], [14], [16] and thereore not comparable. For air comparison, we randomly chose 1 combinations o k attributes rom d dimensional data. For simplicity, we sampled 3 5% data rom dataset Retail and 1% data rom datasets Adult and TPC-E, respectively. The eiciency o our algorithms is measured by computation time and accuracy. The computation time includes CPU time and IO cost. Each set o experiments is run 1 times, and the average running time is reported. To measure accuracy, we used the distance metrics AVD (average variant distance) on the three datasets, as suggested in [5], to quantiy the closeness between the estimated joint distribution P(ω) and the origin joint distribution Q(ω). The AVD error is deined as Dist AVD (P,Q) = 1 P(ω) Q(ω). (24) 2 ω Ω The deault parameters are described as ollows. In the binary dataset Retail, the maximum number o bits and the number o hash unctions used in the bloom ilter are m = 32 and h = 4, respectively. In the non-binary datasets Adult and TPC-E, the maximum number o bits and the number o hash unctions used in bloom ilter are m = 128 and h = 4, respectively. The convergence gap is set as.1 or ast convergence. 6.1 Multivariate Distribution Estimation Here, we show the perormance o our proposed distribution estimations in terms o both eiciency and eectiveness. The eiciency is measured by computation time, and the eectiveness is measured by estimation accuracy Computation Time We irst evaluate the computation time o EM JD, Lasso JD, and Lasso+EM JD or the k-dimensional joint distribution estimation on three real datasets. Figures 6 and 7 compare the computation time on the binary dataset Retail with both k = 3 and k = 5. It can be noticed that, or each dimension k, Lasso JD is consistently much aster than EM JD and Lasso+EM JD, especially when k is large. This is because EM JD has to repeatedly scan each user s bit string. Particularly, the time consumption o EM JD increases with because there will be more iterations or the ixed convergence gap. In contrast, Lasso JD uses the regression to estimate the joint distribution more eiciently. Furthermore, the complexity o Lasso+EM JD is much less than EM JD as the initial estimation o Lasso JD can greatly reduce the candidate attribute space and the number o iterations needed. When k is growing, the computation time o Lasso JD increases slowly, unlike EM JD that has a dramatic increase. This is because the 3. It should be noted that, with sampled data, the dierential privacy level can be urther enhanced [23]. But sampling used here is or simplicity.
On High-Rate Cryptographic Compression Functions
On High-Rate Cryptographic Compression Functions Richard Ostertág and Martin Stanek Department o Computer Science Faculty o Mathematics, Physics and Inormatics Comenius University Mlynská dolina, 842 48
More informationScattered Data Approximation of Noisy Data via Iterated Moving Least Squares
Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems
More informationOPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION
OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION Xu Bei, Yeo Jun Yoon and Ali Abur Teas A&M University College Station, Teas, U.S.A. abur@ee.tamu.edu Abstract This paper presents
More informationSecureML: A System for Scalable Privacy-Preserving Machine Learning
SecureML: A System or Scalable Privacy-Preserving Machine Learning Payman Mohassel and Yupeng Zhang Visa Research, University o Maryland Abstract Machine learning is widely used in practice to produce
More information(C) The rationals and the reals as linearly ordered sets. Contents. 1 The characterizing results
(C) The rationals and the reals as linearly ordered sets We know that both Q and R are something special. When we think about about either o these we usually view it as a ield, or at least some kind o
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationSupplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values
Supplementary material or Continuous-action planning or discounted ininite-horizon nonlinear optimal control with Lipschitz values List o main notations x, X, u, U state, state space, action, action space,
More informationPhysics 5153 Classical Mechanics. Solution by Quadrature-1
October 14, 003 11:47:49 1 Introduction Physics 5153 Classical Mechanics Solution by Quadrature In the previous lectures, we have reduced the number o eective degrees o reedom that are needed to solve
More informationFigure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0.
Probabilistic Artiicial Intelligence Problem Set 3 Oct 27, 2017 1. Variable elimination In this exercise you will use variable elimination to perorm inerence on a bayesian network. Consider the network
More informationGaussian Process Regression Models for Predicting Stock Trends
Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no
More information2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction
. ETA EVALUATIONS USING WEBER FUNCTIONS Introduction So ar we have seen some o the methods or providing eta evaluations that appear in the literature and we have seen some o the interesting properties
More informationOnline Appendix: The Continuous-type Model of Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye
Online Appendix: The Continuous-type Model o Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye For robustness check, in this section we extend our
More informationImprovement of Sparse Computation Application in Power System Short Circuit Study
Volume 44, Number 1, 2003 3 Improvement o Sparse Computation Application in Power System Short Circuit Study A. MEGA *, M. BELKACEMI * and J.M. KAUFFMANN ** * Research Laboratory LEB, L2ES Department o
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationNew Results on Boomerang and Rectangle Attacks
New Results on Boomerang and Rectangle Attacks Eli Biham, 1 Orr Dunkelman, 1 Nathan Keller 2 1 Computer Science Department, Technion. Haia 32000, Israel {biham,orrd}@cs.technion.ac.il 2 Mathematics Department,
More information2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.
.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics Spline interpolation was originally developed or image processing. In GIS, it is mainly used in visualization o spatial
More informationNumerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective
Numerical Solution o Ordinary Dierential Equations in Fluctuationlessness Theorem Perspective NEJLA ALTAY Bahçeşehir University Faculty o Arts and Sciences Beşiktaş, İstanbul TÜRKİYE TURKEY METİN DEMİRALP
More informationSecure Communication in Multicast Graphs
Secure Communication in Multicast Graphs Qiushi Yang and Yvo Desmedt Department o Computer Science, University College London, UK {q.yang, y.desmedt}@cs.ucl.ac.uk Abstract. In this paper we solve the problem
More informationRoberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points
Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques
More informationLeast-Squares Spectral Analysis Theory Summary
Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,
More informationPersistent Data Sketching
Persistent Data Sketching Zhewei Wei School o Inormation, Renmin University o China Key Laboratory o Data Engineering and Knowledge Engineering, MOE zhewei@ruc.edu.cn Xiaoyong Du School o Inormation, Renmin
More informationDigital Image Processing. Lecture 6 (Enhancement) Bu-Ali Sina University Computer Engineering Dep. Fall 2009
Digital Image Processing Lecture 6 (Enhancement) Bu-Ali Sina University Computer Engineering Dep. Fall 009 Outline Image Enhancement in Spatial Domain Spatial Filtering Smoothing Filters Median Filter
More informationProbabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid
Probabilistic Model o Error in Fixed-Point Arithmetic Gaussian Pyramid Antoine Méler John A. Ruiz-Hernandez James L. Crowley INRIA Grenoble - Rhône-Alpes 655 avenue de l Europe 38 334 Saint Ismier Cedex
More informationarxiv: v1 [cs.it] 12 Mar 2014
COMPRESSIVE SIGNAL PROCESSING WITH CIRCULANT SENSING MATRICES Diego Valsesia Enrico Magli Politecnico di Torino (Italy) Dipartimento di Elettronica e Telecomunicazioni arxiv:403.2835v [cs.it] 2 Mar 204
More informationCOMP 408/508. Computer Vision Fall 2017 PCA for Recognition
COMP 408/508 Computer Vision Fall 07 PCA or Recognition Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, )
More informationLocally Differentially Private Protocols for Frequency Estimation. Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha
Locally Differentially Private Protocols for Frequency Estimation Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha Differential Privacy Differential Privacy Classical setting Differential Privacy
More informationThe achievable limits of operational modal analysis. * Siu-Kui Au 1)
The achievable limits o operational modal analysis * Siu-Kui Au 1) 1) Center or Engineering Dynamics and Institute or Risk and Uncertainty, University o Liverpool, Liverpool L69 3GH, United Kingdom 1)
More informationRAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response
RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova Google & USC Presented By: Pat Pannuto RAPPOR, What is is good for? (Absolutely something!)
More informationFinger Search in the Implicit Model
Finger Search in the Implicit Model Gerth Stølting Brodal, Jesper Sindahl Nielsen, Jakob Truelsen MADALGO, Department o Computer Science, Aarhus University, Denmark. {gerth,jasn,jakobt}@madalgo.au.dk Abstract.
More informationNONLINEAR CONTROL OF POWER NETWORK MODELS USING FEEDBACK LINEARIZATION
NONLINEAR CONTROL OF POWER NETWORK MODELS USING FEEDBACK LINEARIZATION Steven Ball Science Applications International Corporation Columbia, MD email: sball@nmtedu Steve Schaer Department o Mathematics
More informationChapter 6 Reliability-based design and code developments
Chapter 6 Reliability-based design and code developments 6. General Reliability technology has become a powerul tool or the design engineer and is widely employed in practice. Structural reliability analysis
More informationRobust Residual Selection for Fault Detection
Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed
More informationProduct Matrix MSR Codes with Bandwidth Adaptive Exact Repair
1 Product Matrix MSR Codes with Bandwidth Adaptive Exact Repair Kaveh Mahdaviani, Soheil Mohajer, and Ashish Khisti ECE Dept, University o Toronto, Toronto, ON M5S3G4, Canada ECE Dept, University o Minnesota,
More informationOBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL
OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL Dionisio Bernal, Burcu Gunes Associate Proessor, Graduate Student Department o Civil and Environmental Engineering, 7 Snell
More informationEvaluating Probabilistic Queries over Imprecise Data
Evaluating Probabilistic Queries over Imprecise Data Reynold Cheng Dmitri V. Kalashnikov Sunil Prabhakar Department o Computer Science, Purdue University Email: {ckcheng,dvk,sunil}@cs.purdue.edu http://www.cs.purdue.edu/place/
More informationExact Inference: Variable Elimination
Readings: K&F 9.2 9. 9.4 9.5 Exact nerence: Variable Elimination ecture 6-7 Apr 1/18 2011 E 515 tatistical Methods pring 2011 nstructor: u-n ee University o Washington eattle et s revisit the tudent Network
More information[Title removed for anonymity]
[Title removed for anonymity] Graham Cormode graham@research.att.com Magda Procopiuc(AT&T) Divesh Srivastava(AT&T) Thanh Tran (UMass Amherst) 1 Introduction Privacy is a common theme in public discourse
More informationDistributed Optimization Methods for Wide-Area Damping Control of Power System Oscillations
Preprints o the 9th World Congress The International Federation o Automatic Control Cape Town, South Arica. August 4-9, 4 Distributed Optimization Methods or Wide-Area Damping Control o Power System Oscillations
More informationDETC A GENERALIZED MAX-MIN SAMPLE FOR RELIABILITY ASSESSMENT WITH DEPENDENT VARIABLES
Proceedings o the ASME International Design Engineering Technical Conerences & Computers and Inormation in Engineering Conerence IDETC/CIE August 7-,, Bualo, USA DETC- A GENERALIZED MAX-MIN SAMPLE FOR
More informationFluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs
Fluctuationlessness Theorem and its Application to Boundary Value Problems o ODEs NEJLA ALTAY İstanbul Technical University Inormatics Institute Maslak, 34469, İstanbul TÜRKİYE TURKEY) nejla@be.itu.edu.tr
More informationSupplement for In Search of the Holy Grail: Policy Convergence, Experimentation and Economic Performance Sharun W. Mukand and Dani Rodrik
Supplement or In Search o the Holy Grail: Policy Convergence, Experimentation and Economic Perormance Sharun W. Mukand and Dani Rodrik In what ollows we sketch out the proos or the lemma and propositions
More informationProvable Seconde Preimage Resistance Revisited
Provable Seconde Preimage Resistance Revisited Charles Bouillaguet 1 Bastien Vayssiere 2 1 LIFL University o Lille, France 2 PRISM University o Versailles, France SAC 2013 1 / 29 Cryptographic Hash Functions
More information3770 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 24, NO. 6, DECEMBER Muhammad Shahzad and Alex X. Liu
3770 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 24, NO. 6, DECEMBER 2016 Fast and Reliable Detection and Identiication o Missing RFID Tags in the Wild Abstract Radio-requency identiication RFID) systems
More informationEstimation and detection of a periodic signal
Estimation and detection o a periodic signal Daniel Aronsson, Erik Björnemo, Mathias Johansson Signals and Systems Group, Uppsala University, Sweden, e-mail: Daniel.Aronsson,Erik.Bjornemo,Mathias.Johansson}@Angstrom.uu.se
More informationA Systematic Approach to Frequency Compensation of the Voltage Loop in Boost PFC Pre- regulators.
A Systematic Approach to Frequency Compensation o the Voltage Loop in oost PFC Pre- regulators. Claudio Adragna, STMicroelectronics, Italy Abstract Venable s -actor method is a systematic procedure that
More informationReceived: 30 July 2017; Accepted: 29 September 2017; Published: 8 October 2017
mathematics Article Least-Squares Solution o Linear Dierential Equations Daniele Mortari ID Aerospace Engineering, Texas A&M University, College Station, TX 77843, USA; mortari@tamu.edu; Tel.: +1-979-845-734
More informationHao Ren, Wim J. van der Linden and Qi Diao
psychometrika vol. 82, no. 2, 498 522 June 2017 doi: 10.1007/s11336-017-9553-1 CONTINUOUS ONLINE ITEM CALIBRATION: PARAMETER RECOVERY AND ITEM UTILIZATION Hao Ren, Wim J. van der Linden and Qi Diao PACIFIC
More informationBenny Pinkas Bar Ilan University
Winter School on Bar-Ilan University, Israel 30/1/2011-1/2/2011 Bar-Ilan University Benny Pinkas Bar Ilan University 1 Extending OT [IKNP] Is fully simulatable Depends on a non-standard security assumption
More informationAdditional exercises in Stationary Stochastic Processes
Mathematical Statistics, Centre or Mathematical Sciences Lund University Additional exercises 8 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
More informationPower Spectral Analysis of Elementary Cellular Automata
Power Spectral Analysis o Elementary Cellular Automata Shigeru Ninagawa Division o Inormation and Computer Science, Kanazawa Institute o Technology, 7- Ohgigaoka, Nonoichi, Ishikawa 92-850, Japan Spectral
More informationNONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS
REVSTAT Statistical Journal Volume 16, Number 2, April 2018, 167 185 NONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS Authors: Frank P.A. Coolen Department
More informationTelescoping Decomposition Method for Solving First Order Nonlinear Differential Equations
Telescoping Decomposition Method or Solving First Order Nonlinear Dierential Equations 1 Mohammed Al-Reai 2 Maysem Abu-Dalu 3 Ahmed Al-Rawashdeh Abstract The Telescoping Decomposition Method TDM is a new
More informationTHE use of radio frequency channels assigned to primary. Traffic-Aware Channel Sensing Order in Dynamic Spectrum Access Networks
EEE JOURNAL ON SELECTED AREAS N COMMUNCATONS, VOL. X, NO. X, X 01X 1 Traic-Aware Channel Sensing Order in Dynamic Spectrum Access Networks Chun-Hao Liu, Jason A. Tran, Student Member, EEE, Przemysław Pawełczak,
More information(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)
Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not
More informationOn the Girth of (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs
On the Girth o (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs Sudarsan V S Ranganathan, Dariush Divsalar and Richard D Wesel Department o Electrical Engineering, University o Caliornia, Los
More informationAH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge
0 E ttenuator Pair Ratio or vs requency NEEN-ERLN 700 Option-E 0-0 k Ultra-precision apacitance/loss ridge ttenuator Ratio Pair Uncertainty o in ppm or ll Usable Pairs o Taps 0 0 0. 0. 0. 07/08/0 E E E
More informationBetter Than Advertised: Improved Collision-Resistance Guarantees for MD-Based Hash Functions
Better Than Advertised: Improved Collision-Resistance Guarantees or MD-Based Hash Functions Mihir Bellare University o Caliornia San Diego La Jolla, Caliornia mihir@eng.ucsd.edu Joseph Jaeger University
More informationObjectives. By the time the student is finished with this section of the workbook, he/she should be able
FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise
More informationFinite Dimensional Hilbert Spaces are Complete for Dagger Compact Closed Categories (Extended Abstract)
Electronic Notes in Theoretical Computer Science 270 (1) (2011) 113 119 www.elsevier.com/locate/entcs Finite Dimensional Hilbert Spaces are Complete or Dagger Compact Closed Categories (Extended bstract)
More informationIntroduction to Simulation - Lecture 2. Equation Formulation Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy
Introduction to Simulation - Lecture Equation Formulation Methods Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Outline Formulating Equations rom Schematics Struts and Joints
More informationProvably Secure Double-Block-Length Hash Functions in a Black-Box Model
Provably Secure Double-Block-ength Hash Functions in a Black-Box Model Shoichi Hirose Graduate School o Inormatics, Kyoto niversity, Kyoto 606-8501 Japan hirose@i.kyoto-u.ac.jp Abstract. In CRYPTO 89,
More informationConstrained Keys for Invertible Pseudorandom Functions
Constrained Keys or Invertible Pseudorandom Functions Dan Boneh, Sam Kim, and David J. Wu Stanord University {dabo,skim13,dwu4}@cs.stanord.edu Abstract A constrained pseudorandom unction (PRF) is a secure
More informationEquidistant Polarizing Transforms
DRAFT 1 Equidistant Polarizing Transorms Sinan Kahraman Abstract arxiv:1708.0133v1 [cs.it] 3 Aug 017 This paper presents a non-binary polar coding scheme that can reach the equidistant distant spectrum
More informationSupplementary Information Reconstructing propagation networks with temporal similarity
Supplementary Inormation Reconstructing propagation networks with temporal similarity Hao Liao and An Zeng I. SI NOTES A. Special range. The special range is actually due to two reasons: the similarity
More informationA Particle Swarm Optimization Algorithm for Neighbor Selection in Peer-to-Peer Networks
A Particle Swarm Optimization Algorithm or Neighbor Selection in Peer-to-Peer Networks Shichang Sun 1,3, Ajith Abraham 2,4, Guiyong Zhang 3, Hongbo Liu 3,4 1 School o Computer Science and Engineering,
More informationThe concept of limit
Roberto s Notes on Dierential Calculus Chapter 1: Limits and continuity Section 1 The concept o limit What you need to know already: All basic concepts about unctions. What you can learn here: What limits
More informationLectures 1&2: Introduction to Secure Computation, Yao s and GMW Protocols
CS 294 Secure Computation January 19, 2016 Lectures 1&2: Introduction to Secure Computation, Yao s and GMW Protocols Instructor: Sanjam Garg Scribe: Pratyush Mishra 1 Introduction Secure multiparty computation
More informationarxiv: v1 [cs.ds] 3 Feb 2018
A Model for Learned Bloom Filters and Related Structures Michael Mitzenmacher 1 arxiv:1802.00884v1 [cs.ds] 3 Feb 2018 Abstract Recent work has suggested enhancing Bloom filters by using a pre-filter, based
More informationELEG 3143 Probability & Stochastic Process Ch. 4 Multiple Random Variables
Department o Electrical Engineering University o Arkansas ELEG 3143 Probability & Stochastic Process Ch. 4 Multiple Random Variables Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Two discrete random variables
More informationComputing proximal points of nonconvex functions
Mathematical Programming manuscript No. (will be inserted by the editor) Warren Hare Claudia Sagastizábal Computing proximal points o nonconvex unctions the date o receipt and acceptance should be inserted
More informationA Simple Explanation of the Sobolev Gradient Method
A Simple Explanation o the Sobolev Gradient Method R. J. Renka July 3, 2006 Abstract We have observed that the term Sobolev gradient is used more oten than it is understood. Also, the term is oten used
More informationA Method for Assimilating Lagrangian Data into a Shallow-Water-Equation Ocean Model
APRIL 2006 S A L M A N E T A L. 1081 A Method or Assimilating Lagrangian Data into a Shallow-Water-Equation Ocean Model H. SALMAN, L.KUZNETSOV, AND C. K. R. T. JONES Department o Mathematics, University
More informationStochastic Game Approach for Replay Attack Detection
Stochastic Game Approach or Replay Attack Detection Fei Miao Miroslav Pajic George J. Pappas. Abstract The existing tradeo between control system perormance and the detection rate or replay attacks highlights
More informationDesign and Optimal Configuration of Full-Duplex MAC Protocol for Cognitive Radio Networks Considering Self-Interference
Received November 8, 015, accepted December 10, 015, date o publication December 17, 015, date o current version December 8, 015. Digital Object Identiier 10.1109/ACCE.015.509449 Design and Optimal Coniguration
More informationPercentile Policies for Inventory Problems with Partially Observed Markovian Demands
Proceedings o the International Conerence on Industrial Engineering and Operations Management Percentile Policies or Inventory Problems with Partially Observed Markovian Demands Farzaneh Mansouriard Department
More informationThe Deutsch-Jozsa Problem: De-quantization and entanglement
The Deutsch-Jozsa Problem: De-quantization and entanglement Alastair A. Abbott Department o Computer Science University o Auckland, New Zealand May 31, 009 Abstract The Deustch-Jozsa problem is one o the
More informationFeedback Linearization
Feedback Linearization Peter Al Hokayem and Eduardo Gallestey May 14, 2015 1 Introduction Consider a class o single-input-single-output (SISO) nonlinear systems o the orm ẋ = (x) + g(x)u (1) y = h(x) (2)
More informationSolving Multi-Mode Time-Cost-Quality Trade-off Problem in Uncertainty Condition Using a Novel Genetic Algorithm
International Journal o Management and Fuzzy Systems 2017; 3(3): 32-40 http://www.sciencepublishinggroup.com/j/ijms doi: 10.11648/j.ijms.20170303.11 Solving Multi-Mode Time-Cost-Quality Trade-o Problem
More informationPublished in the American Economic Review Volume 102, Issue 1, February 2012, pages doi: /aer
Published in the American Economic Review Volume 102, Issue 1, February 2012, pages 594-601. doi:10.1257/aer.102.1.594 CONTRACTS VS. SALARIES IN MATCHING FEDERICO ECHENIQUE Abstract. Firms and workers
More informationChapter 2. Basic concepts of probability. Summary. 2.1 Axiomatic foundation of probability theory
Chapter Basic concepts o probability Demetris Koutsoyiannis Department o Water Resources and Environmental Engineering aculty o Civil Engineering, National Technical University o Athens, Greece Summary
More informationOn Security Arguments of the Second Round SHA-3 Candidates
On Security Arguments o the Second Round SA-3 Candidates Elena Andreeva Andrey Bogdanov Bart Mennink Bart Preneel Christian Rechberger March 19, 2012 Abstract In 2007, the US National Institute or Standards
More informationThe Analysis of Electricity Storage Location Sites in the Electric Transmission Grid
Proceedings o the 2010 Industrial Engineering Research Conerence A. Johnson and J. Miller, eds. The Analysis o Electricity Storage Location Sites in the Electric Transmission Grid Thomas F. Brady College
More informationBANDELET IMAGE APPROXIMATION AND COMPRESSION
BANDELET IMAGE APPOXIMATION AND COMPESSION E. LE PENNEC AND S. MALLAT Abstract. Finding eicient geometric representations o images is a central issue to improve image compression and noise removal algorithms.
More informationCMPUT651: Differential Privacy
CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged
More informationA Brief Survey on Semi-supervised Learning with Graph Regularization
000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 A
More informationCommon Errors: How to (and Not to) Control for Unobserved Heterogeneity *
Common Errors: How to (and ot to) Control or Unobserved Heterogeneity * Todd A. Gormley and David A. Matsa June 6 0 Abstract Controlling or unobserved heterogeneity (or common errors ) such as industry-speciic
More informationarxiv:quant-ph/ v2 12 Jan 2006
Quantum Inormation and Computation, Vol., No. (25) c Rinton Press A low-map model or analyzing pseudothresholds in ault-tolerant quantum computing arxiv:quant-ph/58176v2 12 Jan 26 Krysta M. Svore Columbia
More informationA PROBABILISTIC POWER DOMAIN ALGORITHM FOR FRACTAL IMAGE DECODING
Stochastics and Dynamics, Vol. 2, o. 2 (2002) 6 73 c World Scientiic Publishing Company A PROBABILISTIC POWER DOMAI ALGORITHM FOR FRACTAL IMAGE DECODIG V. DRAKOPOULOS Department o Inormatics and Telecommunications,
More informationSemideterministic Finite Automata in Operational Research
Applied Mathematical Sciences, Vol. 0, 206, no. 6, 747-759 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.2988/ams.206.62 Semideterministic Finite Automata in Operational Research V. N. Dumachev and
More informationCORRESPONDENCE ANALYSIS
CORRESPONDENCE ANALYSIS INTUITIVE THEORETICAL PRESENTATION BASIC RATIONALE DATA PREPARATION INITIAL TRANSFORAMATION OF THE INPUT MATRIX INTO PROFILES DEFINITION OF GEOMETRIC CONCEPTS (MASS, DISTANCE AND
More informationRESOLUTION MSC.362(92) (Adopted on 14 June 2013) REVISED RECOMMENDATION ON A STANDARD METHOD FOR EVALUATING CROSS-FLOODING ARRANGEMENTS
(Adopted on 4 June 203) (Adopted on 4 June 203) ANNEX 8 (Adopted on 4 June 203) MSC 92/26/Add. Annex 8, page THE MARITIME SAFETY COMMITTEE, RECALLING Article 28(b) o the Convention on the International
More informationReliability Assessment with Correlated Variables using Support Vector Machines
Reliability Assessment with Correlated Variables using Support Vector Machines Peng Jiang, Anirban Basudhar, and Samy Missoum Aerospace and Mechanical Engineering Department, University o Arizona, Tucson,
More informationEstimation of Sample Reactivity Worth with Differential Operator Sampling Method
Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.842-850 (2011) ARTICLE Estimation o Sample Reactivity Worth with Dierential Operator Sampling Method Yasunobu NAGAYA and Takamasa MORI Japan Atomic
More informationBasic properties of limits
Roberto s Notes on Dierential Calculus Chapter : Limits and continuity Section Basic properties o its What you need to know already: The basic concepts, notation and terminology related to its. What you
More informationProbabilistic Optimisation applied to Spacecraft Rendezvous on Keplerian Orbits
Probabilistic Optimisation applied to pacecrat Rendezvous on Keplerian Orbits Grégory aive a, Massimiliano Vasile b a Université de Liège, Faculté des ciences Appliquées, Belgium b Dipartimento di Ingegneria
More informationSimpler Functions for Decompositions
Simpler Functions or Decompositions Bernd Steinbach Freiberg University o Mining and Technology, Institute o Computer Science, D-09596 Freiberg, Germany Abstract. This paper deals with the synthesis o
More informationAggregate Growth: R =αn 1/ d f
Aggregate Growth: Mass-ractal aggregates are partly described by the mass-ractal dimension, d, that deines the relationship between size and mass, R =αn 1/ d where α is the lacunarity constant, R is the
More informationComptes rendus de l Academie bulgare des Sciences, Tome 59, 4, 2006, p POSITIVE DEFINITE RANDOM MATRICES. Evelina Veleva
Comtes rendus de l Academie bulgare des ciences Tome 59 4 6 353 36 POITIVE DEFINITE RANDOM MATRICE Evelina Veleva Abstract: The aer begins with necessary and suicient conditions or ositive deiniteness
More informationSEPARATED AND PROPER MORPHISMS
SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN Last quarter, we introduced the closed diagonal condition or a prevariety to be a prevariety, and the universally closed condition or a variety to be complete.
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More information