LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy

Size: px
Start display at page:

Download "LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy"

Transcription

1 1 LoPub: High-Dimensional Crowdsourced Data Publication with Local Dierential Privacy Xuebin Ren, Chia-Mu Yu, Weiren Yu, Shusen Yang, Xinyu Yang, Julie A. McCann, and Philip S. Yu Abstract High-dimensional crowdsourced data collected rom numerous users produces rich knowledge or our society. However, it also brings unprecedented privacy threats to the participants. Local privacy, a variant o dierential privacy, is proposed to eliminate privacy concerns. Unortunately, achieving local privacy on high-dimensional crowdsourced data raises great challenges in terms o both computational eiciency and eectiveness. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we irst propose eicient multi-dimensional joint distribution estimation algorithms that maintain local privacy. Then, we develop a Locally privacy-preserving high-dimensional data Publication algorithm, LoPub, by taking advantage o our distribution estimation techniques. In particular, both correlations and joint distributions among multiple attributes are identiied to reduce the dimensionality o crowdsourced data, thus achieving both eiciency and eectiveness in high-dimensional data publication. To the best o our knowledge, this is the irst work addressing high-dimensional crowdsourced data publication with local privacy. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme signiicantly outperorms existing estimation schemes in terms o both communication overhead and estimation speed, and conirm that our LoPub scheme can keep average 8% and 6% accuracy over the published approximate datasets in terms o SVM and random orest classiication, respectively. Index Terms Local privacy, high-dimensional data, crowdsourced data, data publication 1 INTRODUCTION With the development o various integrated sensors and crowd sensing systems [19], crowdsourced inormation rom all aspects can be collected and analyzed to better produce rich knowledge about the group, which can beneit everyone in the crowdsourced system [2]. Particularly, with multi-dimensional crowdsourced data (data with multiple attributes), a lot o potential inormation and patterns behind the data can be mined or extracted to provide accurate dynamics and reliable prediction or both group and individuals. However, the participants privacy can still be easily inerred or identiied due to the publication o crowdsourced data [15], [33], especially high-dimensional data, even though some existing privacy-preserving schemes and end-to-end encryption are used. The reasons or privacy leaks are two-old: Non-local Privacy. Most existing solutions or privacy protection ocus on centralized datasets under the assumption that the server is trusted. However, despite the privacy protection against dierence and inerence attacks rom aggregate queries, an individual s data may still suer rom privacy leakage beore aggregation because o the lack o local privacy [17], [7] on the user side. Curse o High-dimensionality. With the increase o data dimensions, some existing privacy-preserving techniques like dierential privacy [8], i straightorwardly applied to multiple attributes with high correlations, will become vulnerable [25], [35], thereby increasing the success ratio o many reerence attacks like cross-checking. Even worse, according X. Ren, S. Yang, and X. Yang are with Xi an Jiaotong University. s: {xb.ren@stu, shusenyang@mail, yxyphd@mail}.xjtu.edu.cn C.M. Yu is with National Chung Hsing University. chimayu@gmail.com W. Yu is with Imperial College London and Aston University. s:weiren.yu@imperial.ac.uk, w.yu3@aston.ac.uk J. McCann is with Imperial College London. j.mccann@imperial.ac.uk P. Yu is with University o Illinoise at Chicago. psyu@uic.edu to the composition theorem [26], dierential privacy degrades exponentially when multiple correlated queries are processed. In addition to privacy vulnerability, the large scale o various data records collected rom many distributed users can exaserbate the ineiciency o data processing. Especially in IoT applications, the ubiquitous but resource-constrained sensors require extremely high eiciency and low overhead. For example, privacypreserving real-time pricing mechanisms require not only eective privacy guarantees or individuals electricity usage but also ast response to the dynamical changes o demands and supply in the smart grid [24]. Thus, it is important to provide an eicient privacy-preserving method to publish crowdsourced high-dimensional data. Contributions. To address the above concerns, this paper makes the ollowing contributions. We are the irst to address the problem o highdimensional crowdsourced data publication with local privacy to the best o our knowledge. We propose a locally privacy-preserving scheme or crowdsensing systems to collect and build highdimensional data rom distributed users. Particularly, dierential privacy is directly achieved or each distributed user. Then, based on EM and Lasso regression, we propose eicient algorithms or multivariate joint distribution estimation. By taking advantage o speciic marginal distributions rom the locally privacy-preserved data ater dimensionality and sparsity reduction, we propose LoPub solution that can generate an approximation o the original crowdsourced data with the guarantee o local privacy. We implemented and evaluated our schemes on real-world datasets. Experimental results conirm the eiciency and eectiveness o our proposed distribution estimation and data release mechanisms. Due to the page limit, some detailed examples and explanations that are not presented in this paper can be ound in our ull length preprint technical report [28].

2 2 Fig. 1: Main procedures o high-dimensional data publishing with non-local ǫ = ǫ 1 +ǫ 2 privacy 2 RELATED WORK 2.1 Privacy in Centralized Setting Dierential privacy [8] orms a mathematical oundation or privacy protection by imposing proper randomness on statistical query results. Examples o the use o d- ierential privacy include privacy-preserving data aggregation, where dierential privacy o individuals can be guaranteed by injecting careully-calibrated Laplacian noise [5], [13], [18], [22], [35]. For privacy-preserving lowdimensional data publication, to show crowd statistics and draw the correlations between attributes, both the dierentially privacy-preserving histogram (univariate distribution) [3] and contingency table [27] are widely investigated. However, the techniques or non-interactive dierential privacy [9], [1] in these works suer rom the curse o dimensionality [35], [5]. Particularly, the composition theorems [26] have pointed out that the privacy levels degrade when multiple related queries are processed.to deal with the correlations in high-dimensional data, dierent schemes (e.g., approximations via low dimensional data clusters) have been proposed [5], [6], [18], [21], [32], [35]. Among them, the state-o-art scheme [5] proposed to reduce the dimension by using junction tree to model the correlations. Moreover, Su et al. [31] proposed a multi-party setting to publish synthetic dataset rom multiple data curators. However, their multi-party computation can only protect privacy between data servers and individual s local privacy cannot be guaranteed. Due to the lack o local privacy guarantee, these works, as summarized in Figure 1, may be exposed to some insider attackers, thus being unable to directly apply to crowdsourced systems. 2.2 Privacy in Distributed Setting The schemes mentioned above mainly deal with centralized datasets. Nonetheless, there could be scenarios, where distributed users contribute to the aggregate s- tatistics. Despite the privacy protection against dierence and inerence attacks rom aggregate queries, an individual s data may also suer rom privacy leakage beore aggregation [11]. Hence, local privacy [7], [16], [17] has been proposed to provide local privacy guarantees or distributed users. In addition, local privacy rom the end user can ensure the consistency o the privacy guarantees when there are multiple accesses to users data, in contrast to non-local privacy schemes that has to properly split and assign privacy budgets to dierent steps [5], [21], [35]. In existing work [15][12][14], local privacy is implemented with randomized response technique [34]. However, the correlations and sparsity in high-dimensional data are not well considered, which will cause low scalability and utility or highdimensional data [25], [35]. Fig. 2: An architecture o distributed high dimensional private data collection and publication Dierent rom these work, we propose a novel mechanism to publish high-dimensional crowdsourced data with local privacy or individuals. We compare our work with three similar existing solutions described in the Table 1. More speciically, our method has lower communication costs, time and storage complexity, compared to state-o-the-art approaches. TABLE 1: Comparison o LoPub with existing methods Comparison LoPub (Our method) RAPPOR [12] EM [14] JTree [5] Local privacy Y Y Y N High Dimension Y N N Y Communication O( j Ω j ) O( j Ω j ) O( j Ω j ) - Time Complexity Low Large Large - Space Complexity Low Large Large - Ω j is the domain size o the j-th dimension. 3 SYSTEM MODEL Our system model is depicted in Figure 2, where a number o users and a central server constitute a crowdsourcing system. The users generate multi-dimensional data records, and then send these data to the central server. The server gathers all the data and estimates high-dimensional crowdsourced data distribution with local privacy, aiming to release a privacy-preserving dataset to third-parties or conducting data analysis. In this paper, we mainly ocus on data privacy, and thus the detailed network model is omitted. Problem Statement. Given a collection o data records with d attributes rom dierent users, our goal is to help the central server publish a synthetic dataset that has the approximate joint distribution o d attributes with local privacy. Formally, let N be the total number o users (i.e., data records 1 ) and suiciently large. Let X = {X 1,X 2,...,X N } be the crowdsourced dataset, where X i denotes the data record rom the ith user. We assume that there are d attributes A = {A 1,A 2,...,A d } in X. Then each data record X i can be represented as X i = {x i 1,x i 2,...,x i d }, where xi j denotes the jth element o the ith user record. For each attribute A j (j = 1,2,...,d), we denote Ω j = {ωj 1,ω2 j,...,ω Ωj j } as the domain o A j, where ωj i is the ith possible attribute value o Ω j and Ω j is the cardinality o Ω j. With the above notations, our problem can be ormulated as ollows. Given a dataset X with local privacy, we aim to release an approximate dataset X with the same attributes A and N users record in X such that P X (A 1...A d ) P X (A 1...A d ), (1) 1. For brevity, we assume that each user sends only one data record to the central server.

3 3 where P X (A 1...A d ) P X (x i 1 = ω 1,...,x i d = ω d), i = 1,...,N, ω 1,...,ω d Ω d and P X (x i 1 = ω 1,...,x i d = ω d) is deined as the d-dimensional joint distribution on X. To ocus our research on data privacy, we assume that the central server and users are all honest-but-curious in the sense that they will honestly ollow the protocols in the system without maliciously manipulating their received data. However, they may be curious about others data and even collide to iner others data. In addition, the central server and users share the same public inormation, such as the privacy-preserving protocols (including the hash unctions used). 4 PRELIMINARIES 4.1 Dierential Privacy Dierential privacy is the de-acto standard or providing privacy guarantees [8]. It limits the adversary s ability o inerring the participation or absence o any user in a data set via adding careully calibrated noise (e.g., Laplacian noise [8]) to query results. The algorithm M is ǫ-dierentially private i or all neighboring datasets D 1 and D 2 that dier on a single element (e.g., the data o one person), and all subsets S o the image o M, Pr[M(D 1 ) S] e ǫ Pr[M(D 2 ) S], (2) where ǫ is the privacy budget to speciy the level o privacy protection and smaller ǫ means better privacy. According to the composition theorem [29], an extra privacy budget will be required when multiple related queries are sequentially applied to dierential privacy mechanisms. 4.2 Local Dierential Privacy Generally, dierential privacy research ocuses on centralized databases and implicitly assumes a trusted server. Aiming to eliminate this assumption, local dierential privacy (or simply local privacy) is proposed or crowdsourced systems to provide a stringent privacy guarantee that data contributors trust no one [7], [17]. In particular, or any user i, a mechanism M satisies ǫ-local privacy i or any two data records X i,y i Ω 1 Ω d, and or any possible privacy-preserving outputs X i Range(M), Pr[M(X i ) = X i ] e ǫ Pr[M(Y i ) = X i ], (3) where the probability is taken overm s randomness and ǫ has a similar impact on privacy as in the ordinary dierential privacy (Equation (2)). The simplest orm o local privacy is the randomized response [34], which has been widely used in the survey o people s yes or no opinions about a private issue. Participants o the survey are required to give their true answers with a certain probability or random answers with the remaining probability. Due to the randomness, the surveyor cannot determine the individuals true answers (i.e., local privacy is guaranteed) but still can predict the true proportions o alternative answers. Recently, RAPPOR has been proposed or statistics aggregation [12]. The basic idea o RAPPOR is the extension o the randomized response technique via long binary strings to uniquely represent arbitrary domain. However, it is not directly applicable to multiple dimensional data with large domain size since the binary strings will have exponential length increments in terms o the number o dimensions. To address this problem, Fanti et al. [14] propose an association learning scheme, which extends the 1-dimensional RAPPOR to estimate the 2-dimensional joint distribution. However, the sparsity in the multi-dimensional domain and the way it iteratively scans RAPPOR strings means that it will incur considerable computational complexity. 5 LOPUB: HIGH-DIMENSIONAL DATA PUBLI- CATION WITH LOCAL PRIVACY We propose LoPub, a novel solution to achieve highdimensional crowdsourced data publication with local privacy. In this section, we irst introduce the basic idea behind LoPub and then elaborate the algorithmic procedures in more detail. 5.1 Basic idea Privacy-preserving high-dimensional crowdsourced data publication aims at releasing an approximate dataset with similar statistical inormation (i.e., in terms o s- tatistical distribution as deined in Equation (1)) to the source data while guaranteeing the local privacy. This problem can be considered in our stages: First, to achieve local privacy, some local transormation should be designed to the user side to cloak individuals original data records. Then, the central server needs to obtain the statistical inormation, a.k.a, the distribution o original data. There are two plausible solutions. One is to obtain the 1-dimensional distribution on each attribute independently. Unortunately, the lack o consideration o correlations between dimensions will lose the utility o original dataset. Another is to consider all attributes as one and compute the d-dimensional joint distribution. However, due to combinations, the possible domain will increase exponentially with the number o dimensions, thus leading to both low scalability and signal-noise-ratio problems [35]. Thereore, the next crucial problem is to ind a solution or reducing the dimensionality while keeping the necessary correlations. Finally, with the statistical distribution inormation on low-dimensional data, how to synthesize a new dataset is the remaining problem. To this end, we present LoPub, a locally privacypreserving data publication scheme or highdimensional crowdsourced data. Figure 3 shows the overview o LoPub, which mainly consists o our mechanisms: local privacy protection, multi-dimensional distribution estimation, dimensionality reduction, and data synthesizing. 1) Local Privacy Protection. We irst propose the local transormation process that adopts randomized response technique to cloak the original multidimensional data records on distributed users to provide local privacy or all individuals in the crowdsourced system. Particularly, we locally transorm each attribute value to a random bit string. Then, the local privacy-preserved data is sent to and aggregated at the central server. 2) Multi-dimensional Distribution Estimation. We then propose multi-dimensional joint distribution estimation schemes to obtain both the joint and marginal probability distribution on multidimensional data. Inspired by [14], we irst extend the EM-based approach or high-dimensional

4 4 Fig. 3: An overview o LoPub distribution estimation. However, such a straightorward extension does not consider the sparsity in high-dimensional data, which will lead to high complexity or distribution estimation. To guarantee ast estimation, we then present a Lasso-based approach with the cost o slight accuracy degradation. Finally, we propose a hybrid approach striking the balance between the accuracy and eiciency. 3) Dimensionality Reduction. Based on the multidimensional distribution inormation, we then propose to reduce the dimensionality by identiying mutual-correlated attributes among all dimensions and split the high-dimensional attributes into several compact low-dimensional attribute clusters. In this paper, considering the heterogeneous attributes, we adopt mutual inormation and an undirected dependency graph to measure and model the correlations o attributes, respectively. Then, we propose to split the attributes according to the junction tree built rom the dependency graph. In addition, we also propose a heuristic pruning scheme to urther boost the process o correlation identiication. 4) Synthesizing the New Dataset. Finally, we propose to sample each low-dimensional dataset according to the connectivity o attribute clusters and the estimated joint or conditional distribution on each attribute cluster, thus synthesizing a new privacy-preserving dataset. 5.2 Local Transormation or High-dimensional Data Record Design Rationale A common ramework o locally private distribution estimation is that each individual user applies a local transormation on the data or privacy protection and then sends the transormed data to the server. The server estimates the joint distribution according to the transormed data. Local transormation in our design includes two key steps: one is mapping to Bloom ilters and the other is adding randomness. Particularly, Bloom ilters over attribute domain Ω with multiple hash unctions can hash all the variables in the domain TABLE 2: Notation N number o users (data records) in the system X entire crowdsourced dataset on the server side X i data record rom the ith user x i j jth element o X i d number o attributes in X R set o all attribute clusters A j : jth attribute o X Ω j domain o A j ω j candidate attribute value in Ω j H j (x) hash unctions or A j that map x into a Bloom ilter s i j Bloom ilter o x i j (Si j = H j(x i j )) s i j [b] bth bit o si j ŝ i j randomized Bloom ilter o s i j ŝ i j [b] bth bit o ŝi j m j length o s i j probability o lipping a bit o a Bloom ilter into a pre-deined space. Thus, the unique bit strings are the representative eatures o the original report. Then, ater privacy protection by randomized responses, a large number o samples with various levels o noise are generated by individual users. Ater aggregation, the central server obtains a large sample space with random noise. As a result, one may estimate the distribution rom the noised sample space by taking advantage o machine learning techniques such as EM algorithm and regression analysis. Under the above ramework, a key observation can be made: i eatures are mutually independent, the combinations o eatures rom dierent candidate sets are also mutually independent. Thereore, when Bloom ilters o each attribute are mutually independent (i.e., no collisions or all bits), then the Cartesian product o Bloom ilters o dierent attributes are mutually independent. In this sense, with mutually independent eatures o Bloom ilters, existing machine learning techniques like EM and Lasso regression are eective or the multivariate distribution estimation. Some notations used in this paper are listed in Table Algorithmic Procedures o Local Transormation Beore describing the distribution estimation, we present that details about the local transormation or highdimensional crowdsourced data. In essence, local transormation consists o three steps: 1) For the ith user, we have an original data record X i = {x i 1,xi 2,...,xi d } with d attributes. For each attribute A j (j = 1,...,d), we employ h hash unctions H j ( ) to map x i j to a length-m j bit string s i j (called a Bloom ilter); we calculate s i j = H j(x i j ),j = 1,...,d. 2) Each bit s i j [b] (b = 1,2,...,m j) in s i j is randomly lipped into or 1 according to the ollowing rule: s i ŝ i j [b], with probability o 1 j[b] = 1, with probability o /2 (4), with probability o /2 where [,1] is a user-controlled lipping probability that quantiies the level o randomness or local privacy. 3) Ater deriving randomized Bloom ilter ŝ i j (j = 1,...,d), we concatenates ŝ i 1,...,ŝi d to obtain a stochastic ( d j=1 m j)-bit vector, [ŝi 1 [1],...,ŝ i 1 [m 1]... ŝ i d [1],...,ŝi d [m d] ] (5)

5 5 and send it to the server. Detailed examples illustrating the above procedures can be reerred to [28]. Parameter Setup: According to the characteristics o Bloom ilter [3], given the alse positive probability p and the number Ω i o elements to be inserted, the optimal length m j o Bloom ilter can be calculated as m j = ln(1/p) (ln2) 2 Ω j. (6) Furthermore, the optimal number h j o hash unctions in the Bloom ilter is h j = m j ln(1/p) ln2 = Ω j (ln2). (7) So, the optimal h = ln(1/p) (ln 2) is used or all dimensions. Privacy Analysis: Because local transormation is perormed by the individual user, no one can obtain the original record X i, local privacy can be easily achieved and we only have to analyze the privacy guarantee on the user side. In addition, since both hash operations and randomized response on all attributes are independent, the local transormation on data consumes no extra privacy budget with the increase o number o dimensions d, as pointed by the composition theorem [26]. According to the conclusion in [12], dierential privacy obtained on the user side is ( ) 2 ǫ = 2hln, (8) where h is the number o hash unctions in the Bloom ilter and is the probability that a bit vector was lipped. Overall, since the same transormation is done by all users independently, this ǫ-local privacy guarantee is equivalent or all distributed users. Communication Overhead: Theorem 1: The minimal communication cost C LoPub ater the local transormation d C LoPub m j = ln(1/p) d (ln2) 2 Ω j. (9) j=1 j=1 Proo I we assume that the domain o each attribute is publicly known by both users and the server, then the communication cost o non-private collection is basically d j=1 ln Ω j, which is related to the domain size. Nevertheless, in our method, due to local privacy, the communication cost is d j=1 m j, which is related to the length o the Bloom ilters because only randomly lipped bit strings (not the original data) are sent. For comparison, under the same condition, when RAPPOR [12] is directly applied to the k-dimensional data, all Ω 1 Ω k candidate value will be regarded as 1-dimensional data, then the cost is C RAPPOR ln(1/p) (ln2) 2 k Ω j, (1) j=1 where k j=1 Ω j is due to the size o the candidate set Ω 1 Ω k. Dierence between Equation 9 and 1 is because our LoPub, compared with straightorward RAPPOR, considers the mutual independency between multiple attributes. 5.3 Multivariate Distribution Estimation with Local Privacy Ater receiving randomized bit strings, the central server can aggregate them and estimate their joint distribution. For example, an EM-based estimation algorithm [14] was proposed to estimate 2-dimensional joint distribution. However, due to high complexity and overheads, it is only preerable to low dimensions with small domain, which is impractical to many real-world datasets with high dimensions. Thereore, we then propose a Lasso regression based algorithm with high eiciency and also a hybrid algorithm to achieve a balance between eiciency and accuracy EM-based Distribution Estimation Here, we irst extend EM-based estimation [14] to k- dimensional dataset (2 k d) and then elaborate its computational complexity to show its ineiciency in high-dimensional crowdsourced data. Beore illustrating the algorithm, we irst introduce the ollowing notations. Without loss o generality, we considerk speciied attributes asa 1,A 2,...,A k and their index collection C = {1,2,...,k}. For simplicity, the event A j = ω j or x j = ω j is abbreviated as ω j. For example, the prior probability P(x 1 = ω 1,x 2 = ω 2,...,x k = ω k ) can be simpliied into P(ω 1 ω 2...ω k ) or P(ω C ). Algorithm 1 depicts the extended EM-based approach or estimating k-dimensional joint distribution. More speciically, it consists o the ollowing ive main steps. Algorithm 1 EM-based k-dimensional Joint Distribution (EM JD) Require: C : attribute indexes cluster, i.e., C = {1,2,...,k} A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability, δ : convergence accuracy. Ensure: P(A C ): joint distribution o k attributes speciied by C. 1: initialize P (ω C ) = 1/( Ω j ). j C 2: or each i = 1,...,N do 3: or each j C do 4: compute P(ŝ i j ω j) = m j )ŝi b=1 ( j [b] (1 2 2 )1 ŝi j [b]. 5: end or 6: compute P(ŝ i C ω C) = P(ŝ i j ω j). j C 7: end or 8: initialize t = /* number o iterations */ 9: repeat 1: or each i = 1,...,N do 11: or each (ω C ) Ω 1 Ω 2 Ω k do 12: compute P t(ω C ŝ i C ) = Pt(ω C) P(ŝ i C ω C) P t(ω C )P(ŝ i C ω C) ω C 13: end or 14: end or 15: set P t+1 (ω C ) = 1 N N i=1 Pt(ω C ŝ i C ) 16: update t = t+1 17: until maxp ω t(ω C ) maxp t 1 (ω C ) δ. C ω C 18: return P(A C ) = P t(ω C ) 1) Beore executing EM procedures, we set an uniorm distribution P(ω 1 ω 2...ω k ) = 1/( k Ω j ) as the j=1 initial prior probability. 2) According to Equation (4), each bit s i j [b] will be lipped with probability 2. Thus, by comparing the

6 6 bits H j (ω j ) with the randomized bits, the conditional probability P(ŝ i j ω j) can be computed (see line 4 o Algorithm 1). 3) Due to the independence between attributes (and their Bloom ilters), the joint conditional probability can be easily calculated by combining each individual attribute; i.e., P(ŝ i C ω C) = P(ŝ i j ω j). j C 4) Given all the conditional distributions o one particular combination o bit strings, their corresponding posterior probability can be computed by the Bayes Theorem, P t (ω C ŝ i C) = P t(ω C ) P(ŝ i C ω C) P t (ω C )P(ŝ i C ω C). (11) ω C where P t (ω C )=P t (ω 1 ω 2...ω k ) is the k dimensional joint probability at the tth iteration. 5) Ater identiying posterior probability or each user, we calculate the mean o the posterior probability rom a large number o users to update the prior probability. The prior probability is used in another iteration to compute the posterior probability in the next iteration. The above EM-like procedures are executed iteratively until convergence, i.e., the maximum dierence between two estimations is smaller than the speciied threshold. The above algorithm can converge to a good estimation when the initial value is well chosen. EM-based k- dimensional joint distribution estimation will also ail when converging to local optimum. Especially when k increases, there will be many local optimum to prevent good convergence because sample space o all combinations in Ω j1 Ω j2 Ω jk explodes exponentially. Complexity: Beore the analysis o complexity, we should note that number o user records N needs to be suiciently large according to the analysis in [12], i.e., N v k, where v denotes the average size o Ω j, otherwise it is diicult to estimate reliably rom a small sample space with low signal-noise-ratio. Theorem 2: Suppose that the average length o m j is m and the average Ω j is v. Then, the time complexity o Algorithm 1 is O ( Nkmv k +tnv 2k). (12) Proo EM-based estimation will scan all N users bit strings with the length o km one by one to compute the conditional probability or v k dierent combinations, the time complexity basically can be estimated as O(N(km)(v k )). Also, in the tth iteration, computing the posterior probability o each combination when observing each bit string will incur the time complexity o O(tN(v k ) 2 ). As a consequence, the overall time complexity is O ( tnv 2k +Nkmv k). Theorem 3: The space complexity o Algorithm 1 is O ( Nkm+2Nv k). (13) Proo In Algorithm 1, the necessary storage includes N users bit strings with the length o km, so it is O(N km). The prior probabilities on k dimensions is O(v k ). The conditional probabilities and posterior probabilities on v k candidates or all bit strings is O(2Nv k ). So, the overall complexity is O ( Nkm + 2Nv k + v k) = O ( Nkm+2Nv k) since N is the dominant variable. According to Theorem 2, the space overhead could be daunting when either N or k is large. This makes the perormance o EM-basedk-dimensional distribution estimation degrade dramatically and not applicable to high dimensional data Lasso-based Distribution Estimation To improve the eiciency o the k-dimensional joint distribution estimation, we present a Lasso regressionbased algorithm here. As mentioned in Section 5.2.1, the bit strings are the representative eatures o the original report. Ater randomized responses and lipping, a large number o noisy samples will be generated by individual users. More precisely, one may consider that the central server receives a large number o samples rom speciic distribution, however, with random noise. In this sense, one may estimate the distribution rom the noisy sample space by taking advantage o linear regression y = Mβ, where M is predictor variables and y is response variable, and β is the regression coeicient vector. The use o Bloom ilter can guarantee that the eatures (predictor variables M) re-extracted at the server are the same as ones extracted by the user. Moreover, response variable y can be estimated rom the randomized bit strings according to the statistic characters o known. Thereore, the only problem is to ind a good solution to the linear regression y = Mβ. Obviously, k-dimensional data may incur a output domain Ω 1... Ω k with the size o Ω 1... Ω k, which increases exponentially with k. With ixed N entries in the dataset X, the requencies o many combination ω 1 ω 2...ω k Ω 1... Ω k are rather small or even zero. So, M is sparse and only part o the sparse but eective predictor variables need to be chosen. Otherwise, the general linear regression techniques will lead to overitting problem. Here, we resort to Lasso regression, eectively solving the sparse linear regression by choosing predictor variables. Algorithm 2 Lasso-based k-dimensional Joint Distribution (Lasso JD) Require: C : attribute indexes cluster i.e., {1,2,...,k}, A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability. Ensure: P(A C ): joint distribution o k attributes speciied by C. 1: or each j C do 2: or each b = 1,2,...,m j do 3: compute ŷ j [b] = N i=1ŝi j [b] 4: compute y j [b] = (ŷ j [b] N/2)/(1 ) 5: end or 6: set H j (Ω j ) = {H j (ω) ω Ω j } 7: end or 8: set y = [ y 1 [1],...,y 1 [m 1 ] y 2 [1],...,y 2 [m 2 ]... y k [1],...,y k [m k ] ] 9: set M = [ H 1 (Ω 1 ) H 2 (Ω 2 ) H k (Ω k ) ] 1: compute β = Lasso regression(m, y) 11: return P(A C ) = β/n Our Lasso-based estimation is described in Algorithm 2 and consists o the ollowing our major steps. 1) Ater receiving all randomized Bloom ilters, or each bit b in each attribute j, the server counts the number o 1 s as ŷ j [b] = N i=1ŝi j [b]. 2) The true count sum o each bit y j [b] can be estimated as y j [b] = (ŷ j [b] N/2)/(1 ) according to the randomized response applied to the true count.

7 7 Fig. 4: Illustration o Lasso JD These count sums o all bits orm a vector y with the length o k j=1 m j. 3) To construct the eatures o the overall candidate set o attribute ω 1...ω k, the Bloom ilters on each dimension Ω j is re-implemented by the server with the same hash unctions H j (). Suppose all distinct Bloom ilters on Ω j are H j (Ω j ) = {H j (ω) ω Ω j }, where they are orthogonal with each other. [ The candidate set o Bloom ilters is then M = H1 (Ω 1 ) H 2 (Ω 2 ) H k (Ω k ) ] and the members in M are still mutual orthogonal. 4) Fit a Lasso regression model to the counter vector y and the candidate matrix M, and then choose the non-zero coeicients as the corresponding requencies o each candidate string. By reshaping the coeicient vector into a k-dimensional matrix by natural order and dividing with N, we can derive the k-dimensional joint distribution estimation P(A 1 A 2...A k ). For example, in Figure. 4, we it a linear regression to y 12 and the candidate matrix M to estimate the joint distribution P A1A 2. Generally, the regression operation, the core o the estimation, will lose accuracy only when there are many collisions between Bloom ilter strings. However, as mentioned in Section 5.2.1, i there is no collision in the bit strings o each single dimension, then there is no collision in conjuncted bit strings o dierent dimensions. In act, the probability o collision in conjuncted bit strings will not increase with dimensions. For example, suppose the collision rate o Bloom ilter in one dimension is p, then the collision rate will decrease to p k when we connect bit strings o k dimensions together. Thereore, we only need to choose proper m and h according to Equation (6) and (7) to lower the collision probability or each dimension and then we are guaranteed to have a proper estimation or multiple dimensions. Complexity: Compared with Algorithm 1, our Lassobased estimation can eectively reduce the time and space complexity. Theorem 4: The time complexity o Algorithm 2 is O ( v 3k +kmv 2k +Nkm ). (14) Proo Algorithm 2 involves two parts: to compute the bit counter vector, N bit strings with each length o km will be summed up and this operation at most incurs the complexity o O(N km); and Lasso regression with v k candidates (total domain size) and km samples (the length o the bit counter vector is km) has the complexity o O ( (v k ) 3 +(v k ) 2 (km) ). Based on the general assumption that N dominates Equation (14), then we can see the complexity in Equation (14) is much less than Equation (12) in Theorem 2. Theorem 5: The space complexity o Algorithm 2 is O ( Nkm+v k km ). (15) Proo In Algorithm 2, the storage overhead consists o three parts: users bit strings O(Nkm), a count vector with size O(km), and the candidate bit matrix M with size O(kmv k ). Thereore, the overall space complexity o our proposed Lasso based estimation algorithm is O ( Nkm + km + v k km ) = O ( Nkm + v k km ), which is also smaller than Equation (13) as N is dominant. The empirical results are shown in Section 6. The eiciency comes rom the act that the N bit strings o length m will be scanned to count sum only once and then one-time Lasso regression is itted to estimate the distribution. In addition, Lasso regression could extract the important (i.e., requent) eatures with high probability, which its well with the sparsity o high-dimensional data Hybrid Algorithm Recall that, with suicient samples, EM-based estimation can demonstrate good convergence but also high complexity. On the other hand, Lasso-based estimation can be very eicient with a slight accuracy deviation compared with the EM-based algorithm. The high complexity o the EM-based algorithm stems rom two parts: irst, it iteratively scans users reports and builds a prior distribution table, which has the size o O(Nv k ).For each record o table, one has to compare mj bits. However, when the dimension is high, the combination o Ω j will be very sparse and has lots o zero items. Second, the initial value o the uniormly random assignment will lead to slow convergence. To achieve a balance between the EM-based estimation and Lasso-based estimation, we propose a hybrid algorithm, Lasso+EM JD (Algorithm 3), which irst e- liminates the redundant candidates and estimates the initial value with Lasso-based algorithm and then reines the convergence using EM-based algorithm. The hybrid algorithm has two advantages: 1) The sparse candidates will be selected by the Lassobased estimation algorithm. So, the EM-based algorithm can just compute the conditional probability on these sparse candidates instead o all candidates, which can greatly reduce both time and space complexity. 2) The lasso-based algorithm can give a good initial estimation o the joint distribution. Compared with using initial values with random assignments, using the initial value estimated with the Lasso-based algorithm can urther boost the convergence o the EM algorithm, which is sensitive to the initial value especially when the candidate space is sparse. Theorem 6: The time complexity o Algorithm 3 is O ( (v 3k +kmv 2k +Nkm)+(tN(v ) 2 +Nkm(v )) ), (16) wherev is the average size o sparse items inω 1... Ω k, and v < v k.

8 8 Algorithm 3 Lasso+EM k-dimensional Joint Distribution (Lasso+EM JD) Require: A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability. Ensure: P(A 1 A 2...A k ): k-dimensional joint distribution. 1: compute P (ω 1 ω 2...ω k ) = Lasso JD(A j,ω j,{ŝ i j }N i=1,) 2: set C = {x x C,P (x) = }. 3: or each i = 1,...,N do 4: or each j = 1,...,k do 5: compute P(ŝ i j ω j) = m j )ŝi b=1 ( j [b] (1 2 2 )1 ŝi j [b]. 6: end or 7: i ω 1 ω 2...ω k C then 8: P(ŝ i 1ŝi 2...ŝi k ω 1ω 2...ω k ) = 9: else 1: compute P(ŝ i 1ŝi 2...ŝi k ω 1ω 2...ω k ) = k j=1 P(ŝi j ω j). 11: end i 12: end or 13: initialize t = /* number o iterations */ 14: repeat 15: : /* (similar to Algorithm 1) */ 17: : until P t(ω 1 ω 2...ω k ) converges. 19: return P(A 1 A 2...A k ) = P t(ω 1 ω 2...ω k ) Proo See Theorem 2 and Theorem 4, the only dierence is that ater the Lasso based estimation, only sparse items in Ω 1... Ω k are selected. Theorem 7: The space complexity o Algorithm 3 is O ( Nkm+v k km+2nv ). (17) Proo See Theorem 3 and Theorem Dimension Reduction with Local Privacy Dimension Reduction via 2-dimensional Joint Distribution Estimation The key to reducing dimensionality in a highdimensional dataset is to ind the compact clusters, within which all attributes are tightly correlated to or dependent on each other. Inspired by [35], [5] but without extra privacy budget on dimension reduction, our dimension reduction based on locally once-or-all privacy-preserved data records consists o the ollowing three steps: 1) Pairwise Correlation Computation. We use mutual inormation to measure pairwise correlations between attributes. The mutual inormation is calculated as I m,n = p ij ln p ij (18) p i p j j Ω n i Ω m where, Ω m and Ω n are the domains o attributes A m and A n, respectively. p i and p j represent the probability that A m is the ith value in Ω m and the probability that A n is the jth value in Ω n, respectively. Then, p ij is their joint probability. Particulary,p ij can then be eiciently obtained with our proposed multi-dimensional joint distribution estimation algorithms in Section 5.3, i.e, the hybrid estimation Algorithm 3. Without loss o generality, the term JD reers to the multi-dimensional joint distribution estimation algorithms. As the corresponding marginal distribution, both p i and p j then can be learned rom p ij or estimated with the 2-dimensional joint distribution o A i (or A j ) and itsel A i (or A j ). 2) Dependency Graph Construction. Dependency graph can be used to depict the correlations among attributes. Assume each attribute A j is a node in the dependency graph and an edge between two nodes A m and A n represents that attribute A m and A n are correlated. Based on mutual inormation, the dependency graph o attributes can be constructed as ollows. First, an adjacent matrix G d d (dependency graph o all d attributes) is initialized with all s. Then, all the attribute pairs (A m,a n ) are chosen to compare their mutual inormation with an threshold τ m,n, which is deined as τ m,n = min( Ω m 1, Ω n 1) φ 2 /2, (19) where φ ( φ 1) is a lexible parameter determining the desired correlation level. Normally φ =.2 represent the basic correlation. G m,n and G n,m are both set to be 1 i and only i I m,n > τ m,n. 3) Compact Clusters Building. By triangulation, the dependency graph G d d can be transormed to a junction tree, in which each node represents an attribute A j. Then, based on the junction tree algorithm, several clusters C 1,C 2,...,C l can be obtained as the compact clusters o attributes, in which attributes are mutually correlated. Hence, the whole attributes set can be divided into several compact attribute clusters and the number o dimensions can be eectively reduced. Detailed examples can be reerred to [28]. Algorithm 4 Dimension reduction with local privacy Require: A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability, φ : dependency degree Ensure: C 1,C 2,...,C l : attribute indexes clusters 1: initialize G d d =. 2: or each j = 1,2,...,d do 3: estimate P(A j ) by JD (i.e., Lasso+EM JD Algorithm 3) 4: end or 5: or each attribute m = 1,2,...,d 1 do 6: or each attribute n = m +1,m+2,...,d do 7: estimate P(A ma n) by JD 8: compute I m,n = i Ω m i Ω n p ij ln p ij p i p j 9: compute τ m,n = min( Ω m 1, Ω n 1) φ 2 /2 1: i I(m,n) τ mn then 11: set G m,n = G n,m = 1 12: end i 13: end or 14: end or 15: build dependency graph with G d d 16: triangulate the dependency graph into a junction tree 17: split the junction tree into several cliques C 1,C 2,...,C l with elimination algorithm. 18: return C = {C 1,C 2,...,C l } Theorem 8: The time complexity o Algorithm 4 is O(d 2 (v 6 +2mv 4 +2Nm+tN(v ) 2 +2Nm(v ))). (2) Proo The core o the dimension reduction process is the ( d 2) times o 2-dimensional joint distribution estimation. The complexity o each 2-dimensional joint distribution estimation can be derived rom Equation (16) when adopting the hybrid algorithm (Algorithm 3). The complexity o building junction tree on d d dependency graph is negligible when compared with the joint distribution estimation.

9 9 Theorem 9: The space complexity o Algorithm 4 is O(2Nm+2v 2 m+2nv ). (21) Proo When we compute the mutual correlations between any pairs, a 2-dimensional joint distribution estimation algorithm will be triggered with the space complexity o O(2Nm + 2mv 2 + 2Nv ), since k = 2 is substituted into Equation (17). This maximum complexity dominates Algorithm 4. The space complexity o building junction tree on d d dependency graph is negligible when compared with the joint distribution estimation Entropy based Pruning Scheme In existing work [18], [32] on homogeneous data, correlations can be simply captured by distance or similarity metrics [36]. However, in our work, mutual inormation is used to measure general correlations since heterogenous attributes (a.k.a., attributes with dierent domains) are also considered. As shown in Equation (18), to calculate the mutual inormation o variables X and Y, the joint probability on the joint combination is inevitable, thus making the pairwise computation o dependency necessary. Although mutual inormation is already simpler than Kendall rank coeicients in the similar work [21], here, we also propose a pruning-based heuristic to boost this pairwise correlation learning process. Intuitively, there are dierent situations in Algorithm 4: 1. When φ = or φ = 1, all attributes will be considered mutually correlated or independent. Thus, there is no need to compute pairwise correlation. 2. With the increase o φ ( < φ < 1), less dependencies will be included in the adjacent matrix G d d o dependency graph, which will become sparser. This also means that we may selectively neglect some pairs. Inspired by the relationship between mutual inormation and inormation entropy 2, we irst heuristically ilter out some portion o attributes A x with least relative inormation entropy RH(A x ) = H(A x )/ Ω x, and then veriy the mutual inormation among the remaining attributes, thus reducing the pairwise computations. Furthermore, the adjacent matrix G d d o dependency graph varies in dierent datasets. For example, the adjacent matrix G d d is rarely sparse in binary datasets but very sparse in non-binary datasets. Based on this observation, we can urther simpliy the calculation by inding the independency in binary datasets or inding the dependency in non-binary datasets. For example, we irst set all entries og d d or a binary datasets as1 s and start rom the attributes with least relative inormation entropy RH(A x ) = H(A x )/ Ω x to ind the uncorrelated attributes. While or non-binary datasets, we irst set G d d as s and then start rom the attributes with largest average entropy to ind the correlated attributes. 5.5 Synthesizing New Dataset For brevity, we irst deine A C = {A j j C} and ˆX C = {x j j C}. Then the process o synthesizing the new dataset via sampling is shown in the ollowing Algorithm The relationship between mutual inormation and inormation entropy can be represented as I(X;Y) = H(X) + H(Y) H(X,Y), where H(X) and H(X,Y) denote the inormation entropy o variable X and their joint entropy o X and Y, respectively. Algorithm 5 Entropy based Pruning Scheme Require: A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability, φ : dependency degree Ensure: G d d : adjacent matrix G d d o dependency graph o attributes A j (j = 1,2,...,d) 1: initialize G d d = 2: or each j = 1,2,...,k do 3: compute P(A j ) = JD(A j,ω j,{ŝ i j }N i=1,) 4: compute RH(A j ) = 1 plogp Ω j p P(A j ) 5: end or 6: sort list A = {A 1,A 2,...,A j } according to entropy H(A j ) 7: pick up the previous length(list A ) (1 φ) items rom list A as a new list list A 8:... 9: compute pairwise mutual inormation among list A and set dependency graph G d d as in Algorithm 4. 1: return G d d Algorithm 6 New Dataset Synthesizing Require: C : a collection o attribute index clusters C 1,...C l, A j : k-dimensional attributes (1 j k), Ω j : domain o A j (1 j k), ŝ i j : observed Bloom ilters (1 i N) (1 j k), : lipping probability, Ensure: ˆX: Synthetic Dataset o X 1: initialize R = 2: repeat 3: randomly choose an attribute index cluster C C 4: estimate joint distribution P(A C ) by JD 5: sample ˆX C according to P(A C ) 6: C = C C, R = R C, D = {D C D R } 7: 8: or each D D do estimate joint distribution P(A D ) by JD 9: obtain conditional distribution P(A D R A D R ) rom P(A D ) 1: sample ˆX D R according to P(A D R A D R ) and ˆX D R 11: C = C D, R = R D, D = {D C D R } 12: end or 13: until C = 14: return ˆX We irst initialize a set R to keep the sampled attribute indexes. Then, we randomly choose an attribute index cluster C to estimate the joint distribution and sample new data ˆX in the attributes A j, j C. Next, we remove C rom the cluster collection C into R, and ind the connected component D o C. In the connected component, each cluster D is traversed and sampled as ollows. irst estimate the joint distribution on the attributes A D by our proposed distribution estimations and obtain the conditional distribution P(A D R A D R ). Then, sample ˆX D R according to this conditional distribution and the sampled data ˆX D R. Ater the traverse o D, the attributes in the irst connected components are sampled. Then randomly choose cluster in the remaining C to sample the attributes in the second connected components, until all clusters are sampled. Finally, a new synthetic dataset ˆX is generated according to the estimated correlations and distributions in origin dataset X. Theorem 1: The time complexity o Algorithm 6 is O(l(v 3k +kmv 2k +Nkm+tN(v ) 2 +Nkm(v ))), (22) where l is the number o clusters ater dimension reduction and k here reers to average number o dimensions in these clusters.

10 1 Fig. 5: Main procedures o high-dimensional data publishing with ǫ local privacy Proo The core o the dataset synthesizing is actually multiple (l times) k-dimensional joint distribution estimation. Theorem 11: The space complexity o Algorithm 6 is O(Nkm+v k km+2nv +Nd). (23) Proo Every time, a k-dimensional joint distribution estimation algorithm (with space complexity o O(Nkm + v k km + 2Nv )) is processed to draw a new dataset. A new dataset with the size O(N d) is maintained while synthesizing. The overall process o LoPub can be summarized in Figure 5. Clearly, all the processed are conducted on the locally privacy-preserved data. Thereore, compared with existing non-local privacy schemes in Figure 1, LoPub can provide consistency local privacy guarantee on all crowdsourced users, thus avoiding insider attacks and multiple assignment o privacy budget. 6 EVALUATION In this section, we conducted extensive experiments on real datasets to demonstrate the eiciency o our algorithms in terms o computation time and accuracy. We used three real-world datasets: Retail [1], Adult [4], and TPC-E [2]. Retail is part o a retail market basket dataset. Each record contains distinct items purchased in a shopping visit. Adult is extracted rom the 1994 US Census. This dataset contains personal inormation, such as gender, salary, and education level. TPC-E contains trade records o Trade type, Security, Security status tables in the TPC-E benchmark. It should be noted that some continuous domain were binned in the preprocess or simplicity. Datasets Type #. Records (N) #. Attributes (d) Domain Size Retail Binary 27, Adult Integer 45, TPC-E Mixed 4, All the experiments were run on a machine with Intel Core i5-52u CPU 2.2GHz and 8GB RAM, using Windows 7. We simulated the crowdsourced environment as ollows. First, users read each data record individually and locally transorm it into privacy-preserving bit strings. Then, the crowdsourced bit strings are gathered by the central server or synthesizing and publishing the high-dimensional dataset. LoPub can be realized by combining distribution estimations and data synthesizing techniques. Thus, we implemented dierent LoPub realizations using Python 2.7 with the ollowing three strategies. 1) EM JD, the generalized EM-based multivariate joint distribution estimation algorithm. 2) Lasso JD, our proposed Lasso-based multivariate joint distribution estimation algorithm. 3) Lasso+EM JD, our proposed hybrid estimation algorithm that uses the Lasso JD to ilter out some candidates to reduce the complexity and replace the initial value to boost the convergence o EM JD. It is worth mentioning that we compared only the above algorithms since our algorithm adopts a novel local privacy paradigm on high-dimensional data. Other competitors are either or non-local privacy [5], [35], [21] or on low-dimension data [12], [14], [16] and thereore not comparable. For air comparison, we randomly chose 1 combinations o k attributes rom d dimensional data. For simplicity, we sampled 3 5% data rom dataset Retail and 1% data rom datasets Adult and TPC-E, respectively. The eiciency o our algorithms is measured by computation time and accuracy. The computation time includes CPU time and IO cost. Each set o experiments is run 1 times, and the average running time is reported. To measure accuracy, we used the distance metrics AVD (average variant distance) on the three datasets, as suggested in [5], to quantiy the closeness between the estimated joint distribution P(ω) and the origin joint distribution Q(ω). The AVD error is deined as Dist AVD (P,Q) = 1 P(ω) Q(ω). (24) 2 ω Ω The deault parameters are described as ollows. In the binary dataset Retail, the maximum number o bits and the number o hash unctions used in the bloom ilter are m = 32 and h = 4, respectively. In the non-binary datasets Adult and TPC-E, the maximum number o bits and the number o hash unctions used in bloom ilter are m = 128 and h = 4, respectively. The convergence gap is set as.1 or ast convergence. 6.1 Multivariate Distribution Estimation Here, we show the perormance o our proposed distribution estimations in terms o both eiciency and eectiveness. The eiciency is measured by computation time, and the eectiveness is measured by estimation accuracy Computation Time We irst evaluate the computation time o EM JD, Lasso JD, and Lasso+EM JD or the k-dimensional joint distribution estimation on three real datasets. Figures 6 and 7 compare the computation time on the binary dataset Retail with both k = 3 and k = 5. It can be noticed that, or each dimension k, Lasso JD is consistently much aster than EM JD and Lasso+EM JD, especially when k is large. This is because EM JD has to repeatedly scan each user s bit string. Particularly, the time consumption o EM JD increases with because there will be more iterations or the ixed convergence gap. In contrast, Lasso JD uses the regression to estimate the joint distribution more eiciently. Furthermore, the complexity o Lasso+EM JD is much less than EM JD as the initial estimation o Lasso JD can greatly reduce the candidate attribute space and the number o iterations needed. When k is growing, the computation time o Lasso JD increases slowly, unlike EM JD that has a dramatic increase. This is because the 3. It should be noted that, with sampled data, the dierential privacy level can be urther enhanced [23]. But sampling used here is or simplicity.

On High-Rate Cryptographic Compression Functions

On High-Rate Cryptographic Compression Functions On High-Rate Cryptographic Compression Functions Richard Ostertág and Martin Stanek Department o Computer Science Faculty o Mathematics, Physics and Inormatics Comenius University Mlynská dolina, 842 48

More information

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems

More information

OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION

OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION OPTIMAL PLACEMENT AND UTILIZATION OF PHASOR MEASUREMENTS FOR STATE ESTIMATION Xu Bei, Yeo Jun Yoon and Ali Abur Teas A&M University College Station, Teas, U.S.A. abur@ee.tamu.edu Abstract This paper presents

More information

SecureML: A System for Scalable Privacy-Preserving Machine Learning

SecureML: A System for Scalable Privacy-Preserving Machine Learning SecureML: A System or Scalable Privacy-Preserving Machine Learning Payman Mohassel and Yupeng Zhang Visa Research, University o Maryland Abstract Machine learning is widely used in practice to produce

More information

(C) The rationals and the reals as linearly ordered sets. Contents. 1 The characterizing results

(C) The rationals and the reals as linearly ordered sets. Contents. 1 The characterizing results (C) The rationals and the reals as linearly ordered sets We know that both Q and R are something special. When we think about about either o these we usually view it as a ield, or at least some kind o

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values Supplementary material or Continuous-action planning or discounted ininite-horizon nonlinear optimal control with Lipschitz values List o main notations x, X, u, U state, state space, action, action space,

More information

Physics 5153 Classical Mechanics. Solution by Quadrature-1

Physics 5153 Classical Mechanics. Solution by Quadrature-1 October 14, 003 11:47:49 1 Introduction Physics 5153 Classical Mechanics Solution by Quadrature In the previous lectures, we have reduced the number o eective degrees o reedom that are needed to solve

More information

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0.

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0. Probabilistic Artiicial Intelligence Problem Set 3 Oct 27, 2017 1. Variable elimination In this exercise you will use variable elimination to perorm inerence on a bayesian network. Consider the network

More information

Gaussian Process Regression Models for Predicting Stock Trends

Gaussian Process Regression Models for Predicting Stock Trends Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no

More information

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction . ETA EVALUATIONS USING WEBER FUNCTIONS Introduction So ar we have seen some o the methods or providing eta evaluations that appear in the literature and we have seen some o the interesting properties

More information

Online Appendix: The Continuous-type Model of Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye

Online Appendix: The Continuous-type Model of Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye Online Appendix: The Continuous-type Model o Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye For robustness check, in this section we extend our

More information

Improvement of Sparse Computation Application in Power System Short Circuit Study

Improvement of Sparse Computation Application in Power System Short Circuit Study Volume 44, Number 1, 2003 3 Improvement o Sparse Computation Application in Power System Short Circuit Study A. MEGA *, M. BELKACEMI * and J.M. KAUFFMANN ** * Research Laboratory LEB, L2ES Department o

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

New Results on Boomerang and Rectangle Attacks

New Results on Boomerang and Rectangle Attacks New Results on Boomerang and Rectangle Attacks Eli Biham, 1 Orr Dunkelman, 1 Nathan Keller 2 1 Computer Science Department, Technion. Haia 32000, Israel {biham,orrd}@cs.technion.ac.il 2 Mathematics Department,

More information

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd. .6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics Spline interpolation was originally developed or image processing. In GIS, it is mainly used in visualization o spatial

More information

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective Numerical Solution o Ordinary Dierential Equations in Fluctuationlessness Theorem Perspective NEJLA ALTAY Bahçeşehir University Faculty o Arts and Sciences Beşiktaş, İstanbul TÜRKİYE TURKEY METİN DEMİRALP

More information

Secure Communication in Multicast Graphs

Secure Communication in Multicast Graphs Secure Communication in Multicast Graphs Qiushi Yang and Yvo Desmedt Department o Computer Science, University College London, UK {q.yang, y.desmedt}@cs.ucl.ac.uk Abstract. In this paper we solve the problem

More information

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques

More information

Least-Squares Spectral Analysis Theory Summary

Least-Squares Spectral Analysis Theory Summary Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,

More information

Persistent Data Sketching

Persistent Data Sketching Persistent Data Sketching Zhewei Wei School o Inormation, Renmin University o China Key Laboratory o Data Engineering and Knowledge Engineering, MOE zhewei@ruc.edu.cn Xiaoyong Du School o Inormation, Renmin

More information

Digital Image Processing. Lecture 6 (Enhancement) Bu-Ali Sina University Computer Engineering Dep. Fall 2009

Digital Image Processing. Lecture 6 (Enhancement) Bu-Ali Sina University Computer Engineering Dep. Fall 2009 Digital Image Processing Lecture 6 (Enhancement) Bu-Ali Sina University Computer Engineering Dep. Fall 009 Outline Image Enhancement in Spatial Domain Spatial Filtering Smoothing Filters Median Filter

More information

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid

Probabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid Probabilistic Model o Error in Fixed-Point Arithmetic Gaussian Pyramid Antoine Méler John A. Ruiz-Hernandez James L. Crowley INRIA Grenoble - Rhône-Alpes 655 avenue de l Europe 38 334 Saint Ismier Cedex

More information

arxiv: v1 [cs.it] 12 Mar 2014

arxiv: v1 [cs.it] 12 Mar 2014 COMPRESSIVE SIGNAL PROCESSING WITH CIRCULANT SENSING MATRICES Diego Valsesia Enrico Magli Politecnico di Torino (Italy) Dipartimento di Elettronica e Telecomunicazioni arxiv:403.2835v [cs.it] 2 Mar 204

More information

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition COMP 408/508 Computer Vision Fall 07 PCA or Recognition Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, )

More information

Locally Differentially Private Protocols for Frequency Estimation. Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha

Locally Differentially Private Protocols for Frequency Estimation. Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha Locally Differentially Private Protocols for Frequency Estimation Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha Differential Privacy Differential Privacy Classical setting Differential Privacy

More information

The achievable limits of operational modal analysis. * Siu-Kui Au 1)

The achievable limits of operational modal analysis. * Siu-Kui Au 1) The achievable limits o operational modal analysis * Siu-Kui Au 1) 1) Center or Engineering Dynamics and Institute or Risk and Uncertainty, University o Liverpool, Liverpool L69 3GH, United Kingdom 1)

More information

RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response

RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova Google & USC Presented By: Pat Pannuto RAPPOR, What is is good for? (Absolutely something!)

More information

Finger Search in the Implicit Model

Finger Search in the Implicit Model Finger Search in the Implicit Model Gerth Stølting Brodal, Jesper Sindahl Nielsen, Jakob Truelsen MADALGO, Department o Computer Science, Aarhus University, Denmark. {gerth,jasn,jakobt}@madalgo.au.dk Abstract.

More information

NONLINEAR CONTROL OF POWER NETWORK MODELS USING FEEDBACK LINEARIZATION

NONLINEAR CONTROL OF POWER NETWORK MODELS USING FEEDBACK LINEARIZATION NONLINEAR CONTROL OF POWER NETWORK MODELS USING FEEDBACK LINEARIZATION Steven Ball Science Applications International Corporation Columbia, MD email: sball@nmtedu Steve Schaer Department o Mathematics

More information

Chapter 6 Reliability-based design and code developments

Chapter 6 Reliability-based design and code developments Chapter 6 Reliability-based design and code developments 6. General Reliability technology has become a powerul tool or the design engineer and is widely employed in practice. Structural reliability analysis

More information

Robust Residual Selection for Fault Detection

Robust Residual Selection for Fault Detection Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed

More information

Product Matrix MSR Codes with Bandwidth Adaptive Exact Repair

Product Matrix MSR Codes with Bandwidth Adaptive Exact Repair 1 Product Matrix MSR Codes with Bandwidth Adaptive Exact Repair Kaveh Mahdaviani, Soheil Mohajer, and Ashish Khisti ECE Dept, University o Toronto, Toronto, ON M5S3G4, Canada ECE Dept, University o Minnesota,

More information

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL

OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL Dionisio Bernal, Burcu Gunes Associate Proessor, Graduate Student Department o Civil and Environmental Engineering, 7 Snell

More information

Evaluating Probabilistic Queries over Imprecise Data

Evaluating Probabilistic Queries over Imprecise Data Evaluating Probabilistic Queries over Imprecise Data Reynold Cheng Dmitri V. Kalashnikov Sunil Prabhakar Department o Computer Science, Purdue University Email: {ckcheng,dvk,sunil}@cs.purdue.edu http://www.cs.purdue.edu/place/

More information

Exact Inference: Variable Elimination

Exact Inference: Variable Elimination Readings: K&F 9.2 9. 9.4 9.5 Exact nerence: Variable Elimination ecture 6-7 Apr 1/18 2011 E 515 tatistical Methods pring 2011 nstructor: u-n ee University o Washington eattle et s revisit the tudent Network

More information

[Title removed for anonymity]

[Title removed for anonymity] [Title removed for anonymity] Graham Cormode graham@research.att.com Magda Procopiuc(AT&T) Divesh Srivastava(AT&T) Thanh Tran (UMass Amherst) 1 Introduction Privacy is a common theme in public discourse

More information

Distributed Optimization Methods for Wide-Area Damping Control of Power System Oscillations

Distributed Optimization Methods for Wide-Area Damping Control of Power System Oscillations Preprints o the 9th World Congress The International Federation o Automatic Control Cape Town, South Arica. August 4-9, 4 Distributed Optimization Methods or Wide-Area Damping Control o Power System Oscillations

More information

DETC A GENERALIZED MAX-MIN SAMPLE FOR RELIABILITY ASSESSMENT WITH DEPENDENT VARIABLES

DETC A GENERALIZED MAX-MIN SAMPLE FOR RELIABILITY ASSESSMENT WITH DEPENDENT VARIABLES Proceedings o the ASME International Design Engineering Technical Conerences & Computers and Inormation in Engineering Conerence IDETC/CIE August 7-,, Bualo, USA DETC- A GENERALIZED MAX-MIN SAMPLE FOR

More information

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs Fluctuationlessness Theorem and its Application to Boundary Value Problems o ODEs NEJLA ALTAY İstanbul Technical University Inormatics Institute Maslak, 34469, İstanbul TÜRKİYE TURKEY) nejla@be.itu.edu.tr

More information

Supplement for In Search of the Holy Grail: Policy Convergence, Experimentation and Economic Performance Sharun W. Mukand and Dani Rodrik

Supplement for In Search of the Holy Grail: Policy Convergence, Experimentation and Economic Performance Sharun W. Mukand and Dani Rodrik Supplement or In Search o the Holy Grail: Policy Convergence, Experimentation and Economic Perormance Sharun W. Mukand and Dani Rodrik In what ollows we sketch out the proos or the lemma and propositions

More information

Provable Seconde Preimage Resistance Revisited

Provable Seconde Preimage Resistance Revisited Provable Seconde Preimage Resistance Revisited Charles Bouillaguet 1 Bastien Vayssiere 2 1 LIFL University o Lille, France 2 PRISM University o Versailles, France SAC 2013 1 / 29 Cryptographic Hash Functions

More information

3770 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 24, NO. 6, DECEMBER Muhammad Shahzad and Alex X. Liu

3770 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 24, NO. 6, DECEMBER Muhammad Shahzad and Alex X. Liu 3770 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 24, NO. 6, DECEMBER 2016 Fast and Reliable Detection and Identiication o Missing RFID Tags in the Wild Abstract Radio-requency identiication RFID) systems

More information

Estimation and detection of a periodic signal

Estimation and detection of a periodic signal Estimation and detection o a periodic signal Daniel Aronsson, Erik Björnemo, Mathias Johansson Signals and Systems Group, Uppsala University, Sweden, e-mail: Daniel.Aronsson,Erik.Bjornemo,Mathias.Johansson}@Angstrom.uu.se

More information

A Systematic Approach to Frequency Compensation of the Voltage Loop in Boost PFC Pre- regulators.

A Systematic Approach to Frequency Compensation of the Voltage Loop in Boost PFC Pre- regulators. A Systematic Approach to Frequency Compensation o the Voltage Loop in oost PFC Pre- regulators. Claudio Adragna, STMicroelectronics, Italy Abstract Venable s -actor method is a systematic procedure that

More information

Received: 30 July 2017; Accepted: 29 September 2017; Published: 8 October 2017

Received: 30 July 2017; Accepted: 29 September 2017; Published: 8 October 2017 mathematics Article Least-Squares Solution o Linear Dierential Equations Daniele Mortari ID Aerospace Engineering, Texas A&M University, College Station, TX 77843, USA; mortari@tamu.edu; Tel.: +1-979-845-734

More information

Hao Ren, Wim J. van der Linden and Qi Diao

Hao Ren, Wim J. van der Linden and Qi Diao psychometrika vol. 82, no. 2, 498 522 June 2017 doi: 10.1007/s11336-017-9553-1 CONTINUOUS ONLINE ITEM CALIBRATION: PARAMETER RECOVERY AND ITEM UTILIZATION Hao Ren, Wim J. van der Linden and Qi Diao PACIFIC

More information

Benny Pinkas Bar Ilan University

Benny Pinkas Bar Ilan University Winter School on Bar-Ilan University, Israel 30/1/2011-1/2/2011 Bar-Ilan University Benny Pinkas Bar Ilan University 1 Extending OT [IKNP] Is fully simulatable Depends on a non-standard security assumption

More information

Additional exercises in Stationary Stochastic Processes

Additional exercises in Stationary Stochastic Processes Mathematical Statistics, Centre or Mathematical Sciences Lund University Additional exercises 8 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

More information

Power Spectral Analysis of Elementary Cellular Automata

Power Spectral Analysis of Elementary Cellular Automata Power Spectral Analysis o Elementary Cellular Automata Shigeru Ninagawa Division o Inormation and Computer Science, Kanazawa Institute o Technology, 7- Ohgigaoka, Nonoichi, Ishikawa 92-850, Japan Spectral

More information

NONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS

NONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS REVSTAT Statistical Journal Volume 16, Number 2, April 2018, 167 185 NONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS Authors: Frank P.A. Coolen Department

More information

Telescoping Decomposition Method for Solving First Order Nonlinear Differential Equations

Telescoping Decomposition Method for Solving First Order Nonlinear Differential Equations Telescoping Decomposition Method or Solving First Order Nonlinear Dierential Equations 1 Mohammed Al-Reai 2 Maysem Abu-Dalu 3 Ahmed Al-Rawashdeh Abstract The Telescoping Decomposition Method TDM is a new

More information

THE use of radio frequency channels assigned to primary. Traffic-Aware Channel Sensing Order in Dynamic Spectrum Access Networks

THE use of radio frequency channels assigned to primary. Traffic-Aware Channel Sensing Order in Dynamic Spectrum Access Networks EEE JOURNAL ON SELECTED AREAS N COMMUNCATONS, VOL. X, NO. X, X 01X 1 Traic-Aware Channel Sensing Order in Dynamic Spectrum Access Networks Chun-Hao Liu, Jason A. Tran, Student Member, EEE, Przemysław Pawełczak,

More information

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x) Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not

More information

On the Girth of (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs

On the Girth of (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs On the Girth o (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs Sudarsan V S Ranganathan, Dariush Divsalar and Richard D Wesel Department o Electrical Engineering, University o Caliornia, Los

More information

AH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge

AH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge 0 E ttenuator Pair Ratio or vs requency NEEN-ERLN 700 Option-E 0-0 k Ultra-precision apacitance/loss ridge ttenuator Ratio Pair Uncertainty o in ppm or ll Usable Pairs o Taps 0 0 0. 0. 0. 07/08/0 E E E

More information

Better Than Advertised: Improved Collision-Resistance Guarantees for MD-Based Hash Functions

Better Than Advertised: Improved Collision-Resistance Guarantees for MD-Based Hash Functions Better Than Advertised: Improved Collision-Resistance Guarantees or MD-Based Hash Functions Mihir Bellare University o Caliornia San Diego La Jolla, Caliornia mihir@eng.ucsd.edu Joseph Jaeger University

More information

Objectives. By the time the student is finished with this section of the workbook, he/she should be able

Objectives. By the time the student is finished with this section of the workbook, he/she should be able FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise

More information

Finite Dimensional Hilbert Spaces are Complete for Dagger Compact Closed Categories (Extended Abstract)

Finite Dimensional Hilbert Spaces are Complete for Dagger Compact Closed Categories (Extended Abstract) Electronic Notes in Theoretical Computer Science 270 (1) (2011) 113 119 www.elsevier.com/locate/entcs Finite Dimensional Hilbert Spaces are Complete or Dagger Compact Closed Categories (Extended bstract)

More information

Introduction to Simulation - Lecture 2. Equation Formulation Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy

Introduction to Simulation - Lecture 2. Equation Formulation Methods. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Introduction to Simulation - Lecture Equation Formulation Methods Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Outline Formulating Equations rom Schematics Struts and Joints

More information

Provably Secure Double-Block-Length Hash Functions in a Black-Box Model

Provably Secure Double-Block-Length Hash Functions in a Black-Box Model Provably Secure Double-Block-ength Hash Functions in a Black-Box Model Shoichi Hirose Graduate School o Inormatics, Kyoto niversity, Kyoto 606-8501 Japan hirose@i.kyoto-u.ac.jp Abstract. In CRYPTO 89,

More information

Constrained Keys for Invertible Pseudorandom Functions

Constrained Keys for Invertible Pseudorandom Functions Constrained Keys or Invertible Pseudorandom Functions Dan Boneh, Sam Kim, and David J. Wu Stanord University {dabo,skim13,dwu4}@cs.stanord.edu Abstract A constrained pseudorandom unction (PRF) is a secure

More information

Equidistant Polarizing Transforms

Equidistant Polarizing Transforms DRAFT 1 Equidistant Polarizing Transorms Sinan Kahraman Abstract arxiv:1708.0133v1 [cs.it] 3 Aug 017 This paper presents a non-binary polar coding scheme that can reach the equidistant distant spectrum

More information

Supplementary Information Reconstructing propagation networks with temporal similarity

Supplementary Information Reconstructing propagation networks with temporal similarity Supplementary Inormation Reconstructing propagation networks with temporal similarity Hao Liao and An Zeng I. SI NOTES A. Special range. The special range is actually due to two reasons: the similarity

More information

A Particle Swarm Optimization Algorithm for Neighbor Selection in Peer-to-Peer Networks

A Particle Swarm Optimization Algorithm for Neighbor Selection in Peer-to-Peer Networks A Particle Swarm Optimization Algorithm or Neighbor Selection in Peer-to-Peer Networks Shichang Sun 1,3, Ajith Abraham 2,4, Guiyong Zhang 3, Hongbo Liu 3,4 1 School o Computer Science and Engineering,

More information

The concept of limit

The concept of limit Roberto s Notes on Dierential Calculus Chapter 1: Limits and continuity Section 1 The concept o limit What you need to know already: All basic concepts about unctions. What you can learn here: What limits

More information

Lectures 1&2: Introduction to Secure Computation, Yao s and GMW Protocols

Lectures 1&2: Introduction to Secure Computation, Yao s and GMW Protocols CS 294 Secure Computation January 19, 2016 Lectures 1&2: Introduction to Secure Computation, Yao s and GMW Protocols Instructor: Sanjam Garg Scribe: Pratyush Mishra 1 Introduction Secure multiparty computation

More information

arxiv: v1 [cs.ds] 3 Feb 2018

arxiv: v1 [cs.ds] 3 Feb 2018 A Model for Learned Bloom Filters and Related Structures Michael Mitzenmacher 1 arxiv:1802.00884v1 [cs.ds] 3 Feb 2018 Abstract Recent work has suggested enhancing Bloom filters by using a pre-filter, based

More information

ELEG 3143 Probability & Stochastic Process Ch. 4 Multiple Random Variables

ELEG 3143 Probability & Stochastic Process Ch. 4 Multiple Random Variables Department o Electrical Engineering University o Arkansas ELEG 3143 Probability & Stochastic Process Ch. 4 Multiple Random Variables Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Two discrete random variables

More information

Computing proximal points of nonconvex functions

Computing proximal points of nonconvex functions Mathematical Programming manuscript No. (will be inserted by the editor) Warren Hare Claudia Sagastizábal Computing proximal points o nonconvex unctions the date o receipt and acceptance should be inserted

More information

A Simple Explanation of the Sobolev Gradient Method

A Simple Explanation of the Sobolev Gradient Method A Simple Explanation o the Sobolev Gradient Method R. J. Renka July 3, 2006 Abstract We have observed that the term Sobolev gradient is used more oten than it is understood. Also, the term is oten used

More information

A Method for Assimilating Lagrangian Data into a Shallow-Water-Equation Ocean Model

A Method for Assimilating Lagrangian Data into a Shallow-Water-Equation Ocean Model APRIL 2006 S A L M A N E T A L. 1081 A Method or Assimilating Lagrangian Data into a Shallow-Water-Equation Ocean Model H. SALMAN, L.KUZNETSOV, AND C. K. R. T. JONES Department o Mathematics, University

More information

Stochastic Game Approach for Replay Attack Detection

Stochastic Game Approach for Replay Attack Detection Stochastic Game Approach or Replay Attack Detection Fei Miao Miroslav Pajic George J. Pappas. Abstract The existing tradeo between control system perormance and the detection rate or replay attacks highlights

More information

Design and Optimal Configuration of Full-Duplex MAC Protocol for Cognitive Radio Networks Considering Self-Interference

Design and Optimal Configuration of Full-Duplex MAC Protocol for Cognitive Radio Networks Considering Self-Interference Received November 8, 015, accepted December 10, 015, date o publication December 17, 015, date o current version December 8, 015. Digital Object Identiier 10.1109/ACCE.015.509449 Design and Optimal Coniguration

More information

Percentile Policies for Inventory Problems with Partially Observed Markovian Demands

Percentile Policies for Inventory Problems with Partially Observed Markovian Demands Proceedings o the International Conerence on Industrial Engineering and Operations Management Percentile Policies or Inventory Problems with Partially Observed Markovian Demands Farzaneh Mansouriard Department

More information

The Deutsch-Jozsa Problem: De-quantization and entanglement

The Deutsch-Jozsa Problem: De-quantization and entanglement The Deutsch-Jozsa Problem: De-quantization and entanglement Alastair A. Abbott Department o Computer Science University o Auckland, New Zealand May 31, 009 Abstract The Deustch-Jozsa problem is one o the

More information

Feedback Linearization

Feedback Linearization Feedback Linearization Peter Al Hokayem and Eduardo Gallestey May 14, 2015 1 Introduction Consider a class o single-input-single-output (SISO) nonlinear systems o the orm ẋ = (x) + g(x)u (1) y = h(x) (2)

More information

Solving Multi-Mode Time-Cost-Quality Trade-off Problem in Uncertainty Condition Using a Novel Genetic Algorithm

Solving Multi-Mode Time-Cost-Quality Trade-off Problem in Uncertainty Condition Using a Novel Genetic Algorithm International Journal o Management and Fuzzy Systems 2017; 3(3): 32-40 http://www.sciencepublishinggroup.com/j/ijms doi: 10.11648/j.ijms.20170303.11 Solving Multi-Mode Time-Cost-Quality Trade-o Problem

More information

Published in the American Economic Review Volume 102, Issue 1, February 2012, pages doi: /aer

Published in the American Economic Review Volume 102, Issue 1, February 2012, pages doi: /aer Published in the American Economic Review Volume 102, Issue 1, February 2012, pages 594-601. doi:10.1257/aer.102.1.594 CONTRACTS VS. SALARIES IN MATCHING FEDERICO ECHENIQUE Abstract. Firms and workers

More information

Chapter 2. Basic concepts of probability. Summary. 2.1 Axiomatic foundation of probability theory

Chapter 2. Basic concepts of probability. Summary. 2.1 Axiomatic foundation of probability theory Chapter Basic concepts o probability Demetris Koutsoyiannis Department o Water Resources and Environmental Engineering aculty o Civil Engineering, National Technical University o Athens, Greece Summary

More information

On Security Arguments of the Second Round SHA-3 Candidates

On Security Arguments of the Second Round SHA-3 Candidates On Security Arguments o the Second Round SA-3 Candidates Elena Andreeva Andrey Bogdanov Bart Mennink Bart Preneel Christian Rechberger March 19, 2012 Abstract In 2007, the US National Institute or Standards

More information

The Analysis of Electricity Storage Location Sites in the Electric Transmission Grid

The Analysis of Electricity Storage Location Sites in the Electric Transmission Grid Proceedings o the 2010 Industrial Engineering Research Conerence A. Johnson and J. Miller, eds. The Analysis o Electricity Storage Location Sites in the Electric Transmission Grid Thomas F. Brady College

More information

BANDELET IMAGE APPROXIMATION AND COMPRESSION

BANDELET IMAGE APPROXIMATION AND COMPRESSION BANDELET IMAGE APPOXIMATION AND COMPESSION E. LE PENNEC AND S. MALLAT Abstract. Finding eicient geometric representations o images is a central issue to improve image compression and noise removal algorithms.

More information

CMPUT651: Differential Privacy

CMPUT651: Differential Privacy CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged

More information

A Brief Survey on Semi-supervised Learning with Graph Regularization

A Brief Survey on Semi-supervised Learning with Graph Regularization 000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 A

More information

Common Errors: How to (and Not to) Control for Unobserved Heterogeneity *

Common Errors: How to (and Not to) Control for Unobserved Heterogeneity * Common Errors: How to (and ot to) Control or Unobserved Heterogeneity * Todd A. Gormley and David A. Matsa June 6 0 Abstract Controlling or unobserved heterogeneity (or common errors ) such as industry-speciic

More information

arxiv:quant-ph/ v2 12 Jan 2006

arxiv:quant-ph/ v2 12 Jan 2006 Quantum Inormation and Computation, Vol., No. (25) c Rinton Press A low-map model or analyzing pseudothresholds in ault-tolerant quantum computing arxiv:quant-ph/58176v2 12 Jan 26 Krysta M. Svore Columbia

More information

A PROBABILISTIC POWER DOMAIN ALGORITHM FOR FRACTAL IMAGE DECODING

A PROBABILISTIC POWER DOMAIN ALGORITHM FOR FRACTAL IMAGE DECODING Stochastics and Dynamics, Vol. 2, o. 2 (2002) 6 73 c World Scientiic Publishing Company A PROBABILISTIC POWER DOMAI ALGORITHM FOR FRACTAL IMAGE DECODIG V. DRAKOPOULOS Department o Inormatics and Telecommunications,

More information

Semideterministic Finite Automata in Operational Research

Semideterministic Finite Automata in Operational Research Applied Mathematical Sciences, Vol. 0, 206, no. 6, 747-759 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.2988/ams.206.62 Semideterministic Finite Automata in Operational Research V. N. Dumachev and

More information

CORRESPONDENCE ANALYSIS

CORRESPONDENCE ANALYSIS CORRESPONDENCE ANALYSIS INTUITIVE THEORETICAL PRESENTATION BASIC RATIONALE DATA PREPARATION INITIAL TRANSFORAMATION OF THE INPUT MATRIX INTO PROFILES DEFINITION OF GEOMETRIC CONCEPTS (MASS, DISTANCE AND

More information

RESOLUTION MSC.362(92) (Adopted on 14 June 2013) REVISED RECOMMENDATION ON A STANDARD METHOD FOR EVALUATING CROSS-FLOODING ARRANGEMENTS

RESOLUTION MSC.362(92) (Adopted on 14 June 2013) REVISED RECOMMENDATION ON A STANDARD METHOD FOR EVALUATING CROSS-FLOODING ARRANGEMENTS (Adopted on 4 June 203) (Adopted on 4 June 203) ANNEX 8 (Adopted on 4 June 203) MSC 92/26/Add. Annex 8, page THE MARITIME SAFETY COMMITTEE, RECALLING Article 28(b) o the Convention on the International

More information

Reliability Assessment with Correlated Variables using Support Vector Machines

Reliability Assessment with Correlated Variables using Support Vector Machines Reliability Assessment with Correlated Variables using Support Vector Machines Peng Jiang, Anirban Basudhar, and Samy Missoum Aerospace and Mechanical Engineering Department, University o Arizona, Tucson,

More information

Estimation of Sample Reactivity Worth with Differential Operator Sampling Method

Estimation of Sample Reactivity Worth with Differential Operator Sampling Method Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.842-850 (2011) ARTICLE Estimation o Sample Reactivity Worth with Dierential Operator Sampling Method Yasunobu NAGAYA and Takamasa MORI Japan Atomic

More information

Basic properties of limits

Basic properties of limits Roberto s Notes on Dierential Calculus Chapter : Limits and continuity Section Basic properties o its What you need to know already: The basic concepts, notation and terminology related to its. What you

More information

Probabilistic Optimisation applied to Spacecraft Rendezvous on Keplerian Orbits

Probabilistic Optimisation applied to Spacecraft Rendezvous on Keplerian Orbits Probabilistic Optimisation applied to pacecrat Rendezvous on Keplerian Orbits Grégory aive a, Massimiliano Vasile b a Université de Liège, Faculté des ciences Appliquées, Belgium b Dipartimento di Ingegneria

More information

Simpler Functions for Decompositions

Simpler Functions for Decompositions Simpler Functions or Decompositions Bernd Steinbach Freiberg University o Mining and Technology, Institute o Computer Science, D-09596 Freiberg, Germany Abstract. This paper deals with the synthesis o

More information

Aggregate Growth: R =αn 1/ d f

Aggregate Growth: R =αn 1/ d f Aggregate Growth: Mass-ractal aggregates are partly described by the mass-ractal dimension, d, that deines the relationship between size and mass, R =αn 1/ d where α is the lacunarity constant, R is the

More information

Comptes rendus de l Academie bulgare des Sciences, Tome 59, 4, 2006, p POSITIVE DEFINITE RANDOM MATRICES. Evelina Veleva

Comptes rendus de l Academie bulgare des Sciences, Tome 59, 4, 2006, p POSITIVE DEFINITE RANDOM MATRICES. Evelina Veleva Comtes rendus de l Academie bulgare des ciences Tome 59 4 6 353 36 POITIVE DEFINITE RANDOM MATRICE Evelina Veleva Abstract: The aer begins with necessary and suicient conditions or ositive deiniteness

More information

SEPARATED AND PROPER MORPHISMS

SEPARATED AND PROPER MORPHISMS SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN Last quarter, we introduced the closed diagonal condition or a prevariety to be a prevariety, and the universally closed condition or a variety to be complete.

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information