Privacy-Preserving Bayesian Network Learning From Heterogeneous Distributed Data

Size: px
Start display at page:

Download "Privacy-Preserving Bayesian Network Learning From Heterogeneous Distributed Data"

Transcription

1 Privacy-Preserving Bayesian Network Learning From Heterogeneous Distributed Data Jianjie Ma and Krishnamoorthy Sivakumar School of EECS, Washington State University, Pullman, WA , USA Telehone: , FAX: , {jma, Abstract In this aer, we roose a ost randomization technique to learn a Bayesian network (BN) from distributed heterogeneous data, in a rivacy sensitive fashion. In this case, two or more arties own sensitive data but want to learn a Bayesian network from the combined data. We consider both structure and arameter learning for the BN. The only required information from the data set is a set of sufficient statistics for learning both network structure and arameters. The roosed method estimates the sufficient statistics from the randomized data. The estimated sufficient statistics are then used to learn a BN. For structure learning, we face the familiar extra-link roblem since estimation errors tend to break the conditional indeendence among the variables. We roose modifications of score functions used for BN learning, to solve this roblem. We show both theoretically and exerimentally that ost randomization is an efficient, flexible, and easy-to-use method to learn Bayesian network from rivacy sensitive data. Index Terms Privacy Preserving Data Mining, Bayesian Network, Post Randomization I. INTRODUCTION Privacy-reserving data mining deals with the roblem of building accurate data mining models over aggregate data, while rotecting rivacy at the level of individual records. There are two main aroaches to rivacy reserving data mining. One aroach is to erturb or randomize the data before sending it to the data miner. The erturbed or randomized data are then used to learn or mine the models and atterns []. Evfimieski et al. [5] roosed a select-a-size randomization technique for rivacy-reserving mining of association rules. Du et. al. [4] suggested using randomized resonse techniques for rivacy-reserving data mining and constructed decision trees from randomized data. Another aroach is to use secure multiarty comutation (SMC) to enable two or more arties to build data models without every arty learning anything about the other arty s data [9]. Though the SMC aroach is aealing in its generality and simlicity, secific and efficient rotocols have to be develoed for data mining urose since it is aarently inefficient for data mining alications [3]. All the current available techniques using SMC are based on a semi-honest model. A. Related Work Privacy-reserving Bayesian network learning is a more recent toic. Wright and Yang [5] discuss rivacyreserving Bayesian network structure comutation on distributed heterogeneous data while Meng et al. [] have considered the rivacy-sensitive Bayesian Network arameter learning roblem. The underlying method used in both works is to convert the comutations required for BN learning into a series of inner roduct comutations and then to use a secure inner roduct comutation method roosed elsewhere. An SMC based method for inner roduct comutation is used in [5] whereas [] uses a method based on random rojection roosed in []. The number of secure comutation oerations increases exonentially with the ossible configurations of the roblem variables. Wright et al. [5] roosed a rivacy-reserving Bayesian network structure comutation method. The accuracy of the learned structure using [5] is still not clear since their method is based on an aroximated score function which might cause error links. Our exeriments show that extra-link roblem is severe even with small estimation errors. Existing literature on rivacy reserving Bayesian network learning focuses on multiarty models. In addition to this model, our aer also considers a model where there is a data miner who actually does all the comutations and learning for the articiating arties. SMC based method has the following two drawbacks: (a) unrealistic semi-honest model assumtion (b) large volumes of cooerative or synchronized comutations among the arties involved. Most of the synchronized comutations are the overheads due to rivacy requirement. Other related works include [4], [5], [4], all of which consider the case where there is a data miner who does all the learning. In [4], [5], the focus is on association rules mining whereas [5] uses a select-a-size randomization. Rizi and Haritsa [4] roosed a randomization scheme called MASK which is based on a simle robabilistic distortion of user data. Post randomization gives a general framework for randomization of categorical data after data are collected. Select-a-size randomization can be considered as a secial and intelligently designed ost randomization. MASK [4] is a ost randomization technique for binary variables. In [4], [5] the data is randomized by record, which actually imlements randomization to all variables simultaneously. Randomizing all the variables simultaneously introduces unnecessary randomness. The roosed Post randomization method rovides a more flexible way to coe with situations when different variables have different rivacy requirements.

2 (Left) Binary Randomization (Right) Ternary Symmetric Random- Fig.. ization X X ~ X B. Our Contribution This aer uses ost randomization techniques to reserve rivacy during Bayesian network learning. Gouweleeuw et al. [6] introduced ost randomization for statistical databases for information disclosure control. Post randomization technique has roved to be effective in disclosure control. We exlored the ossibility of using ost randomization in rivacy-reserving data mining taking the rivacy reserving Bayesian network learning as an examle. We consider two Privacy-Preserving Bayesian network learning setus on distributed heterogeneous data, where different sets of variables are collected at the different sites. We develo estimators for frequency counters used in the learning of Bayesian network (both structure and arameters) and exressions for their covariance, based on the randomized data. Our exeriments show that ost randomization is an efficient, flexible and easy-to-use method to learn Bayesian network from rivacy sensitive data. Using ost randomization in rivacy-reserving data mining overcomes the inherent drawbacks of SMC method and rovides a reasonable rivacy and accuracy. It is ossible for a malicious arty in SMC to get rivate information of other arties while he can only get randomized data if ost randomization has been imlemented to the data. Using Post randomization for rivacy-reserving data mining rovides a general framework for randomization of categorical data in rivacy-reserving data mining. II. POST RANDOMIZATION AND ITS PRIVACY ANALYSIS A. Post Randomization Consider a data set D with a set of variables X, X,..., X n, where X i takes discrete values from a set S i whose cardinality is K i. Post randomization for variable X i is a (random) maing R i : S i S i, based on a set of transition robabilities i lm = ( X i = k m X i = k l ), where k m, k l S i and X i denotes the (randomized) variable value corresonding to variable X i. The transition robability i lm is the robability that a variable with original value k l is randomized to the value k m. Let P i = { i lm } denote the K i K i dimensional matrix that has i lm as its (l, m)th entry. The randomized data set is D = {ỹ,..., ỹ N }, where ỹ i is an instance of the randomized variables { X,..., X n }. For examle, Binary Randomization can be used if the variable is binary. Ternary Symmetric Randomization is a choice if the variable is ternary. Binary and Ternary Symmetric Randomization are as shown in Fig.. We can aly the same randomization scheme indeendently to all of the variables uniform randomization of the data set. Alternatively, we 3 3 X ~ can use a non-uniform randomization, where different ost randomization schemes are alied to different variables, indeendently. For examle, we can choose different randomization arameters and to different binary variables for non-uniform randomization if the rivacy requirement of the two variables are different. Non-uniform randomization is effective when different variables require different levels of rivacy. The non-uniform randomization includes the secial case when there is no rivacy requirement for some of the variables. From the above, we can see that if variable X i takes K i values (or categories), the dimension of P i will be K i K i. With larger K i, more randomization is introduced into variable X i in general. This is good from a rivacy oint of view. However, the variances of the estimator of frequency counters will also be larger for a given size of training samles. One solution to this roblem is to artition the K i categories of variable X i into several grous such that a value in one grou can only be randomized to a value in the same grou. In this case, Matrix P i becomes a block diagonal matrix. Post randomization can also be imlemented on several variables simultaneously. For examle, the variables X i and X j can be randomized simultaneously according to transition robability ( X i = l, X j = l X i = k, X j = k ). We can consider those variables randomized simultaneously as a combined variable when estimating the frequency counters. Proer simultaneous randomization can avoid the ossible inconsistency of the randomized data set which indeendent randomization might cause. B. Privacy Analysis of Post Randomization We consider the notion of rivacy introduced by Evfimievski et al. [5] in terms of an amlification factor γ. The γ-amlification in [5] is roosed in the framework where every data record should be randomized with a factor less than γ to limit the rivacy breach before the data are sent to the data miner. However, in this aer we use the amlification γ urely as a worst-case quantification of rivacy for a designed ost randomization scheme. We shall first briefly review the notion of γ-amlification in our context of ost randomization. A ost randomization oerator for variable X i with transition robability P i is P i (k,k) P i (k,k) γ, at most γ-amlifying for Xi = k if k, k where k, k, k S i and S i = K i. A ost randomization oerator is at most γ-amlifying for variable X i if it is at most γ-amlifying for any k S i. An uward ρ -to-ρ rivacy breach occurs when the osterior belief (X i = k X i = k) ρ, while the rior belief (X i = k ) ρ. A downward ρ -to-ρ rivacy breach occurs when the osterior belief (X i k Xi = k) ρ, while the rior belief (X i k ) ρ. If the randomization oerator is at most γ-amlifying for X i, revealing X i will cause neither an uward ρ -to-ρ rivacy breach nor a downward ρ -toρ rivacy breach if ρ( ρ) ρ ( ρ ) > γ. Clearly, smaller the value of γ, better is the worst case rivacy. Ideally we would like to have γ =. Interested reader can refer to [5] for a detailed discussion about γ-amlification. For binary

3 symmetric randomization (Fig. ), if = =.5, then it is easy to see that it is at most γ-amlifying for γ =. For the ternary symmetric randomization, if, then amlification is at most γ =. The at most γ-amlification rovides a worst case quantification of rivacy. For a given γ and rior belief ρ, we can get a ρ such that ρ ( ρ) ρ ( ρ ) = γ and we will not have a rivacy breach with osterior belief ρ > ρ. However, the at most γ amlification does not rovide any information of rivacy reserved in general. Besides γ, we use K = min k#{k ( X i = k X i = k ) > }, minimum number of ossible categories that can be randomized to category k for a designed ost randomization, where minimum is taken over all categories of X i. This K indicates the rivacy reserved in general. It is similar to the K defined in K-anonymity in [3] but in a robabilistic sense. III. FRAMEWORK OF PRIVACY-PRESERVING BN LEARNING USING POST RANDOMIZATION The roblem of Bayesian network learning is to find a network G and corresonding arameters that best matches the given training data set D = {y, y,..., y N }, where each record y i is an instance of variables {X, X,..., X n }. Each arty in our case observes a subset of the variables {X, X,..., X n }. We note that all the required information for the arameter and structure learning are sufficient statistics N ijk s of each candidate structure (the fixed structure G only for arameter learning) from the training data D, where N ijk is the number of records such that variable X i is in its kth configuration and its arents P a(x i ) are in the jth configuration. Therefore, the roblem of (rivacysensitive) Bayesian network learning is equivalent to the roblem of calculating the sufficient statistics (in a rivacysensitive manner). In this aer, we estimate those sufficient statistics from the randomized data. We consider the following two setus (I) Several arties want to learn a global Bayesian network but are concerned about the rivacy of their individual data. This corresonds to the Multiarty model of SMC. (II) All arties send their randomized data to a data miner who does the learning. A. Parameter Learning For arameter learning, the structure G is assumed fixed and known to every arty. For setu I, we used the definitions of cross variable and cross arent from []. Sensitive cross arents are the only variables that need to be randomized in setu I. Learning arameters for setu I above can be done as follows: For each arty a i, () Randomize sensitive cross arents belonging to its own arty according to their resective rivacy requirements using ost randomization as described in Section II. Randomizations are done indeendently for each (combined) variable and each record. () Send randomized cross arents of arty a i for arty a j to arty a j together with the robability transition matrix used. (3) Learn arameters for local variables of arty a i. This ste does not involve randomized data. (4) Estimate the sufficient statistics N ijk s for each cross variable at same site a i using the local data and randomized arent data from other arties. (5) Estimate the arameters for cross variables using the estimated sufficient statistics. (6) Share the arameters with all other arties. In setu II: Every arty randomizes all its sensitive variables according to their resective rivacy requirements using ost randomization (similar to randomization done to cross arents in setu I). Randomized data and their corresonding robability transition matrices are then sent to data miner. Data Miner then estimates the sufficient statistics N ijk s and arameters for each node X i using the randomized data. The details of estimation of sufficient statistics N ijk and arameter learning using estimated sufficient statistics are described in Sections IV and V resectively. B. Structure Learning During the search of a BN Structure (Directed Acyclic Grah DAG) that best fit the data, erform randomization and estimation of sufficient statistics as described in Section III-A for each candidate structure as a fixed structure G. Use those estimated sufficient statistics to calculate the score of the candidate structure G. Then, we can choose a structure with maximum score. We use K- algorithm to search for a DAG that aroximately has the maximum score. The details of structure learning from randomized data using K- algorithm are resented in Section VI. IV. ESTIMATION OF SUFFICIENT STATISTICS FROM RANDOMIZED DATA From Section III, we can see that the roblem of rivacyreserving Bayesian network learning can be decomosed into a series of estimation of N ijk s for each node X i and each candidate structure. The arents P a(x i ) of Node X i are given by the (candidate) structure G. Consider the following general case: The cardinality of Node X i is K i and it has Q arent nodes P a(x i ) = {P a i (), P a i (),..., P a i (Q)} in the candidate structure. The cardinality of each arent P a i (q) is K P ai(q). These variables can be arbitrarily vertically artitioned to different arties in both setus. The randomization of each (combined) variable can also be done by grouing the categories of the variable into grous. Our discussion below is indeendent of the secific artitioning and grouing of the variables. We have the following different cases for estimating N ijk s from the randomized data D due to simultaneous randomization. Note that only the variables belonging to the same arty can be randomized simultaneously in both of our setus. (a) X i and all of its arents are randomized indeendently. (b) Some arents of X i are randomized simultaneously. (c) X i is randomized simultaneously with some of its arents. (d) X i is randomized simultaneously with other variables (not its arent). For cases (b) and (c), we can consider the simultaneously randomized variables as a combined variable. For examle, if

4 node X i is randomized simultaneously with one of its arents P a i (), N ijk is equal to the number of records such that (X i ; P a i ()) = (k; j ), P a i () = j,..., P a i (Q) = j Q, where (X i ; P a i ()) is a combined variable. Thus, we can estimate the N ijk s from the randomized data by treating (X i ; P a i ()) as a single variable with cardinality K i K P ai(). For case (d), since the current N ijk does not involve the variable randomized simultaneously with X i, we can get the marginal transition robability matrix from the given transition robability matrix, which is for the combined variable. Hence, without loss of generality, we can consider case (a) only. Suose the transition robability matrices of X i and its arents are P i, P P ai(),..., P P ai(q), resectively. The roblem here is to estimate the sufficient statistics N ijk from the randomized data. We denote by P a(x i ) as a comound variable for all the arents of Node X i. Hence P a(x i ) takes J i = Q q= K P a i(q)different values. The following are some notations used in the sequel: Suerscrit denotes the variables after randomization and suerscrit denotes an estimate of the corresonding variable. N ijk is as defined in Section III and N ij = Ki k= N ijk, where K i is the cardinality of variable X i. N i is the J i K i dimensional vector of N ijk values, that is N i = (N i, N i,..., N iki, N i,..., N ijik i ) t, where suerscrit t denotes matrix transose. N i (l) for l J i K i is the number of records that {X i, P a(x i )} have the lth configuration, where l = K i (j ) + k for some j and k such that X i is in kth configuration while P a(x i ) in jth configuration. Ñ ijk, Ñ ij, and Ñi are defined similarly as N ijk, N ij and N i but in the randomized data D. ˆNijk, ˆNij, and ˆN i are the estimations of N ijk, N ij, and N i, resectively. Given the training Data set D with N records, a candidate structure G and a randomization scheme characterized by robability transition matrices P i, P P ai(),..., P P ai(q), we have the following theorems. Theorem. (a) E[Ñi D] = P t N i, where P = P i P P ai and P P ai = P P ai() P P ai()... P P ai(q), denotes Kronecker matrix roduct. (b) Denote Yml i as a binomial random variable that gives the number of records such that { X i, P a(x i )} is in the lth configuration while {X i, P a(x i )} is in the mth configuration. We have Yml i B(N i (m), π), where π = P (m, l), the (m, l)th element of the robability matrix P defined in (a) and B denotes Bino- robability distribution. Moreover, Cov{Yml i mial, Ynl i } = Var{Y i ml } = N i(m)p (m, l )( P (m, l )) if n = m,l = l N i (m)p (m, l )P (m, l ) if n = m,l l if n m (c) For l =,,..., J i K i, Ñ i (l) = J i K i m= Y i ml. Moreover, Cov{Ñi D} = J i l= N i(l)v l where V l is a K i J i K i J i covariance matrix such that its (l, l )th element V l (l, l ) = { P (l, l )( P (l, l )) if l = l, P (l, l )P (l, l ) if l l. Proofs are omitted here due to the age limitations. Interested reader can refer to a longer version of this aer [] for details. The following theorem establishes the bias and variance of the estimator ˆN i = (P t ) Ñ i. Theorem. ˆNi = (P t ) Ñ i is an unbiased estimator for N i and Cov{ ˆN i D} = (P ) t Cov{Ñi D}P, where P and Cov{Ñi D} are given in Theorem. A Binomial distribution B(n, ) can be aroximated by a normal distribution N (n, n( )), when n is large. Since in Bayesian Network learning we usually have a relatively large samle size, the distribution of Ñ i (l) can be well aroximated by a normal distribution by Theorem (c). Ñ i and ˆN i can be aroximated by a J i K i dimensional joint normal random variable since by Theorem. In articular, ˆN i N (N i, Cov{ ˆN i D}), where Cov{ ˆN i D} is given by Theorem. V. PARAMETER ESTIMATOR AND ITS DISTRIBUTION The Maximum Likelihood (ML) estimate of the BN arameter using the estimated sufficient statistics N ijk is ˆθ ijk ML = ˆN ijk ˆN = ijk ˆN Ki ˆN and the Maximum a osteriori (MAP) estimate of the arameter using the estimated ij k= ijk sufficient statistics is ˆθ ijk = α ijk+ ˆN ijk α i j+ ˆN, where α ijk is from ij the assumtion of rior Dirichlet distribution for θ ij, that is P (θ ij G) = Dirichlet(α ij,..., α ijki ). Here, we use the ML estimator and analyze its erformance. Results for the MAP estimator can be obtained in a similar fashion. The estimated arameter is a ratio of two deendent (well aroximated) normal random variables with non-zero mean. The exact distribution of the ratio of two normal variables W = X X with (X, X ) N (µ, µ, σ, σ, ρ) for arbitrary mean, variance, and correlation is well-known [7]. However, the general distribution is quite comlicated. An aroximation to the exact distribution, when the robability P (X > ) or when µ σ is large is also given in [7]. Furthermore, Z = µw µ σ σ a(w) is aroximately a standard normal distribution if µ σ is large. A Taylor series exansion of Z around µ µ µ shows that Z (w µ µ ). It follows σ σ µ σ µ ρµ σ σ µ + σ that the distribution of W can be aroximated by a normal distribution with mean µ µ and variance σ σ ρµ σ σ µ + ). We have σ ˆθ ML ijk = ˆN ijk ˆN ij = µ ( µ σ µ ˆN ijk Ki k= ˆN ijk. The jointly normal variable ( ˆN ijk, ˆN ij ) has (aroximate) distribution N (µ, µ, σ, σ, ρ), where µ = N ijk, µ = N ij and σ, σ, ρ can be obtained from Theorem. From Theorems and, we can see µ σ is of the order of N ij for a given data set and a transition robability matrix P. So µ σ is large even for relatively small samle sizes. Hence, the aroximation of the distribution of ratio of two normal random variables with the simler form works well for us. On the other hand, the normal aroximation using Taylor exansion is also feasible in our case since N ijk N ij. Hence, for a given Data set D and a robability transition matrix P,

5 ˆθ ijk ML = ˆN ijk can be aroximated by a normal distribution ˆN ij with mean θ ijk = N ijk N ij and variance σ σ ( N ijk Nijk σ N ρn ijk ij σ σ N ij + ) = σ σ σ ( θ ijk ρθ ijk Nijk σ σ σ + ). We can see the variance is σ of the order N since σ s are of the order N from Theorem. VI. STRUCTURE LEARNING FROM RANDOMIZED DATA The roblem of learning Bayesian network structure from samle data D is to find a network structure G that best matches the samle data D. Our roblem here is to learn the network structure from the randomized data D. We use K- algorithm for structure learning. K- is a greedy search algorithm that searches for a DAG G that (aroximately) maximizes a score function Score(D, G). The two score functions we discuss in this aer are Bayesian score and BIC/MDL score. The Bayesian score for a node is Score(X i, P a i (X i )) = Ji j= Ki k= Γ(α ij ) Γ(α ijk +N ijk ) Γ(α ij +N ij ) Γ(α ij ). The BIC/MDL score for a node is given by Score(X i, P a(x i )) = Ki k= Ji j= N ijklog(θ ijk ) N #(X i, P a(x i )),where #(X i, P a(x i )) is the number of arameters we need to reresent (X i P a(x i )). The decomosability roerty of those score function makes a single oeration in K- algorithm as the addition of a arent to a variable. The addition of a arent corresonds to two different candidate structures G and G. By comaring P old = score(d, X i, P a(x i )) and P new = score(d, X i, P a(x i ) {Z}), where Z is a new candidate arent for variable X i, K- algorithm decides if there is a link between candidate arent Z and node X i. We use the two sets of estimated sufficient statistics ˆN ijk s to calculate P old and P new. The estimation of sufficient statistics is done as described in Section IV for each variable and the difference between P old and P new is caused only by the sufficient statistics associated with node X i. The estimation of sufficient statistics is done as described in Section IV for structures G and G. The framework of structure learning for the two setus were discussed in Section III-B. One roblem with structure learning is that estimation errors of the sufficient statistics tend to cause extra links, which are links that aears in the structure learned from randomized data D but not in the structure learned from original data D. Missing links haen only when the randomization is relatively large. Missing links are those links that are learnt from the original data D but are not learnt from the randomized data D. The extra-link roblem is not difficult to understand from the statistical definition of indeendence. For examle, if we have a samle data from two indeendent discrete random variables A and B, we can conclude statistically that A and B are indeendent if we have (A = i, B = j) = (A = i)(b = j) ± ε i, j, where (A = i, B = j), (A = i), and (B = j) are estimated from their resective relative frequency in the samle set and ε deends on the secific indeendence testing method used. If the data samles of A and B are ost-randomized, the relative frequencies can only be estimated using the available randomized samles. The estimation errors tend to cause (A = i, B = j) > (A = i)(b = j) + ε or (A = i, B = j) < (A = i)(b = j) ε for some i and j. If the same indeendence testing is used, we tend to conclude that A and B are deendent. Similar argument holds for conditional indeendence which is encoded by the Bayesian network structure. Our exeriments show that the extra links exist even for relatively small estimation errors. This kind of effect of estimation error on indendence testing usually causes Score(C, {AB}) > Score(C, {A}) although C is actually indeendent of B given A. Thus an extra link B C will usually result. The above discussion suggests that if we want to learn correct structures from the randomized data, we should enalize comlex structures. For Bayesian score, we roose adding a arent only when P new > ηp old, where η is a suitable threshold. For BIC/MDL score, we roose modifying the score function by increasing the enalty term (descrition length); that is, Score B (X i, P a(x i )) = Ki Ji k= j= N ijklog(θ ijk ) C N #(X i, P a(x i )) for some C >. Exeriments show that the threshold η and C deend on the level of randomization, More the randomization, larger the threshold η and C should be. The underlying relationshi between the threshold η (or C ) and randomization under the available samles can be a further research toic. We believe there exists otimal choices for threshold η and C for a designed randomization scheme. VII. EXPERIMENTAL RESULTS We now resent some exerimental results that demonstrate the accuracy of the BN learning algorithm for different levels of randomization. In Setu I, less number of variables are randomized than Setu II. Therefore, the exerimental results of Setu I are always better than those for Setu II. Hence, we resent only the results from Setu II here. A. Parameter Learning: Non-uniform Randomization In this exeriment, we use the Bayesian Network shown in Fig., where the variables are distributed over three arties. All variables are binary excet variables L and B, which are ternary. The conditional robabilities of the different nodes are also shown., samles were generated from this Bayesian Network to form the data set D. This data set was A Site T X F E L C G Site 3 S Site B D A.7,.3 T.,.9,.9,. S.5,.5 L.3,.7,.4,.5,.3,.5 X.,.6,.8,.4 F.5,.9,.75,. E.5,.8,.5,.5,.3,.4,.75,.,.85,.5,.7,.6 D.7,.65,.,.4,.8,.35,.3,.35,.9,.6,.,.65 C.9,.4,.6,.5,.,.6,.4,.75 B.8,.5,.,.5,.,.35 G.,.4,.8,.6 Fig.. Bayesian Network for Exeriment then randomized according to the scheme described in Table

6 I, where variables T, S, and G were considered not sensitive and hence not randomized. Note that we use a non-uniform randomization with different levels of randomization for different variables. The corresonding at most γ amlification are also shown in Table I. K = for Binary randomization while K = 3 for ternary randomization. Table II shows the arameter of all nodes learnt from the randomized data using the algorithm described in Section III for setu II. All the values in the Table are average over 5 runs, with the corresonding standard deviation indicated in arenthesis. It is clear from Table II that the roosed algorithms can accurately learn the BN arameters for both Setus. TABLE I RANDOMIZATION PERFORMED TO THE VARIABLES P(D B,C) A.5 A.5.5 G.3.7 B E B E Site D Site C C F D F Fig. 3. Bayesian Network for Exeriment P(D= B=,C=) P(D= B=,C=) P(D= B=,C=) P(D= B=,C=) + standard dev iation A,D Binary sym =.5 γ = 3 L,B Ternary sym =.5 γ = 4.67 E Binary sym =. γ = 4 X Binary sym =.4 γ = 3 C,F Binary non sym =., =.5 γ = 9 B. Parameter Learning: Uniform Randomization In this exeriment, we use a uniform randomization and test the accuracy of the BN arameters as a function of the randomization arameter. We used the Bayesian Network shown in Fig. 3. All nodes are binary. We consider the arameters of node D, which has arents in a different site. 5, samles were generated from the above Bayesian Network. Binary Symmetric randomization with arameter was used to randomize the variables. Fig. 4 (to) shows a grah of the arameters as a function of (mean of the estimated arameters over runs is lotted; lot of mean lus one standard deviation is also included). It is clear from the figure that for randomization arameter.5, we can estimate the BN arameters with almost no error. Even for values u to.3, we get reasonably good arameter estimates. From a rivacy ersective, =.5 corresonds TABLE II MEAN AND STANDARD DEVIATION ( ) OVER 5 RUNS OF PARAMETERS LEARNT FROM THE RANDOMIZED DATA A.7(.7).3(.7) T.(.5).9(.77).9(.5).97(.77) S.5(.).49(.) L.3(.49).7(.57).39(.64).4(.55).3(.93).5(.45) B.8(.77).6(.4).94(.7).49(.73).(.).36(.65) E.5(.).8(.9).4(.7).5 (.).3 (.6).4(.34).75(.).9(.9).86(.7).5(.).69(.64).59(.3) D.69 (.).65(.3).(3.3).38(.77).79(.7).39(5.65).3(.3).35(.3).89(3.34).6(.77).(.7).6(5.7) X.(.8).6(.).8(.8).4(.) C.9(.).38(.6).6(.6).5(.).(.).6(.6).39(.6).75(.) F.4(.73).9(.).77(.73).9(.) G.(.3).4(.9).8(.3).6(.9) * randomization arameter c=% Fig. 4. P(D= B=,C=) P(D= B=,C=) P(D= B=,C=) P(D= B=,C=) samle size N x 4 (To) Estimated arameters ˆP (D B, C) vs. ; (Bottom) vs. N to an amlification factor of γ =.5.5 = 3 with K =. We have to oint out that this is done with 5, samles. Under the same accuracy requirement, it is intuitive that the value can be closer to.5 if more samles are available. The closer is to.5, the closer γ is to. Another way to assess the erformance of the algorithms is to determine the maximum level of randomization that we can use for a given accuracy. Towards that end, for a given level of required estimation accuracy, defined in terms of an absolute arameter estimation error threshold c, let be the smallest value of (obtained by averaging over ten runs) for which the absolute value of the estimation error exceeds c. Fig. 4(bottom) shows the variation of as a function of the samle size N, for values of c = %. Note that c = % corresonds to very small arameter estimation error. C. Structure Learning In this exeriment, we test the accuracy of BN structure learning from randomized data., samles from the BN in Fig. 3 was used. All variables were randomized using a binary symmetric randomization with arameter. The K- algorithm with threshold η for Bayesian score or enalty

7 error links 6 4 η= η=.99 v ariable η...3 randomization arameter KL distance η= η=.99 variable η...3 randomization arameter Fig. 5. Structure learning using Bayesian Score: (Left) No. of links in error; (Right) KL distance term C for BIC/MDL score was used to learn the BN structure from the randomized data. We quantify the error in structure learning with two different error measures: (a) Sum of missing links and extra links and (b) KL-distance between the joint robability of the learnt BN and the true BN. The latter actually incororates errors in the structure as well as the arameters and might be better. Fig. 5 (left) shows the number of links in error (sum of missing links and extra links) as a function of the randomization arameter, for the case of Bayesian scores. Fig. 5 (right) deicts a similar grah for KL-distance. Three different choices of the threshold were considered: η =, η =.99, and a variable η value deending on the randomization arameter value chosen as follows η =.99 if <.5, η =.98 if.5 <.5 and η =.97 if.5 <.3. We have similar results using BIC/MDL score with three different values for the enalty term C =, C = 4, and C = 8. Due to age limitation, the results using BIC/MDL are not resented here. We would like to add that the structure error was always contributed by extra links, with just one excetion (when Bayesian score with variable η is used) where we had one missing link. It can be seen from the grahs that a variable value of η gives better results than a fixed value. For the case of Bayesian score, we can have a randomization level of =., with little error in structure learning. From a rivacy ersective, this corresonds to a value of γ = 4 with K =. The erformance with BIC/MDL score is even better, where we can have a randomization level of =.5, with little error in structure learning. This corresonds to a value of γ = 3. We also have to oint out here that this is done using a samle size of, oints. With more samles, it is intuitive that we can still get small errors in structure learning with more randomization; i.e., closer to.5. VIII. DISCUSSION AND CONCLUSIONS We have roosed a ost randomization technique to learn the structure and arameters of a Bayesian network from distributed heterogeneous data. Our method estimates the sufficient statistics from the randomized data, which are subsequently used to learn the BN. For structure learning, we used a modified score function to deal with the familiar extra-link roblem. Exerimental results with different levels of randomization and different samle sizes show that our method is caable of accurately estimating the BN. We quantified the rivacy of our randomization scheme using the concet of γ-amlification and K similar to the concet of K-anonimity. We showed that we obtain a fairly good level of rivacy. We believe the ost randomization can be easily extended to many other Privacy-reserving data mining alications whose comutation also only deend on a set of sufficient statistics such as the decision tree learning. The randomization in our exeriment is imlemented to the individual variables. However, the randomization can also be imlemented to combined variables, which might include all the variables in one arty. Combining all the variables in one arty as one combined variable might revent rivacy breach between the variables in the same arty. IX. ACKNOWLEDGEMENTS This work was suorted by the United States National Science Foundation Grant IIS We would like to thank Dr. Karguta for many fruitful discussions and ideas. REFERENCES [] R. Agrawal and R. Srikant, Privacy-reserving data mining, In Proceedings of SIGMOD Conference on Management of Data, ages , May. [] R. Chen, K. Sivakumar, and H. Karguta, Collective Mining of Bayesian Networks from Distributed Heterogeneous Data (acceted) Knowledge and Information Systems Journal. [3] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Zhu, Tools for Privacy Preserving Distributed Data mining ACM SIGKDD Exlorations, 4():8-34, 3. [4] W. Du and Z.Zhan, Using Randomized Resonse Techniques for Privacy-Preserving Data Mining, In Proceedings of the 9th ACM SIGKDD, Washington, DC, USA. August 3. Page [5] A. Evfimievski, J. Gehrke, and R. Srikant, Limiting rivacy breaches in rivacy reserving data mining In roceedings of the ACM SIG- MOD/POD Conference, ages -, San Diego, CA, June 3. [6] J. M. Gouweleeuw, P. Kooiman, L.C.R.J. Willenborg, and P.-P. de Wolf. Post Randomisation for Statistical Disclosure Control: Theory and Imlementation, Journal of official Statistics, Vol ages [7] D. V. Hinkley, On the ratio of two correlated normal random variables, Biometrika (969), 56, 3, ages [8] H. Karguta, S. Datta, Q. Wang, and K. Sivakumar, On the rivacy Preserving Proerties of Random Data Perturbation Techniques, In Proceedings of the IEEE International Conference on Data Mining, Pages 99-6, Melbourne, FL. November 3. [9] Y. Lindell and B. Pinkas, Privacy reserving data mining, In Advances in Crytology-CRYPTO, ages 36-54,. [] K. Liu, H. Karguta, and J. Ryan, Multilicative Noise, Random Projection, and Privacy Preserving Data Mining from Distributed Multi-Party Data, (In Communication), 3. [] J. Ma and K. Sivakumar, Privacy-Preserving Bayesian Network Learning Using Post Randomization, (in rearation), 6. [] D. Meng, K. Sivakumar and H. Karguta, Privacy-Sensitive Bayesian Network Parameter Learning, In the Fourth IEEE International Conference on Data Mining. Brighton, UK. November 4. [3] L.Sweeney, k-anonymity: a model for rotecting rivacy, International Journal on uncertainty, Fuzziness and Knowledge-based Systems, (5):557-57,. [4] S. Rizi and J. R. Haritsa, Maintaining data rivacy in association rule mining, In the roceedings of the 8th VLDB Conference, Hongkong, China,. [5] R. Wright and Z. Yang, Privacy Preserving Bayesian Network Structure Comutation on Distributed Heterogeneous Data, In Proceedings of the 4 ACM SIGKDD international conference on Knowledge discovery and data mining.

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

Elliptic Curves and Cryptography

Elliptic Curves and Cryptography Ellitic Curves and Crytograhy Background in Ellitic Curves We'll now turn to the fascinating theory of ellitic curves. For simlicity, we'll restrict our discussion to ellitic curves over Z, where is a

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Estimation of Separable Representations in Psychophysical Experiments

Estimation of Separable Representations in Psychophysical Experiments Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

The Binomial Approach for Probability of Detection

The Binomial Approach for Probability of Detection Vol. No. (Mar 5) - The e-journal of Nondestructive Testing - ISSN 45-494 www.ndt.net/?id=7498 The Binomial Aroach for of Detection Carlos Correia Gruo Endalloy C.A. - Caracas - Venezuela www.endalloy.net

More information

Learning Sequence Motif Models Using Gibbs Sampling

Learning Sequence Motif Models Using Gibbs Sampling Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under

More information

SIMULATED ANNEALING AND JOINT MANUFACTURING BATCH-SIZING. Ruhul SARKER. Xin YAO

SIMULATED ANNEALING AND JOINT MANUFACTURING BATCH-SIZING. Ruhul SARKER. Xin YAO Yugoslav Journal of Oerations Research 13 (003), Number, 45-59 SIMULATED ANNEALING AND JOINT MANUFACTURING BATCH-SIZING Ruhul SARKER School of Comuter Science, The University of New South Wales, ADFA,

More information

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Journal of Sound and Vibration (998) 22(5), 78 85 VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES Acoustics and Dynamics Laboratory, Deartment of Mechanical Engineering, The

More information

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment Neutrosohic Sets and Systems Vol 14 016 93 University of New Mexico Otimal Design of Truss Structures Using a Neutrosohic Number Otimization Model under an Indeterminate Environment Wenzhong Jiang & Jun

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Participation Factors. However, it does not give the influence of each state on the mode.

Participation Factors. However, it does not give the influence of each state on the mode. Particiation Factors he mode shae, as indicated by the right eigenvector, gives the relative hase of each state in a articular mode. However, it does not give the influence of each state on the mode. We

More information

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule The Grah Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule STEFAN D. BRUDA Deartment of Comuter Science Bisho s University Lennoxville, Quebec J1M 1Z7 CANADA bruda@cs.ubishos.ca

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

Plotting the Wilson distribution

Plotting the Wilson distribution , Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion

More information

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Otimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Diana Negoescu Peter Frazier Warren Powell Abstract In this aer, we consider a version of the newsvendor

More information

Adaptive estimation with change detection for streaming data

Adaptive estimation with change detection for streaming data Adative estimation with change detection for streaming data A thesis resented for the degree of Doctor of Philosohy of the University of London and the Diloma of Imerial College by Dean Adam Bodenham Deartment

More information

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

Bayesian Networks Practice

Bayesian Networks Practice Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation

More information

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering

More information

Privacy via Pseudorandom Sketches

Privacy via Pseudorandom Sketches Privacy via Pseudorandom Sketches Nina Mishra Mark Sandler December 18, 2005 Abstract Imagine a collection of individuals who each ossess rivate data that they do not wish to share with a third arty. This

More information

q-ary Symmetric Channel for Large q

q-ary Symmetric Channel for Large q List-Message Passing Achieves Caacity on the q-ary Symmetric Channel for Large q Fan Zhang and Henry D Pfister Deartment of Electrical and Comuter Engineering, Texas A&M University {fanzhang,hfister}@tamuedu

More information

TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES

TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES Khayyam J. Math. DOI:10.22034/kjm.2019.84207 TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES ISMAEL GARCÍA-BAYONA Communicated by A.M. Peralta Abstract. In this aer, we define two new Schur and

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION Journal of Statistics: Advances in heory and Alications Volume 8, Number, 07, Pages -44 Available at htt://scientificadvances.co.in DOI: htt://dx.doi.org/0.864/jsata_700868 MULIVARIAE SAISICAL PROCESS

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

The Noise Power Ratio - Theory and ADC Testing

The Noise Power Ratio - Theory and ADC Testing The Noise Power Ratio - Theory and ADC Testing FH Irons, KJ Riley, and DM Hummels Abstract This aer develos theory behind the noise ower ratio (NPR) testing of ADCs. A mid-riser formulation is used for

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Bayesian System for Differential Cryptanalysis of DES

Bayesian System for Differential Cryptanalysis of DES Available online at www.sciencedirect.com ScienceDirect IERI Procedia 7 (014 ) 15 0 013 International Conference on Alied Comuting, Comuter Science, and Comuter Engineering Bayesian System for Differential

More information

Diverse Routing in Networks with Probabilistic Failures

Diverse Routing in Networks with Probabilistic Failures Diverse Routing in Networks with Probabilistic Failures Hyang-Won Lee, Member, IEEE, Eytan Modiano, Senior Member, IEEE, Kayi Lee, Member, IEEE Abstract We develo diverse routing schemes for dealing with

More information

PROFIT MAXIMIZATION. π = p y Σ n i=1 w i x i (2)

PROFIT MAXIMIZATION. π = p y Σ n i=1 w i x i (2) PROFIT MAXIMIZATION DEFINITION OF A NEOCLASSICAL FIRM A neoclassical firm is an organization that controls the transformation of inuts (resources it owns or urchases into oututs or roducts (valued roducts

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Introduction to Probability and Statistics

Introduction to Probability and Statistics Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1) Introduction to MVC Definition---Proerness and strictly roerness A system G(s) is roer if all its elements { gij ( s)} are roer, and strictly roer if all its elements are strictly roer. Definition---Causal

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection

More information

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem An Ant Colony Otimization Aroach to the Probabilistic Traveling Salesman Problem Leonora Bianchi 1, Luca Maria Gambardella 1, and Marco Dorigo 2 1 IDSIA, Strada Cantonale Galleria 2, CH-6928 Manno, Switzerland

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

STK4900/ Lecture 7. Program

STK4900/ Lecture 7. Program STK4900/9900 - Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Positive decomposition of transfer functions with multiple poles

Positive decomposition of transfer functions with multiple poles Positive decomosition of transfer functions with multile oles Béla Nagy 1, Máté Matolcsi 2, and Márta Szilvási 1 Deartment of Analysis, Technical University of Budaest (BME), H-1111, Budaest, Egry J. u.

More information

An Analysis of TCP over Random Access Satellite Links

An Analysis of TCP over Random Access Satellite Links An Analysis of over Random Access Satellite Links Chunmei Liu and Eytan Modiano Massachusetts Institute of Technology Cambridge, MA 0239 Email: mayliu, modiano@mit.edu Abstract This aer analyzes the erformance

More information

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables High-dimensional Ordinary Least-squares Projection for Screening Variables Xiangyu Wang and Chenlei Leng arxiv:1506.01782v1 [stat.me] 5 Jun 2015 Abstract Variable selection is a challenging issue in statistical

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

Collaborative Place Models Supplement 1

Collaborative Place Models Supplement 1 Collaborative Place Models Sulement Ber Kaicioglu Foursquare Labs ber.aicioglu@gmail.com Robert E. Schaire Princeton University schaire@cs.rinceton.edu David S. Rosenberg P Mobile Labs david.davidr@gmail.com

More information

1. Introduction. 2. Background of elliptic curve group. Identity-based Digital Signature Scheme Without Bilinear Pairings

1. Introduction. 2. Background of elliptic curve group. Identity-based Digital Signature Scheme Without Bilinear Pairings Identity-based Digital Signature Scheme Without Bilinear Pairings He Debiao, Chen Jianhua, Hu Jin School of Mathematics Statistics, Wuhan niversity, Wuhan, Hubei, China, 43007 Abstract: Many identity-based

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

arxiv: v2 [math.ac] 5 Jan 2018

arxiv: v2 [math.ac] 5 Jan 2018 Random Monomial Ideals Jesús A. De Loera, Sonja Petrović, Lily Silverstein, Desina Stasi, Dane Wilburne arxiv:70.070v [math.ac] Jan 8 Abstract: Insired by the study of random grahs and simlicial comlexes,

More information

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS CASEY BRUCK 1. Abstract The goal of this aer is to rovide a concise way for undergraduate mathematics students to learn about how rime numbers behave

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

arxiv: v3 [physics.data-an] 23 May 2011

arxiv: v3 [physics.data-an] 23 May 2011 Date: October, 8 arxiv:.7v [hysics.data-an] May -values for Model Evaluation F. Beaujean, A. Caldwell, D. Kollár, K. Kröninger Max-Planck-Institut für Physik, München, Germany CERN, Geneva, Switzerland

More information

Slash Distributions and Applications

Slash Distributions and Applications CHAPTER 2 Slash Distributions and Alications 2.1 Introduction The concet of slash distributions was introduced by Kafadar (1988) as a heavy tailed alternative to the normal distribution. Further literature

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve In Advances in Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Limann, eds. Morgan Kaumann Publishers, San Mateo CA, 1995,. 950{957. A Simle Weight Decay Can Imrove Generalization

More information

Generalized Coiflets: A New Family of Orthonormal Wavelets

Generalized Coiflets: A New Family of Orthonormal Wavelets Generalized Coiflets A New Family of Orthonormal Wavelets Dong Wei, Alan C Bovik, and Brian L Evans Laboratory for Image and Video Engineering Deartment of Electrical and Comuter Engineering The University

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

Lecture: Condorcet s Theorem

Lecture: Condorcet s Theorem Social Networs and Social Choice Lecture Date: August 3, 00 Lecture: Condorcet s Theorem Lecturer: Elchanan Mossel Scribes: J. Neeman, N. Truong, and S. Troxler Condorcet s theorem, the most basic jury

More information

Bilinear Entropy Expansion from the Decisional Linear Assumption

Bilinear Entropy Expansion from the Decisional Linear Assumption Bilinear Entroy Exansion from the Decisional Linear Assumtion Lucas Kowalczyk Columbia University luke@cs.columbia.edu Allison Bisho Lewko Columbia University alewko@cs.columbia.edu Abstract We develo

More information

Published: 14 October 2013

Published: 14 October 2013 Electronic Journal of Alied Statistical Analysis EJASA, Electron. J. A. Stat. Anal. htt://siba-ese.unisalento.it/index.h/ejasa/index e-issn: 27-5948 DOI: 1.1285/i275948v6n213 Estimation of Parameters of

More information

SUPER-GEOMETRIC CONVERGENCE OF A SPECTRAL ELEMENT METHOD FOR EIGENVALUE PROBLEMS WITH JUMP COEFFICIENTS *

SUPER-GEOMETRIC CONVERGENCE OF A SPECTRAL ELEMENT METHOD FOR EIGENVALUE PROBLEMS WITH JUMP COEFFICIENTS * Journal of Comutational Mathematics Vol.8, No.,, 48 48. htt://www.global-sci.org/jcm doi:.48/jcm.9.-m6 SUPER-GEOMETRIC CONVERGENCE OF A SPECTRAL ELEMENT METHOD FOR EIGENVALUE PROBLEMS WITH JUMP COEFFICIENTS

More information

GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H.

GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H. Iranian Journal of Fuzzy Systems Vol. 5, No. 2, (2008). 21-33 GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H. SADREDDINI

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Introduction to Probability for Graphical Models

Introduction to Probability for Graphical Models Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,

More information

Coding Along Hermite Polynomials for Gaussian Noise Channels

Coding Along Hermite Polynomials for Gaussian Noise Channels Coding Along Hermite olynomials for Gaussian Noise Channels Emmanuel A. Abbe IG, EFL Lausanne, 1015 CH Email: emmanuel.abbe@efl.ch Lizhong Zheng LIDS, MIT Cambridge, MA 0139 Email: lizhong@mit.edu Abstract

More information

LINEAR SYSTEMS WITH POLYNOMIAL UNCERTAINTY STRUCTURE: STABILITY MARGINS AND CONTROL

LINEAR SYSTEMS WITH POLYNOMIAL UNCERTAINTY STRUCTURE: STABILITY MARGINS AND CONTROL LINEAR SYSTEMS WITH POLYNOMIAL UNCERTAINTY STRUCTURE: STABILITY MARGINS AND CONTROL Mohammad Bozorg Deatment of Mechanical Engineering University of Yazd P. O. Box 89195-741 Yazd Iran Fax: +98-351-750110

More information

On parameter estimation in deformable models

On parameter estimation in deformable models Downloaded from orbitdtudk on: Dec 7, 07 On arameter estimation in deformable models Fisker, Rune; Carstensen, Jens Michael Published in: Proceedings of the 4th International Conference on Pattern Recognition

More information

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS #A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,

More information

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition TNN-2007-P-0332.R1 1 Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers

More information