Fundamental Limits of Database Alignment
|
|
- Reynard Shields
- 5 years ago
- Views:
Transcription
1 Fundaental Liits of Database Alignent Daniel Cullina Dept of Electrical Engineering Princeton University Prateek Mittal Dept of Electrical Engineering Princeton University Negar Kiyavash Dept of Electrical and Coputer Engineering Dept of Industrial and Enterprise Systes Engineering Coordinated Science Lab University of Illinois at Urbana-Chapaign Abstract We consider the proble of aligning a pair of databases with correlated entries We introduce a new easure of correlation in a joint distribution that we call cycle utual inforation This easure has operational significance: it deterines whether exact recovery of the correspondence between database entries is possible for any algorith Additionally, there is an efficient algorith for database alignent that achieves this inforation theoretic threshold F a 0,,,, 0,, 0 0,, 0, 0,,, 0 U a u u u 3 u n M U b v v v 3 v n F b, 0,, 0 0, 0,, 0 0,,, 0, 0, 0, 0 I THE DATABASE DEANONYMIZATION PROBLEM Suppose that we have two databases Each ite in the databases contain inforation about a single individual Soe individuals appear in both databases When a entry in the first database and an entry in the second database concern the sae individual, their contents are correlated The entries ay be two noisy observations of the sae signal, they ay be two copletely different types of data that have soe correlation through population statistics, or they ay even be correlated though the sapling process used to deterine which individuals appear in the database We consider the following question: If the databases are published with user identities reoved fro each entry, is it possible to learn the association between database entries that correspond to the sae individual by exploiting the correlation between the? Clearly, when there is enough correlation between entries about the sae individual and the databases are sall enough, it is possible to learn the true alignent between the database entries Our goal is to find the precise conditions under which it is possible to learn the coplete correspondence between entries with high probability In particular, we would like to deterine the easure of correlation that characterizes feasibility of perfect deanonyization in this setting This fraework for database alignent is related to several practical deanonyization attacks Narayanan and Shatikov linked an anonyized dataset of fil ratings to a publicly available dataset using correlations between the ratings [] Differential privacy has been widely used to quantifying privacy issues related to databases [] More recently, generative adversarial privacy has been proposed [3] In both cases, if users are present in ultiple databases, knowledge of alignent is required to fully apply these fraeworks Takbiri, Houansadr, Goeckel, and Pishro-Nik have recently investigated a closely related user privacy proble [4] Fig Two databases, F a and F b, with alphabets X a = X b = {0, } 4 and a atching M between their user identifier sets A Notation For finite sets X and Y, let R X Y be the set of real-valued atrices with rows indexed by X and coluns indexed by Y For x R X Y, let x k R X Y be the entry-wise power of x, ie the atrix such that x k i,j = x i,j k Let x k R X k Y k be the tensor power of x, ie the atrix such that for a X k and b Y k, x k a,b = k i=0 x a i,b i Let PX be the set of probability distributions on X B Foral description We have the following sets related to the user identifiers: U a U b M U a U b Set of user identifiers in the first database Set of user identifiers in the second database Bijective atching between the two types of user identifierss A bijection between U a and U b is a subset of U a U b in which each eleent of U a and U b appears exactly once The atching M contains the pairs of ids that correspond to the sae user The fact that M is a bijection iplies that M = U a = U b Throughout, we let n = M We have the following sets, functions, and distributions associated with the databases: X a X b F a : U a X a F b : U b X b F = F a, F b p PX a X b p a PX a p b PX b Figure illustrates a pair of databases Alphabet of entries in first database Alphabet of entries in second database First database Second database Joint distribution between related entries Marginal distribution on first alphabet Marginal distribution on second alphabet /8/$ IEEE 65
2 C Generative odel For each user u U a, there is a database entry F a u X a For a pair u, v M, the entries F a u and F b v are correlated via the joint distribution p: Pr[F a u = i, F b v = j M] = pi, j For distinct u, v U a, F a u and F a v are independent The sae is true for distinct u, v U b Thus we define rf a, f b ; = pf a u, f b v u,v so the joint distribution of the databases is Pr[F a = f a, F b = f b M = ] = rf a, f b ; D Relationship to graph alignent The ethods used in this paper are related to those used to analyze inforation theoretic thresholds for exact graph alignent [5] [7] An undirected graph G can be represented by its edge indicator function: V G {0, }, so we have a very siple type of inforation about each user pair The analogue to the generative odel is the correlated Erdős- Rényi distribution on graph pairs, where corresponding edge indicator rvs are sapled iid fro soe joint distribution on {0, } Once the arginal distributions are fixed, the one reaining degree of freedo specifies the level of correlation In the database proble, we instead have larger blocks of inforation about individual users This allows for ore coplicated fors of correlations In this paper, we identify the relevant one-diensional suary of that correlation A further connection is that graph alignent falls into the database alignent fraework when seed vertices are used [8], [9]: the list of adjacent seeds is essentially a database entry II RESULTS Both our achievability and converse bounds use the following easure of correlation in a joint distribution We propose to call this quantity cycle utual inforation Definition For p PX a X b, let z R Xa X b be the atrix such that z i,j = pi, j for i X a and j X b For an integer l, define the order-l cycle utual inforation I l p = l log trzzt l Then z has a singular value decoposition z = UΣV T where Σ = diagσ Observe that trσ = truσv T V ΣU T = trzz T = i,j z i,j =, so σ, the vector of squared singular values, constitutes a probability distribution Thus we have another expression for cycle utual inforation of order l: Il p = H lσ, where H l is the Rényi entropy of order l This expression allows us to extend the definition of Il p to all nonnegative real l Our achievability theore allows for arbitrary structure in the joint distribution of database entries Theore Let M U a U b be a uniforly rando bijection Let the alphabets X a and X a and the joint distribution p PX a X b depend on n If I p log n + ω, there is an estiator for M given F that is correct with probability o When the database entries are vectors of independent identically distributed coponents, we have a converse bound with a leading ter that atches the achievability Theore Let M U a U b be a uniforly rando bijection Fix alphabets Y a and Y b and a joint distribution q PY a Y b Let X a = Y l a, X b = Y l b, and p = q l, where l can depend on n If I p Ω log n, any estiator for M given F is correct with probability o III MAP ESTIMATION The optial estiator for M given F is the axiu a posteriori estiator: ˆf a, f b = argax Pr[M = F = f a, f b ] = argax a = argax Pr[F = f a, f b M = ] Pr[M = ] Pr[F = f a, f b ] Pr[F = f a, f b M = ] In a we use that fact that M is uniforly distributed Define the event E, = {f a, f b : rf a, f b ; rf a, f b ; } When is the true atching, this is the error event in which is incorrectly preferred to A Algorith for coputing the MAP estiator Define the atrix Qf a, f b R Xa X b, Qf a, f b u,v = log pf a u, f b v The MAP estiator is the ax weight atching in Qf a, f b : ˆf a, f b = argax Qf a, f b u,v u,v Thus ˆ can be coputed in On 3 tie [0] IV GENERATING FUNCTIONS Let x and y be two atrices of foral variables indexed by X a X b, and let x a and y a be vectors of foral variables indexed by X a, and let x b and y b be vectors of foral variables indexed by X b For a atching U a U b and a pair of databases f a : U a X a and f b : U b X b, define the generating function of the joint type t; f a, f b ; x = x fau,f b v u,v 65
3 Observe that t; f a, f b ; p = rf a, f b ; For a pair of atchings, define the generating function x, y = f a:u a X a f b :U b X b t ; f a, f b ; xt ; f a, f b ; y By understanding the behavior of this generating function, we can obtains upper bounds on the probability of an estiator aking an error Throughout this section, let z R Xa X b be a atrix and let z a R Xa and z b R X b be vectors such that z i,j = pi, j, z a i = p a i, and z b j = p b j Lea For any two bijections, U a U b, M = ] z, z Proof: For any θ 0, we have M = ] [ ] rfa, f b ; = E rf a, f b ; M = [ rfa ] θ, f b ; rf a, f b ; M = Furtherore, [ rfa ] θ, f b ; E rf a, f b ; M = = θ rfa, f b ; rf a, f b ; rf a, f b ; f a,f b rf a, f b ; θ rf a, f b ; θ t ; f a, f b ; p θ t ; f a, f b ; p θ t ; f a, f b ; p θ t ; f a, f b ; p θ = p θ, p θ where the atrix and vector exponents with are applied entrywise Selecting θ = gives the clai Define the generating function b l x, y = trxy T l Regard as a function X a X b and regard T as a function X b X a Then their coposition T is a perutation of X a Lea Let, U a U b be bijections Let t l be the nuber of cycles of length l in the perutation T Then t =, l lt l = X a, and x, y = l Nb l x, y t l Lea 3 For z R Xa X b with nonnegative entries and for l, b l z, z b z, z l/ Proof: We have b l z, z = k σl k where σ k are the singular values of z By a standard inequality on p-nors, k σl k l/ k k σ4 Lea 4 Let, U a U b be bijections and let d = n Then z, z b z, z d/ Proof: Fro, b z, z = Then the clai follows fro Leas and 3 V ACHIEVABILITY Proof of Theore : We will use a union bound over all possible errors [ ] Pr M = E, M = ] n = M = ] d= S,d where S,d is the set of atchings that differ fro is exactly d places We have n S,d d! n d d Fro Lea and Lea 4, we have M = ] l b l z, z t l b z, z l/ t l l = b z, z d/ Thus the overall probability of error is at ost n n d b z, z d/ d= Fro the ain condition of the theore, we have I p log n + ω b z, z exp log n ω = on, so for sufficiently large n, nb z, z / < and we have n n d b z, z d/ n b z, z nb o z, z/ d= which proves the clai 653
4 VI CONVERSE Lea 5 For any two bijections, U a U b, x, y = B, x, y Proof: For each l, b l x, y = b l y, x The perutations T and T are inverses and thus have the sae cycle decoposition The clai follows fro Lea Lea 6 Fix alphabets Y a and Y b and a joint distribution q PY a Y b Let l depend on n such that l = ω Let X a = Y l a, X b = Y l b, p = q l For any two bijections, U a U b such that = n, M = ] b z, z +o Proof: The function cθ = p θ, p θ is a conditional oent generating function: [ ] rfa, f b ; cθ = E exp θ log M = rf a, f b ; Fro Lea, we have because p θ, p θ = b p θ, p θ n b p θ, p θ = b p θ, p θ b p θ, p θ = trp θ p θ T = i,j By Lea 5 p θ i,jp θ i,j = cθ = b p θ, p θ = b p θ, p θ = c θ Moent generating functions are log-convex, so cθ is iniized at θ = Because p = q l, cθ is the product of l identical ters Let u = q θ and v = q θ b p θ, p θ = b u l, v l = tru l v l T u l v l T = truv T uv T l = b u, v l = b q θ, q θ l By Craér s Theore on the asyptotic tightness of the Chernoff bound [] [ ] rfa, f b ; Pr log 0 rf a, f b ; M = b q, q l o l = b p, p o Because l = ω, o l and o are equivalent Lea 7 For any three bijections,, 3 U a U b, E 3, M = ] b z, z d/ where d = n 3 Proof: For θ 0 and θ 0, [ rfa, f b ; Pr rf a, f b ; rf ] a, f b ; 3 rf a, f b ; M = = E[E 3, E, M = ] [ rfa θ θ ], f b ; rfa, f b ; 3 rf a, f b ; rf a, f b ; M = = θ θ rfa, f b ; rfa, f b ; 3 rf a, f b ; rf a, f b ; rf a, f b ; f a,f b rf a, f b ; θ rf a, f b ; 3 θ rf a, f b ; θ θ Choosing θ = θ =, we obtain E[E 3, E, M = ] f a,f b rf a, f b ; rfa, f b ; 3 = B, 3 z, z a b z, z d/ where a follows fro Lea 4 Proof of Theore : Let be the atching used to generate the databases and let S = S, be the set of atchings of size n that differ fro in exactly two places That is, for all S, = n Observe that S = n, because each eleent of S can be specified by the two users in U a that it atches differently than does Let X be the nuber of error events that occur: X = S E, Let ɛ = Pr[E, M = ], ie the probability that a specific transposition error occurs We need a lower bound on the probability that X > 0 Fro Chebyshev s inequality, we have [ Pr X E[X] E[X] ] [ ] X E[X] E[X] = E[X ] E[X] and we need to find conditions that ake this o We have X = E, E 3,, 3 S = E, + E, E 3, S {, 3} S For a set {, 3 } S, either 3 = n 3 or 3 = n 4 There are 3 n 3 pairs of the forer type and 3 n 4 pairs of the latter type In the latter case, the indicator variables E, and E 3, are independent In the forer case, let ɛ = E 3, M = ] Now we copute n n n n E[X] = ɛ = ɛ
5 and n n n E[X ] = ɛ + 6 ɛ + 6 ɛ 3 4 E[X ] E[X] n E[X] = ɛ ɛ + 6 n 3 ɛ ɛ n ɛ O n + ɛ ɛ nɛ Fro Lea 7 we have ɛ b z, z 3 and fro Lea 6 we have ɛ b z, z +o, so Pr[X = 0] O n b z, + z+o nb z, z +o If b z, z n +Ω, then n b z, z +o n ++o +Ω n Ω ω and Pr[X = 0] o VII PROPERTIES OF CYCLE MUTUAL INFORMATION Consider a joint distribution p PX a X b and recall the definitions of z and σ fro Section II The properties of σ reflect the correlation in the distribution p The following three conditions are equivalent: σ is supported on one point, the rank of the atrix z is one, and the p is the product of distributions on X a and X b Il p shares several properties with utual inforation It is syetric: Il p = I l pt It tensorizes: Il p k = kil p It reduces to entropy in the case of identical rando variables: if X a = X b and p = diagp, then I l diagp = H l p because σ is a rearrangeent of p In general, we have I l p inh l p a, H l p b Soething stronger is true: the distribution σ ajorizes p a and p b The diagonal of zz T is the arginal distributions p a : Furtherore, zz T i,i = j z i,j = j p i,j zz T i,i = UΣV T V ΣU T i,i = Ui,kσ k k Because U is an orthogonal atrix, the Hadaard product U U is doubly stochastic Thus σ ajorizes p a The diagonal of z T z contains p b, which is also ajorized by σ A Data processing inequality Lea 8 Let p PX, let q X PY, and let r Y PZ, so diagp PX X, diagpq PX Y, and diagpqr PX Z Then for integer l, I l diagpq I l diagpqr Proof: Define the atrices z i,k = diagpq i,k and w i,l = diagpqr i,l Then zz T i,i = ww T i,i = p i We have zz T i,j = p i p j qi,k q j,k k Y The su is the Bhattacharyya coefficient of the distributions q i, and q j,, which can be written in ters of the Bhattacharyya divergence as follows: exp D q i, q j, Siilarly ww T i,j = p i p j qr i,l qr j,l l Z By the data processing inequality for Rényi divergences [], we have Thus for all integer l D q i, q j, D qr i, qr j, zz T i,j ww T i,j trzz T l trww T l I l diagpq I l diagpqr ACKNOWLEDGEMENT This work was supported in part by NSF grants CCF 6-96, CCF 6-786, and CNS REFERENCES [] A Narayanan and V Shatikov, Robust de-anonyization of large sparse datasets, in IEEE Syposiu on Security and Privacy, 008, pp 5 [] C Dwork, Differential privacy: A survey of results, in International Conference on Theory and Applications of Models of Coputation Springer, 008, pp 9 [3] C Huang, P Kairouz, X Chen, L Sankar, and R Rajagopal, Contextaware generative adversarial privacy, Entropy, vol 9, no, p 656, 07 [4] N Takbiri, A Houansadr, D L Goeckel, and H Pishro-Nik, Matching Anonyized and Obfuscated Tie Series to Users Profiles, arxiv: [cs, ath], Sep 07 [5] D Cullina and N Kiyavash, Exact alignent recovery for correlated erdos-rényi graphs, 07 [6], Iproved achievability and converse bounds for Erdos-Rnyi graph atching, in Proceedings of the 06 ACM SIGMETRICS International Conference on Measureent and Modeling of Coputer Science ACM, 06, pp 63 7 [7] P Pedarsani and M Grossglauser, On the privacy of anonyized networks, in Proceedings of the 7th ACM SIGKDD international conference on Knowledge discovery and data ining ACM, 0, pp [8] S Ji, W Li, N Z Gong, P Mittal, and R A Beyah, On your social network de-anonyizablity: Quantification and large scale evaluation with seed knowledge in The Network and Distributed Syste Security Syposiu NDSS, 05 [9] O E Dai, D Cullina, N Kiyavash, and M Grossglauser, On the Perforance of a Canonical Labeling for Matching Correlated Erd\H{o}s-R\ enyi Graphs, arxiv: [cs, stat], Apr 08, arxiv: [0] J Edonds and R M Karp, Theoretical Iproveents in Algorithic Efficiency for Network Flow Probles, J ACM, vol 9, no, pp 48 64, Apr 97 [Online] Available: [] B Hajek, Rando processes for engineers Cabridge university press, 05 [] T Van Erven and P Harreos, Rényi divergence and Kullback-Leibler divergence, IEEE Transactions on Inforation Theory, vol 60, no 7, pp ,
Multi-Dimensional Hegselmann-Krause Dynamics
Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationLecture 9 November 23, 2015
CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationSupplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion
Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish
More informationPolygonal Designs: Existence and Construction
Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G
More informationSupport recovery in compressed sensing: An estimation theoretic approach
Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationLecture 20 November 7, 2013
CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,
More informationThe Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters
journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn
More informationRandomized Recovery for Boolean Compressed Sensing
Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch
More informationA note on the multiplication of sparse matrices
Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani
More informationCSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13
CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture
More information3.3 Variational Characterization of Singular Values
3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and
More informationLecture 21 Nov 18, 2015
CS 388R: Randoized Algoriths Fall 05 Prof. Eric Price Lecture Nov 8, 05 Scribe: Chad Voegele, Arun Sai Overview In the last class, we defined the ters cut sparsifier and spectral sparsifier and introduced
More informationLower Bounds for Quantized Matrix Completion
Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &
More informationTesting Properties of Collections of Distributions
Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the
More informationThe Hilbert Schmidt version of the commutator theorem for zero trace matrices
The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that
More informationConstrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008
LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We
More informationOptimal Jamming Over Additive Noise: Vector Source-Channel Case
Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper
More informationFixed-to-Variable Length Distribution Matching
Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de
More informationA note on the realignment criterion
A note on the realignent criterion Chi-Kwong Li 1, Yiu-Tung Poon and Nung-Sing Sze 3 1 Departent of Matheatics, College of Willia & Mary, Williasburg, VA 3185, USA Departent of Matheatics, Iowa State University,
More informationDistributed Subgradient Methods for Multi-agent Optimization
1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions
More informationarxiv: v1 [cs.ds] 17 Mar 2016
Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass
More informationAsynchronous Gossip Algorithms for Stochastic Optimization
Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationThe proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).
A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationOn Conditions for Linearity of Optimal Estimation
On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationEstimating Entropy and Entropy Norm on Data Streams
Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationExact tensor completion with sum-of-squares
Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David
More informationThe degree of a typical vertex in generalized random intersection graph models
Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent
More informationVulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links
Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Tie-Varying Jaing Links Jun Kurihara KDDI R&D Laboratories, Inc 2 5 Ohara, Fujiino, Saitaa, 356 8502 Japan Eail: kurihara@kddilabsjp
More informationPAC-Bayes Analysis Of Maximum Entropy Learning
PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E
More informationFast and Memory Optimal Low-Rank Matrix Approximation
Fast and Meory Optial Low-Rank Matrix Approxiation Yun Se-Young, Marc Lelarge, Alexandre Proutière To cite this version: Yun Se-Young, Marc Lelarge, Alexandre Proutière. Fast and Meory Optial Low-Rank
More informationThe Transactional Nature of Quantum Information
The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.
More informationBipartite subgraphs and the smallest eigenvalue
Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.
More informationDistance Optimal Target Assignment in Robotic Networks under Communication and Sensing Constraints
Distance Optial Target Assignent in Robotic Networks under Counication and Sensing Constraints Jingjin Yu CSAIL @ MIT/MechE @ BU Soon-Jo Chung Petros G. Voulgaris AE @ University of Illinois Supported
More informationCompressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements
1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationLeast Squares Fitting of Data
Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationarxiv: v1 [cs.ds] 3 Feb 2014
arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/
More informationNew Bounds for Learning Intervals with Implications for Semi-Supervised Learning
JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University
More informationarxiv: v2 [math.co] 8 Mar 2018
Restricted lonesu atrices arxiv:1711.10178v2 [ath.co] 8 Mar 2018 Beáta Bényi Faculty of Water Sciences, National University of Public Service, Budapest beata.benyi@gail.co March 9, 2018 Keywords: enueration,
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationOn Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40
On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering
More informationSolutions of some selected problems of Homework 4
Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next
More informationM ath. Res. Lett. 15 (2008), no. 2, c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS. Van H. Vu. 1.
M ath. Res. Lett. 15 (2008), no. 2, 375 388 c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS Van H. Vu Abstract. Let F q be a finite field of order q and P be a polynoial in F q[x
More informationPage 1 Lab 1 Elementary Matrix and Linear Algebra Spring 2011
Page Lab Eleentary Matri and Linear Algebra Spring 0 Nae Due /03/0 Score /5 Probles through 4 are each worth 4 points.. Go to the Linear Algebra oolkit site ransforing a atri to reduced row echelon for
More informationComputable Shell Decomposition Bounds
Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung
More informationWeighted Superimposed Codes and Constrained Integer Compressed Sensing
Weighted Superiposed Codes and Constrained Integer Copressed Sensing Wei Dai and Olgica Milenovic Dept. of Electrical and Coputer Engineering University of Illinois, Urbana-Chapaign Abstract We introduce
More informationSupplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse
More informationTail estimates for norms of sums of log-concave random vectors
Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics
More informationCOS 424: Interacting with Data. Written Exercises
COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well
More informationLean Walsh Transform
Lean Walsh Transfor Edo Liberty 5th March 007 inforal intro We show an orthogonal atrix A of size d log 4 3 d (α = log 4 3) which is applicable in tie O(d). By applying a rando sign change atrix S to the
More informationASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical
IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul
More informationarxiv: v1 [cs.lg] 8 Jan 2019
Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v
More informationA Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness
A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationHamming Compressed Sensing
Haing Copressed Sensing Tianyi Zhou, and Dacheng Tao, Meber, IEEE Abstract arxiv:.73v2 [cs.it] Oct 2 Copressed sensing CS and -bit CS cannot directly recover quantized signals and require tie consuing
More informationA Theoretical Analysis of a Warm Start Technique
A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful
More informationSupplementary Information for Design of Bending Multi-Layer Electroactive Polymer Actuators
Suppleentary Inforation for Design of Bending Multi-Layer Electroactive Polyer Actuators Bavani Balakrisnan, Alek Nacev, and Elisabeth Sela University of Maryland, College Park, Maryland 074 1 Analytical
More informationThe Fundamental Basis Theorem of Geometry from an algebraic point of view
Journal of Physics: Conference Series PAPER OPEN ACCESS The Fundaental Basis Theore of Geoetry fro an algebraic point of view To cite this article: U Bekbaev 2017 J Phys: Conf Ser 819 012013 View the article
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationReed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product.
Coding Theory Massoud Malek Reed-Muller Codes An iportant class of linear block codes rich in algebraic and geoetric structure is the class of Reed-Muller codes, which includes the Extended Haing code.
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationUniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval
Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,
More informationIEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO., FEBRUARY 015 37 Target Assignent in Robotic Networks: Distance Optiality Guarantees and Hierarchical Strategies Jingjin Yu, Meber, IEEE, Soon-Jo Chung,
More informationDetection and Estimation Theory
ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer
More informationError Exponents in Asynchronous Communication
IEEE International Syposiu on Inforation Theory Proceedings Error Exponents in Asynchronous Counication Da Wang EECS Dept., MIT Cabridge, MA, USA Eail: dawang@it.edu Venkat Chandar Lincoln Laboratory,
More informationLONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES
Journal of Marine Science and Technology, Vol 19, No 5, pp 509-513 (2011) 509 LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES Ming-Tao Chou* Key words: fuzzy tie series, fuzzy forecasting,
More informationIntroduction to Discrete Optimization
Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and
More informationIn this chapter, we consider several graph-theoretic and probabilistic models
THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions
More informationDesign of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding
IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED PAPER) 1 Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding Lai Wei, Student Meber, IEEE, David G. M. Mitchell, Meber, IEEE, Thoas
More informationInteractive Markov Models of Evolutionary Algorithms
Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More informationTHE POLYNOMIAL REPRESENTATION OF THE TYPE A n 1 RATIONAL CHEREDNIK ALGEBRA IN CHARACTERISTIC p n
THE POLYNOMIAL REPRESENTATION OF THE TYPE A n RATIONAL CHEREDNIK ALGEBRA IN CHARACTERISTIC p n SHEELA DEVADAS AND YI SUN Abstract. We study the polynoial representation of the rational Cherednik algebra
More informationConvex Programming for Scheduling Unrelated Parallel Machines
Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly
More informationCollision-based Testers are Optimal for Uniformity and Closeness
Electronic Colloquiu on Coputational Coplexity, Report No. 178 (016) Collision-based Testers are Optial for Unifority and Closeness Ilias Diakonikolas Theis Gouleakis John Peebles Eric Price USC MIT MIT
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationIntelligent Systems: Reasoning and Recognition. Artificial Neural Networks
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationA remark on a success rate model for DPA and CPA
A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationPAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification
PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification Eilie Morvant eilieorvant@lifuniv-rsfr okol Koço sokolkoco@lifuniv-rsfr Liva Ralaivola livaralaivola@lifuniv-rsfr Aix-Marseille
More informationBayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)
Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3
More informationResearch Article On the Isolated Vertices and Connectivity in Random Intersection Graphs
International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for
More informationarxiv: v2 [math.co] 3 Dec 2008
arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20
More informationComputable Shell Decomposition Bounds
Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago
More informationHybrid System Identification: An SDP Approach
49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationA Note on the Applied Use of MDL Approximations
A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention
More information