Estimating properties of distributions. Ronitt Rubinfeld MIT and Tel Aviv University
|
|
- Patricia Charles
- 5 years ago
- Views:
Transcription
1 Estimating properties of distributions Ronitt Rubinfeld MIT and Tel Aviv University
2 Distributions are everywhere
3 What properties do your distributions have?
4 Play the lottery? Is it independent? Is it uniform?
5 Is the lottery unfair? From Hitlotto.com: Lottery experts agree, past number histories can be the key to predicting future winners.
6 True Story! Polish lottery Multilotek Choose uniformly at random distinct 20 numbers out of 1 to 80. Initial machine biased e.g., probability of too small Past results:
7 New Jersey Pick 3,4 Lottery New Jersey Pick k ( =3,4) Lottery. Pick k digits in order. 10 k possible values. Assume lottery draws iid Data: Pick results from 5/22/75 to 10/15/00 χ 2 -test gives 42% confidence Pick results from 9/1/77 to 10/15/00. fewer results than possible values χ 2 -test gives no confidence
8 Testing Independence: Shopping patterns: Independent of zip code?
9 Testing closeness of two distributions: Transactions of yr olds Transactions of yr olds trend change?
10 Outbreak of diseases Similar patterns? Correlated with income level? More prevalent near large airports? Flu 2011 Flu 2012
11 Information in neural spike trails [Strong, Koberle, de Ruyter van Steveninck, Bialek 98] Neural signals Each application of stimuli gives sample of signal (spike trail) time Entropy of (discretised) signal indicates which neurons respond to stimuli
12 Compressibility of data
13 Worm detection find ``heavy hitters nodes that send to many distinct addresses
14 Testing properties of distributions: Decisions based on samples of distribution No assumptions on shape of distribution i.e., smoothness, monotonicity, Normal distribution, Focus on large domains Can sample complexity be sublinear in size of the domain? Rules out standard statistical techniques, learning distribution
15 Model: p p is arbitrary black-box distribution over [n], generates iid samples. samples p i = Prob[ p outputs i ] Test Sample complexity in terms of n? Pass/Fail?
16 Some properties Similarities of distributions: Testing uniformity Testing identity Testing closeness Entropy estimation Support size Independence and limited independence properties Monotonicity
17 Similarities of distributions Are p and q close or far? q is known to the tester q is uniform q is given via samples
18 Two Distances L 1 Distance: p q 1 = p i q i L 2 Distance: i ε D p q 2 = ( ( p i q i ) 2 ) 1/2 i ε D
19 Is p uniform? p Test samples Theorem: ([Goldreich Ron] [Paninski]) Sample complexity of distinguishing p = U from p U 1 > ε is θ(n 1/ 2 ) Nearly same complexity to test if p is any known distribution [Batu Fischer Fortnow Kumar R. White]: Testing identity Pass/Fail?
20 Is p uniform? p samples Theorem: ([Goldreich Ron] [Paninski]) Sample complexity of distinguishing p = U from p U 1 > ε is θ(n 1/ 2 ) Test Pass/Fail?
21 Upper bound for L 2 distance [Goldreich Ron] L 2 distance: p q 2 2 = pi q i 2 p-u 2 2 = Σ(p i -1/n) 2 = Σp i 2-2Σp i /n + Σ1/n 2 ======== = Σp i 2-1/n Estimate collision probability to estimate L 2 distance from uniform
22 Estimating collision probabilities [GR]? Naïve idea: Use independent samples to estimate collision probability Get O(k) samples of collision probability from k samples of p?? Better idea: Use all pairs in sample to estimate collision probabilities Get O(k 2 ) samples of collision probability from k samples of p Not all independent, use variance to bound accuracy??????
23 Testing uniformity [GR] Upper bound: Estimate collision probability Issues: Collision probability of uniform is 1/n Use O(sqrt(n)) samples Relation between L 1 and L 2 norms Comment: [P] uses different estimator Easy lower bound: Ω(n ½ ) Can get Ω (n ½ /ε 2 ) [P]
24 Back to the lottery plenty of samples!
25 Is p uniform? p samples Theorem: ([Goldreich Ron][Batu Fortnow R. Smith White] [Paninski]) Sample complexity of distinguishing p=u from p-u 1 >ε is θ(n 1/2 ) Test Pass/Fail? Nearly same complexity to test if p is any known distribution [Batu Fischer Fortnow Kumar R. White]: Testing identity
26 This image cannot currently be displayed. This image cannot currently be displayed. Testing identity via testing uniformity on subdomains: (Relabel domain so that q monotone) Partition domain into O(log n) groups, so that each group almost flat -- differ by <(1+ε) multiplicative factor q close to uniform over each group Test: Test that p close to uniform over each group Test that p assigns approximately correct total weights to each group q (known)
27 Testing closeness p Test q Theorem: ([BFRSW] [P. Valiant]) Sample complexity of distinguishing p=q from p-q 1 >ε ~ is θ(n 2/3 ) Pass/Fail?
28 Why so different? Collision statistics are all that matter Collisions on heavy elements can hide collision statistics of rest of the domain Construct pairs of distributions where heavy elements are identical, but light elements are either identical or very different
29 Lower bound p: n 2/3 1/2 n/4 n/4 1/2 Claim: No algorithm can distinguish (p,p) from (p,q) with only o(n 2/3 ) samples. p i =1/2n 2/3 p i =0 p i =2/n p i =0 q: 1/2 1/2 q i =1/2n 2/3 heavy elements q i =0 q i =0 light elements q i =2/n p-q 1 =1
30 Lower Bound Argument Assume algorithm is symmetric or we can permute the underlying domain. Only collision statistics matter! If we use only o(n 2/3 ) samples 3 + -way collisions are only in heavy elements and only a small fraction of these. Can focus only on 2-way collisions (one from each source)
31 Lower Bound (cont.) (p,p) case # 2-way collisions = H + L (p,q) case # 2-way collisions = H H: # 2-way collisions in heavy elements L: # 2-way collisions in light elements When sample size s=o(n 2/3 ), St.Dev.(H) = Θ(s n -1/3 ) >> E[L] = o(s n -1/3 ) = Var[L] Cannot distinguish H+L from H!!!
32 P. Valiant s characterization for symmetric properties: Collisions tell all! Canonical tester identifies if there is a distribution with the property that expects observed collision statistics Difficulty in analysis: Collision statistics aren t independent Low frequency elements can be ignored? Applies to symmetric properties with continuity condition
33 Upper Bound for L2 distance: Theorem: [Batu Fortnow R. Smith White] There is an algorithm which on input distributions p and q satisfies: if p-q 2 < ε/2 output Pass if p-q 2 > ε output Fail error probability < 1/3 sample, time complexity O(ε -4 ) No dependence on n?!?!
34 What about L 1 distance?
35 Naïve nearly linear time algorithm learning approach 1. Estimate probabilities p i and q i s using frequencies 2. Estimate L 1 distance directly Uses O(n log n) samples in general.
36 L 2 distance test Use O(ε -4 ) algorithm for estimating L 2 distance to within ε Naïve use of relationship between L 1 and L 2 distance leads to O(n 2 ) algorithm
37 Naïve nearly linear time algorithm learning approach (revisited) 1. Estimate probabilities p i and q i s using frequencies 2. Estimate L 1 distance directly Uses O(n log n) samples in general. If minimum nonzero p i or q i is > b, then need only O(1/b log n) samples. When b=n -2/3, need Õ(n 2/3 ) samples
38 L 2 distance test (revisited) Use O(ε -4 ) algorithm for estimating L 2 distance to within ε Naïve use of relationship between L 1 and L 2 distance leads to O(n 2 ) algorithm Can analyze tighter variance bound for L 2 test in terms of maximum probability b of p i and q i s. When b<n -2/3, need Õ(n 2/3 ) samples for L 1 distance
39 Algorithm Identify all ``heavy elements Estimate probabilities of heavy elements and fail if not close Filter out heavy elements Use L 2 test and fail if not close
40 Theorem: There exists an algorithm which on input distributions p and q uses O(ε -4 n 2/3 log n) samples and satisfies: if p-q 1 < ε/n 1/ 3 outputs Pass if p-q 1 > ε outputs Fail probability of error is < 1/3.
41 A historical note: Interest in [GR] and [BFRSW] sparked by search for property testers for expanders Eventual success! [Czumaj Sohler, Kale Seshadri, Nachmias Shapira] Used to give O(n 2/3 ) time property testers for rapidly mixing Markov chains [BFRSW]
42 Approximating the distance between two Estimating distributions? p q 1 requires Ө(n/log n) samples! [P. Valiant 08] [G. Valiant and P. Valiant STOC11, FOCS11]
43 Information theoretic quantities Entropy Support size
44 Can we approximate the entropy? In general, not to within a multiplicative factor... 0 entropy distributions are hard to distinguish (even in superlinear time) What if entropy is bigger? Can γ-multiplicatively approximate the entropy with Õ(n 1/γ2 ) samples (when entropy >2γ/ε) [Batu Dasgupta R. Kumar] requires Ω(n 1/γ2 ) [Valiant] better bounds when support size is small [Brautbar Samorodnitsky] Similar bounds for estimating support size
45 What about additive approximations? Intense interest in information theory, neurobiology, physics, machine learning. Previously, O(n) algorithms made the above people very happy. (better than trivial O(n log n)!) O(n/log n) algorithms possible! [Raskhodnikova, Ron, Shpilka, Smith] [Valiant Valiant] And Ω(n/log n) samples required [Valiant Valiant] Similar story for support size
46 Estimating Compressibility of Data [Raskhodnikova Ron Rubinfeld Smith] General question undecidable Run-length encoding Huffman coding Entropy Lempel-Ziv ``Color number = Number of elements with probability at least 1/n (support size!)
47 Testing Independence: Shopping patterns: Independent of zip code?
48 Independence of pairs p is joint distribution on pairs <a,b> from [n] x [m] (wlog n m) Marginal distributions p 1,p 2 p independent if p = p 1 x p 2, that is p (a,b) =(p 1 ) a (p 2 ) b for all a,b
49 Independence vs. product of marginals Lemma: [Sahai Vadhan] If A,B, such that p AxB 1 <ε/3 then p- p 1 x p 2 1 <ε
50 Testing Independence [Batu Fischer Fortnow Kumar R. White] p samples Independence Test Goal: If p = p 1 x p 2 then PASS If p p 1 x p 2 1 >ε then FAIL Pass/Fail?
51 1st try: use closeness test p p 1 x p 2 Simulate p 1 and p 2, and check p- p 1 x p 2 1 <ε. samples Closeness Test Pass/Fail? Behavior: If p- p 1 x p 2 1 <ε/n 1/3 then PASS If p- p 1 x p 2 1 >ε then FAIL Sample complexity: Õ((nm) 2/3 )
52 2nd try: Use identity test Algorithm: Approximate marginal distributions f 1 p 1 and f 2 p 2 Use Identity testing algorithm to test that p f 1 x f 2 Comments: use care when showing that good distributions pass Sample complexity: Õ(n+m + (nm) 1/2 ) Can combine with previous using filtering ideas identity test works well on distribution restricted to ``heavy prefixes from p 1 closeness test works well if max probability element is bounded from above
53 Theorem: [Batu Fischer Fortnow Kumar R. White] There exists an algorithm for testing independence with sample complexity O(n 2/3 m 1/3 poly(log n, ε -1 )) s.t. If p=p 1 x p 2, it outputs PASS If p-q 1 >ε for any independent q, it outputs FAIL This is tight! [Levi Ron Rubinfeld]
54 An open question: What is the complexity of testing independence of distributions over k-tuples from [n 1 ]x x[n k ]? Easy Ω( n i 1/2 ) lower bound
55 Testing the monotonicity of distributions: Does the occurrence of cancer decrease with distance from the nuclear reactor?
56 Monotone distributions p is monotone if 0.4 i<j implies p i p j pi 0.1 Many distributions are monotone or are made of small number of monotone distributions pi
57 First Monotone distributions over totally ordered domains [1..n] pi
58 Form of test? Idea: test that average weight of distribution in range [i..j] less than average weight of distribution in [i j ] for various choices of i<i,j<j Problem: uniform distribution on even numbers passes such tests ARGHHHHHHHHHH!
59 Lower bound [Batu Kumar R.] Lemma: Testing monotonicity requires Ω( n) samples Proof: p close to uniform iff p, p R = reversal of p, are both close to monotone p Rate p R Rate
60 Algorithm idea: Approximate distribution by k-flat distribution: Properties: Partition domain into k intervals Conditional distribution uniform in each Questions: Does it exist for k=o(polylog(n))? How do you find interval boundaries? Check if k-flat distribution close to monotone Solve linear program on O(polylog(n)) variables
61 Upper bound [Batu Kumar R.] Lemma: There is an algorithm for testing monotonicity over totally ordered domains which uses Õ(n 1/2 ε -2 ) samples s.t. (with probability 2/3) If p monotone, outputs PASS If ε far from monotone, outputs FAIL Can also test unimodal distributions
62 More recent work: K-flat distributions [Levi Indyk R.] K-modal distributions [Daskalakis Diakonikolas Servedio] Poisson Binomial Distributions [Daskalakis Diakonikolas Servedio] Monotonicity over general posets [Bhattacharyya Fischer R. Valiant]
63 Other properties? Higher dimensional flat distributions Mixtures of k Gaussians Junta -distributions
64 Getting past the lower bounds Special distributions e.g, uniform on a subset, monotone Other query models Queries to probabilities of elements Other distance measures
65 Flat distributions Entropy can be estimated somewhat faster when distribution is uniform on a subset of the elements [Batu Dasgupta Kumar R.][Brautbar Samorodnitsky]
66 Monotone distributions over totally ordered domains Test uniformity with O(1) samples [Batu Kumar R.] Other tasks doable with polylogarithmic samples: [Batu Dasgupta Kumar R.][BKR] Examples: Testing closeness Testing independence Estimating entropy Algorithm: Use k-flat partitions to approximate distributions Test property on approximation Do these big wins carry over to other partial orders? Yes and no.[r. Servedio] [Bhattacharyya Fischer R. Valiant] Still open: can you get sublinear sample algorithms?
67 Other models: Distribution given explicitly -- i.e. can query probability of element in one step! Entropy [BDKR] Distribution given both by samples and explicitly e.g. data in sorted order, Google n-gram data Entropy, closeness [BDKR][RS]
68 Other distance measures: KL-divergence [Guha McGregor Venkatasubramanian] EMD [Doba Nguyen Nguyen R.]
69 What about many distributions?
70 Collection of distributions: Further refinement: Known or unknown distribution on i s? Two models: D 1 D 2 D m Sampling model: samples Get (i,x) for random i, x D i Query model: Get (i,x) for query i and x D i Test Pass/Fail? Sample complexity in terms of n,m?
71 Some questions (and answers): Are they all equal? Can they be clustered into k groups of similar distributions? Do they all have the same mean? See [Levi Ron R. 2011, Levi Ron R. 2012]
72 Conclusions and Future Directions Distribution property testing problems are everywhere Several useful techniques known Other properties for which sublinear tests exist? Special classes of distributions? Other query models? Non-iid samples?
73 Thank you
Taming Big Probability Distributions
Taming Big Probability Distributions Ronitt Rubinfeld June 18, 2012 CSAIL, Massachusetts Institute of Technology, Cambridge MA 02139 and the Blavatnik School of Computer Science, Tel Aviv University. Research
More informationTesting Distributions Candidacy Talk
Testing Distributions Candidacy Talk Clément Canonne Columbia University 2015 Introduction Introduction Property testing: what can we say about an object while barely looking at it? P 3 / 20 Introduction
More informationTesting that distributions are close
Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, Patrick White May 26, 2010 Anat Ganor Introduction Goal Given two distributions over an n element set, we wish to check whether these distributions
More informationWhat can we do in sublinear time?
What can we do in sublinear time? 0368.4612 Seminar on Sublinear Time Algorithms Lecture 1 Ronitt Rubinfeld Today s goal Motivation Overview of main definitions Examples (Disclaimer this is not a complete
More information1 Distribution Property Testing
ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data Instructor: Jayadev Acharya Lecture #10-11 Scribe: JA 27, 29 September, 2016 Please send errors to acharya@cornell.edu 1 Distribution
More informationTesting Closeness of Discrete Distributions
Testing Closeness of Discrete Distributions Tuğkan Batu Lance Fortnow Ronitt Rubinfeld Warren D. Smith Patrick White May 29, 2018 arxiv:1009.5397v2 [cs.ds] 4 Nov 2010 Abstract Given samples from two distributions
More informationTesting equivalence between distributions using conditional samples
Testing equivalence between distributions using conditional samples (Extended Abstract) Clément Canonne Dana Ron Rocco A. Servedio July 5, 2013 Abstract We study a recently introduced framework [7, 8]
More informationTesting Cluster Structure of Graphs. Artur Czumaj
Testing Cluster Structure of Graphs Artur Czumaj DIMAP and Department of Computer Science University of Warwick Joint work with Pan Peng and Christian Sohler (TU Dortmund) Dealing with BigData in Graphs
More informationdistributed simulation and distributed inference
distributed simulation and distributed inference Algorithms, Tradeoffs, and a Conjecture Clément Canonne (Stanford University) April 13, 2018 Joint work with Jayadev Acharya (Cornell University) and Himanshu
More informationTesting random variables for independence and identity
Testing random variables for independence and identity Tuğkan Batu Eldar Fischer Lance Fortnow Ravi Kumar Ronitt Rubinfeld Patrick White January 10, 2003 Abstract Given access to independent samples of
More informationTesting that distributions are close
Testing that distributions are close Tuğkan Batu Lance Fortnow Ronitt Rubinfeld Warren D. Smith Patrick White October 12, 2005 Abstract Given two distributions over an n element set, we wish to check whether
More informationWith P. Berman and S. Raskhodnikova (STOC 14+). Grigory Yaroslavtsev Warren Center for Network and Data Sciences
L p -Testing With P. Berman and S. Raskhodnikova (STOC 14+). Grigory Yaroslavtsev Warren Center for Network and Data Sciences http://grigory.us Property Testing [Goldreich, Goldwasser, Ron; Rubinfeld,
More informationCOMS E F15. Lecture 22: Linearity Testing Sparse Fourier Transform
COMS E6998-9 F15 Lecture 22: Linearity Testing Sparse Fourier Transform 1 Administrivia, Plan Thu: no class. Happy Thanksgiving! Tue, Dec 1 st : Sergei Vassilvitskii (Google Research) on MapReduce model
More informationProbability Revealing Samples
Krzysztof Onak IBM Research Xiaorui Sun Microsoft Research Abstract In the most popular distribution testing and parameter estimation model, one can obtain information about an underlying distribution
More informationTesting Monotone High-Dimensional Distributions
Testing Monotone High-Dimensional Distributions Ronitt Rubinfeld Computer Science & Artificial Intelligence Lab. MIT Cambridge, MA 02139 ronitt@theory.lcs.mit.edu Rocco A. Servedio Department of Computer
More informationTESTING PROPERTIES OF DISTRIBUTIONS
TESTING PROPERTIES OF DISTRIBUTIONS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
More informationTolerant Versus Intolerant Testing for Boolean Properties
Tolerant Versus Intolerant Testing for Boolean Properties Eldar Fischer Faculty of Computer Science Technion Israel Institute of Technology Technion City, Haifa 32000, Israel. eldar@cs.technion.ac.il Lance
More informationLearning and Testing Structured Distributions
Learning and Testing Structured Distributions Ilias Diakonikolas University of Edinburgh Simons Institute March 2015 This Talk Algorithmic Framework for Distribution Estimation: Leads to fast & robust
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationEstimation of information-theoretic quantities
Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 16, 2004 Some
More informationComputing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More informationTolerant Versus Intolerant Testing for Boolean Properties
Electronic Colloquium on Computational Complexity, Report No. 105 (2004) Tolerant Versus Intolerant Testing for Boolean Properties Eldar Fischer Lance Fortnow November 18, 2004 Abstract A property tester
More informationQuantum algorithms for testing properties of distributions
Quantum algorithms for testing properties of distributions Sergey Bravyi, Aram W. Harrow, and Avinatan Hassidim July 8, 2009 Abstract Suppose one has access to oracles generating samples from two unknown
More informationAn Automatic Inequality Prover and Instance Optimal Identity Testing
04 IEEE Annual Symposium on Foundations of Computer Science An Automatic Inequality Prover and Instance Optimal Identity Testing Gregory Valiant Stanford University valiant@stanford.edu Paul Valiant Brown
More informationClassy sample correctors
Classy sample correctors 1 Ronitt Rubinfeld MIT and Tel Aviv University joint work with Clement Canonne (Columbia) and Themis Gouleakis (MIT) 1 thanks to Clement and G for inspiring this classy title Our
More informationTesting the Expansion of a Graph
Testing the Expansion of a Graph Asaf Nachmias Asaf Shapira Abstract We study the problem of testing the expansion of graphs with bounded degree d in sublinear time. A graph is said to be an α- expander
More informationTesting monotonicity of distributions over general partial orders
Testing monotonicity of distributions over general partial orders Arnab Bhattacharyya Eldar Fischer 2 Ronitt Rubinfeld 3 Paul Valiant 4 MIT CSAIL. 2 Technion. 3 MIT CSAIL, Tel Aviv University. 4 U.C. Berkeley.
More informationDeclaring Independence via the Sketching of Sketches. Until August Hire Me!
Declaring Independence via the Sketching of Sketches Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Until August 08 -- Hire Me! The Problem The Problem
More informationLecture 1: 01/22/2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 1: 01/22/2014 Spring 2014 Scribes: Clément Canonne and Richard Stark 1 Today High-level overview Administrative
More informationAn Optimal Lower Bound for Monotonicity Testing over Hypergrids
An Optimal Lower Bound for Monotonicity Testing over Hypergrids Deeparnab Chakrabarty 1 and C. Seshadhri 2 1 Microsoft Research India dechakr@microsoft.com 2 Sandia National Labs, Livermore scomand@sandia.gov
More informationProclaiming Dictators and Juntas or Testing Boolean Formulae
Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas The Academic College of Tel-Aviv-Yaffo Tel-Aviv, ISRAEL michalp@mta.ac.il Dana Ron Department of EE Systems Tel-Aviv University
More informationSublinear Time Algorithms for Earth Mover s Distance
Sublinear Time Algorithms for Earth Mover s Distance Khanh Do Ba MIT, CSAIL doba@mit.edu Huy L. Nguyen MIT hlnguyen@mit.edu Huy N. Nguyen MIT, CSAIL huyn@mit.edu Ronitt Rubinfeld MIT, CSAIL ronitt@csail.mit.edu
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 Allan Borodin March 13, 2016 1 / 33 Lecture 9 Announcements 1 I have the assignments graded by Lalla. 2 I have now posted five questions
More informationSquare Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing. Extended Abstract 1
Proceedings of Machine Learning Research vol 65:1 7, 2017 Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing Constantinos Daskslakis EECS and CSAIL, MIT Qinxuan
More informationOptimal Testing for Properties of Distributions
Optimal Testing for Properties of Distributions Jayadev Acharya, Constantinos Daskalakis, Gautam Kamath EECS, MIT {jayadev, costis, g}@mit.edu Abstract Given samples from an unknown discrete distribution
More informationTesting Expansion in Bounded-Degree Graphs
Testing Expansion in Bounded-Degree Graphs Artur Czumaj Department of Computer Science University of Warwick Coventry CV4 7AL, UK czumaj@dcs.warwick.ac.uk Christian Sohler Department of Computer Science
More informationLecture 5: Derandomization (Part II)
CS369E: Expanders May 1, 005 Lecture 5: Derandomization (Part II) Lecturer: Prahladh Harsha Scribe: Adam Barth Today we will use expanders to derandomize the algorithm for linearity test. Before presenting
More informationTesting that distributions are close
Testing that distributions are close Tuğkan Batu Computer Science Department Cornell University Ithaca, NY 14853 batu@cs.cornell.edu Warren D. Smith NEC Research Institute 4 Independence Way Princeton,
More informationMulti-join Query Evaluation on Big Data Lecture 2
Multi-join Query Evaluation on Big Data Lecture 2 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 2 March, 2015 1 / 34 Multi-join Query Evaluation Outline Part 1 Optimal Sequential Algorithms. Thursday
More informationTesting Booleanity and the Uncertainty Principle
Testing Booleanity and the Uncertainty Principle Tom Gur Weizmann Institute of Science tom.gur@weizmann.ac.il Omer Tamuz Weizmann Institute of Science omer.tamuz@weizmann.ac.il Abstract Let f : { 1, 1}
More informationTesting Linear-Invariant Function Isomorphism
Testing Linear-Invariant Function Isomorphism Karl Wimmer and Yuichi Yoshida 2 Duquesne University 2 National Institute of Informatics and Preferred Infrastructure, Inc. wimmerk@duq.edu,yyoshida@nii.ac.jp
More informationLearning Sums of Independent Integer Random Variables
Learning Sums of Independent Integer Random Variables Cos8s Daskalakis (MIT) Ilias Diakonikolas (Edinburgh) Ryan O Donnell (CMU) Rocco Servedio (Columbia) Li- Yang Tan (Columbia) FOCS 2013, Berkeley CA
More informationCPSC 467: Cryptography and Computer Security
CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 16 October 30, 2017 CPSC 467, Lecture 16 1/52 Properties of Hash Functions Hash functions do not always look random Relations among
More informationApproximate Pattern Matching and the Query Complexity of Edit Distance
Krzysztof Onak Approximate Pattern Matching p. 1/20 Approximate Pattern Matching and the Query Complexity of Edit Distance Joint work with: Krzysztof Onak MIT Alexandr Andoni (CCI) Robert Krauthgamer (Weizmann
More informationTesting by Implicit Learning
Testing by Implicit Learning Rocco Servedio Columbia University ITCS Property Testing Workshop Beijing January 2010 What this talk is about 1. Testing by Implicit Learning: method for testing classes of
More informationA Self-Tester for Linear Functions over the Integers with an Elementary Proof of Correctness
A Self-Tester for Linear Functions over the Integers with an Elementary Proof of Correctness The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story
More informationLower Bounds for Testing Bipartiteness in Dense Graphs
Lower Bounds for Testing Bipartiteness in Dense Graphs Andrej Bogdanov Luca Trevisan Abstract We consider the problem of testing bipartiteness in the adjacency matrix model. The best known algorithm, due
More informationOn Learning Powers of Poisson Binomial Distributions and Graph Binomial Distributions
On Learning Powers of Poisson Binomial Distributions and Graph Binomial Distributions Dimitris Fotakis Yahoo Research NY and National Technical University of Athens Joint work with Vasilis Kontonis (NTU
More informationSofya Raskhodnikova
Sofya Raskhodnikova sofya@wisdom.weizmann.ac.il Work: Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel 76100 +972 8 934 3573 http://theory.lcs.mit.edu/
More informationLecture Lecture 9 October 1, 2015
CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest
More informationRecoverabilty Conditions for Rankings Under Partial Information
Recoverabilty Conditions for Rankings Under Partial Information Srikanth Jagabathula Devavrat Shah Abstract We consider the problem of exact recovery of a function, defined on the space of permutations
More informationSublinear Algorithms Lecture 3. Sofya Raskhodnikova Penn State University
Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State University 1 Graph Properties Testing if a Graph is Connected [Goldreich Ron] Input: a graph G = (V, E) on n vertices in adjacency lists representation
More informationDRAFT. Algebraic computation models. Chapter 14
Chapter 14 Algebraic computation models Somewhat rough We think of numerical algorithms root-finding, gaussian elimination etc. as operating over R or C, even though the underlying representation of the
More informationLearning and Testing Submodular Functions
Learning and Testing Submodular Functions Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture3.pdf CIS 625: Computational Learning Theory Submodularity Discrete analog of
More informationCS369E: Communication Complexity (for Algorithm Designers) Lecture #8: Lower Bounds in Property Testing
CS369E: Communication Complexity (for Algorithm Designers) Lecture #8: Lower Bounds in Property Testing Tim Roughgarden March 12, 2015 1 Property Testing We begin in this section with a brief introduction
More informationTesting Problems with Sub-Learning Sample Complexity
Testing Problems with Sub-Learning Sample Complexity Michael Kearns AT&T Labs Research 180 Park Avenue Florham Park, NJ, 07932 mkearns@researchattcom Dana Ron Laboratory for Computer Science, MIT 545 Technology
More informationSettling the Query Complexity of Non-Adaptive Junta Testing
Settling the Query Complexity of Non-Adaptive Junta Testing Erik Waingarten, Columbia University Based on joint work with Xi Chen (Columbia University) Rocco Servedio (Columbia University) Li-Yang Tan
More informationFaster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT
Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians Constantinos Daskalakis, MIT Gautam Kamath, MIT What s a Gaussian Mixture Model (GMM)? Interpretation 1: PDF is a convex
More informationOptimal compression of approximate Euclidean distances
Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch
More informationLearning and Fourier Analysis
Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for PAC-style learning under uniform distribution
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationTesting Booleanity and the Uncertainty Principle
Electronic Colloquium on Computational Complexity, Report No. 31 (2012) Testing Booleanity and the Uncertainty Principle Tom Gur and Omer Tamuz April 4, 2012 Abstract Let f : { 1, 1} n R be a real function
More informationSparse Fourier Transform (lecture 1)
1 / 73 Sparse Fourier Transform (lecture 1) Michael Kapralov 1 1 IBM Watson EPFL St. Petersburg CS Club November 2015 2 / 73 Given x C n, compute the Discrete Fourier Transform (DFT) of x: x i = 1 x n
More informationInformation Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST
Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it
More informationLearning mixtures of structured distributions over discrete domains
Learning mixtures of structured distributions over discrete domains Siu-On Chan UC Berkeley siuon@cs.berkeley.edu. Ilias Diakonikolas University of Edinburgh ilias.d@ed.ac.uk. Xiaorui Sun Columbia University
More informationActive Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! Liu Yang 2011
! Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! 1 Property Testing Instance space X = R n (Distri D over X)! Tested function f : X->{0,1}! A property P of Boolean fn
More informationProperty Testing Bounds for Linear and Quadratic Functions via Parity Decision Trees
Electronic Colloquium on Computational Complexity, Revision 2 of Report No. 142 (2013) Property Testing Bounds for Linear and Quadratic Functions via Parity Decision Trees Abhishek Bhrushundi 1, Sourav
More informationTesting k-modal Distributions: Optimal Algorithms via Reductions
Testing k-modal Distributions: Optimal Algorithms via Reductions Constantinos Daskalakis MIT Ilias Diakonikolas UC Berkeley Paul Valiant Brown University October 3, 2012 Rocco A. Servedio Columbia University
More informationData Structures and Algorithms CMPSC 465
Data Structures and Algorithms CMPSC 465 LECTURE 9 Solving recurrences Substitution method Adam Smith S. Raskhodnikova and A. Smith; based on slides by E. Demaine and C. Leiserson Review question Draw
More informationA Combinatorial Characterization of the Testable Graph Properties: It s All About Regularity
A Combinatorial Characterization of the Testable Graph Properties: It s All About Regularity ABSTRACT Noga Alon Tel-Aviv University and IAS nogaa@tau.ac.il Ilan Newman Haifa University ilan@cs.haifa.ac.il
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationEarthmover resilience & testing in ordered structures
Earthmover resilience & testing in ordered structures Omri Ben-Eliezer Tel-Aviv University Eldar Fischer Technion Computational Complexity Conference 28 UCSD, San Diago Property testing (RS96, GGR98) Meta
More informationTwo Decades of Property Testing
Two Decades of Property Testing Madhu Sudan Microsoft Research 12/09/2014 Invariance in Property Testing @MIT 1 of 29 Kepler s Big Data Problem Tycho Brahe (~1550-1600): Wished to measure planetary motion
More informationt-closeness: Privacy beyond k-anonymity and l-diversity
t-closeness: Privacy beyond k-anonymity and l-diversity paper by Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian presentation by Caitlin Lustig for CS 295D Definitions Attributes: explicit identifiers,
More informationSuccinct Data Structures for the Range Minimum Query Problems
Succinct Data Structures for the Range Minimum Query Problems S. Srinivasa Rao Seoul National University Joint work with Gerth Brodal, Pooya Davoodi, Mordecai Golin, Roberto Grossi John Iacono, Danny Kryzanc,
More informationTesting for Concise Representations
Testing for Concise Representations Ilias Diakonikolas Columbia University ilias@cs.columbia.edu Ronitt Rubinfeld MIT ronitt@theory.csail.mit.edu Homin K. Lee Columbia University homin@cs.columbia.edu
More informationCPSC 467: Cryptography and Computer Security
CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 14 October 16, 2013 CPSC 467, Lecture 14 1/45 Message Digest / Cryptographic Hash Functions Hash Function Constructions Extending
More informationRobustness Meets Algorithms
Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately
More informationSuccinct Data Structures for Approximating Convex Functions with Applications
Succinct Data Structures for Approximating Convex Functions with Applications Prosenjit Bose, 1 Luc Devroye and Pat Morin 1 1 School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca
More informationLecture 5. 1 Review (Pairwise Independence and Derandomization)
6.842 Randomness and Computation September 20, 2017 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Tom Kolokotrones 1 Review (Pairwise Independence and Derandomization) As we discussed last time, we can
More informationCSC 8301 Design & Analysis of Algorithms: Lower Bounds
CSC 8301 Design & Analysis of Algorithms: Lower Bounds Professor Henry Carter Fall 2016 Recap Iterative improvement algorithms take a feasible solution and iteratively improve it until optimized Simplex
More informationLecture 16 Oct. 26, 2017
Sketching Algorithms for Big Data Fall 2017 Prof. Piotr Indyk Lecture 16 Oct. 26, 2017 Scribe: Chi-Ning Chou 1 Overview In the last lecture we constructed sparse RIP 1 matrix via expander and showed that
More informationSpectral Methods for Testing Clusters in Graphs
Spectral Methods for Testing Clusters in Graphs Sandeep Silwal, Jonathan Tidor Mentor: Jonathan Tidor August, 8 Abstract We study the problem of testing if a graph has a cluster structure in the framework
More informationFast Approximate PCPs for Multidimensional Bin-Packing Problems
Fast Approximate PCPs for Multidimensional Bin-Packing Problems Tuğkan Batu 1 School of Computing Science, Simon Fraser University, Burnaby, BC, Canada V5A 1S6 Ronitt Rubinfeld 2 CSAIL, MIT, Cambridge,
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationInformation Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35
1 / 35 Information Theory Mark van Rossum School of Informatics, University of Edinburgh January 24, 2018 0 Version: January 24, 2018 Why information theory 2 / 35 Understanding the neural code. Encoding
More informationLearning symmetric non-monotone submodular functions
Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru
More informationLearning and Fourier Analysis
Learning and Fourier Analysis Grigory Yaroslavtsev http://grigory.us Slides at http://grigory.us/cis625/lecture2.pdf CIS 625: Computational Learning Theory Fourier Analysis and Learning Powerful tool for
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationEnumeration Schemes for Words Avoiding Permutations
Enumeration Schemes for Words Avoiding Permutations Lara Pudwell November 27, 2007 Abstract The enumeration of permutation classes has been accomplished with a variety of techniques. One wide-reaching
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationWhat Can We Learn Privately?
What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Homin Lee Kobbi Nissim Adam Smith Los Alamos UT Austin Ben Gurion Penn State To appear in SICOMP
More informationOn Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004
On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationLecture 21: Algebraic Computation Models
princeton university cos 522: computational complexity Lecture 21: Algebraic Computation Models Lecturer: Sanjeev Arora Scribe:Loukas Georgiadis We think of numerical algorithms root-finding, gaussian
More informationcompare to comparison and pointer based sorting, binary trees
Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:
More informationEfficient Compression of Monotone and m-modal Distributions
Efficient Compression of Monotone and m-modal Distriutions Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh ECE Department, UCSD {jacharya, ashkan, alon, asuresh}@ucsd.edu Astract
More information