Learning Sums of Independent Integer Random Variables

Similar documents
Learning and Testing Structured Distributions

Learning Sums of Independent Random Variables with Sparse Collective Support

Testing by Implicit Learning

A Switching Lemma Tutorial. Benjamin Rossman University of Toronto and NII, Tokyo

On Learning Powers of Poisson Binomial Distributions and Graph Binomial Distributions

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Testing Distributions Candidacy Talk

THE N-VALUE GAME OVER Z AND R

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring 2016

Lecture 9: March 26, 2014

Settling the Query Complexity of Non-Adaptive Junta Testing

Estimating properties of distributions. Ronitt Rubinfeld MIT and Tel Aviv University

CS173 Strong Induction and Functions. Tandy Warnow

Online Learning, Mistake Bounds, Perceptron Algorithm

A Noisy-Influence Regularity Lemma for Boolean Functions Chris Jones

Unit 4 Probability. Dr Mahmoud Alhussami

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Linear Classifiers and the Perceptron

Testing Monotone High-Dimensional Distributions

Discriminative Learning can Succeed where Generative Learning Fails

CS 336. We use the principle of inclusion and exclusion. Let B. = {five digit decimal numbers ending with a 5}, and

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT

Optimal Testing for Properties of Distributions

Courtesy of Jes Jørgensen

Introduction to Bayesian Learning. Machine Learning Fall 2018

Differentially Private Learning of Structured Discrete Distributions

3 Finish learning monotone Boolean functions

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

Tsybakov noise adap/ve margin- based ac/ve learning

Statistical Distribution Assumptions of General Linear Models

CS 361: Probability & Statistics

Learning symmetric non-monotone submodular functions

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

Hardness Results for Agnostically Learning Low-Degree Polynomial Threshold Functions

CS 543 Page 1 John E. Boon, Jr.

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Chapter 4 : The Logic of Boolean Connec6ves. Not all English connec4ves are truth- func4onal

Computational Lower Bounds for Statistical Estimation Problems

STA 247 Solutions to Assignment #1

THE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R

10.1 The Formal Model

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Nonlinear Shi, Registers: A Survey and Open Problems. Tor Helleseth University of Bergen NORWAY

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010

18.01 Single Variable Calculus Fall 2006

CS446: Machine Learning Spring Problem Set 4

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!

A simple algorithmic explanation for the concentration of measure phenomenon

Lecture 13: Covariance. Lisa Yan July 25, 2018

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Spectral Algorithms for Graph Mining and Analysis. Yiannis Kou:s. University of Puerto Rico - Rio Piedras NSF CAREER CCF

CSE 312 Final Review: Section AA

Learning with multiple models. Boosting.

Lecture 1: 01/22/2014

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006.

i=1 Pr(X 1 = i X 2 = i). Notice each toss is independent and identical, so we can write = 1/6

Introduction and Overview STAT 421, SP Course Instructor

COMS 4771 Introduction to Machine Learning. Nakul Verma

Introduction to Computational Learning Theory

Section 2.5 : The Completeness Axiom in R

Chapter 2: The Random Variable

Hierarchical Concept Learning

arxiv: v1 [cs.cc] 29 Feb 2012

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Computational Learning Theory

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

MAT 271E Probability and Statistics

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015


Learning and Fourier Analysis

Statistical Learning Theory: Generalization Error Bounds

What Can We Learn Privately?

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Predicting with Distributions

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

1 Last time and today

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Agnostic Learning of Disjunctions on Symmetric Distributions

MATH 3670 First Midterm February 17, No books or notes. No cellphone or wireless devices. Write clearly and show your work for every answer.

University of Illinois ECE 313: Final Exam Fall 2014

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

1 Inverse Transform Method and some alternative algorithms

Lecture 5: February 16, 2012

Statistics for scientists and engineers

Chapter 8: An Introduction to Probability and Statistics

CMPSCI 240: Reasoning Under Uncertainty

8.1 Polynomial Threshold Functions

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

PROBABILITY THEORY REVIEW

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Final Exam, Fall 2002

Tutorial 3: Random Processes

Probability Density Functions and the Normal Distribution. Quantitative Understanding in Biology, 1.2

Attempt QUESTIONS 1 and 2, and THREE other questions. penalised if you attempt additional questions.

Prime and irreducible elements of the ring of integers modulo n

Transcription:

Learning Sums of Independent Integer Random Variables Cos8s Daskalakis (MIT) Ilias Diakonikolas (Edinburgh) Ryan O Donnell (CMU) Rocco Servedio (Columbia) Li- Yang Tan (Columbia) FOCS 2013, Berkeley CA

learning discrete distribu8ons Probability distribu8ons on Learning problem defined by class C of distribu8ons Target distribu8on D 2 C unknown to learner Learner given sample of i.i.d. draws from D Goal: w.p.. output D sa8sfying

analogies with PAC learning Boolean func8ons 10101010010 10111111110 10101010000.. 1 1 0.. Class C of distribu-ons Unknown target D 2 C Learner gets i.i.d. samples from D Output approxima8on D of D Class C of Boolean func-ons Unknown target Learner gets labeled samples Output approxima8on of Explicit emphasis on computa(onal efficiency

learning distribu8ons: an easy upper bound a Learning arbitrary distribu8ons: samples necessary and sufficient When can we do bewer? Which distribu8ons are easy to learn, which are hard?

two types of structured distribu8ons Distribu8ons with shape restric8ons log- concave monotone bimodal Simple combina-ons of simple distribu-ons Mixtures of simple distribu8ons mixtures of Gaussians This work: Sums of independent, simple random variables + + + + +

one piece of terminology k- IRV: Integer- valued Random Variable supported on 2- IRV 6- IRV k- SIIRV: Sum of n Independent (not necessarily iden(cal ) k- IRVs + + + 2- SIIRV k- SIIRV

star8ng small Simplest imaginable learning problem: Learning 2- IRVs samples necessary and sufficient Learning 2- SIIRVs: samples, Sums of n independent independent of n! coin flips with dis8nct biases? Daskalakis, Diakonikolas, Servedio [STOC 2012] + + + + [Defined by Poisson in 1837] + +

more ambi8ous Learning k- IRVs: samples necessary and sufficient Learning k- SIIRVs: Sum of n independent die rolls, each with dis8nct biases, in. 8me? Our main result: Yes! 8me and sample complexity, independent of n.

from 2 to k : a whole new ball game Even just 3- SIIRVs have significantly richer structure than 2- SIIRVs 2- SIIRVs : unimodal, log- concave, close to Binomial 3- SIIRVs : - far from unimodal - far from log- concave - far from Binomial Prior to our work nothing known, even about sample complexity, even for 3- SIIRVs.

our main theorem Theorem. Let C be the class of k- SIIRVs, i.e. all distribu8ons where are independent, dis8nct r.v. s supported on a. There is an algorithm that learns C with 8me and sample complexity, independent of n. Recall: samples necessary even for a single k- IRV

our main technical contribu8on A new limit theorem for k- SIIRVs: Every k- SIIRV is close to sum of two simple random variables

Limit Theorem. Let be a k- SIIRV with. Then is - close to, where = discre8zed normal = c- IRV, independent S structured global component arbitrary local component

S

previous limit theorems Exis8ng k- SIIRV limit theorems: Certain highly structured k- SIIRVs close to discre8zed normals structure = shii- invariance of But general k- SIIRVs can be far from any disc. norm. Goal: limit theorem for arbitrary k- SIIRVs

k- SIIRVs can be far from Trivial but illustra8ve example: Our main contribu-on: Build on and generalize exis8ng limit theorems to characterize structure of all k- SIIRVs for all disc. norm. Cause for op8mism?

two kinds of numbers + + + Heavy numbers: large structured global component + + + Light numbers: small arbitrary local component + + +

a useful special case: all numbers heavy + + + Intui-on: No mod structure in S e.g. S equally likely to be 0 or 1 mod 2 Use [Chen- Goldstein- Shao 2011] limit theorem to establish closeness to discre8zed normal

a sampling procedure for k- SIIRVs {3, 5} heavy, {0,1,2,4} light + + + light or heavy? 1. Decide independently for each whether outcome will be heavy or light. 2. Draw either or according to respec8ve condi8onal distribu8ons.

analysis Every outcome O of Stage 1 induces distribu8on light or heavy? = mixture of many Key technical lemma: With high probability over outcomes O where = disc. norm. independent of O. Proof uses all numbers heavy special case

using the limit theorem to learn Limit Theorem. Let be a k- SIIRV with. Then is - close to, where = discre8zed normal = c- IRV, independent If, is close to sparse. Easily learn by brute force. Else guess For each, learn and separately. Do hypothesis tes8ng over all k possibili8es.

1. A limit theorem for k- SIIRVs summary of contribu8ons S structured global component arbitrary local component 2. Efficient algorithm for learning k- SIIRVs 8me and samples.

thank you!