Fully Understanding the Hashing Trick

Similar documents
Local Decoding and Testing Polynomials over Grids

High-Dimensional Indexing by Distributed Aggregation

Lesson 24: Using the Quadratic Formula,

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

2.4 Error Analysis for Iterative Methods

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Locality in Coding Theory

Property Testing and Affine Invariance Part I Madhu Sudan Harvard University

Sparse Johnson-Lindenstrauss Transforms

Two Decades of Property Testing Madhu Sudan Harvard

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Lecture 6 September 13, 2016

Quantum Chosen-Ciphertext Attacks against Feistel Ciphers

General Strong Polarization

Math 3 Unit 2: Solving Equations and Inequalities

General Strong Polarization

Classical RSA algorithm

Sparser Johnson-Lindenstrauss Transforms

ME5286 Robotics Spring 2017 Quiz 2

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems

Polynomial Representations of Threshold Functions and Algorithmic Applications. Joint with Josh Alman (Stanford) and Timothy M.

Randomized Algorithms

The domain and range of lines is always R Graphed Examples:

On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays

General Strong Polarization

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick

Math 3 Unit 3: Polynomial Functions

General Strong Polarization

Low-Degree Polynomials

Unit 6 Note Packet List of topics for this unit/assignment tracker Date Topic Assignment & Due Date Absolute Value Transformations Day 1

Sections 4.2 and 4.3 Zeros of Polynomial Functions. Complex Numbers

Higher Cell Probe Lower Bounds for Evaluating Polynomials

From Non-Negative Matrix Factorization to Deep Learning

Math 3 Unit 3: Polynomial Functions

A new procedure for sensitivity testing with two stress factors

Hash-based Indexing: Application, Impact, and Realization Alternatives

Faster Johnson-Lindenstrauss style reductions

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Optimal compression of approximate Euclidean distances

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

Leveraging Big Data: Lecture 13

Quadratic Equations and Functions

Math 171 Spring 2017 Final Exam. Problem Worth

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets

Ad Placement Strategies

P versus NP. Math 40210, Fall November 10, Math (Fall 2015) P versus NP November 10, / 9

Lesson 1: Successive Differences in Polynomials

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Markov Chains and Related Matters

Passing-Bablok Regression for Method Comparison

Lecture 11. Kernel Methods

TEXT AND OTHER MATERIALS:

7.3 The Jacobi and Gauss-Seidel Iterative Methods

PHY103A: Lecture # 4

Specialist Mathematics 2019 v1.2

Data Streams & Communication Complexity

Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels

compare to comparison and pointer based sorting, binary trees

Lecture 3: Pattern Classification. Pattern classification

COMPRESSION FOR QUANTUM POPULATION CODING

CS246 Final Exam, Winter 2011

Lesson 25: Using the Quadratic Formula,

2 How many distinct elements are in a stream?

Expectation of geometric distribution. Variance and Standard Deviation. Variance: Examples

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa

Weak truth table degrees of categoricity

Estimate by the L 2 Norm of a Parameter Poisson Intensity Discontinuous

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition

Worksheets for GCSE Mathematics. Solving Equations. Mr Black's Maths Resources for Teachers GCSE 1-9. Algebra

Fast Random Projections

? 11.5 Perfect hashing. Exercises

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Lower Bounds for External Memory Integer Sorting via Network Coding

CSC 578 Neural Networks and Deep Learning

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Math 3 Unit 4: Rational Functions

Randomized Algorithms. Zhou Jun

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

MATH 1080: Calculus of One Variable II Fall 2018 Textbook: Single Variable Calculus: Early Transcendentals, 7e, by James Stewart.

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a

6.1 Occupancy Problem

Testing a Hash Function using Probability

Unit 6: Absolute Value

Transition to College Math and Statistics

Rank minimization via the γ 2 norm

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Yang-Hwan Ahn Based on arxiv:

Uncertain Compression & Graph Coloring. Madhu Sudan Harvard

Proximity problems in high dimensions

5 1 Worksheet MATCHING: For # 1 4, match each graph to its equation. Not all equations will be used. 1) 2) 3) 4)

Recent Developments on Circuit Satisfiability Algorithms

Foundations II: Data Structures and Algorithms

Multiclass Classification-1

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine)

Lecture 3: Pattern Classification

Lecture 2. More Algorithm Analysis, Math and MCSS By: Sarah Buchanan

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra

Cryptography CS 555. Topic 13: HMACs and Generic Attacks

Fast Dimension Reduction

Transcription:

Fully Understanding the Hashing Trick Lior Kamma, Aarhus University Joint work with Casper Freksen and Kasper Green Larsen.

Recommendation and Classification PG-13 Comic Book Super Hero Sci Fi Adventure Action Violent Scary Comedy Drama Horror

Recommendation and Classification PG-13 Comic Book Super Hero Sci Fi Adventure Action Violent Scary Comedy Drama Horror Categorical Variables How do we decide these are close?

Feature Vectors 1 0 1 1 0 0 1 1 0... 1 0 1 1 0 0 1 0 1... Boolean vectors Denote the feature dimension by nn

kk-nearest Neighbours Storing a corpus of MM items requires Ω nnnn memory Corpus

kk-nearest Neighbours New Movie How do we find the kk closest movies?

Dimensionality Reduction Given εε, δδ (0,1) find Approximation Ratio Error Probability

Dimensionality Reduction For some small mm Given εε, δδ (0,1) find random ff: R nn R mm such that for every xx, yy R nn Think of nn as HUGE

Dimensionality Reduction Given εε, δδ (0,1) find random ff: R nn R mm such that for every xx, yy R nn Pr ff xx ff(yy) 2 2 (1 ± εε) xx yy 2 2 1 δδ

Dimensionality Reduction Given εε, δδ (0,1) find random AA R mm nn such that for every xx, yy R nn Pr AA xx yy 2 2 (1 ± εε) xx yy 2 2 1 δδ Why linear? Cool Math Focus on linear projections Streaming (updates). Good in practice

Dimensionality Reduction Given εε, δδ (0,1) find random AA R mm nn such that for every xx R nn Pr AA xx 2 2 (1 ± εε) xx 2 2 1 δδ Why linear? Cool Math Focus on linear projections Streaming (updates). Good in practice

Johnson Lindenstrauss Lemma [JL 84] Given εε, δδ (0,1) there exists a random linear AA R mm nn such that for every xx Pr AA xx 2 2 (1 ± εε) xx 2 2 1 δδ mm = OO lg 1/δδ εε 2 In most proofs matrix is as dense as possible. Embedding takes OO(mmmm) operations.

Johnson Lindenstrauss Lemma [JL 84] Given εε, δδ (0,1) there exists a random linear AA R mm nn such that for every xx Pr AA xx 2 2 (1 ± εε) xx 2 2 1 δδ If AA is sparse, this can be made faster. In most proofs matrix is as dense as possible. Embedding takes OO(mmmm) operations.

Feature Hashing [Weinberger et al. Add random signs 2009] General Idea: Shuffle the entries of xx xx 1 0 1 1 0 0 1 0 1 + -

Feature Hashing [Weinberger et al. Add random signs 2009] General Idea: Shuffle the entries of xx xx 1 0 1 1 0 0 1 0 1 + - ff(xx) 0 1 0 mm = 33

Feature Hashing [Weinberger et al. Add random signs 2009] General Idea: Shuffle the entries of xx xx 1 0 1 1 0 0 1 0 1 + - ff(xx) 0 1 1 mm = 33

Feature Hashing [Weinberger et al. Add random signs 2009] General Idea: Shuffle the entries of xx xx + Observation: This operation is linear. 1 0 1 1 0 0 1 0 1 Moreover, every column has exactly one non-zero entry. - ff(xx) 0 1 1 mm = 33

The Hashing Trick With High Prob. Observation: If mm is large enough, and the mass of x is not concentrated in few entries, then the trick works with high probability. 1 1 0 0 0 εε = 0.1 Pr h: 1,2,,nn {1,2,,mm} h 1 = h(2) = 1 mm xx xx 2 = 1 2.

The Hashing Trick With High Prob. Success Observation: iff no collision If mm is occurs large enough, and the mass of x is not concentrated in few entries, then the trick works with high probability. εε = 0.1 1 1 Pr h 1 = h(2) = 1 h: 1,2,,nn {1,2,,mm} 0 mm 0 xx = 1 0 xx 2 2 Ṫo succeed we need mm 1 δδ

Tight Bounds Formal Problem Fix mm, εε, δδ. Define νν(mm, εε, δδ) to be the maximum νν such that whenever xx νν xx 2 then feature hashing works.

Tight Bounds Formal Problem Fix mm, εε, δδ. Define νν(mm, εε, δδ) to be the maximum νν such that whenever xx νν xx 2 then feature hashing works. We have a fixed budget, and a fixed room for error. Evaluating νν has been an open question for almost a decade.

Tight Bounds Our Result Fix mm, εε, δδ. Theorem. 1. If mm < cc log1 δδ εε 2 then νν = 0. Essentially, this means our budget is too small to do anything meaningful.

Tight Bounds Our Result Fix mm, εε, δδ. Theorem. 1. If mm < cc log1 δδ εε 2 then νν = 0. 2. If mm 2 δδεε2 then νν = 1. Essentially, this means our budget is rich enough to do anything.

Tight Bounds Our Result Fix mm, εε, δδ. Theorem. This is tight, which means this is the right 1. If mm < cc log1 δδ then νν = 0. εε 2 2. If expression. mm 2 2 then νν = 1. δδεε 3. If CC log1 δδ εε 2 mm < 1 δδεε 2 νν = Θ εε min then log εεεε log 1 δδ log 1 δδ, log εε2 mm log 1 δδ log 1 δδ

Empirical Analysis Results show that the Θ-constant is close to 1. εε min νν lg εεεε lg 1/δδ lg 1/δδ, lg εε2 mm lg 1/δδ lg 1/δδ This implies that Feature Hashing s performance can be very well predicted in practice using our formula. νν = Θ εε min log εεεε log 1 δδ, log εε2 mm log 1 δδ log 1 δδ log 1 δδ 0.725

Questions? Come see poster Read the paper Talk offline All of the above Tight Cell-Probe Bounds for Succinct Boolean Matrix-Vector Multiplication

Questions? Come see poster Read the paper Talk offline All of the above Thank you Tight Cell-Probe Bounds for Succinct Boolean Matrix-Vector Multiplication