Fully Understanding the Hashing Trick

Size: px
Start display at page:

Download "Fully Understanding the Hashing Trick"

Transcription

1 Fully Understanding the Hashing Trick Lior Kamma, Aarhus University Joint work with Casper Freksen and Kasper Green Larsen.

2 Recommendation and Classification PG-13 Comic Book Super Hero Sci Fi Adventure Action Violent Scary Comedy Drama Horror

3 Recommendation and Classification PG-13 Comic Book Super Hero Sci Fi Adventure Action Violent Scary Comedy Drama Horror Categorical Variables How do we decide these are close?

4 Feature Vectors Boolean vectors Denote the feature dimension by nn

5 kk-nearest Neighbours Storing a corpus of MM items requires Ω nnnn memory Corpus

6 kk-nearest Neighbours New Movie How do we find the kk closest movies?

7 Dimensionality Reduction Given εε, δδ (0,1) find Approximation Ratio Error Probability

8 Dimensionality Reduction For some small mm Given εε, δδ (0,1) find random ff: R nn R mm such that for every xx, yy R nn Think of nn as HUGE

9 Dimensionality Reduction Given εε, δδ (0,1) find random ff: R nn R mm such that for every xx, yy R nn Pr ff xx ff(yy) 2 2 (1 ± εε) xx yy δδ

10 Dimensionality Reduction Given εε, δδ (0,1) find random AA R mm nn such that for every xx, yy R nn Pr AA xx yy 2 2 (1 ± εε) xx yy δδ Why linear? Cool Math Focus on linear projections Streaming (updates). Good in practice

11 Dimensionality Reduction Given εε, δδ (0,1) find random AA R mm nn such that for every xx R nn Pr AA xx 2 2 (1 ± εε) xx δδ Why linear? Cool Math Focus on linear projections Streaming (updates). Good in practice

12 Johnson Lindenstrauss Lemma [JL 84] Given εε, δδ (0,1) there exists a random linear AA R mm nn such that for every xx Pr AA xx 2 2 (1 ± εε) xx δδ mm = OO lg 1/δδ εε 2 In most proofs matrix is as dense as possible. Embedding takes OO(mmmm) operations.

13 Johnson Lindenstrauss Lemma [JL 84] Given εε, δδ (0,1) there exists a random linear AA R mm nn such that for every xx Pr AA xx 2 2 (1 ± εε) xx δδ If AA is sparse, this can be made faster. In most proofs matrix is as dense as possible. Embedding takes OO(mmmm) operations.

14 Feature Hashing [Weinberger et al. Add random signs 2009] General Idea: Shuffle the entries of xx xx

15 Feature Hashing [Weinberger et al. Add random signs 2009] General Idea: Shuffle the entries of xx xx ff(xx) mm = 33

16 Feature Hashing [Weinberger et al. Add random signs 2009] General Idea: Shuffle the entries of xx xx ff(xx) mm = 33

17 Feature Hashing [Weinberger et al. Add random signs 2009] General Idea: Shuffle the entries of xx xx + Observation: This operation is linear Moreover, every column has exactly one non-zero entry. - ff(xx) mm = 33

18 The Hashing Trick With High Prob. Observation: If mm is large enough, and the mass of x is not concentrated in few entries, then the trick works with high probability εε = 0.1 Pr h: 1,2,,nn {1,2,,mm} h 1 = h(2) = 1 mm xx xx 2 = 1 2.

19 The Hashing Trick With High Prob. Success Observation: iff no collision If mm is occurs large enough, and the mass of x is not concentrated in few entries, then the trick works with high probability. εε = Pr h 1 = h(2) = 1 h: 1,2,,nn {1,2,,mm} 0 mm 0 xx = 1 0 xx 2 2 Ṫo succeed we need mm 1 δδ

20 Tight Bounds Formal Problem Fix mm, εε, δδ. Define νν(mm, εε, δδ) to be the maximum νν such that whenever xx νν xx 2 then feature hashing works.

21 Tight Bounds Formal Problem Fix mm, εε, δδ. Define νν(mm, εε, δδ) to be the maximum νν such that whenever xx νν xx 2 then feature hashing works. We have a fixed budget, and a fixed room for error. Evaluating νν has been an open question for almost a decade.

22 Tight Bounds Our Result Fix mm, εε, δδ. Theorem. 1. If mm < cc log1 δδ εε 2 then νν = 0. Essentially, this means our budget is too small to do anything meaningful.

23 Tight Bounds Our Result Fix mm, εε, δδ. Theorem. 1. If mm < cc log1 δδ εε 2 then νν = If mm 2 δδεε2 then νν = 1. Essentially, this means our budget is rich enough to do anything.

24 Tight Bounds Our Result Fix mm, εε, δδ. Theorem. This is tight, which means this is the right 1. If mm < cc log1 δδ then νν = 0. εε 2 2. If expression. mm 2 2 then νν = 1. δδεε 3. If CC log1 δδ εε 2 mm < 1 δδεε 2 νν = Θ εε min then log εεεε log 1 δδ log 1 δδ, log εε2 mm log 1 δδ log 1 δδ

25 Empirical Analysis Results show that the Θ-constant is close to 1. εε min νν lg εεεε lg 1/δδ lg 1/δδ, lg εε2 mm lg 1/δδ lg 1/δδ This implies that Feature Hashing s performance can be very well predicted in practice using our formula. νν = Θ εε min log εεεε log 1 δδ, log εε2 mm log 1 δδ log 1 δδ log 1 δδ 0.725

26 Questions? Come see poster Read the paper Talk offline All of the above Tight Cell-Probe Bounds for Succinct Boolean Matrix-Vector Multiplication

27 Questions? Come see poster Read the paper Talk offline All of the above Thank you Tight Cell-Probe Bounds for Succinct Boolean Matrix-Vector Multiplication

Local Decoding and Testing Polynomials over Grids

Local Decoding and Testing Polynomials over Grids Local Decoding and Testing Polynomials over Grids Madhu Sudan Harvard University Joint work with Srikanth Srinivasan (IIT Bombay) January 11, 2018 ITCS: Polynomials over Grids 1 of 12 DeMillo-Lipton-Schwarz-Zippel

More information

High-Dimensional Indexing by Distributed Aggregation

High-Dimensional Indexing by Distributed Aggregation High-Dimensional Indexing by Distributed Aggregation Yufei Tao ITEE University of Queensland In this lecture, we will learn a new approach for indexing high-dimensional points. The approach borrows ideas

More information

Lesson 24: Using the Quadratic Formula,

Lesson 24: Using the Quadratic Formula, , b ± b 4ac x = a Opening Exercise 1. Examine the two equation below and discuss what is the most efficient way to solve each one. A. 4xx + 5xx + 3 = xx 3xx B. cc 14 = 5cc. Solve each equation with the

More information

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Collision. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST Collision Kuan-Yu Chen ( 陳冠宇 ) 2018/12/17 @ TR-212, NTUST Review Hash table is a data structure in which keys are mapped to array positions by a hash function When two or more keys map to the same memory

More information

2.4 Error Analysis for Iterative Methods

2.4 Error Analysis for Iterative Methods 2.4 Error Analysis for Iterative Methods 1 Definition 2.7. Order of Convergence Suppose {pp nn } nn=0 is a sequence that converges to pp with pp nn pp for all nn. If positive constants λλ and αα exist

More information

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence

More information

Locality in Coding Theory

Locality in Coding Theory Locality in Coding Theory Madhu Sudan Harvard April 9, 2016 Skoltech: Locality in Coding Theory 1 Error-Correcting Codes (Linear) Code CC FF qq nn. FF qq : Finite field with qq elements. nn block length

More information

Property Testing and Affine Invariance Part I Madhu Sudan Harvard University

Property Testing and Affine Invariance Part I Madhu Sudan Harvard University Property Testing and Affine Invariance Part I Madhu Sudan Harvard University December 29-30, 2015 IITB: Property Testing & Affine Invariance 1 of 31 Goals of these talks Part I Introduce Property Testing

More information

Sparse Johnson-Lindenstrauss Transforms

Sparse Johnson-Lindenstrauss Transforms Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean

More information

Two Decades of Property Testing Madhu Sudan Harvard

Two Decades of Property Testing Madhu Sudan Harvard Two Decades of Property Testing Madhu Sudan Harvard April 8, 2016 Two Decades of Property Testing 1 of 29 Kepler s Big Data Problem Tycho Brahe (~1550-1600): Wished to measure planetary motion accurately.

More information

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline These preliminaries serve to signal to students what tools they need to know to succeed in ECON 360 and refresh their

More information

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]

More information

Quantum Chosen-Ciphertext Attacks against Feistel Ciphers

Quantum Chosen-Ciphertext Attacks against Feistel Ciphers SESSION ID: CRYP-R09 Quantum Chosen-Ciphertext Attacks against Feistel Ciphers Gembu Ito Nagoya University Joint work with Akinori Hosoyamada, Ryutaroh Matsumoto, Yu Sasaki and Tetsu Iwata Overview 3-round

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) May 1, 018 G.Tech:

More information

Math 3 Unit 2: Solving Equations and Inequalities

Math 3 Unit 2: Solving Equations and Inequalities Math 3 Unit 2: Solving Equations and Inequalities Unit Title Standards 2.1 Analyzing Piecewise Functions F.IF.9 2.2 Solve and Graph Absolute Value Equations F.IF.7B F.BF.3 2.3 Solve and Graph Absolute

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) December 4, 2017 IAS:

More information

Classical RSA algorithm

Classical RSA algorithm Classical RSA algorithm We need to discuss some mathematics (number theory) first Modulo-NN arithmetic (modular arithmetic, clock arithmetic) 9 (mod 7) 4 3 5 (mod 7) congruent (I will also use = instead

More information

Sparser Johnson-Lindenstrauss Transforms

Sparser Johnson-Lindenstrauss Transforms Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 212 joint work with Daniel Kane (Stanford) Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression)

More information

ME5286 Robotics Spring 2017 Quiz 2

ME5286 Robotics Spring 2017 Quiz 2 Page 1 of 5 ME5286 Robotics Spring 2017 Quiz 2 Total Points: 30 You are responsible for following these instructions. Please take a minute and read them completely. 1. Put your name on this page, any other

More information

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems SECTION 5: POWER FLOW ESE 470 Energy Distribution Systems 2 Introduction Nodal Analysis 3 Consider the following circuit Three voltage sources VV sss, VV sss, VV sss Generic branch impedances Could be

More information

Polynomial Representations of Threshold Functions and Algorithmic Applications. Joint with Josh Alman (Stanford) and Timothy M.

Polynomial Representations of Threshold Functions and Algorithmic Applications. Joint with Josh Alman (Stanford) and Timothy M. Polynomial Representations of Threshold Functions and Algorithmic Applications Ryan Williams Stanford Joint with Josh Alman (Stanford) and Timothy M. Chan (Waterloo) Outline The Context: Polynomial Representations,

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

The domain and range of lines is always R Graphed Examples:

The domain and range of lines is always R Graphed Examples: Graphs/relations in R 2 that should be familiar at the beginning of your University career in order to do well (The goal here is to be ridiculously complete, hence I have started with lines). 1. Lines

More information

On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays

On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays Journal of Mathematics and System Science 6 (216) 194-199 doi: 1.17265/2159-5291/216.5.3 D DAVID PUBLISHING On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Guruswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) Oct. 5, 018 Stanford:

More information

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick Grover s algorithm Search in an unordered database Example: phonebook, need to find a person from a phone number Actually, something else, like hard (e.g., NP-complete) problem 0, xx aa Black box ff xx

More information

Math 3 Unit 3: Polynomial Functions

Math 3 Unit 3: Polynomial Functions Math 3 Unit 3: Polynomial Functions Unit Title Standards 3.1 End Behavior of Polynomial Functions F.IF.7c 3.2 Graphing Polynomial Functions F.IF.7c, A.APR3 3.3 Writing Equations of Polynomial Functions

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Guruswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) Oct. 8, 018 Berkeley:

More information

Low-Degree Polynomials

Low-Degree Polynomials Low-Degree Polynomials Madhu Sudan Harvard University October 7, 2016 Avi60!! Low-degree Polynomials 1 of 27 October 7, 2016 Avi60!! Low-degree Polynomials 2 of 27 Happy Birthday, Avi!! October 7, 2016

More information

Unit 6 Note Packet List of topics for this unit/assignment tracker Date Topic Assignment & Due Date Absolute Value Transformations Day 1

Unit 6 Note Packet List of topics for this unit/assignment tracker Date Topic Assignment & Due Date Absolute Value Transformations Day 1 Name: Period: Unit 6 Note Packet List of topics for this unit/assignment tracker Date Topic Assignment & Due Date Absolute Value Transformations Day 1 Absolute Value Transformations Day 2 Graphing Equations

More information

Sections 4.2 and 4.3 Zeros of Polynomial Functions. Complex Numbers

Sections 4.2 and 4.3 Zeros of Polynomial Functions. Complex Numbers Sections 4.2 and 4.3 Zeros of Polynomial Functions Complex Numbers 1 Sections 4.2 and 4.3 Find the Zeros of Polynomial Functions and Graph Recall from section 4.1 that the end behavior of a polynomial

More information

Higher Cell Probe Lower Bounds for Evaluating Polynomials

Higher Cell Probe Lower Bounds for Evaluating Polynomials Higher Cell Probe Lower Bounds for Evaluating Polynomials Kasper Green Larsen MADALGO, Department of Computer Science Aarhus University Aarhus, Denmark Email: larsen@cs.au.dk Abstract In this paper, we

More information

From Non-Negative Matrix Factorization to Deep Learning

From Non-Negative Matrix Factorization to Deep Learning The Math!! From Non-Negative Matrix Factorization to Deep Learning Intuitions and some Math too! luissarmento@gmailcom https://wwwlinkedincom/in/luissarmento/ October 18, 2017 The Math!! Introduction Disclaimer

More information

Math 3 Unit 3: Polynomial Functions

Math 3 Unit 3: Polynomial Functions Math 3 Unit 3: Polynomial Functions Unit Title Standards 3.1 End Behavior of Polynomial Functions F.IF.7c 3.2 Graphing Polynomial Functions F.IF.7c, A.APR3 3.3 Writing Equations of Polynomial Functions

More information

A new procedure for sensitivity testing with two stress factors

A new procedure for sensitivity testing with two stress factors A new procedure for sensitivity testing with two stress factors C.F. Jeff Wu Georgia Institute of Technology Sensitivity testing : problem formulation. Review of the 3pod (3-phase optimal design) procedure

More information

Hash-based Indexing: Application, Impact, and Realization Alternatives

Hash-based Indexing: Application, Impact, and Realization Alternatives : Application, Impact, and Realization Alternatives Benno Stein and Martin Potthast Bauhaus University Weimar Web-Technology and Information Systems Text-based Information Retrieval (TIR) Motivation Consider

More information

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss

More information

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional

More information

Optimal compression of approximate Euclidean distances

Optimal compression of approximate Euclidean distances Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch

More information

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 8th, 2017 Universal hash family Notation: Universe U = {0,..., u 1}, index space M = {0,..., m 1},

More information

Leveraging Big Data: Lecture 13

Leveraging Big Data: Lecture 13 Leveraging Big Data: Lecture 13 http://www.cohenwang.com/edith/bigdataclass2013 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo What are Linear Sketches? Linear Transformations of the input vector

More information

Quadratic Equations and Functions

Quadratic Equations and Functions 50 Quadratic Equations and Functions In this chapter, we discuss various ways of solving quadratic equations, aaxx 2 + bbbb + cc 0, including equations quadratic in form, such as xx 2 + xx 1 20 0, and

More information

Math 171 Spring 2017 Final Exam. Problem Worth

Math 171 Spring 2017 Final Exam. Problem Worth Math 171 Spring 2017 Final Exam Problem 1 2 3 4 5 6 7 8 9 10 11 Worth 9 6 6 5 9 8 5 8 8 8 10 12 13 14 15 16 17 18 19 20 21 22 Total 8 5 5 6 6 8 6 6 6 6 6 150 Last Name: First Name: Student ID: Section:

More information

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Emanuele Viola July 6, 2009 Abstract We prove that to store strings x {0, 1} n so that each prefix sum a.k.a. rank query Sumi := k i x k can

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

P versus NP. Math 40210, Fall November 10, Math (Fall 2015) P versus NP November 10, / 9

P versus NP. Math 40210, Fall November 10, Math (Fall 2015) P versus NP November 10, / 9 P versus NP Math 40210, Fall 2015 November 10, 2015 Math 40210 (Fall 2015) P versus NP November 10, 2015 1 / 9 Properties of graphs A property of a graph is anything that can be described without referring

More information

Lesson 1: Successive Differences in Polynomials

Lesson 1: Successive Differences in Polynomials Lesson 1 Lesson 1: Successive Differences in Polynomials Classwork Opening Exercise John noticed patterns in the arrangement of numbers in the table below. 2.4 3.4 4.4 5.4 6.4 5.76 11.56 19.36 29.16 40.96

More information

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek

More information

Markov Chains and Related Matters

Markov Chains and Related Matters Markov Chains and Related Matters 2 :9 3 4 : The four nodes are called states. The numbers on the arrows are called transition probabilities. For example if we are in state, there is a probability of going

More information

Passing-Bablok Regression for Method Comparison

Passing-Bablok Regression for Method Comparison Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

More information

Lecture 11. Kernel Methods

Lecture 11. Kernel Methods Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product

More information

TEXT AND OTHER MATERIALS:

TEXT AND OTHER MATERIALS: 1. TEXT AND OTHER MATERIALS: Check Learning Resources in shared class files Calculus Wiki-book: https://en.wikibooks.org/wiki/calculus (Main Reference e-book) Paul s Online Math Notes: http://tutorial.math.lamar.edu

More information

7.3 The Jacobi and Gauss-Seidel Iterative Methods

7.3 The Jacobi and Gauss-Seidel Iterative Methods 7.3 The Jacobi and Gauss-Seidel Iterative Methods 1 The Jacobi Method Two assumptions made on Jacobi Method: 1.The system given by aa 11 xx 1 + aa 12 xx 2 + aa 1nn xx nn = bb 1 aa 21 xx 1 + aa 22 xx 2

More information

PHY103A: Lecture # 4

PHY103A: Lecture # 4 Semester II, 2017-18 Department of Physics, IIT Kanpur PHY103A: Lecture # 4 (Text Book: Intro to Electrodynamics by Griffiths, 3 rd Ed.) Anand Kumar Jha 10-Jan-2018 Notes The Solutions to HW # 1 have been

More information

Specialist Mathematics 2019 v1.2

Specialist Mathematics 2019 v1.2 181314 Mensuration circumference of a circle area of a parallelogram CC = ππππ area of a circle AA = ππrr AA = h area of a trapezium AA = 1 ( + )h area of a triangle AA = 1 h total surface area of a cone

More information

Data Streams & Communication Complexity

Data Streams & Communication Complexity Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x

More information

Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels

Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Devdatt Dubhashi Department of Computer Science and Engineering, Chalmers University, dubhashi@chalmers.se Functional

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

COMPRESSION FOR QUANTUM POPULATION CODING

COMPRESSION FOR QUANTUM POPULATION CODING COMPRESSION FOR QUANTUM POPULATION CODING Ge Bai, The University of Hong Kong Collaborative work with: Yuxiang Yang, Giulio Chiribella, Masahito Hayashi INTRODUCTION Population: A group of identical states

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

Lesson 25: Using the Quadratic Formula,

Lesson 25: Using the Quadratic Formula, , b ± b 4ac x = a Opening Exercise Over the years, many students and teachers have thought of ways to help us all remember the quadratic formula. Below is the YouTube link to a video created by two teachers

More information

2 How many distinct elements are in a stream?

2 How many distinct elements are in a stream? Dealing with Massive Data January 31, 2011 Lecture 2: Distinct Element Counting Lecturer: Sergei Vassilvitskii Scribe:Ido Rosen & Yoonji Shin 1 Introduction We begin by defining the stream formally. Definition

More information

Expectation of geometric distribution. Variance and Standard Deviation. Variance: Examples

Expectation of geometric distribution. Variance and Standard Deviation. Variance: Examples Expectation of geometric distribution Variance and Standard Deviation What is the probability that X is finite? Can now compute E(X): Σ k=f X (k) = Σ k=( p) k p = pσ j=0( p) j = p ( p) = E(X) = Σ k=k (

More information

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 1 Review for Exam3 12. 11. 2013 Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 2 Chapter

More information

Weak truth table degrees of categoricity

Weak truth table degrees of categoricity Weak truth table degrees of categoricity Selwyn Ng Nanyang Technological University, Singapore Feb 2017 Selwyn Ng Weak truth table degrees of categoricity 1 / 19 Motivating questions Study how computation

More information

Estimate by the L 2 Norm of a Parameter Poisson Intensity Discontinuous

Estimate by the L 2 Norm of a Parameter Poisson Intensity Discontinuous Research Journal of Mathematics and Statistics 6: -5, 24 ISSN: 242-224, e-issn: 24-755 Maxwell Scientific Organization, 24 Submied: September 8, 23 Accepted: November 23, 23 Published: February 25, 24

More information

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition Work, Energy, and Power Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 With the knowledge we got so far, we can handle the situation on the left but not the one on the right.

More information

Worksheets for GCSE Mathematics. Solving Equations. Mr Black's Maths Resources for Teachers GCSE 1-9. Algebra

Worksheets for GCSE Mathematics. Solving Equations. Mr Black's Maths Resources for Teachers GCSE 1-9. Algebra Worksheets for GCSE Mathematics Solving Equations Mr Black's Maths Resources for Teachers GCSE 1-9 Algebra Equations Worksheets Contents Differentiated Independent Learning Worksheets Solving Equations

More information

Fast Random Projections

Fast Random Projections Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random

More information

? 11.5 Perfect hashing. Exercises

? 11.5 Perfect hashing. Exercises 11.5 Perfect hashing 77 Exercises 11.4-1 Consider inserting the keys 10; ; 31; 4; 15; 8; 17; 88; 59 into a hash table of length m 11 using open addressing with the auxiliary hash function h 0.k/ k. Illustrate

More information

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Linear Sketches A Useful Tool in Streaming and Compressive Sensing Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =

More information

Lower Bounds for External Memory Integer Sorting via Network Coding

Lower Bounds for External Memory Integer Sorting via Network Coding Lower Bounds for External Memory Integer Sorting via Network Coding Alireza Farhadi University of Maryland College Park, MD MohammadTaghi Hajiaghayi University of Maryland College Park, MD Elaine Shi Cornell

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Lecture 5: Hashing. David Woodruff Carnegie Mellon University Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of

More information

Math 3 Unit 4: Rational Functions

Math 3 Unit 4: Rational Functions Math Unit : Rational Functions Unit Title Standards. Equivalent Rational Expressions A.APR.6. Multiplying and Dividing Rational Expressions A.APR.7. Adding and Subtracting Rational Expressions A.APR.7.

More information

Randomized Algorithms. Zhou Jun

Randomized Algorithms. Zhou Jun Randomized Algorithms Zhou Jun 1 Content 13.1 Contention Resolution 13.2 Global Minimum Cut 13.3 *Random Variables and Expectation 13.4 Randomized Approximation Algorithm for MAX 3- SAT 13.6 Hashing 13.7

More information

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,

More information

MATH 1080: Calculus of One Variable II Fall 2018 Textbook: Single Variable Calculus: Early Transcendentals, 7e, by James Stewart.

MATH 1080: Calculus of One Variable II Fall 2018 Textbook: Single Variable Calculus: Early Transcendentals, 7e, by James Stewart. MATH 1080: Calculus of One Variable II Fall 2018 Textbook: Single Variable Calculus: Early Transcendentals, 7e, by James Stewart Unit 2 Skill Set Important: Students should expect test questions that require

More information

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a

Hash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a Hash Tables Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a mapping from U to M = {1,..., m}. A collision occurs when two hashed elements have h(x) =h(y).

More information

6.1 Occupancy Problem

6.1 Occupancy Problem 15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.

More information

Testing a Hash Function using Probability

Testing a Hash Function using Probability Testing a Hash Function using Probability Suppose you have a huge square turnip field with 1000 turnips growing in it. They are all perfectly evenly spaced in a regular pattern. Suppose also that the Germans

More information

Unit 6: Absolute Value

Unit 6: Absolute Value Algebra 1 NAME: Unit 6: Absolute Value Note Packet Date Topic/Assignment HW Page 6-A Introduction to Absolute Value Functions 6-B Reflections of Absolute Value Functions 6-C Solve Absolute Value Equations

More information

Transition to College Math and Statistics

Transition to College Math and Statistics Transition to College Math and Statistics Summer Work 016 due date: third day of class estimated time: 10 hours (for planning purposes only; work until you finish) Dear College Algebra Students, This assignment

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Yang-Hwan Ahn Based on arxiv:

Yang-Hwan Ahn Based on arxiv: Yang-Hwan Ahn (CTPU@IBS) Based on arxiv: 1611.08359 1 Introduction Now that the Higgs boson has been discovered at 126 GeV, assuming that it is indeed exactly the one predicted by the SM, there are several

More information

Uncertain Compression & Graph Coloring. Madhu Sudan Harvard

Uncertain Compression & Graph Coloring. Madhu Sudan Harvard Uncertain Compression & Graph Coloring Madhu Sudan Harvard Based on joint works with: (1) Adam Kalai (MSR), Sanjeev Khanna (U.Penn), Brendan Juba (WUStL) (2) Elad Haramaty (Harvard) (3) Badih Ghazi (MIT),

More information

Proximity problems in high dimensions

Proximity problems in high dimensions Proximity problems in high dimensions Ioannis Psarros National & Kapodistrian University of Athens March 31, 2017 Ioannis Psarros Proximity problems in high dimensions March 31, 2017 1 / 43 Problem definition

More information

5 1 Worksheet MATCHING: For # 1 4, match each graph to its equation. Not all equations will be used. 1) 2) 3) 4)

5 1 Worksheet MATCHING: For # 1 4, match each graph to its equation. Not all equations will be used. 1) 2) 3) 4) Algebra 1 Name: Per: 5 1 Worksheet MATCHING: For # 1 4, match each graph to its equation. Not all equations will be used. 1) 2) 3) 4) A) yy = xx 3 B) yy = xx + 3 C) yy = 1 2 xx 3 D) yy = xx 3 E) yy = 2

More information

Recent Developments on Circuit Satisfiability Algorithms

Recent Developments on Circuit Satisfiability Algorithms Recent Developments on Circuit Satisfiability Algorithms Suguru TAMAKI Kyoto University Fine-Grained Complexity and Algorithm Design Reunion, December 14, 2016, Simons Institute, Berkeley, CA After the

More information

Foundations II: Data Structures and Algorithms

Foundations II: Data Structures and Algorithms Foundations II: Data Structures and Algorithms Instructor : Yusu Wang Topic 1 : Introduction and Asymptotic notation Course Information Course webpage http://www.cse.ohio-state.edu/~yusu/courses/2331 Office

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine)

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Mihai Pǎtraşcu, MIT, web.mit.edu/ mip/www/ Index terms: partial-sums problem, prefix sums, dynamic lower bounds Synonyms: dynamic trees 1

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Lecture 2. More Algorithm Analysis, Math and MCSS By: Sarah Buchanan

Lecture 2. More Algorithm Analysis, Math and MCSS By: Sarah Buchanan Lecture 2 More Algorithm Analysis, Math and MCSS By: Sarah Buchanan Announcements Assignment #1 is posted online It is directly related to MCSS which we will be talking about today or Monday. There are

More information

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra Worksheets for GCSE Mathematics Quadratics mr-mathematics.com Maths Resources for Teachers Algebra Quadratics Worksheets Contents Differentiated Independent Learning Worksheets Solving x + bx + c by factorisation

More information

Cryptography CS 555. Topic 13: HMACs and Generic Attacks

Cryptography CS 555. Topic 13: HMACs and Generic Attacks Cryptography CS 555 Topic 13: HMACs and Generic Attacks 1 Recap Cryptographic Hash Functions Merkle-Damgård Transform Today s Goals: HMACs (constructing MACs from collision-resistant hash functions) Generic

More information

Fast Dimension Reduction

Fast Dimension Reduction Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard

More information