Monotone Estimation Framework and Applications for Scalable Analytics of Large Data Sets
|
|
- Adelia Ferguson
- 5 years ago
- Views:
Transcription
1 Monotone Estimation Framework and Applications for Scalable Analytics of Large Data Sets Edith Cohen Google Research Tel Aviv University
2 A Monotone Sampling Scheme Data domain V( R + ) v random seed u ~ U 0,1 Outcome S(v, u) : function of the data v and seed u Seed value u is available with the outcome S(v, u) can be interpreted as the set of all data vectors consistent with the outcome and u Monotonicity: Fixing v, S(v, u) is non-increasing with u.
3 Monotone Estimation Problem (MEP) A monotone sampling scheme (V, S) : Data domain V( R + ) Sampling scheme S: V [0,1], A nonnegative function f: V 0 Goal: estimate f v from the sample Data v Desired properties of the estimator f 6 (S): Sample S estimator Q: f(v)? f 6 (S v, u, u) > Unbiased v, f 6 S v, x, x dx = f v (useful with sums of MEPs)? Nonnegative keep estimate f 6 in the same domain as f (Pareto) optimal (admissible) any estimator with smaller var? C f6 S v, u has for some v, larger var? C E, f6 S(v, u)
4 Bounds on f(v) from S and u Data v. The lower the seed u is, the more we know on v and hence on f(v). f(v) Information on f(v) u 1
5 Estimators for MEPs Unbiased, Nonnegative, Bounded variance Admissible: Pareto Optimal in terms of variance Results preview: Explicit expressions for estimators for any MEP for which such estimator exists Solution is not unique. Consider some estimators with natural properties if we require monotonicity - f 6 (S, u) is non-increasing with u, we get uniqueness Notion of Competitiveness of estimators We will come back to this, but first see some applications
6 MEP applications in data analysis Scalable computation of approximate statistics and queries over large data sets Data is sampled (composable, distributed scheme). Sample is used to estimate statistics/queries expressed as a sum of multiple MEPs Key-value pairs with multiple sets of values (instances) Take coordinated samples of instances. We get a MEP for each key Sketching graph-based influence and similarity functions Distance sketch the utility values (relations of node to all others). Get a MEP for each target node from sketches of seed nodes Sketching generalized coverage functions Coordinated weighted sample of the utility vector of each element. MEP for each item.
7 Social/Communication data Activity value v(x) is associated with each key x = (b, c) (e.g. number of messages, communication from b to c) Take a weighted sample of keys. For example bottom-k ( weighed reservoir ) or PPS (Probability Proportional to Size) Monday activity (a,b) 40 (f,g) 5 (h,c) 20 (a,z) 10 (h,f) 10 (f,s) 10 For τ > 0, iid u x U[0,1]: x S v x τ u(x) Monday Sample: (a,b) 40 (a,z) 10.. (f,s) 10 With bottom-k, τ is set to obtain a fixed sample size k Without replacement sampling: v x τ ln u x Fully composable sampling scheme
8 Samples of multiple days Coordinated samples: Different values for different days. Each key is sampled with same seed u(x) in different days Monday activity (a,b) 40 (f,g) 5 (h,c) 20 (a,z) 10 (h,f) 10 (f,s) 10 Monday Sample: (a,b) 40 (a,z) 10.. (f,s) 10 Tuesday activity (a,b) 3 (f,g) 5 (g,c) 10 (a,z) 50 (s,f) 20 (g,h) 10 Tuesday Sample: (g,c) (a,z) 50.. (g,h) Wednesday activity (a,b) 30 (g,c) 5 (h,c) 10 (a,z) 10 (b,f) 20 (d,h) 10 Wednesday Sample: (a,b) 30 (b,f) 20.. (d,h) 10
9 Matrix view keys instances In our example: keys x = (a, b) are user pairs. Instances are days. Su Mo Tu We Th Fr Sa (a,b) (g,c) (h,c) (a,z) (h,f) (f,s) (d,h)
10 Example Statistics Specify a segment of the keys Y X, examples: one user in CA and one in NY apple device to android Queries/Statistics S T f(v > x, v U x,, v + x ) Total communication of segment on Wednesday. S T v > x X L X distance/weighted Jaccard change in activity of segment between Friday and Saturday S T v >(x) v U (x) X L X X increase/decrease S T Coverage of segment Y in days D : S T max {0, v >(x) v U (x)} X max a b v a(x) Average/sum of median/max/min/top-3/concave aggregate of activity values over days D We would like to compute an estimate from the sample
11 Matrix view keys instances Coordinated PPS sample τ = 100 for all entries u Su Mo Tu We Th Fr Sa (a,b) (g,c) (h,c) (a,z) (h,f) (f,s) (d,h)
12 Estimate sum statistics, one key at a time f f(v(x)) S T Sum over keys x Y of f(v x ), where v(x) = (v > x, v U (x) ) For L X distance: f(v) = v > v U X Estimate one key at a time: f f 6 (S(v(x)) S T The estimator for f(v) is applied to the sample of v
13 Easy statistics: Sum over entries Estimate a single entry at a time Example: Total communication of segment Y on Monday Inverse probability estimate (Horviz Thompson) [HT52]: Sum over sampled x Y of v monday x p monday (x) Inclusion Probability p monday x can be computed from v x and τ : x S v x τ u(x) p a x = Pr? C [v a x τ i u x ]
14 HT estimator (single-instance) Coordinated PPS sample τ = 100 u Su Mo Tu We Th Fr Sa (a,b) (g,c) (h,c) (a,z) (h,f) (f,s) (d,h)
15 HT estimator (single-instance) τ = 100. Day: Wednesday, Segment: CA-NY u Su Mo Tu We Th Fr Sa (a,b) (g,c) (h,c) (a,z) (h,f) (f,s) (d,h)
16 HT estimator for single-instance τ = 100. Day: Wednesday, Segment: CA-NY u We (a,b) 43 (g,c) 0 (h,c) 60 (a,z) 24 (h,f) 3 (f,s) 20 (d,h) 0 p = 0.43 p = 0.20 Exact: = 123 HT estimate is 0 for keys that are not sampled, v/p when key is sampled HT estimate: = 200
17 Inverse-Probability (HT) estimator ü Unbiased: 1 p x 0 + p x z { S X S = f(v x ) ü Nonnegative: v x 0 so { S X S 0 ü Bounded variance (for all v) ü Monotone: more information higher estimate ü Optimal: UMVU The unique minimum variance (unbiased, nonnegative, sum) estimator Works when f depends on a single entry. What about general f?
18 Queries involving multiple columns L X X distance f(v) = v> v U X L X X increase f(v) = max{0, v > v U } X HT estimate is positive only when we know f v = v > v U from the sample. But for v U = 0, v > > 0 then f v > 0 but sample never reveals f v because second entry is never sampled. Thus, HT is biased Even when unbiased, HT may not be optimal. E.g. when v > is sampled and we can deduce from τ U and u that v U a < v > then we know that f v v > a. An optimal estimator will use this incomplete information We want estimators with the same nice properties as HT and optimality
19 Sampled data Coordinated PPS sample τ = 100 u Su Mo Tu We Th Fr Sa (a,b) (g,c) (h,c) (a,z) (h,f) (f,s) (d,h) Want to estimate U U U Lets look at key (a,z), and estimating
20 Information on f Fix the data v. The lower u is, the more we know on v and on f v = U = 81. We plot the lower bound we have on f(v) as a function of the seed u u
21 This is a MEP! Monotone Estimation Problem A monotone sampling scheme (V, S) : Data domain V( R + ) here v >, v U R E Sampling scheme S: V [0,1], here S((v >, v U ), u) reveals v a when v a > 100 u U A nonnegative function f: V 0 here v > v U Goal: estimate f(v): specify an estimator f 6 (S, u) that is Unbiased, Nonnegative, Bounded variance, Admissible (optimal) U Solution is not unique.
22 The optimal (admissible) range Cum. Est For x > u We see S(v, u) and u. We know what S(v, x) is for all x > u. >? Suppose we fixed M = f 6 S v, x, x dx
23 MEP Estimators Order optimal estimators: For an order on the data domain V: Any estimator with lower variance on v, must have higher variance on z v The L* estimator: The unique admissible monotone estimator Order optimal for: z v f z < f(v) 4-variance competitive (soon we define that) The U* estimator: Order optimal for: z v f z > f(v) Choice of estimator depends on properties we want, possibly depending on typical data distribution. L* is a good default (monotone and competitive)
24 Variance Competitiveness [CK13] A worst-case over data theoretical indicator for estimator quality For each v, we can consider the minimum E? C[E,>] [f 6U S v, u, u ] attainable by an estimator that is unbiased and nonnegative for all other v We use such optimal estimator f 6 (v) for v as a reference point. An estimator f 6 S, u is c-competitive if for any data v, the expectation of the square is within a factor c of the minimum possible for v (by an unbiased and nonnegative estimator). For all unbiased nonnegative g, E? C[E,>] [f 6U S(v, u) ] c E? C E,> [g U S(v, u )] The L estimator is 4-competitive and this is tight. For some MEPs, ratio is 4
25 Optimal estimator f6(v) for data v (unbiased and nonnegative for all data) The optimal estimates f 6 (v) are the negated derivative of the lower hull of the Lower bound function. f(s, u) Lower Bound function for v Lower Hull u 1 Intuition: The lower bound guides us on outcome S, how high we can go with the estimate, in order to optimize variance for v while still being nonnegative on all other consistent data vectors.
26 The L* estimator We see S(v, u) and u. We know what S(v, x) is for all x > u f S(v, u), u u? > f S(v, x), x x U dx
27 L > estimation example Estimators for f v >, v U = v > v U Scheme: v a 0 is sampled if v a > u lower bound (LB) on f 0.6,0.2 from S and u The Lower hull of LB The L, U, and opt for v estimators U is optimized for the vector f 0.6,0.0 (always consistent with S) L is optimized locally for the vector f 0.6, u (consistent vector with smallest f
28 L U U estimation example Estimators for f v >, v U = v > v U U Scheme: v a 0 is sampled if v a > u lower bound (LB) on f 0.6,0.2 from S and u The Lower hull of LB The L, U, and opt for v estimators U is optimized for the vector f 0.6,0.0 (always consistent with S) L is optimized locally for the vector f 0.6, u (consistent vector with smallest f
29 Summary Defined Monotone Estimation Problems (MEPs) (motivated by coordinated sampling) Derive Pareto optimal (admissible) unbiased and nonnegative estimators (for any MEP when they exist): L* (lower end of range: unique monotone estimator, dominates HT), U* (upper end of range), Order optimal estimators (optimized for certain data patterns)
30 Applications Estimators for Euclidean and Manhattan distances from samples [C KDD 14] sketch-based closeness similarity in social networks [CDFGGW COSN 13] (similarity of the sets of longrange interactions) Sketching generalized coverage functions, including graph-based influence functions [CDPW 14, C 16]
31 Future Tighter bounds on universal ratio: L* is 4 competitive, can do competitive, lower bound is 1.44 competitive. Instance-optimal competitiveness Give efficient construction for any MEP Multi-dimensional MEPs: Have multiple independent seed (independent samples of columns ), some initial derivations for d = 2 and coverage and distance functions [CK 12, C 14], but the full picture is missing
32 L 1 distance [C KDD14] Independent / Coordinated PPS sampling #IP flows to a destination in two time periods
33 L U U distance [C KDD14] Independent/Coordinated PPS sampling Surname occurrences in 2007, 2008 books (Google ngrams)
34
35 Coordination of samples Very powerful tool for big data analysis with applications well beyond what [Brewer, Early, Joyce 1972] could envision Locality Sensitive Hashing (LSH) (similar weight vectors have similar samples/sketches) Multi-objective samples (universal samples): A single sample (as small as possible) that provides statistical guarantees for multiple sets of weights. Statistics/Domain queries that span multiple instances (Jaccard similarity, L X distances, distinct counts, union size, ) MinHash sketches are a special case with 0/1 weights. Facilitates faster computation of samples. Example: [C 97] Sketching/sampling reachability sets and neighborhoods of all nodes in a graph in near-linear time.
The Magic of Random Sampling: From Surveys to Big Data
The Magic of Random Sampling: From Surveys to Big Data Edith Cohen Google Research Tel Aviv University Disclaimer: Random sampling is classic and well studied tool with enormous impact across disciplines.
More informationLeveraging Big Data: Lecture 13
Leveraging Big Data: Lecture 13 http://www.cohenwang.com/edith/bigdataclass2013 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo What are Linear Sketches? Linear Transformations of the input vector
More informationGreedy Maximization Framework for Graph-based Influence Functions
Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1 Large Graphs Model relations/interactions (edges) between entities (nodes)
More information4.3 1st and 2nd derivative tests
CHAPTER 4. APPLICATIONS OF DERIVATIVES 08 4.3 st and nd derivative tests Definition. If f 0 () > 0 we say that f() is increasing. If f 0 () < 0 we say that f() is decreasing. f 0 () > 0 f 0 () < 0 Theorem
More informationAll-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis
1 All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis Edith Cohen arxiv:136.3284v7 [cs.ds] 17 Jan 215 Abstract Graph datasets with billions of edges, such as social and Web graphs,
More informationarxiv: v6 [cs.db] 2 Aug 2013
What You Can Do with Coordinated Samples Edith Cohen Haim Kaplan arxiv:126.5637v6 [cs.db] 2 Aug 213 Abstract Sample coordination, where similar instances have similar samples, was proposed by statisticians
More informationCSCE 750 Final Exam Answer Key Wednesday December 7, 2005
CSCE 750 Final Exam Answer Key Wednesday December 7, 2005 Do all problems. Put your answers on blank paper or in a test booklet. There are 00 points total in the exam. You have 80 minutes. Please note
More informationCS168: The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction
CS168: The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction Tim Roughgarden & Gregory Valiant April 12, 2017 1 The Curse of Dimensionality in the Nearest Neighbor Problem Lectures #1 and
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationB490 Mining the Big Data
B490 Mining the Big Data 1 Finding Similar Items Qin Zhang 1-1 Motivations Finding similar documents/webpages/images (Approximate) mirror sites. Application: Don t want to show both when Google. 2-1 Motivations
More informationLecture 8: Basic convex analysis
Lecture 8: Basic convex analysis 1 Convex sets Both convex sets and functions have general importance in economic theory, not only in optimization. Given two points x; y 2 R n and 2 [0; 1]; the weighted
More informationBeyond Distinct Counting: LogLog Composable Sketches of frequency statistics
Beyond Distinct Counting: LogLog Composable Sketches of frequency statistics Edith Cohen Google Research Tel Aviv University TOCA-SV 5/12/2017 8 Un-aggregated data 3 Data element e D has key and value
More informationPaper Reference. Paper Reference(s) 6665/01 Edexcel GCE Core Mathematics C3 Advanced. Friday 6 June 2008 Afternoon Time: 1 hour 30 minutes
Centre No. Candidate No. Paper Reference(s) 6665/01 Edexcel GCE Core Mathematics C3 Advanced Friday 6 June 2008 Afternoon Time: 1 hour 30 minutes Materials required for examination Mathematical Formulae
More information6 Distances. 6.1 Metrics. 6.2 Distances L p Distances
6 Distances We have mainly been focusing on similarities so far, since it is easiest to explain locality sensitive hashing that way, and in particular the Jaccard similarity is easy to define in regards
More informationNotes on Mathematical Expectations and Classes of Distributions Introduction to Econometric Theory Econ. 770
Notes on Mathematical Expectations and Classes of Distributions Introduction to Econometric Theory Econ. 77 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill October 4, 2 MATHEMATICAL
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationBloom Filters, Minhashes, and Other Random Stuff
Bloom Filters, Minhashes, and Other Random Stuff Brian Brubach University of Maryland, College Park StringBio 2018, University of Central Florida What? Probabilistic Space-efficient Fast Not exact Why?
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Find most influential set S of size k: largest expected cascade size f(s) if set S is activated
More information9. Submodular function optimization
Submodular function maximization 9-9. Submodular function optimization Submodular function maximization Greedy algorithm for monotone case Influence maximization Greedy algorithm for non-monotone case
More informationApproximate counting: count-min data structure. Problem definition
Approximate counting: count-min data structure G. Cormode and S. Muthukrishhan: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55 (2005) 58-75. Problem
More informationECEN 689 Special Topics in Data Science for Communications Networks
ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 5 Optimizing Fixed Size Samples Sampling as
More informationIntroduction to discrete probability. The rules Sample space (finite except for one example)
Algorithms lecture notes 1 Introduction to discrete probability The rules Sample space (finite except for one example) say Ω. P (Ω) = 1, P ( ) = 0. If the items in the sample space are {x 1,..., x n }
More informationPiazza Recitation session: Review of linear algebra Location: Thursday, April 11, from 3:30-5:20 pm in SIG 134 (here)
4/0/9 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 Piazza Recitation session: Review of linear algebra Location: Thursday, April, from 3:30-5:20 pm in SIG 34
More informationa i,1 a i,j a i,m be vectors with positive entries. The linear programming problem associated to A, b and c is to find all vectors sa b
LINEAR PROGRAMMING PROBLEMS MATH 210 NOTES 1. Statement of linear programming problems Suppose n, m 1 are integers and that A = a 1,1... a 1,m a i,1 a i,j a i,m a n,1... a n,m is an n m matrix of positive
More information7 Algorithms for Massive Data Problems
7 Algorithms for Massive Data Problems Massive Data, Sampling This chapter deals with massive data problems where the input data (a graph, a matrix or some other object) is too large to be stored in random
More informationExpectation of geometric distribution
Expectation of geometric distribution What is the probability that X is finite? Can now compute E(X): Σ k=1f X (k) = Σ k=1(1 p) k 1 p = pσ j=0(1 p) j = p 1 1 (1 p) = 1 E(X) = Σ k=1k (1 p) k 1 p = p [ Σ
More informationBloom Filters and Locality-Sensitive Hashing
Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,
More informationIntroduction to Randomized Algorithms III
Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability
More informationEstimating arbitrary subset sums with few probes
Estimating arbitrary subset sums with few probes Noga Alon Nick Duffield Carsten Lund Mikkel Thorup School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel AT&T Labs Research, 8 Park Avenue,
More informationINSIGHT YEAR 12 Trial Exam Paper
INSIGHT YEAR 12 Trial Exam Paper 2013 MATHEMATICAL METHODS (CAS) STUDENT NAME: Written examination 1 QUESTION AND ANSWER BOOK Reading time: 15 minutes Writing time: 1 hour Structure of book Number of questions
More informationMonth Demand Give an integer linear programming formulation for the problem of finding a production plan of minimum cost.
1.9 Modem production A company A, producing a single model of modem, is planning the production for the next 4 months. The producing capacity is of 2500 units per month, at a cost of 7 Euro per unit. A
More informationDifferential Geometry III, Solutions 6 (Week 6)
Durham University Pavel Tumarkin Michaelmas 016 Differential Geometry III, Solutions 6 Week 6 Surfaces - 1 6.1. Let U R be an open set. Show that the set {x, y, z R 3 z = 0 and x, y U} is a regular surface.
More information7 Distances. 7.1 Metrics. 7.2 Distances L p Distances
7 Distances We have mainly been focusing on similarities so far, since it is easiest to explain locality sensitive hashing that way, and in particular the Jaccard similarity is easy to define in regards
More informationAn Efficient reconciliation algorithm for social networks
An Efficient reconciliation algorithm for social networks Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models Outline Graph reconciliation
More informationFinding similar items
Finding similar items CSE 344, section 10 June 2, 2011 In this section, we ll go through some examples of finding similar item sets. We ll directly compare all pairs of sets being considered using the
More informationz + az + bz + 12 = 0, where a and b are real numbers, is
016 IJC H Math 9758 Promotional Examination 1 One root of the equation z = 1 i z + az + bz + 1 = 0, where a and b are real numbers, is. Without using a calculator, find the values of a and b and the other
More informationExpectation of geometric distribution. Variance and Standard Deviation. Variance: Examples
Expectation of geometric distribution Variance and Standard Deviation What is the probability that X is finite? Can now compute E(X): Σ k=f X (k) = Σ k=( p) k p = pσ j=0( p) j = p ( p) = E(X) = Σ k=k (
More informationBayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples
Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,
More informationIn other words, we are interested in what is happening to the y values as we get really large x values and as we get really small x values.
Polynomial functions: End behavior Solutions NAME: In this lab, we are looking at the end behavior of polynomial graphs, i.e. what is happening to the y values at the (left and right) ends of the graph.
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationDATA MINING LECTURE 6. Similarity and Distance Sketching, Locality Sensitive Hashing
DATA MINING LECTURE 6 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY AND DISTANCE Thanks to: Tan, Steinbach, and Kumar, Introduction to Data Mining Rajaraman and Ullman, Mining
More informationAlgorithms for Data Science
Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based
More informationCS246 Final Exam. March 16, :30AM - 11:30AM
CS246 Final Exam March 16, 2016 8:30AM - 11:30AM Name : SUID : I acknowledge and accept the Stanford Honor Code. I have neither given nor received unpermitted help on this examination. (signed) Directions
More informationErrata and Proofs for Quickr [2]
Errata and Proofs for Quickr [2] Srikanth Kandula 1 Errata We point out some errors in the SIGMOD version of our Quickr [2] paper. The transitivity theorem, in Proposition 1 of Quickr, has a revision in
More informationWeighted Sampling for Scalable Analytics of Large Data Sets
Weighted Sampling for Scalable Analytics of Large Data Sets 1 Google, CA USA 2 School of Computer Science Tel Aviv University, Israel May 24, 2016 Data Model Key value pairs (x, w x ) (users/activity,
More informationHeuristic Search Algorithms
CHAPTER 4 Heuristic Search Algorithms 59 4.1 HEURISTIC SEARCH AND SSP MDPS The methods we explored in the previous chapter have a serious practical drawback the amount of memory they require is proportional
More informationCommunities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices
Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into
More information557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE
557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE An estimator, T (X), of θ can be evaluated via its statistical properties. Typically, two aspects are considered: Expectation Variance either in terms
More informationARE211, Fall 2005 CONTENTS. 5. Characteristics of Functions Surjective, Injective and Bijective functions. 5.2.
ARE211, Fall 2005 LECTURE #18: THU, NOV 3, 2005 PRINT DATE: NOVEMBER 22, 2005 (COMPSTAT2) CONTENTS 5. Characteristics of Functions. 1 5.1. Surjective, Injective and Bijective functions 1 5.2. Homotheticity
More informationLecture 5: February 21, 2017
COMS 6998: Advanced Complexity Spring 2017 Lecture 5: February 21, 2017 Lecturer: Rocco Servedio Scribe: Diana Yin 1 Introduction 1.1 Last Time 1. Established the exact correspondence between communication
More informationUrban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information
Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information * Satish V. Ukkusuri * * Civil Engineering, Purdue University 24/04/2014 Outline Introduction Study Region Link Travel
More informationPaper Reference. Paper Reference(s) 6665/01 Edexcel GCE Core Mathematics C3 Advanced Level. Monday 12 June 2006 Afternoon Time: 1 hour 30 minutes
Centre No. Candidate No. Paper Reference(s) 6665/01 Edexcel GCE Core Mathematics C3 Advanced Level Monday 12 June 2006 Afternoon Time: 1 hour 30 minutes Materials required for examination Mathematical
More informationGraphical Relationships Among f, f,
Graphical Relationships Among f, f, and f The relationship between the graph of a function and its first and second derivatives frequently appears on the AP exams. It will appear on both multiple choice
More informationAP Calculus. Analyzing a Function Based on its Derivatives
AP Calculus Analyzing a Function Based on its Derivatives Student Handout 016 017 EDITION Click on the following link or scan the QR code to complete the evaluation for the Study Session https://www.surveymonkey.com/r/s_sss
More informationChapter 5. Eigenvalues and Eigenvectors
Chapter 5 Eigenvalues and Eigenvectors Section 5. Eigenvectors and Eigenvalues Motivation: Difference equations A Biology Question How to predict a population of rabbits with given dynamics:. half of the
More informationMath 40510, Algebraic Geometry
Math 40510, Algebraic Geometry Problem Set 1, due February 10, 2016 1. Let k = Z p, the field with p elements, where p is a prime. Find a polynomial f k[x, y] that vanishes at every point of k 2. [Hint:
More informationExact and Approximate Equilibria for Optimal Group Network Formation
Exact and Approximate Equilibria for Optimal Group Network Formation Elliot Anshelevich and Bugra Caskurlu Computer Science Department, RPI, 110 8th Street, Troy, NY 12180 {eanshel,caskub}@cs.rpi.edu Abstract.
More informationProcessing Top-k Queries from Samples
TEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES SCHOOL OF COMPUTER SCIENCE Processing Top-k Queries from Samples This thesis was submitted in partial fulfillment of the requirements
More informationPersonalized Social Recommendations Accurate or Private
Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma Outline Introduction Motivation The model General
More informationAlgorithms for Data Science: Lecture on Finding Similar Items
Algorithms for Data Science: Lecture on Finding Similar Items Barna Saha 1 Finding Similar Items Finding similar items is a fundamental data mining task. We may want to find whether two documents are similar
More informationSample Documents. NY Regents Math (I III) (NY1)
Sample Documents NY Regents Math (I III) (NY1) E D U C A I D E S O F T W A R E Copyright c 1999 by EAS EducAide Software Inc. All rights reserved. Unauthorized reproduction of this document or the related
More informationPrice of Stability in Survivable Network Design
Noname manuscript No. (will be inserted by the editor) Price of Stability in Survivable Network Design Elliot Anshelevich Bugra Caskurlu Received: January 2010 / Accepted: Abstract We study the survivable
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationCOMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from
COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from http://www.mmds.org Distance Measures For finding similar documents, we consider the Jaccard
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationPresentation for CSG399 By Jian Wen
Presentation for CSG399 By Jian Wen Outline Skyline Definition and properties Related problems Progressive Algorithms Analysis SFS Analysis Algorithm Cost Estimate Epsilon Skyline(overview and idea) Skyline
More informationCMU Lecture 4: Informed Search. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 4: Informed Search Teacher: Gianni A. Di Caro UNINFORMED VS. INFORMED Uninformed Can only generate successors and distinguish goals from non-goals Informed Strategies that can distinguish
More informationHomework 6: Solutions Sid Banerjee Problem 1: (The Flajolet-Martin Counter) ORIE 4520: Stochastics at Scale Fall 2015
Problem 1: (The Flajolet-Martin Counter) In class (and in the prelim!), we looked at an idealized algorithm for finding the number of distinct elements in a stream, where we sampled uniform random variables
More informationChapter 3: Element sampling design: Part 1
Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part
More informationLECTURE NOTES 57. Lecture 9
LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem
More informationQuilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs
Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011 Overview
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationA tailor made nonparametric density estimate
A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory
More informationSupport weight enumerators and coset weight distributions of isodual codes
Support weight enumerators and coset weight distributions of isodual codes Olgica Milenkovic Department of Electrical and Computer Engineering University of Colorado, Boulder March 31, 2003 Abstract In
More informationUniversity of New Mexico Department of Computer Science. Final Examination. CS 561 Data Structures and Algorithms Fall, 2013
University of New Mexico Department of Computer Science Final Examination CS 561 Data Structures and Algorithms Fall, 2013 Name: Email: This exam lasts 2 hours. It is closed book and closed notes wing
More informationLESSON 2 5 CHAPTER 2 OBJECTIVES
LESSON 2 5 CHAPTER 2 OBJECTIVES POSTULATE a statement that describes a fundamental relationship between the basic terms of geometry. THEOREM a statement that can be proved true. PROOF a logical argument
More informationApplications of Differentiation
Applications of Differentiation Definitions. A function f has an absolute maximum (or global maximum) at c if for all x in the domain D of f, f(c) f(x). The number f(c) is called the maximum value of f
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationEfficient Approximation for Restricted Biclique Cover Problems
algorithms Article Efficient Approximation for Restricted Biclique Cover Problems Alessandro Epasto 1, *, and Eli Upfal 2 ID 1 Google Research, New York, NY 10011, USA 2 Department of Computer Science,
More informationThe Rocket Car. UBC Math 403 Lecture Notes by Philip D. Loewen
The Rocket Car UBC Math 403 Lecture Notes by Philip D. Loewen We consider this simple system with control constraints: ẋ 1 = x, ẋ = u, u [ 1, 1], Dynamics. Consider the system evolution on a time interval
More information4 Nonlinear Equations
4 Nonlinear Equations lecture 27: Introduction to Nonlinear Equations lecture 28: Bracketing Algorithms The solution of nonlinear equations has been a motivating challenge throughout the history of numerical
More informationA Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window
A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY
More informationASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata
ASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Errata Effective July 5, 3, only the latest edition of this manual will have its errata
More informationMath Models of OR: Some Definitions
Math Models of OR: Some Definitions John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA September 2018 Mitchell Some Definitions 1 / 20 Active constraints Outline 1 Active constraints
More informationTransformation of Quasiconvex Functions to Eliminate. Local Minima. JOTA manuscript No. (will be inserted by the editor) Suliman Al-Homidan Nicolas
JOTA manuscript No. (will be inserted by the editor) Transformation of Quasiconvex Functions to Eliminate Local Minima Suliman Al-Homidan Nicolas Hadjisavvas Loai Shaalan Communicated by Constantin Zalinescu
More informationf x x, where f x (E) f, where ln
AB Review 08 Calculator Permitted (unless stated otherwise) 1. h0 ln e h 1 lim is h (A) f e, where f ln (B) f e, where f (C) f 1, where ln (D) f 1, where f ln e ln 0 (E) f, where ln f f 1 1, where t is
More informationSource Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime
Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime Solmaz Torabi Dept. of Electrical and Computer Engineering Drexel University st669@drexel.edu Advisor:
More informationInternal link prediction: a new approach for predicting links in bipartite graphs
Internal link prediction: a new approach for predicting links in bipartite graphs Oussama llali, lémence Magnien and Matthieu Latapy LIP6 NRS and Université Pierre et Marie urie (UPM Paris 6) 4 place Jussieu
More informationExact and Approximate Equilibria for Optimal Group Network Formation
Noname manuscript No. will be inserted by the editor) Exact and Approximate Equilibria for Optimal Group Network Formation Elliot Anshelevich Bugra Caskurlu Received: December 2009 / Accepted: Abstract
More informationELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties
ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties Prof. James She james.she@ust.hk 1 Last lecture 2 Selected works from Tutorial
More informationLecture 4: UMVUE and unbiased estimators of 0
Lecture 4: UMVUE and unbiased estimators of Problems of approach 1. Approach 1 has the following two main shortcomings. Even if Theorem 7.3.9 or its extension is applicable, there is no guarantee that
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationChapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.
Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should
More informationCSE 291: Fourier analysis Chapter 2: Social choice theory
CSE 91: Fourier analysis Chapter : Social choice theory 1 Basic definitions We can view a boolean function f : { 1, 1} n { 1, 1} as a means to aggregate votes in a -outcome election. Common examples are:
More informationMcGILL UNIVERSITY FACULTY OF SCIENCE FINAL EXAMINATION MATHEMATICS A CALCULUS I EXAMINER: Professor K. K. Tam DATE: December 11, 1998 ASSOCIATE
NOTE TO PRINTER (These instructions are for the printer. They should not be duplicated.) This examination should be printed on 8 1 2 14 paper, and stapled with 3 side staples, so that it opens like a long
More informationI will post Homework 1 soon, probably over the weekend, due Friday, September 30.
Random Variables Friday, September 09, 2011 2:02 PM I will post Homework 1 soon, probably over the weekend, due Friday, September 30. No class or office hours next week. Next class is on Tuesday, September
More informationFrequency Estimators
Frequency Estimators Outline for Today Randomized Data Structures Our next approach to improving performance. Count-Min Sketches A simple and powerful data structure for estimating frequencies. Count Sketches
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]
More informationch 3 applications of differentiation notebook.notebook January 17, 2018 Extrema on an Interval
Extrema on an Interval Extrema, or extreme values, are the minimum and maximum of a function. They are also called absolute minimum and absolute maximum (or global max and global min). Extrema that occur
More informationOn the Scalability in Cooperative Control. Zhongkui Li. Peking University
On the Scalability in Cooperative Control Zhongkui Li Email: zhongkli@pku.edu.cn Peking University June 25, 2016 Zhongkui Li (PKU) Scalability June 25, 2016 1 / 28 Background Cooperative control is to
More information