Multi-Objective Weighted Sampling

Size: px
Start display at page:

Download "Multi-Objective Weighted Sampling"

Transcription

1 1 Multi-Objective Weighted Sampling Edith Cohen Google Research Mountain View, CA, USA arxiv: v6 [csdb] 13 Jun 2017 Abstract Multi-objective samples are powerful and versatile summaries of large data sets For a set of kes X and associated values f 0, a weighted sample taken with respect to f allows us to approimate segment-sum statistics sum(f; H) = H f, for an subset H of the kes, with statisticall-guaranteed qualit that depends on sample size and the relative weight of H When estimating sum(g; H) for g f, however, qualit guarantees are lost A multi-objective sample with respect to a set of functions F provides for each f F the same statistical guarantees as a dedicated weighted sample while minimizing the summar size We analze properties of multi-objective samples and present sampling schemes and meta-algortithms for estimation and optimization while showcasing two important application domains The first are ke-value data sets, where different functions f F applied to the values correspond to different statistics such as moments, thresholds, capping, and sum A multi-objective sample allows us to approimate all statistics in F The second is metric spaces, where kes are points, and each f F is defined b a set of points C with f being the service cost of b C, and sum(f; X) models centralit or clustering cost of C A multi-objective sample allows us to estimate costs for each f F In these domains, multi-objective samples are often of small size, are efficientl to construct, and enable scalable estimation and optimization We aim here to facilitate further applications of this powerful technique 1 INTRODUCTION Random sampling is a powerful tool for working with ver large data sets on which eact computation, even of simple statistics, can be time and resource consuming A small sample of the data allows us to efficientl obtain approimate answers Consider data in the form of ke value pairs {(, f )}, where kes are from some universe X, f 0, and we define f 0 for kes X that are not present in the data Ver common statistics over such data are segment sum statistics sum(f; H) = H f, where H X is a segment 1 of X Eamples of such data sets are IP flow kes and btes, users and activit, or customers and distance to the nearest facilit Segments ma correspond to a certain demographic or location or other meta data of kes Segment statistics in these eample correspond respectivel to total traffic, activit, or service cost of the segment When the data set is large, we can compute a weighted sample which includes each ke with probabilit (roughl) proportional to f and allows us to estimate segment sum statistics ŝum(f; H) for quer segments H Popular weighted sampling schemes [30], [32] include Poisson Probabilit Proportional to Size (pps) [21], VarOpt [5], [12], and the bottom-k schemes [28], [14], [15] Sequential Poisson (priorit) [24], [17] and PPS without replacement (ppswor) [27] These weighted samples provide us with nonnegative unbiased estimates ŝum(g; H) for an segment and g 0 (provided that f > 0 when g > 0) For all statistics sum(f; H) we obtain statistical guarantees on estimation qualit: The error, measured b the coefficient of variation (CV), which is the standard deviation divided b the mean, is at most the inverse 1 The alternative term selection is used in the DB literature and the term domain is used in the statistics literature of the square root of the size of the sample multiplied b the fraction sum(f; H)/ sum(f, X ) of weight that is due to the segment H This trade-off of qualit (across segments) and sample size are (worst-case) optimal Moreover, the estimates are wellconcentrated in the Chernoff-Bernstein sense: The probabilit of an error that is c times the CV decreases eponentiall in c In man applications, such as the following eamples, there are multiple sets of values f F that are associated with the kes: (i) Data records can come with eplicit multiple weights, as with activit summaries of customers/jobs that specif both bandwidth and computation consumption (ii) Metric objectives, such as our service cost eample, where each configuration of facilit locations induces a different set of distances and hence service costs (iii) The raw data can be specified in terms of a set of ke value pairs {(, w )} but we are interested in different functions f f(w ) of the values that correspond to different statistics such as Statistics function f(w) count f(w) = 1 for w > 0 sum f(w) = w threshold with T > 0 thresh T (w) = I w T moment with p > 0 f(w) = w p capping with T > 0 cap T = min{t, w} Eample 11 Consider a to data set D: (u1, 5), (u3, 100), (u10, 23), (u12, 7), (u17, 1), (u24, 5), (u31, 220), (u42, 19), (u43, 3), (u55, 2) For a segment H with H D = {u3, u12, u42, u55}, we have sum(h) = 128, count(h) = 4, thresh 10 (H) = 2, cap 5 (H) = 17, and 2-moment(H) = For these applications, we are interested in a summar that can provide us with estimates with statisticall guaranteed qualit for each f F The naive solutions are not satisfactor: We can compute a weighted sample taken with respect to a particular f F, but the qualit of the estimates sum(g; H) rapidl

2 2 degrades with the dissimilarit between g and f We can compute a dedicated sample for each f F, but the total summar size can be much larger than necessar Multi-objective samples, a notion crstallized in [16] 2, provide us with the desired statistical guarantees on qualit with minimal summar size Multi-objective samples build on the classic notion of sample coordination [23], [4], [29], [7], [28], [25] In a nutshell, coordinated samples are localit sensitive hashes of f, mapping similar f to similar samples A multi-objective sample is (roughl) the union S (F ) = f F S(f) of all the kes that are included in coordinated weighted samples S (f) for f F Because the samples are coordinated, the number of distinct kes included and hence the size of S (F ) is (roughl) as small as possible Since for each f F, the sample S (F ) includes the dedicated sample S (f), the estimate qualit from S (F ) dominates that of S (f) In this paper, we review the definition of multi-objective samples, stud their properties, and present efficient sampling schemes We consider both general sets F of objectives and families F with special structure B eploiting special structure, we can bound the overhead, which is the increase factor in sample size necessar to meet multi-objective qualit guarantees, and obtain efficient sampling schemes that avoid dependence of the computation on F 11 Organization In Section 2 we review (single-objective) weighted sampling focusing on the Poisson pps and bottom-k sampling schemes We then review the definitions [16] and establish properties of multi-objective samples In Section 3 we stud multi-objective pps samples We show that the multi-objective sample size is also necessar for meeting the qualit guarantees for segment statistics for all f F We also show that the guarantees are met when we use upper bounds on the multi-objective pps sampling probabilities instead of working with the eact values In Section 4 we stud multi-objective bottom-k samples In Section 5 we establish a fundamental propert of multiobjective samples: We define the sampling closure F of a set of objectives F, as all functions f for which a multi-objective sample S (F ) meets the qualit guarantees for segment statistics Clearl F F but we show that the closure F also includes ever f that is a non-negative linear combination of functions from F In Section 6, we consider data sets in the form of ke value pairs and the famil M of all monotone non-decreasing functions of the values This famil includes most natural statistics, such as our eamples of count, sum, threshold, moments, and capping Since M is infinite, it is inefficient to appl a generic multi-objective sampling algorithm to compute S (M) We present efficient near-linear sampling schemes for S (M) which also appl over streamed or distributed data Moreover, we establish a bound on the sample size of E[ S (M) ] k ln n, where n is the number of kes in our data set and k is the reference size of the singleobjective samples S (f) for each f M The design is based on a surprising relation to All-Distances Sketches [7], [8] Furthermore, we establish that (when ke weights are unique), a sample of size Ω(k ln n) is necessar: Intuitivel, the hardness stems from the need to support all threshold functions In Section 7 we stud the set C = {cap T T > 0} of all capping functions The closure C includes all concave f M 2 The collocated model with at most a linear growth (satisf f () 1 and f () 0) Since C M, the multi-objective sample S (M) includes S (C) and provides estimates with statistical guarantees for all f C The more specialized sample S (C), however, can be much smaller than S (M) We design an efficient algorithm for computing S (C) samples In Section 8 we discuss metric objectives and multi-objective samples as summaries of a set of points that allows us to approimate such objectives In Section 9 we discuss different tpes of statistical guarantees across functions f in settings where we are onl interested in statistics sum(f; X ) over the full data Our basic multi-objective samples analzes the sample size required for ForEach, where the statistical guarantees appl to each estimate ŝum(f; H) in isolation In particular, also to each estimate over the full data set ForAll is much stronger and bounds the (distribution of) the maimum relative error of estimates ŝum(f; X ) for all f F Meeting ForAll tpicall necessitates a larger multi-objective sample size than meeting ForEach In section 10 we present a meta-algorithm for optimization over samples The goal is to maimize a (smooth) function of sum(f; X ) over f F When X is large, we can instead perform the optimization over a small multi-objective sample of X This framework has important applications to metric objectives and estimating loss of a model from eamples The ForOpt guarantee is for a sample size that facilitates such optimization, that is, the approimate maimizer over the sample is an approimate maimizer over the data set This guarantee is stronger than ForEach but generall weaker than ForAll We make a ke observation that with a ForEach sample we are onl prone to testable onesided errors on the optimization result Based on that, we present an adaptive algorithm where the sample size is increased until ForOpt is met This framework unifies and generalizes previous work of optimization over coordinated samples [13], [11] We conclude in Section 11 2 WEIGHTED SAMPLING (SINGLE OBJECTIVE) We review weighted sampling schemes with respect to a set of values f, focusing on preparation for the multi-objective generalization The schemes are specified in terms of a samplesize parameter k which allows us to trade-off representation size and estimation qualit 21 Poisson Probabilit Proportional to Size (pps) The pps sample S (f,k) includes each ke independentl with probabilit p (f,k) = min{1, k f f } (1) Eample 21 The table below lists pps sampling probabilities p (f,3) (k = 3, rounded to the nearest hundredth) for kes in our eample data for sum (f = w ), thresh 10 (f = I w 10), and cap 5 (f = min{5, w }) The number in parenthesis is sum(f, X ) = f We can see that sampling probabilities highl var between functions f ke u1 u3 u10 u12 u17 u24 u31 u42 u43 u55 w sum (385) thresh 10 (4) cap 5 (41) PPS samples can be computed b association a random value u U[0, 1] with each ke and including the ke in the sample

3 if u p (f,k) This formulation to us when there are multiple objectives as it facilitates the coordination of samples taken with respect to the different objectives Coordination is achieved using the same set u 22 Bottom-k (order) sampling Bottom-k sampling unifies priorit (sequential Poisson) [24], [17] and pps without replacement (ppswor) sampling [27] To obtain a bottom-k sample for f we associate a random value u U[0, 1] with each ke To obtain a ppswor sample we use r ln(1 u ) and to obtain a priorit sample we use r u The bottomk sample S (f,k) for f contains the k kes with minimum f-seed, where f-seed() r f To support estimation, we also retain the threshold, τ (f,k), which is defined to be the (k + 1)st smallest f-seed 23 Estimators We estimate a statistics sum(g; H) from a weighted sample S (f,k) using the inverse probabilit estimator [22]: ŝum(g; H) = g p (f,k) (2) H S The estimate is alwas nonnegative and is unbiased when the functions satisf g > 0 = f > 0 (which ensures that an ke with g > 0 is sampled with positive probabilit) To appl this estimator, we need to compute p (f,k) for S To do so with pps samples (1) we include the sum f with S as auiliar information For bottom-k samples, inclusion probabilities of kes are not readil available We therefore use the inverse probabilit estimator (2) with conditional probabilities p (f,k) [17], [15]: A ke, fiing the randomization u for all other kes, is sampled if and onl if f-seed() < t, where t is the kth smallest f- seed among kes For S (f,k), the kth smallest f-seed among other kes is t = τ (f,k), and thus p (f,k) = Pr u U[0,1] [ r f < τ (f,k) ] (3) Note that the right hand side epression for probabilit is equal to 1 e ft with ppswor and to min{1, f t} with priorit sampling 24 Estimation qualit We consider the variance and concentration of our estimates A natural measure of estimation qualit of our unbiased estimates is the coefficient of variation (CV), which is the ratio of the standard deviation to the mean We can upper bound the CV of our estimates (2) of sum(g; H) in terms of the (epected) sample size k and the relative g-weight of the segment H, defined as q (g) (H) = sum(g; H) sum(g; X ) To be able to epress a bound on the CV when we estimate a statistics sum(g; H) using a weighted sample taken with respect to f, we define the disparit between f and g as ρ(f, g) = ma f g ma g f The disparit alwas satisfies ρ(f, g) 1 and we have equalit ρ(f, g) = 1 onl when g is a scaling of f, that is, equal to g = cf for some c > 0 We obtain the following upper bound: Theorem 21 For pps samples and the estimator (2), ρ(f, g) g H, CV [ŝum(g; H)] q (g) (H)k For bottom-k samples, we replace k b k 1 The proof for ρ = 1 is standard for pps, provided in [7], [8] for ppswor, and in [31] for priorit samples The proof for ρ 1 for ppswor is provided in Theorem A1 The proof for pps is simpler, using a subset of the arguments The proof for priorit can be obtained b generalizing [31] Moreover, the estimates obtained from these weighted sample are concentrated in the Chernoff-Hoeffding-Bernstein sense We provide the proof for the multiplicative form of the bound and for Poisson pps samples: Theorem 22 For δ 1, For δ > 1, Pr[ ŝum(g; H) sum(g; H) > δ sum(g; H)] 2 ep( q (g) (H)kρ 2 δ 2 /3) Pr[ŝum(g; H) sum(g; H) > δ sum(g; H)] ep( q (g) (H)kρ 2 δ/3) kf / sum(f) The contribution of such a ke to Proof Consider Poisson pps sampling and the inverse probabilit estimator The contribution of kes that are sampled with p (f,k) = 1 is computed eactl Let the contribution of these kes be (1 α) sum(g; H), for some α [0, 1]) If α = 0, the estimate is the eact sum and we are done Otherwise, it suffices to estimate the remaining α sum(g; H) with relative error δ = δ/α Consider the remaining kes, which have inclusion probabilities p (f,k) the estimate is 0 if is not sampled and is g ρ(f, g) sum(f)/k p (f,k) when is sampled Note that b definition sum(g; H) = q (g) (H) sum(g) q (g) (H) sum(f)/ρ(f, g) We appl the concentration bounds to the sum of random variables in the range [0, ρ(f, g) sum(f)/k] To use the standard form, we can normalize our random variables and to have range [0, 1] and accordingl normalize the epectation α sum(g; H) to obtain sum(g; H) µ = α ρ(f, g) sum(f)/k] αq(g) (H)ρ(f, g) 2 k We can now appl multiplicative Chernoff bounds for random variables in the range [0, 1] with δ = δ/α and epectation µ The formula bounds the probabilit of relative error that eceeds δ b 2 ep( δ 2 µ/3 when δ < 1 and b ep( δµ/3 when δ > 1 25 Computing the sample Consider data presented as streamed or distributed elements of the form of ke-value pairs (, f ), where X and f > 0 We define f 0 for kes that are not in the data An important propert of our samples (bottom-k or pps) is that the are composable (mergeable) Meaning that a sample of 3

4 the union of two data sets can be computed from the samples of the data sets Composabilit facilitates efficient streamed or distributed computation The sampling algorithms can use a random hash function applied to ke to generate u so seed values can be computed on the fl from (, f ) and do not need to be stored With bottom-k sampling we permit kes to occur in multiple elements, in which case we define f to be the maimum value of elements with ke The sample S(D) of a set D of elements contains the pair (, f ) for the k + 1 (unique) kes with smallest f-seeds 3 The sample of the union i D i is obtained from i S(D i) b first replacing multiple occurrences of a ke with the one with largest f(w) and then returning the pairs for the k + 1 kes with smallest f-seeds With pps sampling, the information we store with our sample S(D) includes the sum sum(f; D) D f and the sampled pairs (, f ), which are those with u kf / sum(f; D) Because we need to accuratel track the sum, we require that elements have unique kes The sample of a union D = i D i is obtained using the sum sum(f; D) = i sum(f; D i), and retaining onl kes in i S(D i) that satisf u kf / sum(f; D) 3 MULTI-OBJECTIVE PPS SAMPLES Our objectives are specified as pairs (f, k f ) where f F is a function and k f specifies a desired estimation qualit for sum(f; H) statistics, stated in terms of the qualit (Theorem 21 and Theorem 22) provided b a single-objective sample for f with size parameter k f To simplif notation, we sometimes omit k f when clear from contet A multi-objective sample S (F ) [16] is defined b considering dedicated samples S (f,k f ) for each objective that are coordinated The dedicated samples are coordinating b using the same randomization, which is the association of u U[0, 1] with kes The multi-objective sample S (F ) = f F S(f,k f ) contains all kes that are included in at least one of the coordinated dedicated samples In the remaining part of this section we stud pps samples Multi-objective bottom-k samples are studied in the net section Lemma 31 A multi-objective pps sample for F includes each ke independentl with probabilit p (F ) = min{1, ma f F k f f f } (4) Proof Consider coordinated dedicated pps samples for f F obtained using the same set {u } The ke is included in at least one of the samples if and onl if the value u is at most the maimum over objectives (f, k f ) of the pps inclusion probabilit for that objective: u ma f F p(f,k f ) = ma min{1, k f f f F = min{1, ma f F k f f f } f Since u are independent, so are the inclusion probabilities of different kes Eample 32 Consider the three objectives: sum, thresh 10, and cap 5 all with k = 3 as in Eample 21 The epected size of S (F ) 3 When kes are unique to elements it suffices to keep onl the (k + 1)st smallest f-seed without the pair (, f ) } is S (F ) = p(f ) = 468 The naive solution of maintaining a separate dedicated sample for each objective would have total epected size 829 (Note that the dedicated epected sample size for sum is 229 and for thresh 10, and cap 5 it is 3 To estimate a statistics sum(g; H) from S (F ), we appl the inverse probabilit estimator ŝum(g; H) = S (F ) H 4 g p (F ) (5) using the probabilities p (F ) (4) To compute the estimator (5), we need to know p (F ) when S (F ) These probabilities can be computed if we maintain the sums sum(f) = f for f F as auiliar information and we have f available to us when S (f,k f ) In some settings it is easier to obtain upper bounds π p (F ) on the multi-objective pps inclusion probabilities, compute a Poisson sample using π, and appl the respective inverse-probabilit estimator ŝum(g; H) = S H g π (6) Side note: It is sometime useful to use other sampling schemes, in particular, VarOpt (dependent) sampling [5], [20], [12] to obtain a fied sample size The estimation qualit bounds on the CV and concentration also hold with VarOpt (which has negative covariances) 31 Multi-objective pps estimation qualit We show that the estimation qualit, in terms of the bounds on the CV and concentration, of the estimator (5) is at least as good as that of the estimate we obtain from the dedicated samples To do so we prove a more general claim that holds for an Poisson sampling scheme that includes each ke in the sample S with probabilit π p (f,k f ) and the respective inverse probabilit estimator (6) The following lemma shows that estimate qualit can onl improve when inclusion probabilities increase: Lemma 33 The variance var[ŝum(g; H)] of (6) and hence CV [ŝum(g; H)] are non-increasing in π Proof For each ke consider the inverse probabilit estimator ĝ = g /π when is sampled and ĝ = 0 otherwise Note that var[ĝ ] = g 2 (1/π 1), which is decreasing with π We have ŝum(g; H)] = H ĝ When covariances between ĝ are nonpositive, which is the case in particular with independet inclusions, we have var[ŝum(g; H)] = H var[ĝ ] and the claim follows as it applies to each summand We net consider concentration of the estimates Lemma 34 The concentration claim of Theorem 22 carries over when we use the inverse probabilit estimator with an sampling probabilities that satisf π p (f,k) for all Proof The generalization of the proof is immediate, as the range of the random variables ˆf can onl decrease when we increase the inclusion probabilit

5 5 32 Uniform guarantees across objectives An important special case is when k f = k for all f F, that is, we seek uniform statistical guarantees for all our objectives We use the notation S (F,k) for the respective multi-objective sample We can write the multi-objective pps probabilities (4) as p (F,k) = min{1, k ma f F f f } = min{1, kp (F,1) } (7) The last equalit follows when recalling the definitions of p (f,1) = f / f and hence p (F,1) f = ma f F p(f,1) = ma f F f We refer to p (f,1) as the base pps probabilities for f Note that the base pps probabilities are rescaling of f and pps probabilities are invariant to this scaling We refer to p (F,1) as the multiobjective base pps probabilities for F Finall, for a reason that will soon be clear, we refer to the sum h (pps) (F ) p (F,1) as the multi-objective pps overhead of F It is eas to see that h(f ) [1, F ] and h(f ) is closer to 1 when all objectives in F are more similar We can write kp (F,1) = (kh(f )) p (F,1) p(f,1) That is, the multi-objective pps probabilities (7) are equivalent to single-objective pps probabilities with size parameter kh(f ) computed with respect to base probabilit weights g = p (F,1) Side note: We can appl an single-objective weighted sampling scheme such as VarOpt or bottom-k to the weights p (F,1), or to upper-bounds π p (F,1) on these weights while adjusting the sample size parameter to k π 33 Lower bound on multi-objective sample size The following theorem shows that an multi-objective sample for F that meets the qualit guarantees on all domain queries for (f, k f ) must include each ke with probabilit at least p (f,1) k f 1 p (f,1) k f +1 2 p(f,k f ) Moreover, when p (f,1) k f 1, then the lower bound on inclusion probabilit is close to p (f,k f ) This implies the multi-objective pps sample size is necessar to meet the qualit guarantees we seek (Theorem 21) Theorem 31 Consider a sampling scheme that for weights f supports estimators that satisf for each segment H CV [ŝum(f; H)] 1/ q (f) (H)k Then the inclusion probabilit of a ke must be at least p p(f,1) k p (f,1) k min{1, p(f,1) k} Proof Consider a segment of a single ke H = {} Then p (f,1) q (f) (H) = f / f q The best nonnegative unbiased sum estimator is the HT estimator: When the ke is not sampled, there is no evidence of the segment and the estimate must be 0 When it is, uniform estimate minimize the variance If the ke is included with probabilit p, the CV of the estimate is CV [ŝum(f; {})] = (1/p 1) 05 From the requirement (1/p 1) 05 1/(qk) 05, we obtain that p qk qk+1 4 MULTI-OBJECTIVE BOTTOM-k SAMPLES The sample S (F ) is defined with respect to random {u } Each dedicated sample S (f,k f ) includes the k f lowest f-seeds, computed using {u } S (F ) accordingl includes all kes that have one of the k f lowest f-seeds for at least one f F To estimate statistics sum(g; H) from bottom-k S (F ), we again appl the inverse probabilit estimator (5) but here we use the conditional inclusion probabilit p (F ) for each ke [16] This is the probabilit (over u U[0, 1]) that S (F ), when fiing u for all to be as in the current sample Note that p (F ) = ma f F p(f), where p (f) are as defined in (3) In order to compute the probabilities p (F ) for S (F ), it alwas suffices to maintain the slightl larger sample f F S(f,k f +1) For completeness, we show that it suffices to instead maintain with S (F ) f F S(f,k f ) a smaller (possibl empt) set Z f F S(f,k f +1) \ S (F ) of auiliar kes We now define the set Z and show how inclusion probabilities can be computed from S (F ) Z For a ke S (F ), we denote b g () = arg ma p (f) f F S (f) the objective with the most forgiving threshold for If p (g() ) < 1, let be the ke with (k + 1) smallest g-seed (otherwise is not defined) The auiliar kes are then Z = { S (F ) } \ S (F ) We use the sample and auiliar kes S (F ) Z as follows to compute the inclusion probabilities: We first compute for each f F, τ f, which is the k f + 1 smallest f-seed of kes in S (F ) Z For each S (F ), we then use p (F ) priorit) or p (F ) To see that p (F ) have τ f = ma f F f(w )τ f (for = 1 ep( ma f F f(w )τ f ) (for ppswor) are correctl computed, note that while we can > τ (f,k f ) for some f F (Z ma not include the threshold kes of all the dedicated samples S (f,k f ) ), our definition of Z ensures that τ f = τ (f,k f ) for f such that there is at least one where f = g () and p (g() ) < 1 Composabilit: Note that multi-objective samples S (F ) are composable, since the are a union of (composable) single-objective samples S (f) It is not hard to see that composabilit applies with the auiliar kes: The set of auiliar kes in the composed sample must be a subset of sampled and auiliar kes in the components Therefore, the sample itself includes all the necessar state for streaming or distributed computation Estimate qualit: We can verif that for an f F and, for an random assignment {u } for, we have p (F ) p (f) Therefore (appling Lemma 33 and noting zero covariances[15]) the variance and the CV are at most that of the estimator (2) applied to the bottom-k f sample S (f) To summarize, we obtain the following statistical guarantees on estimate qualit with multiobjective samples:

6 6 Theorem 41 For each H and g, the inverse-probabilit estimator applied to a multi-objective pps sample S (F ) has ρ(f, g) CV [ŝum(g; H)] min f F q (g) (H)k f The estimator applied to a multi-objective bottom-k samples has the same guarantee but with (k f 1) replacing k f Sample size overhead: We must have E[ S (F ) ] f F k f The worst-case, where the size of S (F ) is the sum of the sizes of the dedicated samples, materializes when functions f F have disjoint supports The sample size, however, can be much smaller when functions are more related With uniform guarantees (k f k), we define the multiobjective bottom-k overhead to be h (botk) (F ) E[ S (F ) ]/k This is the sample size overhead of a multi-objective versus a dedicated bottom-k sample pps versus bottom-k multi-objective sample size: For some sets F, with the same parameter k, we can have much larger multiobjective overhead with bottom-k than with pps A multi-objective pps samples is the smallest sample that can include a pps sample for each f A multi-objective bottom-k sample must include a bottom-k f sample for each f Consider a set of n > k kes For each subset of n/2 kes we define a function f that is uniform on the subset and 0 elsewhere It is eas to see that in this case h (pps) (F ) = 2 whereas h (botk) (F ) (n/2 + k)/k Computation: When the data has the form of elements (, w ) with unique kes and f = f(w ) for a set of functions F, then short of further structural assumptions on F, the sampling algorithm that computes S (F ) must appl all functions f F to all elements The computation is thus Ω( F n) and can be O( F n + S (F ) log k) time b identifing for each f F, the k kes with smallest f-seed() In the sequel we will see eamples of large or infinite sets F but with special structure that allows us to efficientl compute a multi-objective sample 5 THE SAMPLING CLOSURE We define the sampling closure F of a set of functions F to be the set of all functions f such that for all k and for all H, the estimate of sum(f; H) from S (F,k) has the CV bound of Theorem 21 Note that this definitions is with respect to uniform guarantees (same size parameter k for all objectives) We show that the closure of F contains all non-negative linear combinations of functions from F Theorem 51 An f = g F α gg where α g 0 is in F Proof We first consider pps samples, where we establish the stronger claim S (F {f},k) = S (F,k), or equivalentl, for all kes, p (f,k) p (F,k) (8) For a function g, we use the notation g(x ) = g, and recall that p (g,k) = min{1, k g g(x )} We first consider f = cg for some g F In this case, p (f,k) = p (g,k) p (F,k) and (8) follows To complete the proof, it suffices to establish (8) for f = g (1) + g (2) g (2) such that g (1), g (2) F Let c be such that g (2) (X ) = c g(1) g (1) (X ) we can assume WLOG that c 1 (otherwise reverse g (1) and g (2) ) For convenience denote α = g (2) (X )/g (1) (X ) Then we can write f f(x ) = = Therefore p (f,k) g (1) + g (2) g (1) (X ) + g (2) (X ) (1 + cα)g (1) (1 + α)g (1) (X ) = 1 + cα 1 + α g (1) g (1) (X ) g (1) g (1) (X ) = ma{ g(1) g (1) (X ), g (2) g (2) (X ) } ma{p (g(1),k), p (g(2),k) } p (F,k) The proof for multi-objective bottom-k samples is more involved, and deferred to the full version Note that the multiobjective bottom-k sample S (F,k) ma not include a bottom-k sample S (f,k), but it is still possible to bound the CV 6 THE UNIVERSAL SAMPLE FOR MONOTONE STATISTICS In this section we consider the (infinite) set M of all monotone non-decreasing functions and the objectives (f, k) for all f M We show that the multi-objective pps and bottom-k samples S (M,k), which we refer to as the universal monotone sample, are larger than a single dedicated weighted sample b at most a logarithmic factor in the number of kes We will also show that this is tight We will also present efficient universal monotone bottom-k sampling scheme for streamed or distributed data We take the following steps We consider the multi-objective sample S (thresh,k) for the set thresh of all threshold functions (recall that thresh T () = 1 if T and thresh T = 0 otherwise) We epress the inclusion probabilities p (thresh,k) and bound the sample size Since all threshold functions are monotone, thresh M We will establish that S (thresh,k) = S (M,k) 4 We start with the simpler case of pps and then move on to bottom-k samples 61 Universal monotone pps Theorem 61 Consider a data set D = {(, w )} of n kes and the sorted order of kes b non-increasing w Then a ke that is in position i in the sorted order has base multi-objective pps probabilit p (thresh,1) = p (M,1) 1/i When all kes have unique weights, equalit holds Proof Consider the function thresh w The function has weight 1 on all the i kes with weight w Therefore, the base (threshw,1) pps probabilit is p 1/i When the kes have unique weights then there are eactl i kes with weight w w (threshw,1) and we have p = 1/i If we consider all threshold 4 We observe that M = thresh This follows from Theorem 51, after noticing that an f M can be epressed as a non-negative combination of threshold functions f() = 0 α(t ) thresh()dt, T for some function α(t ) 0 We establish here the stronger relation S (thresh,k) = S (M,k)

7 functions, then p (thresh T,1) = 0 when T > w and p (thresh T,1) 1/i when T w Therefore, p (thresh,1) = ma T p(thresh T,1) = p (w,1) = 1/i We now consider an arbitrar monotone function f = f(w ) From monotonicit there are at least i kes with f f therefore, f / f 1/i Thus, p (f,1) = f / f 1/i and p (M,1) = ma f M p(f,1) 1/i There is a simple main-memor sampling scheme where we sort the kes, compute the probabilities p (M,1) and then p (M,k) = min{1, kp (M,1) } and compute a sample accordingl We net present universal monotone bottom-k samples and sampling schemes that are efficient on streamed or distributed data 62 Universal monotone bottom-k Theorem 62 Consider a data set D = {(, w )} of n kes The universal monotone bottom-k sample has epected size E[ S (M,k) ] k ln n and can be computed using O(n + k log n log k) operations For a particular T, the bottom-k sample S (thresh T,k) is the set of k kes with smallest u among kes with w T The set of kes in the multi-objective sample is S (thresh,k) = T >0 S(thresh T,k) We show that a ke is in the multi-objective sample for thresh if and onl if it is in the bottom-k sample for thresh w : Lemma 61 Fiing {u }, for an ke, S (thresh,k) S (threshw,k) Proof Consider the position t(, T ) of in an ordering of kes induced b thresh T -seed() We claim that if for a ke we have thresh T -seed() < thresh T -seed() for some T > 0, this must hold for T = w The claim can be established b separatel considering w w and w < w The claim implies that t(, T ) is minimized for T = w We now consider the auiliar kes Z associated with this sample Recall that these kes are not technicall part of the sample but the information (u, w ) for Z is needed in order to compute the conditional inclusion probabilities p (thresh,k) for S Note that it follows from Lemma 61 that for all kes, p (thresh,k) (threshw,k) = p For a ke, let Y = { w w } be the set of kes other than that have weight that is at least that of Let be the ke with kth smallest u in Y, when Y k The auiliar kes are Z = { S} \ S A ke is included in the sample with probabilit 1 if is not defined (which means it has one of the k largest weights) Otherwise, it is (conditionall) included if and onl if u < u To compute the inclusion probabilit p (thresh,k) from S Z, we do as follows If there are k or fewer kes in S Z with weight that is at most w, then p (thresh,k) = 1 (For correctness, note that in this case all kes with weight w would be in S) Otherwise, observe that is the ke with (k + 1)th smallest u value in S Z among all kes with w w We compute from the sample and use p (thresh,k) = u Note that when weights are unique, Z = The definition of S (thresh,k) is equivalent to that of an All- Distances Sketch (ADS) computed with respect to weights w (as inverse distances) [7], [8], and we can appl some algorithms and analsis In particular, we obtain that E[ S (thresh,k) ] k ln n and the size is well-concentrated around this epectation The argument is simple: Consider kes ordered b decreasing weight The probabilit that the ith ke has one of the k smallest u values, and thus is a member of S (thresh,k) is at most min{1, k/i} Summing the epectations over all kes we obtain n i=1 min{1, k/i} < k ln n We shall see that the bound is asmptoticall tight when weights are unique With repeated weights, however, the sample size can be much smaller 5 Lemma 62 For an data set {(, w )}, when using the same randomization {u } to generate both samples, S (M,k) = S (thresh,k) Proof Consider f M and the samples obtained for some fied randomization u for all kes Suppose that a ke is in the bottom-k sample S (f,k) B definition, we have that f- seed() = r /f(w ) is among the k smallest f-seeds of all kes Therefore, it must be among the k smallest f-seeds in the set Y of kes with w w From monotonicit of f, this implies that r must be one of the k smallest in {r Y }, which is the same as u being one of the k smallest in {u Y } This implies that S (threshw,k) 63 Estimation qualit The estimator (5) with the conditional inclusion probabilities p (M,k) generalizes the HIP estimator of [8] to sketches computed for non-unique weights Theorem 41 implies that for an f M 1 and H, CV [ŝum(f; H)] When weights are q (f) (H)(k 1) unique and we estimate statistics over all kes, we have the tighter bound CV [ŝum(f; X )] 1 2k 1 [8] 64 Sampling algorithms The samples, including the auiliar information, are composable Composabilit holds even when we allow multiple elements to have the same ke and interpret w to be the maimum weight over elements with ke To do so, we use a random hash function to generate u consistentl for multiple elements with the same ke To compose multiple samples, we take a union of the elements, replace multiple elements with same ke with the one of maimum weight, and appl a sampling algorithm to the set of remaining elements The updated inclusion probabilities can be computed from the composed sample We present two algorithms that compute the sample S (M,k) along with the auiliar kes Z and the inclusion probabilities p (M,k) for S (M,k) The algorithms process the elements either in order of decreasing w or in order of increasing u These two orderings ma be suitable for different applications and it is worthwhile to present both: In the related contet of alldistances sketches, both ordering on distances were used [7], [26], [3], [8] The algorithms are correct when applied to an set of n elements that includes S Z Recall that the inclusion probabilit p (M,k) is the kth smallest u among kes with w w Therefore, all kes with the same weight have the same inclusion 5 The sample we would obtain with repeated weights is alwas a subset of the sample we would have obtained with tie breaking In particular, the sample size can be at most k times the number of unique weights 7

8 8 probabilit For convenience, we thus epress the probabilities as a function p(w) of the weights Algorithm 1 Universal monotone sampling: Scan b weight Initialize empt ma heap H of size k ; // k smallest u values processed so far ptau + ; ; // ** omit with unique weights for (, w ) b decreasing w then increasing u order do if H < k then S S {}; Insert u to H; p(w ) 1; Continue if u < ma(h) then S S {}; p(w ) ma(h) ; // is sampled ptau ma(h); prevw w ; // ** Delete ma(h) from H Insert u to H else // ** if u < ptau and w = prevw then Z Z {}; p(w ) u Algorithm 1 processes kes b order of decreasing weight, breaking ties b increasing u We maintain a ma-heap H of the k smallest u values processed so far When processing a current ke, we include S if u < ma(h) If including, we delete ma(h) and insert u into H Correctness follows from H being the k smallest u values of kes with weight at least w When weights are unique, the probabilit p(w ) is the kth largest u value in H just before is inserted When weights are not unique, we also need to compute Z To do so, we track the previous ma(h), which we call ptau If the current ke has u (ma(h), ptau), we include Z It is eas to verif that in this case, p(w ) = u Note that the algorithm ma overwrite p(w) multiple times, as kes with weight w are inserted to the sample or to Z Algorithm 2 processes kes in order of increasing u The algorithm maintains a min heap H of the k largest weights processed so far With unique weights, the current ke is included in the sample if and onl if w > min(h) If is included, we delete from the heap H the ke with weight min(h) and insert When weights are not unique, we also track the weight w of the previous removed ke from H When processing a ke then if w = min(h) and w > prevw then is inserted to Z The computed inclusion probabilities p(w) with unique weights is u, were is the ke whose processing triggered the deletion of the ke with weight w from H To establish correctness, consider the set H, just after is deleted B construction, H contains the k kes with weight w < w that have smallest u values Therefore, p(w ) = ma H u Since we process kes b increasing u, this maimum u value in H is the value of the most recent inserted ke, that is, the ke which triggered the removal of Finall, the kes that remain in H in the end are those with the k largest weights The algorithms correctl assigns p(w ) = 1 for these kes When multiple kes can have the same weight, then p(w) is the minimum of ma H u after the first ke of weight w is evicted, and the minimum u z of a ke z with w z = w that was not sampled If the minimum is realized at such a ke z, that ke is included in Z, and the algorithm set p(w) accordingl when z is processed If p(w) is not set alread when the first ke of weight w is deleted from H, the algorithm correctl assigns p(w) to be ma H u After all kes are processed, p(w ) is set for remaining kes H where a ke with the same weight was previousl deleted from H Other kes are assigned p(w) = 1 We now analze the number of operations With both algorithms, the cost of processing a ke is O(1) if the ke is not inserted and O(log k) if the ke is included in the sample Using the bound on sample size, we obtain a bound of O(n + k ln n log k) on processing cost The sorting requires O(n log n) computation, which dominates the computation (since tpicall k n) When u are assigned randoml, however, we can generate them with a sorted order b u in O(n) time, enabling a faster O(n + k log k log n) computation Algorithm 2 Universal monotone sampling: Scan b u H ; // min heap of size k, prioritized b le order on (w, u ), containing kes with largest priorities processed so far prevw for b increasing u order do if H < k then S S {}; Insert to H; Continue arg min z H(w z, u z) ; // The min weight ke in H with largest u z if w > w then S S {} ; // Add to sample prevw w if p(prevw) = then p(prevw) u Delete from H Insert to H; else // ** if w = w and w > prevw then Z Z {}; p(w ) u for H do // kes with largest weights if p(w ) = then p(w ) 1 65 Lower bound on sample size We now show that the worst-case factor of ln n on the size of universal monotone sample is in a sense necessar It suffices to show this for threshold functions: Theorem 63 Consider data sets where all kes have unique weights An sampling scheme with a nonnegative unbiased estimator that for all T > 0 and H has CV [ŝum(thresh; H)] 1/ T must have samples of epected size Ω(k ln n) q (thresh T ) (H)k, Proof We will use Theorem 31 which relates estimation qualit to sampling probabilities Consider the ke with the ith heaviest weight Appling Theorem 31 to and thresh w we obtain that p k k+i Summing the sampling probabilities over all kes i [n] to bound the epected sample size we obtain p k n 1 i=1 k+i = k(h n H k ) k(ln n ln k) 7 THE UNIVERSAL CAPPING SAMPLE An important strict subset of monotone functions is the set C = {cap T T > 0} of capping functions We stud the multi-objective bottom-k sample S (C,k), which we refer to as the

9 9 universal capping sample From Theorem 51, the closure of C includes all functions of the form f() = 0 α(t ) cap T ()dt, for some α(t ) 0 This is the set of all non-decreasing concave functions with at most a linear growth, that is f(w) that satisf df dw 1 and d2 f dw 0 We show that the sample S (C,k) can be computed using O(n+ k log n log k) operaions from an D D that is superset of the kes in S (C,k) We start with properties of S (C,k) which we will use to design our sampling algorithm For a ke, let h be the number h of kes with w w and u < u Let l be the number of kes with w < w and r /w < r /w For a ke and T > 0 and fiing the assignment {u } for all kes, let t(, T ) be the position of cap T -seed() in the list of values cap T -seed() for all The function t has the following properties: Lemma 71 For a ke, t(, T ) is minimized for T = w Moreover, t(, T ) is non-decreasing for T w and nonincreasing for T w Proof We can verif that for an ke such that there is a T > 0 such that the cap T -seed() < cap T -seed(), we must have cap w -seed() < cap w -seed() Moreover, the set of T values where cap T -seed() < cap T -seed() is an interval which contains T = w We can establish the claim b separatel considering the cases w w and w < w As a corollar, we obtain that a ke is in the universal capping sample onl if it is in the bottom-k sample for cap w : Corollar 72 Fiing {u }, for an ke, S (C,k) S (cap w,k) Lemma 73 Fiing the assignment {u }, S (C,k) l + h < k Proof From Lemma 71, a ke is in a bottom-k cap T sample for some T if and onl if it is in the bottom-k sample for cap w The kes with a lower cap w -seed than are those with w w and u < u, which are counted in h, and those with w < w and r /w < r /w, which are counted in l Therefore, a ke is in S (cap w,k) if and onl if there are fewer than k kes with a lower cap w -seed, which is the same as having h + l < k For reference, a ke is in the universal monotone sample S (M,k) if and onl if it satisfies the weaker condition h < k Lemma 74 A ke can be auiliar onl if l + h = k Proof A ke is auiliar (in the set Z) onl if for some S, it has the kth smallest cap w -seed among all kes other than This means it has the (k + 1)th smallest cap w -seed The number of kes with seed smaller than seed() is minimized for T = w If the cap w -seed of is one of the k smallest ones, it is included in the sample Therefore, to be auiliar, it must have the (k + 1)th smallest seed 71 Sampling algorithm We are read to present our sampling algorithm We first process the data so that multiple elements with the same ke are replaced with with the one with maimum weight The net step is to identif all kes with h k It suffices to compute S (M,k) of the data with the auiliar kes We can appl a variant of Algorithm 1: We process the kes in order of decreasing weight, breaking ties b increasing rank, while maintaining a binar search tree H of size k which contains the k lowest u values of processed kes When processing, if u > ma(h) then h > k, and the ke is removed Otherwise, h is the position of u in H, and u is inserted to H and ma(h) removed from H We now onl consider kes with h k Note that in epectation there are at most k ln n such kes The algorithm then computes l for all kes with l k This is done b scanning kes in order of increasing weight, tracking in a binar search tree structure H the (at most) k smallest r /w values When processing, if r /w < ma(h), then l is the position of r /w in H We then delete ma(h) and insert r /w Kes that have l + h < k then constitute the sample S and kes with l + h = k are retained as potentiall being auiliar Finall, we perform another pass on the sampled and potentiall auiliar kes For each ke, we determine the k + 1th smallest cap w -seed, which is τ (cap w,k) Using Corollar 72, we can use (3) to compute p (cap w,k) = p (C,k) At the same time we can also determine the precise set of auiliar kes b removing those that are not the (k + 1)th smallest seed for an cap w for S 72 Size of S (C,k) The sample S (C,k) is contained in S (M,k), but can be much smaller Intuitivel, this is because two kes with similar, but not necessaril identical, weights are likel to have the same relation between their f-seeds across all f C This is not true for M: For a threshold T between the two weights, the thresh T -seed would alwas be lower for the higher weight ke whereas the relation for lower T value can be either one with almost equal probabilities In particular, we obtain a bound on S (C,k) which does not depend on n: Theorem 71 E[ S (C,k) ] ek ln ma w min w Proof Consider a set of kes Y such that ma Y w min Y w = ρ We show that the epected number of kes Y that for at least one T > 0 have one of the bottom-k cap T -seeds is at most ρk ma w The claim then follows b partitioning kes to ln min w groups where weights within each group var b at most a factor of e and then noticing that the bottom-k across all groups must be a subset of the union of bottom-k sets within each group We now prove the claim for Y Denote b τ the (k + 1)th smallest r value of Y The set of k kes with r < τ are the bottom-k sample for cap T min Y w Consider a ke From Lemma 71, we have S (C,k) onl if S (cap w,k) A necessar condition for the latter is that r /w < τ/ min Y w This probabilit is at most [ ] r τ Pr < u U[0,1] w min Y w w Pr min Y w [r < τ] ρ k u U[0,1] Y Thus, the epected number of kes that satisf this condition is at most ρk

10 10 8 METRIC OBJECTIVES In this section we discuss the application of multi-objective sampling to additive cost objectives The formulation has a set of kes X, a set of models Q, and a nonnegative cost function c(q, ) of servicing X b Q Q In metric settings, the kes X are points in a metric space M, Q Q is a configuration of facilities (that can also be points Q M), and c(q, ) is distance-based and is the cost of servicing b Q For each Q Q we are interested in the total cost of servicing X which is c(q, X) = X c(q, ) A concrete eample is the k-means clustering cost function, where Q is a set of points (centers) of size k and c(q, ) = min q Q d(q, ) 2 In this formulation, we are interested in computing a small summar of X that would allow us to estimate c(q, X) for each Q Q Such summaries in a metric setting are sometimes referred to as coresets [1] Multi-objective samples can be used as such a summar Each Q Q has a corresponding function f c(q, ) A multi-objective sample for the set F of all the functions for Q Q allows us to estimate sum(f) = c(q, X) for each Q In particular, a sample of size h(f )ɛ 2 allows us to estimate c(q, X) for each Q Q with CV at most ɛ and good concentration The challenges, for a domain of such problems, are to Upper-bound the multi-objective overhead h(f ) as a function of parameters of the domain ( X, c, structure of Q Q) The overhead is a fundamental propert of the problem domain Efficientl compute upper bounds µ on the multiobjective sampling probabilities p (F,1) so that the sum µ is not much larger than h(f ) We are interested in obtaining these bounds without enumerating over Q Q (which can be infinite or ver large) Recentl, we [6] applied multi-objective sampling to the problem of centralit estimation in metric spaces Here M is a general metric space, X M is a set of points, each Q is a single point in M, and the cost functions is c(q, ) = d(q, ) p The centralit Q is the sum c(q, ) We established that the multi-objective overhead is constant and that upper bound probabilities (with constant overhead) can be computed ver efficientl using O( X ) distance computations More recentl [10], we generalized the result to the k-means objective where Q are subsest of size at most k and establish that the overhead is O(k) 9 FOREACH, FORALL Our multi-objective sampling probabilities provide statistical guarantees that hold for each f and H: Theorem 41 states that the estimate ŝum(f; H) has the CV and concentration bounds over the sample distribution S p (sample S that includes each X independentl (or VarOpt) with probabilit p ) In this section we focus on uniform per-objective guarantees (k f = k for all f F ) and statistics sum(f; X ) = sum(f) over the full data set For F and probabilities p, we define the ForEach Normalized Mean Squared Error (NMSE): NMSE e (F, p) = ma f F E S p( ŝum(f) sum(f) 1)2, (9) and the ForAll NMSE: NMSE a (F, p) = E S p ma f F ( ŝum(f) sum(f) 1)2 (10) The respective normalized root MSE (NRMSE) are the squared roots of the NMSE Note that ForAll is stronger than ForEach as it requires a simultaneous good approimation of sum(f) for all f F : p, NMSE e (F, p) NMSE a (F, p) We are interested in the tradeoff between the epected size of a sample, which is sum(p) p, and the NRMSE The multi-objective pps probabilities are such that for all l > 1, NMSE e (F, p (F,l) ) 1/l For a parameter l 1, we can also consider the ForAll error NMSE a (F, p (F,l) ) and ask for a bound on l so that the NRMSE a ɛ A union-bound argument established that l = ɛ 2 log F alwas suffices Moreover, when F is the sampling closure of a smaller subset F, then l = ɛ 2 log F suffices If we onl bound the maimum error on an subset of F of size m, we can use l = ɛ 2 log m When F is the set of all monotone functions over n kes, then l = O(ɛ 2 log log n) suffices To see this intuitivel, recall that it suffices to consider all threshold functions since all monotone functions are nonnegative combinations of threshold functions There are n threshold functions but these functions have O(log n) points where the value significantl changes b a factor We provide an eample of a famil F where the sample-size gap between NMSE e and NMSE a is linear in the support size Consider a set of n kes and define f for each subset of n/2 kes so that f is uniform on the subset and 0 outside it The multi-objective base pps sampling probabilities are p (F,1) = 2/n for all and hence the overhead is h(f ) = 2 Therefore, p of size 2ɛ 2 has NRMSE e (F, p) ɛ In contrast, an p with NRMSE a (F, p) 1/2 must contain at least one ke from the support of each f in (almost) all samples, impling epected sample size that is at least n/2 When we seek to bound NRMSE e, probabilities of the form p (F,l) essentiall optimize the size-qualit tradeoff For NRMSE a, however, the minimum size p that meets a certain error NRMSE a (F, p) ɛ can be much smaller than the minimum size p that is restricted to the form p (F,l) Consider ɛ > 0 and a set F that has k parts F i with disjoint supports of equal sizes n/k All the parts F i ecept F 1 have a single f that is uniform on the support, which means that with uniform p of size ɛ 2 we have, NRMSE e (F i, p) = NRMSE a (F i, p) = ɛ The part F 1 has similar structure to our previous eample which means that an p that has NRMSE a (F 1, p) 1/2 has size at least n/(2k) wheras a uniform p of size 2ɛ 2 has NRMSE e (F 1, p) = ɛ The multiobjective base sampling probabilities are therefore p (F,1) = k/n for kes in the supports of F i where i > 1 and p (F,1) = 2k/n for kes in the support of F 1 and thus the overhead is k + 1 The minimum size p for NRMSE a (F, p) = 1/2 must have value at least 1/2 for kes in the support of F 1 and value about (k/n) log(k) for other kes (having ForAll requirement for each part and a logarithmic factor due to a (tight) union bound) Therefore the sample size is O(n/k + k log k) In contrast, To have NRMSE a (F, p (F,l) ) = 1/2, we have to use l = Ω(n/k), obtaining a sample size of Ω(n)

Chapter 6. Self-Adjusting Data Structures

Chapter 6. Self-Adjusting Data Structures Chapter 6 Self-Adjusting Data Structures Chapter 5 describes a data structure that is able to achieve an epected quer time that is proportional to the entrop of the quer distribution. The most interesting

More information

Toda s Theorem: PH P #P

Toda s Theorem: PH P #P CS254: Computational Compleit Theor Prof. Luca Trevisan Final Project Ananth Raghunathan 1 Introduction Toda s Theorem: PH P #P The class NP captures the difficult of finding certificates. However, in

More information

CONTINUOUS SPATIAL DATA ANALYSIS

CONTINUOUS SPATIAL DATA ANALYSIS CONTINUOUS SPATIAL DATA ANALSIS 1. Overview of Spatial Stochastic Processes The ke difference between continuous spatial data and point patterns is that there is now assumed to be a meaningful value, s

More information

Monotone Estimation Framework and Applications for Scalable Analytics of Large Data Sets

Monotone Estimation Framework and Applications for Scalable Analytics of Large Data Sets Monotone Estimation Framework and Applications for Scalable Analytics of Large Data Sets Edith Cohen Google Research Tel Aviv University A Monotone Sampling Scheme Data domain V( R + ) v random seed u

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

Section 1.5 Formal definitions of limits

Section 1.5 Formal definitions of limits Section.5 Formal definitions of limits (3/908) Overview: The definitions of the various tpes of limits in previous sections involve phrases such as arbitraril close, sufficientl close, arbitraril large,

More information

School of Computer and Communication Sciences. Information Theory and Coding Notes on Random Coding December 12, 2003.

School of Computer and Communication Sciences. Information Theory and Coding Notes on Random Coding December 12, 2003. ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE School of Computer and Communication Sciences Handout 8 Information Theor and Coding Notes on Random Coding December 2, 2003 Random Coding In this note we prove

More information

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y Paul J. Bruillard MATH 0.970 Problem Set 6 An Introduction to Abstract Mathematics R. Bond and W. Keane Section 3.1: 3b,c,e,i, 4bd, 6, 9, 15, 16, 18c,e, 19a, 0, 1b Section 3.: 1f,i, e, 6, 1e,f,h, 13e,

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

CSE 5311 Notes 5: Trees. (Last updated 6/4/13 4:12 PM)

CSE 5311 Notes 5: Trees. (Last updated 6/4/13 4:12 PM) SE 511 Notes 5: Trees (Last updated 6//1 :1 PM) What is the optimal wa to organie a static tree for searching? n optimal (static) binar search tree is significantl more complicated to construct than an

More information

6. Vector Random Variables

6. Vector Random Variables 6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following

More information

Directional derivatives and gradient vectors (Sect. 14.5). Directional derivative of functions of two variables.

Directional derivatives and gradient vectors (Sect. 14.5). Directional derivative of functions of two variables. Directional derivatives and gradient vectors (Sect. 14.5). Directional derivative of functions of two variables. Partial derivatives and directional derivatives. Directional derivative of functions of

More information

Leveraging Big Data: Lecture 13

Leveraging Big Data: Lecture 13 Leveraging Big Data: Lecture 13 http://www.cohenwang.com/edith/bigdataclass2013 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo What are Linear Sketches? Linear Transformations of the input vector

More information

Flows and Connectivity

Flows and Connectivity Chapter 4 Flows and Connectivit 4. Network Flows Capacit Networks A network N is a connected weighted loopless graph (G,w) with two specified vertices and, called the source and the sink, respectivel,

More information

Processing Top-k Queries from Samples

Processing Top-k Queries from Samples TEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES SCHOOL OF COMPUTER SCIENCE Processing Top-k Queries from Samples This thesis was submitted in partial fulfillment of the requirements

More information

Hamiltonicity and Fault Tolerance

Hamiltonicity and Fault Tolerance Hamiltonicit and Fault Tolerance in the k-ar n-cube B Clifford R. Haithcock Portland State Universit Department of Mathematics and Statistics 006 In partial fulfillment of the requirements of the degree

More information

We have examined power functions like f (x) = x 2. Interchanging x

We have examined power functions like f (x) = x 2. Interchanging x CHAPTER 5 Eponential and Logarithmic Functions We have eamined power functions like f =. Interchanging and ields a different function f =. This new function is radicall different from a power function

More information

Constant 2-labelling of a graph

Constant 2-labelling of a graph Constant 2-labelling of a graph S. Gravier, and E. Vandomme June 18, 2012 Abstract We introduce the concept of constant 2-labelling of a graph and show how it can be used to obtain periodic sphere packing.

More information

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image. INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging

More information

Review of Essential Skills and Knowledge

Review of Essential Skills and Knowledge Review of Essential Skills and Knowledge R Eponent Laws...50 R Epanding and Simplifing Polnomial Epressions...5 R 3 Factoring Polnomial Epressions...5 R Working with Rational Epressions...55 R 5 Slope

More information

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work. 1 A socioeconomic stud analzes two discrete random variables in a certain population of households = number of adult residents and = number of child residents It is found that their joint probabilit mass

More information

Journal of Inequalities in Pure and Applied Mathematics

Journal of Inequalities in Pure and Applied Mathematics Journal of Inequalities in Pure and Applied Mathematics http://jipam.vu.edu.au/ Volume 2, Issue 2, Article 15, 21 ON SOME FUNDAMENTAL INTEGRAL INEQUALITIES AND THEIR DISCRETE ANALOGUES B.G. PACHPATTE DEPARTMENT

More information

where a 0 and the base b is a positive number other

where a 0 and the base b is a positive number other 7. Graph Eponential growth functions No graphing calculators!!!! EXPONENTIAL FUNCTION A function of the form than one. a b where a 0 and the base b is a positive number other a = b = HA = Horizontal Asmptote:

More information

8 Differential Calculus 1 Introduction

8 Differential Calculus 1 Introduction 8 Differential Calculus Introduction The ideas that are the basis for calculus have been with us for a ver long time. Between 5 BC and 5 BC, Greek mathematicians were working on problems that would find

More information

Summary of Random Variable Concepts March 17, 2000

Summary of Random Variable Concepts March 17, 2000 Summar of Random Variable Concepts March 17, 2000 This is a list of important concepts we have covered, rather than a review that devives or eplains them. Tpes of random variables discrete A random variable

More information

8.1 Exponents and Roots

8.1 Exponents and Roots Section 8. Eponents and Roots 75 8. Eponents and Roots Before defining the net famil of functions, the eponential functions, we will need to discuss eponent notation in detail. As we shall see, eponents

More information

Strain Transformation and Rosette Gage Theory

Strain Transformation and Rosette Gage Theory Strain Transformation and Rosette Gage Theor It is often desired to measure the full state of strain on the surface of a part, that is to measure not onl the two etensional strains, and, but also the shear

More information

Functions of Several Variables

Functions of Several Variables Chapter 1 Functions of Several Variables 1.1 Introduction A real valued function of n variables is a function f : R, where the domain is a subset of R n. So: for each ( 1,,..., n ) in, the value of f is

More information

An Information Theory For Preferences

An Information Theory For Preferences An Information Theor For Preferences Ali E. Abbas Department of Management Science and Engineering, Stanford Universit, Stanford, Ca, 94305 Abstract. Recent literature in the last Maimum Entrop workshop

More information

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem 1 Conve Analsis Main references: Vandenberghe UCLA): EECS236C - Optimiation methods for large scale sstems, http://www.seas.ucla.edu/ vandenbe/ee236c.html Parikh and Bod, Proimal algorithms, slides and

More information

All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis

All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis 1 All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis Edith Cohen arxiv:136.3284v7 [cs.ds] 17 Jan 215 Abstract Graph datasets with billions of edges, such as social and Web graphs,

More information

Logarithms. Bacteria like Staph aureus are very common.

Logarithms. Bacteria like Staph aureus are very common. UNIT 10 Eponentials and Logarithms Bacteria like Staph aureus are ver common. Copright 009, K1 Inc. All rights reserved. This material ma not be reproduced in whole or in part, including illustrations,

More information

Mathematical Statistics. Gregg Waterman Oregon Institute of Technology

Mathematical Statistics. Gregg Waterman Oregon Institute of Technology Mathematical Statistics Gregg Waterman Oregon Institute of Technolog c Gregg Waterman This work is licensed under the Creative Commons Attribution. International license. The essence of the license is

More information

Roberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 3. Separable ODE s

Roberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 3. Separable ODE s Roberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 3 Separable ODE s What ou need to know alread: What an ODE is and how to solve an eponential ODE. What ou can learn

More information

The Magic of Random Sampling: From Surveys to Big Data

The Magic of Random Sampling: From Surveys to Big Data The Magic of Random Sampling: From Surveys to Big Data Edith Cohen Google Research Tel Aviv University Disclaimer: Random sampling is classic and well studied tool with enormous impact across disciplines.

More information

SEPARABLE EQUATIONS 2.2

SEPARABLE EQUATIONS 2.2 46 CHAPTER FIRST-ORDER DIFFERENTIAL EQUATIONS 4. Chemical Reactions When certain kinds of chemicals are combined, the rate at which the new compound is formed is modeled b the autonomous differential equation

More information

Correlation analysis 2: Measures of correlation

Correlation analysis 2: Measures of correlation Correlation analsis 2: Measures of correlation Ran Tibshirani Data Mining: 36-462/36-662 Februar 19 2013 1 Review: correlation Pearson s correlation is a measure of linear association In the population:

More information

CSE 546 Midterm Exam, Fall 2014

CSE 546 Midterm Exam, Fall 2014 CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

Stochastic Interpretation for the Arimoto Algorithm

Stochastic Interpretation for the Arimoto Algorithm Stochastic Interpretation for the Arimoto Algorithm Serge Tridenski EE - Sstems Department Tel Aviv Universit Tel Aviv, Israel Email: sergetr@post.tau.ac.il Ram Zamir EE - Sstems Department Tel Aviv Universit

More information

Estimators in simple random sampling: Searls approach

Estimators in simple random sampling: Searls approach Songklanakarin J Sci Technol 35 (6 749-760 Nov - Dec 013 http://wwwsjstpsuacth Original Article Estimators in simple random sampling: Searls approach Jirawan Jitthavech* and Vichit Lorchirachoonkul School

More information

8. BOOLEAN ALGEBRAS x x

8. BOOLEAN ALGEBRAS x x 8. BOOLEAN ALGEBRAS 8.1. Definition of a Boolean Algebra There are man sstems of interest to computing scientists that have a common underling structure. It makes sense to describe such a mathematical

More information

2.2 SEPARABLE VARIABLES

2.2 SEPARABLE VARIABLES 44 CHAPTER FIRST-ORDER DIFFERENTIAL EQUATIONS 6 Consider the autonomous DE 6 Use our ideas from Problem 5 to find intervals on the -ais for which solution curves are concave up and intervals for which

More information

6 = 1 2. The right endpoints of the subintervals are then 2 5, 3, 7 2, 4, 2 9, 5, while the left endpoints are 2, 5 2, 3, 7 2, 4, 9 2.

6 = 1 2. The right endpoints of the subintervals are then 2 5, 3, 7 2, 4, 2 9, 5, while the left endpoints are 2, 5 2, 3, 7 2, 4, 9 2. 5 THE ITEGRAL 5. Approimating and Computing Area Preliminar Questions. What are the right and left endpoints if [, 5] is divided into si subintervals? If the interval [, 5] is divided into si subintervals,

More information

Derivatives 2: The Derivative at a Point

Derivatives 2: The Derivative at a Point Derivatives 2: The Derivative at a Point 69 Derivatives 2: The Derivative at a Point Model 1: Review of Velocit In the previous activit we eplored position functions (distance versus time) and learned

More information

The Fusion of Parametric and Non-Parametric Hypothesis Tests

The Fusion of Parametric and Non-Parametric Hypothesis Tests The Fusion of arametric and Non-arametric pothesis Tests aul Frank Singer, h.d. Ratheon Co. EO/E1/A173.O. Bo 902 El Segundo, CA 90245 310-647-2643 FSinger@ratheon.com Abstract: This paper considers the

More information

Language and Statistics II

Language and Statistics II Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma

More information

Mathematics 309 Conic sections and their applicationsn. Chapter 2. Quadric figures. ai,j x i x j + b i x i + c =0. 1. Coordinate changes

Mathematics 309 Conic sections and their applicationsn. Chapter 2. Quadric figures. ai,j x i x j + b i x i + c =0. 1. Coordinate changes Mathematics 309 Conic sections and their applicationsn Chapter 2. Quadric figures In this chapter want to outline quickl how to decide what figure associated in 2D and 3D to quadratic equations look like.

More information

MA123, Chapter 8: Idea of the Integral (pp , Gootman)

MA123, Chapter 8: Idea of the Integral (pp , Gootman) MA13, Chapter 8: Idea of the Integral (pp. 155-187, Gootman) Chapter Goals: Understand the relationship between the area under a curve and the definite integral. Understand the relationship between velocit

More information

MAT389 Fall 2016, Problem Set 2

MAT389 Fall 2016, Problem Set 2 MAT389 Fall 2016, Problem Set 2 Circles in the Riemann sphere Recall that the Riemann sphere is defined as the set Let P be the plane defined b Σ = { (a, b, c) R 3 a 2 + b 2 + c 2 = 1 } P = { (a, b, c)

More information

Generalized Least-Squares Regressions III: Further Theory and Classication

Generalized Least-Squares Regressions III: Further Theory and Classication Generalized Least-Squares Regressions III: Further Theor Classication NATANIEL GREENE Department of Mathematics Computer Science Kingsborough Communit College, CUNY Oriental Boulevard, Brookln, NY UNITED

More information

6.4 graphs OF logarithmic FUnCTIOnS

6.4 graphs OF logarithmic FUnCTIOnS SECTION 6. graphs of logarithmic functions 9 9 learning ObjeCTIveS In this section, ou will: Identif the domain of a logarithmic function. Graph logarithmic functions. 6. graphs OF logarithmic FUnCTIOnS

More information

10.2 The Unit Circle: Cosine and Sine

10.2 The Unit Circle: Cosine and Sine 0. The Unit Circle: Cosine and Sine 77 0. The Unit Circle: Cosine and Sine In Section 0.., we introduced circular motion and derived a formula which describes the linear velocit of an object moving on

More information

Recall Discrete Distribution. 5.2 Continuous Random Variable. A probability histogram. Density Function 3/27/2012

Recall Discrete Distribution. 5.2 Continuous Random Variable. A probability histogram. Density Function 3/27/2012 3/7/ Recall Discrete Distribution 5. Continuous Random Variable For a discrete distribution, for eample Binomial distribution with n=5, and p=.4, the probabilit distribution is f().7776.59.3456.34.768.4.3

More information

Math 115 First Midterm February 8, 2017

Math 115 First Midterm February 8, 2017 EXAM SOLUTIONS Math First Midterm Februar 8, 07. Do not open this eam until ou are told to do so.. Do not write our name anwhere on this eam.. This eam has pages including this cover. There are problems.

More information

From the help desk: It s all about the sampling

From the help desk: It s all about the sampling The Stata Journal (2002) 2, Number 2, pp. 90 20 From the help desk: It s all about the sampling Allen McDowell Stata Corporation amcdowell@stata.com Jeff Pitblado Stata Corporation jsp@stata.com Abstract.

More information

LINEAR REGRESSION ANALYSIS

LINEAR REGRESSION ANALYSIS LINEAR REGRESSION ANALYSIS MODULE V Lecture - 2 Correcting Model Inadequacies Through Transformation and Weighting Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technolog Kanpur

More information

Ordinal One-Switch Utility Functions

Ordinal One-Switch Utility Functions Ordinal One-Switch Utilit Functions Ali E. Abbas Universit of Southern California, Los Angeles, California 90089, aliabbas@usc.edu David E. Bell Harvard Business School, Boston, Massachusetts 0163, dbell@hbs.edu

More information

Section 1.2: A Catalog of Functions

Section 1.2: A Catalog of Functions Section 1.: A Catalog of Functions As we discussed in the last section, in the sciences, we often tr to find an equation which models some given phenomenon in the real world - for eample, temperature as

More information

15. Eigenvalues, Eigenvectors

15. Eigenvalues, Eigenvectors 5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does

More information

2 Ordinary Differential Equations: Initial Value Problems

2 Ordinary Differential Equations: Initial Value Problems Ordinar Differential Equations: Initial Value Problems Read sections 9., (9. for information), 9.3, 9.3., 9.3. (up to p. 396), 9.3.6. Review questions 9.3, 9.4, 9.8, 9.9, 9.4 9.6.. Two Examples.. Foxes

More information

2 Push and Pull Source Sink Push

2 Push and Pull Source Sink Push 1 Plaing Push vs Pull: Models and Algorithms for Disseminating Dnamic Data in Networks. R.C. Chakinala, A. Kumarasubramanian, Kofi A. Laing R. Manokaran, C. Pandu Rangan, R. Rajaraman Push and Pull 2 Source

More information

4.317 d 4 y. 4 dx d 2 y dy. 20. dt d 2 x. 21. y 3y 3y y y 6y 12y 8y y (4) y y y (4) 2y y 0. d 4 y 26.

4.317 d 4 y. 4 dx d 2 y dy. 20. dt d 2 x. 21. y 3y 3y y y 6y 12y 8y y (4) y y y (4) 2y y 0. d 4 y 26. 38 CHAPTER 4 HIGHER-ORDER DIFFERENTIAL EQUATIONS sstems are also able, b means of their dsolve commands, to provide eplicit solutions of homogeneous linear constant-coefficient differential equations.

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

D u f f x h f y k. Applying this theorem a second time, we have. f xx h f yx k h f xy h f yy k k. f xx h 2 2 f xy hk f yy k 2

D u f f x h f y k. Applying this theorem a second time, we have. f xx h f yx k h f xy h f yy k k. f xx h 2 2 f xy hk f yy k 2 93 CHAPTER 4 PARTIAL DERIVATIVES We close this section b giving a proof of the first part of the Second Derivatives Test. Part (b) has a similar proof. PROOF OF THEOREM 3, PART (A) We compute the second-order

More information

Additional Material On Recursive Sequences

Additional Material On Recursive Sequences Penn State Altoona MATH 141 Additional Material On Recursive Sequences 1. Graphical Analsis Cobweb Diagrams Consider a generic recursive sequence { an+1 = f(a n ), n = 1,, 3,..., = Given initial value.

More information

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part I

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part I NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Comple Analsis II Lecture Notes Part I Chapter 1 Preliminar results/review of Comple Analsis I These are more detailed notes for the results

More information

Introduction to Differential Equations. National Chiao Tung University Chun-Jen Tsai 9/14/2011

Introduction to Differential Equations. National Chiao Tung University Chun-Jen Tsai 9/14/2011 Introduction to Differential Equations National Chiao Tung Universit Chun-Jen Tsai 9/14/011 Differential Equations Definition: An equation containing the derivatives of one or more dependent variables,

More information

Interspecific Segregation and Phase Transition in a Lattice Ecosystem with Intraspecific Competition

Interspecific Segregation and Phase Transition in a Lattice Ecosystem with Intraspecific Competition Interspecific Segregation and Phase Transition in a Lattice Ecosstem with Intraspecific Competition K. Tainaka a, M. Kushida a, Y. Ito a and J. Yoshimura a,b,c a Department of Sstems Engineering, Shizuoka

More information

Inferential Statistics and Methods for Tuning and Configuring

Inferential Statistics and Methods for Tuning and Configuring DM63 HEURISTICS FOR COMBINATORIAL OPTIMIZATION Lecture 13 and Methods for Tuning and Configuring Marco Chiarandini 1 Summar 2 3 Eperimental Designs 4 Sequential Testing (Racing) Summar 1 Summar 2 3 Eperimental

More information

University of Toronto Mississauga

University of Toronto Mississauga Surname: First Name: Student Number: Tutorial: Universit of Toronto Mississauga Mathematical and Computational Sciences MATY5Y Term Test Duration - 0 minutes No Aids Permitted This eam contains pages (including

More information

Review Topics for MATH 1400 Elements of Calculus Table of Contents

Review Topics for MATH 1400 Elements of Calculus Table of Contents Math 1400 - Mano Table of Contents - Review - page 1 of 2 Review Topics for MATH 1400 Elements of Calculus Table of Contents MATH 1400 Elements of Calculus is one of the Marquette Core Courses for Mathematical

More information

Computation of Csiszár s Mutual Information of Order α

Computation of Csiszár s Mutual Information of Order α Computation of Csiszár s Mutual Information of Order Damianos Karakos, Sanjeev Khudanpur and Care E. Priebe Department of Electrical and Computer Engineering and Center for Language and Speech Processing

More information

Fuzzy Topology On Fuzzy Sets: Regularity and Separation Axioms

Fuzzy Topology On Fuzzy Sets: Regularity and Separation Axioms wwwaasrcorg/aasrj American Academic & Scholarl Research Journal Vol 4, No 2, March 212 Fuzz Topolog n Fuzz Sets: Regularit and Separation Aioms AKandil 1, S Saleh 2 and MM Yakout 3 1 Mathematics Department,

More information

CHAPTER 2: Partial Derivatives. 2.2 Increments and Differential

CHAPTER 2: Partial Derivatives. 2.2 Increments and Differential CHAPTER : Partial Derivatives.1 Definition of a Partial Derivative. Increments and Differential.3 Chain Rules.4 Local Etrema.5 Absolute Etrema 1 Chapter : Partial Derivatives.1 Definition of a Partial

More information

11.4 Polar Coordinates

11.4 Polar Coordinates 11. Polar Coordinates 917 11. Polar Coordinates In Section 1.1, we introduced the Cartesian coordinates of a point in the plane as a means of assigning ordered pairs of numbers to points in the plane.

More information

INTRODUCTION TO DIOPHANTINE EQUATIONS

INTRODUCTION TO DIOPHANTINE EQUATIONS INTRODUCTION TO DIOPHANTINE EQUATIONS In the earl 20th centur, Thue made an important breakthrough in the stud of diophantine equations. His proof is one of the first eamples of the polnomial method. His

More information

8.4. If we let x denote the number of gallons pumped, then the price y in dollars can $ $1.70 $ $1.70 $ $1.70 $ $1.

8.4. If we let x denote the number of gallons pumped, then the price y in dollars can $ $1.70 $ $1.70 $ $1.70 $ $1. 8.4 An Introduction to Functions: Linear Functions, Applications, and Models We often describe one quantit in terms of another; for eample, the growth of a plant is related to the amount of light it receives,

More information

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Copright, 8, RE Kass, EN Brown, and U Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Chapter 6 Random Vectors and Multivariate Distributions 6 Random Vectors In Section?? we etended

More information

Lecture 7: Advanced Coupling & Mixing Time via Eigenvalues

Lecture 7: Advanced Coupling & Mixing Time via Eigenvalues Counting and Sampling Fall 07 Lecture 7: Advanced Coupling & Miing Time via Eigenvalues Lecturer: Shaan Oveis Gharan October 8 Disclaimer: These notes have not been subjected to the usual scrutin reserved

More information

Inference about the Slope and Intercept

Inference about the Slope and Intercept Inference about the Slope and Intercept Recall, we have established that the least square estimates and 0 are linear combinations of the Y i s. Further, we have showed that the are unbiased and have the

More information

Solutions to Two Interesting Problems

Solutions to Two Interesting Problems Solutions to Two Interesting Problems Save the Lemming On each square of an n n chessboard is an arrow pointing to one of its eight neighbors (or off the board, if it s an edge square). However, arrows

More information

UNIT 2 QUADRATIC FUNCTIONS AND MODELING Lesson 2: Interpreting Quadratic Functions Instruction

UNIT 2 QUADRATIC FUNCTIONS AND MODELING Lesson 2: Interpreting Quadratic Functions Instruction Prerequisite Skills This lesson requires the use of the following skills: knowing the standard form of quadratic functions using graphing technolog to model quadratic functions Introduction The tourism

More information

Chapter 6* The PPSZ Algorithm

Chapter 6* The PPSZ Algorithm Chapter 6* The PPSZ Algorithm In this chapter we present an improvement of the pp algorithm. As pp is called after its inventors, Paturi, Pudlak, and Zane [PPZ99], the improved version, pps, is called

More information

x y plane is the plane in which the stresses act, yy xy xy Figure 3.5.1: non-zero stress components acting in the x y plane

x y plane is the plane in which the stresses act, yy xy xy Figure 3.5.1: non-zero stress components acting in the x y plane 3.5 Plane Stress This section is concerned with a special two-dimensional state of stress called plane stress. It is important for two reasons: () it arises in real components (particularl in thin components

More information

Perturbation Theory for Variational Inference

Perturbation Theory for Variational Inference Perturbation heor for Variational Inference Manfred Opper U Berlin Marco Fraccaro echnical Universit of Denmark Ulrich Paquet Apple Ale Susemihl U Berlin Ole Winther echnical Universit of Denmark Abstract

More information

A. Derivation of regularized ERM duality

A. Derivation of regularized ERM duality A. Derivation of regularized ERM dualit For completeness, in this section we derive the dual 5 to the problem of computing proximal operator for the ERM objective 3. We can rewrite the primal problem as

More information

Beyond Distinct Counting: LogLog Composable Sketches of frequency statistics

Beyond Distinct Counting: LogLog Composable Sketches of frequency statistics Beyond Distinct Counting: LogLog Composable Sketches of frequency statistics Edith Cohen Google Research Tel Aviv University TOCA-SV 5/12/2017 8 Un-aggregated data 3 Data element e D has key and value

More information

Module 3, Section 4 Analytic Geometry II

Module 3, Section 4 Analytic Geometry II Principles of Mathematics 11 Section, Introduction 01 Introduction, Section Analtic Geometr II As the lesson titles show, this section etends what ou have learned about Analtic Geometr to several related

More information

DIFFERENTIAL EQUATIONS First Order Differential Equations. Paul Dawkins

DIFFERENTIAL EQUATIONS First Order Differential Equations. Paul Dawkins DIFFERENTIAL EQUATIONS First Order Paul Dawkins Table of Contents Preface... First Order... 3 Introduction... 3 Linear... 4 Separable... 7 Eact... 8 Bernoulli... 39 Substitutions... 46 Intervals of Validit...

More information

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D. Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and

More information

Study Guide and Intervention

Study Guide and Intervention 6- NAME DATE PERID Stud Guide and Intervention Graphing Quadratic Functions Graph Quadratic Functions Quadratic Function A function defined b an equation of the form f () a b c, where a 0 b Graph of a

More information

A Linear Round Lower Bound for Lovasz-Schrijver SDP Relaxations of Vertex Cover

A Linear Round Lower Bound for Lovasz-Schrijver SDP Relaxations of Vertex Cover A Linear Round Lower Bound for Lovasz-Schrijver SDP Relaxations of Vertex Cover Grant Schoenebeck Luca Trevisan Madhur Tulsiani Abstract We study semidefinite programming relaxations of Vertex Cover arising

More information

UNCORRECTED SAMPLE PAGES. 3Quadratics. Chapter 3. Objectives

UNCORRECTED SAMPLE PAGES. 3Quadratics. Chapter 3. Objectives Chapter 3 3Quadratics Objectives To recognise and sketch the graphs of quadratic polnomials. To find the ke features of the graph of a quadratic polnomial: ais intercepts, turning point and ais of smmetr.

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 5 Optimizing Fixed Size Samples Sampling as

More information

LESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II

LESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS

More information

10.5 Graphs of the Trigonometric Functions

10.5 Graphs of the Trigonometric Functions 790 Foundations of Trigonometr 0.5 Graphs of the Trigonometric Functions In this section, we return to our discussion of the circular (trigonometric functions as functions of real numbers and pick up where

More information

Unit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents

Unit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents Unit NOTES Honors Common Core Math Da : Properties of Eponents Warm-Up: Before we begin toda s lesson, how much do ou remember about eponents? Use epanded form to write the rules for the eponents. OBJECTIVE

More information

The Steiner Ratio for Obstacle-Avoiding Rectilinear Steiner Trees

The Steiner Ratio for Obstacle-Avoiding Rectilinear Steiner Trees The Steiner Ratio for Obstacle-Avoiding Rectilinear Steiner Trees Anna Lubiw Mina Razaghpour Abstract We consider the problem of finding a shortest rectilinear Steiner tree for a given set of pointn the

More information

Multiple Integration

Multiple Integration Contents 7 Multiple Integration 7. Introduction to Surface Integrals 7. Multiple Integrals over Non-rectangular Regions 7. Volume Integrals 4 7.4 Changing Coordinates 66 Learning outcomes In this Workbook

More information