Overcoming Limitations of Sampling for Aggregation Queries

Size: px
Start display at page:

Download "Overcoming Limitations of Sampling for Aggregation Queries"

Transcription

1 CIS 6930 Approxmate Quer Processg Paper Presetato Sprg Istructor: Dr Al Dobra Overcomg Lmtatos of Samplg for Aggregato Queres Authors: Surajt Chaudhur, Gautam Das, Maur Datar, Rajeev Motwa, ad Vvek R arasaa ICDE 2001 Preseted b: Adréa Matsuaga ammatsu@ufledu) 2004, UFL-COE-ECE

2 Outle Itroducto The eed for Approxmate Quer Processg Issues wth uform samplg Solutos Outler-dexes Explotg workload formato Expermetal results

3 Itroducto Data aalss over large data s hard Data aaltcs ofte do ot eed exact aswers ballpark estmates are eough Examples O Le Aaltcal Processg OLAP)/Decso Support Eg what s the percet crease the sales of Wdows XP over last ear Calfora? Data Mg Buldg models eg decso trees) does ot requre precse couts Focus o Aggregate queres

4 Issues Lmtatos of uform samplg aswerg Aggregato queres: Data skew large data varace) Outler-dexes Low selectvt ad small groups Explotg workload formato

5 Data Skew Effect Example 99% Relato R 1% tuples) K C SumC) 109,900 1% uform sample 100 tuples) Extrapolate multpl b 100) Severe uderestmate f outler ot sample o tuple wth 1000: EstSUMC))10,000 R-9900 R-100 P tuple wth 1000: EstSUMC))109,900 Severe overestmate f outler ot sample 2 or more tuples wth 1000: EstSUMC))209,800 EstSUMC))309,700 Probablt of 063 to get large error estmate!!!

6 Theorem 1 U e Y Y 1 S 1 ε R Relato of sze { 1, 2,, } Set of values assocated wth the tuples the relato U uform sample of s of sze wth stadard error: where S stadard devato Ubased estmator of the actual sum 1 ) 1 2 Y S

7 Theorem 1 - Proof U e Y Y 1 S Y Var e ) ε 1 ) ) 1 2 Y S Var S Var Var Y Var U U e ) ) Y E P E E Y E U U e 1 1 ) ) Propertes of varace: For depedet radom varables) ) ) 2 X Var a ax Var X Var X Var ) ) Propertes of expectato: a a E )

8 Soluto 1: Outler Idexg To hadle data skew a aggregato quer The dea: Separate the outlers R O ) from the rest of the data or o-outlers R O ) to a outler dex Keep a uform radom sample of the remag data Use outler dex as well as radom sample to aswer queres

9 Outler Idexg Implemetato Pre-processg 1) Determe the outlers R O Quer Quer processg 3) Aggregate outlers A 1 R R O R O sample 2) Sample o-outlers Quer & extrapolate A2 4) Aggregate o-outlers ote: Sce DB cotet chage over tme, selecto of outlers dexes ad samples should be refreshed appropratel + A 5) Combe aggregates

10 Outler Selecto: Defto 1 For a sub-relato R R R) εr ) stadard error estmatg the sum of values R uform samplg followed b extrapolato) A optmal outler-dex R O R,C,τ) s defed as a sub-relato R O R: R O τ εr\r O ) m R R, R τ {εr\r )}

11 Outler Selecto: Theorem 2 Cosder a multset R { 1, 2,, } where the s are sorted order Let R O R be the subset such that: R O τ SR\RO) m R R, R τ {SR\R )} The exsts some 0 τ τ such that R O { 1 τ } { +τ +1-τ) }

12 Outler Selecto: Algorthm 1) Read the values colum C of the relato R Let { 1, 2,, } be the sorted order of the values appearg C each value correspods to a tuple) 2) For 1 to τ+1, compute E) S{, +1,, -τ+-1 }) 3) Let be the value of where E) takes ts mmum value The the outler-dex s the tuples that correspod to the set of values { j 1 j τ } { j +τ +1-τ) j } where τ -1 The algorthm depeds o computg stadard devatos Stadard devatos computed O1) tme for sertos ad deletos eg E+1) ca be computed from E), ad -τ+1 )

13 Outler Selecto: Example Relato R _ Y 1099 Y 109,900 10,000 tuples 99% 9900 tuples 1% 100 tuples E1) Outlers!!! For τ 100: E1) 999 E2) 1409 E3) 1725 E101) 999 E2) E101) CREATE VIEW c_otl_dx AS SELECT * from R WHERE C > 1000)

14 Low Selectvt ad Small Groups Effect Example Relato R Sample Quer wth group-b s Sample ma ot cota eve a sgle row that belogs to the sub-relato Quer wth low selectvt Sample ma ot cota eve a sgle row selected b the quer

15 Soluto 2: Explotg Workload Iformato To hadle low selectvt ad small groups The dea: Use weghted samplg Sample more from subsets of data that are small sze but are mportat have hgh usage) Explot DB access patter localt Usg pre-computed samples

16 Explotg Workload Iformato Steps: 1) Workload Collecto: obta a workload cosstg of represetatve queres agast the DB eg Mcrosoft SQL Server Profler) 2) Trace Quer Patters: aalze workload to obta parsed formato eg the set of selecto codtos that are posed) 3) Trace Tuple Usage: The executo of the workload reveals addtoal formato o usage of specfc tuples eg frequec of access to each tuple) Sce trackg ths formato at the level of tuples ca be expesve, t ca be kept at coarser graulart eg o page-level) For the expermets, assumed that a tuple t has weght w f the tuple t s requred to aswer w queres the workload) 4) Weghted Samplg: Perform samplg b takg to accout weghts of tuples step 3 The probablt to accept the sample s p w, where: w ' w / w eed to store the ormalzed weght w together wth the tuple sce ts verse multplcato factor) wll be used to aswer the aggregate quer j 1 j

17 Explotg Workload Iformato Whe weghted samplg based o workload formato works well? Access patter of queres are local We have a workload that s a good represetatve of future queres

18 Expermetal Setup Platform: Dell Precso 610 sstem wth a Petum III Xeo 450 MHz processor wth 128 MB RAM ad a exteral 23GB hard drve Databases: 100MB TPC-R databases TPC-R bechmark modfed to var the degree of skew determed b the Zpfa parameter z 5 dstrbuto, sce orgal data s geerated from a uform dstrbuto Workloads: radom quer geerato program wth sum aggregate fucto Parameters: a) skew of the data z) was vared over 1, 15, 2, 25, ad 3 b) the samplg fracto f) was vared over a wde rage from 1% to 100%, c) the storage for the outler-dex was vared over 1%, 5%, 10%, ad 20%; ad d) average over 3 rus Techques:USAMP: uform samplg WSAMP: weghted samplg WSAMP+OTLIDX: weghted samplg + outler-dexg

19 Expermetal Results

20 Expermetal Results

21 Expermetal Results

22 Questos? Thak ou!

CHAPTER VI Statistical Analysis of Experimental Data

CHAPTER VI Statistical Analysis of Experimental Data Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

To use adaptive cluster sampling we must first make some definitions of the sampling universe:

To use adaptive cluster sampling we must first make some definitions of the sampling universe: 8.3 ADAPTIVE SAMPLING Most of the methods dscussed samplg theory are lmted to samplg desgs hch the selecto of the samples ca be doe before the survey, so that oe of the decsos about samplg deped ay ay

More information

Objectives of Multiple Regression

Objectives of Multiple Regression Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

= 1. UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Parameters and Statistics. Measures of Centrality

= 1. UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Parameters and Statistics. Measures of Centrality UCLA STAT Itroducto to Statstcal Methods for the Lfe ad Health Sceces Istructor: Ivo Dov, Asst. Prof. of Statstcs ad Neurology Teachg Assstats: Fred Phoa, Krste Johso, Mg Zheg & Matlda Hseh Uversty of

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Mean is only appropriate for interval or ratio scales, not ordinal or nominal. Mea Same as ordary average Sum all the data values ad dvde by the sample sze. x = ( x + x +... + x Usg summato otato, we wrte ths as x = x = x = = ) x Mea s oly approprate for terval or rato scales, ot

More information

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR amplg Theory MODULE II LECTURE - 4 IMPLE RADOM AMPLIG DR. HALABH DEPARTMET OF MATHEMATIC AD TATITIC IDIA ITITUTE OF TECHOLOGY KAPUR Estmato of populato mea ad populato varace Oe of the ma objectves after

More information

4. Standard Regression Model and Spatial Dependence Tests

4. Standard Regression Model and Spatial Dependence Tests 4. Stadard Regresso Model ad Spatal Depedece Tests Stadard regresso aalss fals the presece of spatal effects. I case of spatal depedeces ad/or spatal heterogeet a stadard regresso model wll be msspecfed.

More information

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean Research Joural of Mathematcal ad Statstcal Sceces ISS 30 6047 Vol. 1(), 5-1, ovember (013) Res. J. Mathematcal ad Statstcal Sc. Comparso of Dual to Rato-Cum-Product Estmators of Populato Mea Abstract

More information

L5 Polynomial / Spline Curves

L5 Polynomial / Spline Curves L5 Polyomal / Sple Curves Cotets Coc sectos Polyomal Curves Hermte Curves Bezer Curves B-Sples No-Uform Ratoal B-Sples (NURBS) Mapulato ad Represetato of Curves Types of Curve Equatos Implct: Descrbe a

More information

( ) ( ) ( ) f ( ) ( )

( ) ( ) ( ) f ( ) ( ) Idepedet Mote Carlo Iterested Approxmate E X = x dν x = µ µ wth = = x where x, x s sampled rom the probablty, measure ν ( X ). Uder certa regularty codtos, I x,, x x E X = are a d sample rom ν ( X ) x

More information

X ε ) = 0, or equivalently, lim

X ε ) = 0, or equivalently, lim Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece

More information

Chapter 3 Sampling For Proportions and Percentages

Chapter 3 Sampling For Proportions and Percentages Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys

More information

The Mathematical Appendix

The Mathematical Appendix The Mathematcal Appedx Defto A: If ( Λ, Ω, where ( λ λ λ whch the probablty dstrbutos,,..., Defto A. uppose that ( Λ,,..., s a expermet type, the σ-algebra o λ λ λ are defed s deoted by ( (,,...,, σ Ω.

More information

Chapter -2 Simple Random Sampling

Chapter -2 Simple Random Sampling Chapter - Smple Radom Samplg Smple radom samplg (SRS) s a method of selecto of a sample comprsg of umber of samplg uts out of the populato havg umber of samplg uts such that every samplg ut has a equal

More information

Chapter 8: Statistical Analysis of Simulated Data

Chapter 8: Statistical Analysis of Simulated Data Marquette Uversty MSCS600 Chapter 8: Statstcal Aalyss of Smulated Data Dael B. Rowe, Ph.D. Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 08 by Marquette Uversty MSCS600 Ageda 8. The Sample

More information

Median as a Weighted Arithmetic Mean of All Sample Observations

Median as a Weighted Arithmetic Mean of All Sample Observations Meda as a Weghted Arthmetc Mea of All Sample Observatos SK Mshra Dept. of Ecoomcs NEHU, Shllog (Ida). Itroducto: Iumerably may textbooks Statstcs explctly meto that oe of the weakesses (or propertes) of

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

Section l h l Stem=Tens. 8l Leaf=Ones. 8h l 03. 9h 58

Section l h l Stem=Tens. 8l Leaf=Ones. 8h l 03. 9h 58 Secto.. 6l 34 6h 667899 7l 44 7h Stem=Tes 8l 344 Leaf=Oes 8h 5557899 9l 3 9h 58 Ths dsplay brgs out the gap the data: There are o scores the hgh 7's. 6. a. beams cylders 9 5 8 88533 6 6 98877643 7 488

More information

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015 Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts

More information

Chapter -2 Simple Random Sampling

Chapter -2 Simple Random Sampling Chapter - Smple Radom Samplg Smple radom samplg (SRS) s a method of selecto of a sample comprsg of umber of samplg uts out of the populato havg umber of samplg uts such that every samplg ut has a equal

More information

STA302/1001-Fall 2008 Midterm Test October 21, 2008

STA302/1001-Fall 2008 Midterm Test October 21, 2008 STA3/-Fall 8 Mdterm Test October, 8 Last Name: Frst Name: Studet Number: Erolled (Crcle oe) STA3 STA INSTRUCTIONS Tme allowed: hour 45 mutes Ads allowed: A o-programmable calculator A table of values from

More information

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations HP 30S Statstcs Averages ad Stadard Devatos Average ad Stadard Devato Practce Fdg Averages ad Stadard Devatos HP 30S Statstcs Averages ad Stadard Devatos Average ad stadard devato The HP 30S provdes several

More information

Multiple Choice Test. Chapter Adequacy of Models for Regression

Multiple Choice Test. Chapter Adequacy of Models for Regression Multple Choce Test Chapter 06.0 Adequac of Models for Regresso. For a lear regresso model to be cosdered adequate, the percetage of scaled resduals that eed to be the rage [-,] s greater tha or equal to

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA THE ROYAL STATISTICAL SOCIETY EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER II STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for

More information

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Lecture Notes Forecasting the process of estimating or predicting unknown situations Lecture Notes. Ecoomc Forecastg. Forecastg the process of estmatg or predctg ukow stuatos Eample usuall ecoomsts predct future ecoomc varables Forecastg apples to a varet of data () tme seres data predctg

More information

BIOREPS Problem Set #11 The Evolution of DNA Strands

BIOREPS Problem Set #11 The Evolution of DNA Strands BIOREPS Problem Set #11 The Evoluto of DNA Strads 1 Backgroud I the md 2000s, evolutoary bologsts studyg DNA mutato rates brds ad prmates dscovered somethg surprsg. There were a large umber of mutatos

More information

The expected value of a sum of random variables,, is the sum of the expected values:

The expected value of a sum of random variables,, is the sum of the expected values: Sums of Radom Varables xpected Values ad Varaces of Sums ad Averages of Radom Varables The expected value of a sum of radom varables, say S, s the sum of the expected values: ( ) ( ) S Ths s always true

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67. Ecoomcs 3 Itroducto to Ecoometrcs Sprg 004 Professor Dobk Name Studet ID Frst Mdterm Exam You must aswer all the questos. The exam s closed book ad closed otes. You may use your calculators but please

More information

Some Notes on the Probability Space of Statistical Surveys

Some Notes on the Probability Space of Statistical Surveys Metodološk zvezk, Vol. 7, No., 200, 7-2 ome Notes o the Probablty pace of tatstcal urveys George Petrakos Abstract Ths paper troduces a formal presetato of samplg process usg prcples ad cocepts from Probablty

More information

Handout #1. Title: Foundations of Econometrics. POPULATION vs. SAMPLE

Handout #1. Title: Foundations of Econometrics. POPULATION vs. SAMPLE Hadout #1 Ttle: Foudatos of Ecoometrcs Course: Eco 367 Fall/015 Istructor: Dr. I-Mg Chu POPULATION vs. SAMPLE From the Bureau of Labor web ste (http://www.bls.gov), we ca fd the uemploymet rate for each

More information

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data /7/06 Aalzg Two-Dmesoal Data The most commo aaltcal measuremets volve the determato of a ukow cocetrato based o the respose of a aaltcal procedure (usuall strumetal). Such a measuremet requres calbrato,

More information

2SLS Estimates ECON In this case, begin with the assumption that E[ i

2SLS Estimates ECON In this case, begin with the assumption that E[ i SLS Estmates ECON 3033 Bll Evas Fall 05 Two-Stage Least Squares (SLS Cosder a stadard lear bvarate regresso model y 0 x. I ths case, beg wth the assumto that E[ x] 0 whch meas that OLS estmates of wll

More information

PTAS for Bin-Packing

PTAS for Bin-Packing CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,

More information

1 Onto functions and bijections Applications to Counting

1 Onto functions and bijections Applications to Counting 1 Oto fuctos ad bectos Applcatos to Coutg Now we move o to a ew topc. Defto 1.1 (Surecto. A fucto f : A B s sad to be surectve or oto f for each b B there s some a A so that f(a B. What are examples of

More information

ε. Therefore, the estimate

ε. Therefore, the estimate Suggested Aswers, Problem Set 3 ECON 333 Da Hugerma. Ths s ot a very good dea. We kow from the secod FOC problem b) that ( ) SSE / = y x x = ( ) Whch ca be reduced to read y x x = ε x = ( ) The OLS model

More information

Introduction to local (nonparametric) density estimation. methods

Introduction to local (nonparametric) density estimation. methods Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest

More information

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled

More information

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen. .5 x 54.5 a. x 7. 786 7 b. The raked observatos are: 7.4, 7.5, 7.7, 7.8, 7.9, 8.0, 8.. Sce the sample sze 7 s odd, the meda s the (+)/ 4 th raked observato, or meda 7.8 c. The cosumer would more lkely

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Radom Varables ad Probablty Dstrbutos * If X : S R s a dscrete radom varable wth rage {x, x, x 3,. } the r = P (X = xr ) = * Let X : S R be a dscrete radom varable wth rage {x, x, x 3,.}.If x r P(X = x

More information

Measures of Dispersion

Measures of Dispersion Chapter 8 Measures of Dsperso Defto of Measures of Dsperso (page 31) A measure of dsperso s a descrptve summary measure that helps us characterze the data set terms of how vared the observatos are from

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Marquette Uverst Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Coprght 08 b Marquette Uverst Maxmum Lkelhood Estmato We have bee sag that ~

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

Chapter Two. An Introduction to Regression ( )

Chapter Two. An Introduction to Regression ( ) ubject: A Itroducto to Regresso Frst tage Chapter Two A Itroducto to Regresso (018-019) 1 pg. ubject: A Itroducto to Regresso Frst tage A Itroducto to Regresso Regresso aalss s a statstcal tool for the

More information

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR Pot Patter Aalyss Part I Outle Revst IRP/CSR, frst- ad secod order effects What s pot patter aalyss (PPA)? Desty-based pot patter measures Dstace-based pot patter measures Revst IRP/CSR Equal probablty:

More information

to the estimation of total sensitivity indices

to the estimation of total sensitivity indices Applcato of the cotrol o varate ate techque to the estmato of total sestvty dces S KUCHERENKO B DELPUECH Imperal College Lodo (UK) skuchereko@mperalacuk B IOOSS Electrcté de Frace (Frace) S TARANTOLA Jot

More information

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Multivariate Transformation of Variables and Maximum Likelihood Estimation Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty

More information

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Introduction to Matrices and Matrix Approach to Simple Linear Regression Itroducto to Matrces ad Matrx Approach to Smple Lear Regresso Matrces Defto: A matrx s a rectagular array of umbers or symbolc elemets I may applcatos, the rows of a matrx wll represet dvduals cases (people,

More information

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set. Addtoal Decrease ad Coquer Algorthms For combatoral problems we mght eed to geerate all permutatos, combatos, or subsets of a set. Geeratg Permutatos If we have a set f elemets: { a 1, a 2, a 3, a } the

More information

Regresso What s a Model? 1. Ofte Descrbe Relatoshp betwee Varables 2. Types - Determstc Models (o radomess) - Probablstc Models (wth radomess) EPI 809/Sprg 2008 9 Determstc Models 1. Hypothesze

More information

Answer key to problem set # 2 ECON 342 J. Marcelo Ochoa Spring, 2009

Answer key to problem set # 2 ECON 342 J. Marcelo Ochoa Spring, 2009 Aswer key to problem set # ECON 34 J. Marcelo Ochoa Sprg, 009 Problem. For T cosder the stadard pael data model: y t x t β + α + ǫ t a Numercally compare the fxed effect ad frst dfferece estmates. b Compare

More information

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits Block-Based Compact hermal Modelg of Semcoductor Itegrated Crcuts Master s hess Defese Caddate: Jg Ba Commttee Members: Dr. Mg-Cheg Cheg Dr. Daqg Hou Dr. Robert Schllg July 27, 2009 Outle Itroducto Backgroud

More information

Chapter 2 - Free Vibration of Multi-Degree-of-Freedom Systems - II

Chapter 2 - Free Vibration of Multi-Degree-of-Freedom Systems - II CEE49b Chapter - Free Vbrato of Mult-Degree-of-Freedom Systems - II We ca obta a approxmate soluto to the fudametal atural frequecy through a approxmate formula developed usg eergy prcples by Lord Raylegh

More information

Pseudo-random Functions

Pseudo-random Functions Pseudo-radom Fuctos Debdeep Mukhopadhyay IIT Kharagpur We have see the costructo of PRG (pseudo-radom geerators) beg costructed from ay oe-way fuctos. Now we shall cosder a related cocept: Pseudo-radom

More information

Lecture 8: Linear Regression

Lecture 8: Linear Regression Lecture 8: Lear egresso May 4, GENOME 56, Sprg Goals Develop basc cocepts of lear regresso from a probablstc framework Estmatg parameters ad hypothess testg wth lear models Lear regresso Su I Lee, CSE

More information

Continuous Distributions

Continuous Distributions 7//3 Cotuous Dstrbutos Radom Varables of the Cotuous Type Desty Curve Percet Desty fucto, f (x) A smooth curve that ft the dstrbuto 3 4 5 6 7 8 9 Test scores Desty Curve Percet Probablty Desty Fucto, f

More information

MEASURES OF DISPERSION

MEASURES OF DISPERSION MEASURES OF DISPERSION Measure of Cetral Tedecy: Measures of Cetral Tedecy ad Dsperso ) Mathematcal Average: a) Arthmetc mea (A.M.) b) Geometrc mea (G.M.) c) Harmoc mea (H.M.) ) Averages of Posto: a) Meda

More information

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes coometrcs, CON Sa Fracsco State Uverst Mchael Bar Sprg 5 Mdterm xam, secto Soluto Thursda, Februar 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes exam.. No calculators of a kd are allowed..

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

Pseudo-random Functions. PRG vs PRF

Pseudo-random Functions. PRG vs PRF Pseudo-radom Fuctos Debdeep Muhopadhyay IIT Kharagpur PRG vs PRF We have see the costructo of PRG (pseudo-radom geerators) beg costructed from ay oe-way fuctos. Now we shall cosder a related cocept: Pseudo-radom

More information

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis) We have covered: Selecto, Iserto, Mergesort, Bubblesort, Heapsort Next: Selecto the Qucksort The Selecto Problem - Varable Sze Decrease/Coquer (Practce wth algorthm aalyss) Cosder the problem of fdg the

More information

9.1 Introduction to the probit and logit models

9.1 Introduction to the probit and logit models EC3000 Ecoometrcs Lecture 9 Probt & Logt Aalss 9. Itroducto to the probt ad logt models 9. The logt model 9.3 The probt model Appedx 9. Itroducto to the probt ad logt models These models are used regressos

More information

Sequential Approach to Covariance Correction for P-Field Simulation

Sequential Approach to Covariance Correction for P-Field Simulation Sequetal Approach to Covarace Correcto for P-Feld Smulato Chad Neufeld ad Clayto V. Deutsch Oe well kow artfact of the probablty feld (p-feld smulato algorthm s a too large covarace ear codtog data. Prevously,

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato

More information

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek Partally Codtoal Radom Permutato Model 7- vestgato of Partally Codtoal RP Model wth Respose Error TRODUCTO Ed Staek We explore the predctor that wll result a smple radom sample wth respose error whe a

More information

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model ECON 48 / WH Hog The Smple Regresso Model. Defto of the Smple Regresso Model Smple Regresso Model Expla varable y terms of varable x y = β + β x+ u y : depedet varable, explaed varable, respose varable,

More information

Chapter 11 Systematic Sampling

Chapter 11 Systematic Sampling Chapter stematc amplg The sstematc samplg techue s operatoall more coveet tha the smple radom samplg. It also esures at the same tme that each ut has eual probablt of cluso the sample. I ths method of

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:

More information

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

Idea is to sample from a different distribution that picks points in important regions of the sample space. Want ( ) ( ) ( ) E f X = f x g x dx

Idea is to sample from a different distribution that picks points in important regions of the sample space. Want ( ) ( ) ( ) E f X = f x g x dx Importace Samplg Used for a umber of purposes: Varace reducto Allows for dffcult dstrbutos to be sampled from. Sestvty aalyss Reusg samples to reduce computatoal burde. Idea s to sample from a dfferet

More information

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5 THE ROYAL STATISTICAL SOCIETY 06 EAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5 The Socety s provdg these solutos to assst cadtes preparg for the examatos 07. The solutos are teded as learg ads ad should

More information

Dimensionality Reduction and Learning

Dimensionality Reduction and Learning CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that

More information

STK4011 and STK9011 Autumn 2016

STK4011 and STK9011 Autumn 2016 STK4 ad STK9 Autum 6 Pot estmato Covers (most of the followg materal from chapter 7: Secto 7.: pages 3-3 Secto 7..: pages 3-33 Secto 7..: pages 35-3 Secto 7..3: pages 34-35 Secto 7.3.: pages 33-33 Secto

More information

ENGI 4421 Propagation of Error Page 8-01

ENGI 4421 Propagation of Error Page 8-01 ENGI 441 Propagato of Error Page 8-01 Propagato of Error [Navd Chapter 3; ot Devore] Ay realstc measuremet procedure cotas error. Ay calculatos based o that measuremet wll therefore also cota a error.

More information

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic Covergece of the Desrozers scheme ad ts relato to the lag ovato dagostc chard Méard Evromet Caada, Ar Qualty esearch Dvso World Weather Ope Scece Coferece Motreal, August 9, 04 o t t O x x x y x y Oservato

More information

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s). CHAPTER STATISTICS Pots to Remember :. Facts or fgures, collected wth a defte pupose, are called Data.. Statstcs s the area of study dealg wth the collecto, presetato, aalyss ad terpretato of data.. The

More information

Wendy Korn, Moon Chang (IBM) ACM SIGARCH Computer Architecture News Vol. 35, No. 1, March 2007

Wendy Korn, Moon Chang (IBM) ACM SIGARCH Computer Architecture News Vol. 35, No. 1, March 2007 CPU 2006 Sestvty to Memory Page Szes Wedy Kor, Moo Chag (IBM) ACM SIGARCH Computer Archtecture News Vol. 35, No. 1, March 2007 Memory usage 1. Mmum ad Maxmum memory used 2. Sestvty to page szes 3. 4K,

More information

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall

More information

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos

More information

Lecture 2: The Simple Regression Model

Lecture 2: The Simple Regression Model Lectre Notes o Advaced coometrcs Lectre : The Smple Regresso Model Takash Yamao Fall Semester 5 I ths lectre we revew the smple bvarate lear regresso model. We focs o statstcal assmptos to obta based estmators.

More information

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line HUR Techcal Report 000--9 verso.05 / Frak Borg (borgbros@ett.f) A Study of the Reproducblty of Measuremets wth HUR Leg Eteso/Curl Research Le A mportat property of measuremets s that the results should

More information

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i. CS 94- Desty Matrces, vo Neuma Etropy 3/7/07 Sprg 007 Lecture 3 I ths lecture, we wll dscuss the bascs of quatum formato theory I partcular, we wll dscuss mxed quatum states, desty matrces, vo Neuma etropy

More information

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic. c Pogsa Porchawseskul, Faculty of Ecoomcs, Chulalogkor Uversty olato of costat varace of s but they are stll depedet. C,, he error term s sad to be heteroscedastc. c Pogsa Porchawseskul, Faculty of Ecoomcs,

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA THE ROYAL STATISTICAL SOCIETY 3 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER I STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad

More information