Sketching Sampled Data Streams
|
|
- Erin Green
- 6 years ago
- Views:
Transcription
1 Sketchng Sampled Data Streams Florn Rusu and Aln Dobra CISE Department Unversty of Florda March 31, 2009
2 Motvaton & Goal Motvaton Multcore processors How to use all the processng power? Parallel algorthms Sde tasks (analytcal, exploratory) Goal Analyze data at wre speed sngle pass, small memory Skew of a relaton read from the dsk Correlaton between flows passng through a hgh-speed router 2
3 Class of Queres Aggregates over Jons Equ-jon J between relatons F and G wth the jon constrant F.a = G.a Queres specfed by a jon and an aggregate: COUNT, SUM sze of jon (dot product), self-jon sze (second frequency moment) Example Stream F: a , frequency vector f: Stream G: a , frequency vector g: f g COUNT(F a G) = fg T = f g = [ ] 3 0 =
4 Man Idea: Basc Sketchng Technque [AGMS99] Summarze frequency table by projectng t onto a random vector sketch Use sketches to recover query result Random vectors: ξ = [ ξ 1...ξ n ] random vector of ±1 values, called ξ famly Sketches: Sketch of F, X F = fξ T Sketch of G, X G = gξ T X = X F X G estmates COUNT(F a G) snce E[X] = E[fξ T ξ g T ] = fe[ξ T ξ ]g T = fig T = fg T f E[ξ T ξ ] = I. dstnct elements of ξ must be par-wse ndependent, ξ 2 = 1, E[ξ ξ ] = 0 4
5 Basc Sketchng Technque Example: ξ = [ ξ 1 ξ 2 ξ 3 ] = [ ] Error of estmate X due to ts varance X F = fξ T = [ ] 1 +1 = 4 1 X G = gξ T = [ ] 1 +1 = 5 1 X = X F X G = ( 4)( 5) = Var[X] = f 2 j I g 2 j + ( ) 2 f g 2 f 2 g 2 Var(X) 2ff T gg T = 2 SJ(F) SJ(G) ξ s a famly of 4-wse ndependent random varables 5
6 Sketch Mantenance over Streams Choose seed s that generates ξ Stream F: a , frequency vector f: f X F = fξ T = f ξ = (ξ + + ξ ) = ξ t.a = ξ 1 + ξ 1 + ξ 2 + ξ 3 + ξ 1 + ξ 3 t F = h(s,1) + h(s,1) + h(s,2) + h(s,3) + h(s,1) + h(s,3) Stream G: a , frequency vector g: X G = gξ T = g ξ = (ξ + + ξ ) ξ t.a = ξ 3 + ξ 1 + ξ 3 + ξ 1 + ξ 1 = t G = h(s,3) + h(s,1) + h(s,3) + h(s,1) + h(s,1) g Counters X F and X G need only log space n the sze of the stream 6
7 !!,-,- () () % %$ Sketch Error Reducton Estmaton of COUNT(F a G) from sngle sketches of F and G s too nosy Soluton: Average 8Var(X) ε 2 E 2 [X] ndependent copes of X to reduce error to ε Compute medan of 2log1/δ such averages to ncrease confdence to 1 δ Stream F Sketches of F Seeds Stream G ξ ξ Sketches of G Independent copes of X Average Medan COUNT(F G)(1 ± ε) wth prob (1 δ) Memory requred ndependent of the sze of the stream 7
8 Speed-Up Methods Hashng Fast-AGMS sketches are faster and have better accuracy Pseudo-random number generatng schemes EH3 s as good as any 4-wse scheme + faster and denser 8
9 Fast-AGMS Sketches [CG05] Randomzaton Vector h of 2-unversal hash functons, h : I B Vector ξ of 4-wse ndependent ±1 random varables, ξ : I { 1,+1} Update Tme no. of rows m x x h (k) = x h (k) + w ξ (k) h ξ (k,w) 9
10 Sze of Jon Estmator Fast-AGMS Sketches [CG05] E [Z] = f ḡ = F G, Z ( f ḡ ± ε f 2 ḡ 2 ) wth probablty at least 1 δ f 2 = f f = f 2, ḡ 2 = ḡ ḡ = g 2 Sketch sze: B = n = O( 1 ε 2 ) and m = O(log 1 δ ) x(f) X j = x j (F) x j (G) h ξ x(g) Sum Z Y j = n =1 X j Medan 10
11 Update Tme Setup: sketch sze (row=1), Xeon 2.8 GHz, 512 KB cache, 4 GB man memory Tme / sketch update (ns) F-AGMS FC CM e+06 1e+07 1e+08 Bucket sze (log scale) 100 ns / sketch 10 sketches 1 µs 1 mllon ntegers / second 4 MB / second Desred rate 100 MB / second = 25X 11
12 Samplng Stream F: a , frequency vector f: Sample F : a 1 3 1, sampled frequency vector f : f f Stream G: a , frequency vector g: g Sample G : a 3 1, sampled frequency vector g : X = C f g T = C [2 0 1 ] 1 0 = C g
13 Samplng Sample at the tuple level Analyze n the frequency doman Random frequency vector (moment generatng functon) Bernoull Bnomal WR Multnomal WOR Multvarate hypergeometrc Var[X] = C 2 X = C f g E [X] = C E [ f ] E [ g ] E [ ( f f j] [ E g g ] j j I E [ f ] [ ] ) 2 E g 13
14 Sketches over Sampled Streams Stream F: a , frequency vector f: Sample F : a 1 3 1, sampled frequency vector f : f f Stream G: a , frequency vector g: g Sample G : a 3 1, sampled frequency vector g : ξ = [ ] [ ] ξ 1 ξ 2 ξ 3 = X F = f ξ T = [ ] 1 +1 = 3 1 X = C X FX G = C ( 3)( 2) = C g X G = g ξ T = [ ] 1 +1 =
15 Sketchng a Sample Buld the sketch over the non-materalzed sample [ [ Var[X] = C 2 E X = C f E [X] = C [ 2 E ] [ ] f 2 E g 2 j j I ] [ f 2 E ξ g jξ j j I E [ f ] [ ] E g g ( ] E [ f f [ j] E g g ] j j I E [ f ] [ ] ) 2 E g 15
16 Averagng Multple Sketches Sketches share the same sample correlaton Var [ 1 n n k=1 Var X k ] = C n ( [ 1 n E n k=1 X k ] = 1 n [ Var[Xk ] + (n 1) Cov k l [X k,x l ] ] E [ ( f f j] [ E g g ] j j I [ ] [ ] f 2 E g 2 j j I + E [ f j I E [ f ] [ ] ) 2 E g f j] [ [ E g g j] 2 E ] [ f 2 E g 2 ] )] Var sketch over samples = Var sketch + Var samplng + Var nteracton 16
17 p, q are samplng probabltes n F, G Bernoull Samplng Var [ 1 n n k=1 X k ] = 1 n f p p + 1 n j I g 2 j + [ 1 p p ( ) 2 f g 2 f 2 g 2 f g q q f g 2 j + 1 q j I, j q f 2 (1 p)(1 q) g + pq f g f 2 (1 p)(1 q) g j + j I, j pq ] f g j j I, j 17
18 Varance for Bernoull Samplng Term sgnfcance as a functon of the frequency dstrbuton 1 Interacton Samplng Sketch 1 Interacton Samplng Sketch Varance terms dstrbuton Varance terms dstrbuton Zpf coeffcent / p Zpf coeffcent / p Sze of Jon Self-Jon Sze 18
19 Error for Bernoull Samplng Settngs 100 mllon tuples F-AGMS sketches wth 5, 000 buckets 10 1 p= p= p= p=1.0 Relatve error (log scale) 1e-04 1e-05 1e ZIPF coeffcent 19
20 Error for WOR Samplng Settngs TPC-H scale 1, lnetem l orderkey = o orderkey orders 10 Relatve error (log scale) 1 1 Samplng rate (log scale) 20
21 Conclusons Sketches over sampled data Generc moment analyss Samplng n frequency doman Combned estmator Three types of samplng Bernoull Wth replacement Wthout replacement Expermental evaluaton 2 orders of magntude speed-up wthout sgnfcant error degradaton = Fast-AGMS sketches wth EH3 random varables over a sample 21
22 Questons 22
Mining Data Streams-Estimating Frequency Moment
Mnng Data Streams-Estmatng Frequency Moment Barna Saha October 26, 2017 Frequency Moment Computng moments nvolves dstrbuton of frequences of dfferent elements n the stream. Frequency Moment Computng moments
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationProcessing Aggregate Queries over Continuous Data Streams
Processing Aggregate Queries over Continuous Data Streams Alin Dobra Computer Science Department Cornell University April 15, 2003 Relational Database Systems did dname 15 Legal 17 Marketing 3 Development
More informationFirst Year Examination Department of Statistics, University of Florida
Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationStatistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models
Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 14 Multple Regresson Models 1999 Prentce-Hall, Inc. Chap. 14-1 Chapter Topcs The Multple Regresson Model Contrbuton of Indvdual Independent Varables
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationStat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j
Stat 642, Lecture notes for 01/27/05 18 Rate Standardzaton Contnued: Note that f T n t where T s the cumulatve follow-up tme and n s the number of subjects at rsk at the mdpont or nterval, and d s the
More informationStanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7
Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationRegression Analysis. Regression Analysis
Regresson Analyss Smple Regresson Multvarate Regresson Stepwse Regresson Replcaton and Predcton Error 1 Regresson Analyss In general, we "ft" a model by mnmzng a metrc that represents the error. n mn (y
More informationLecture 6 More on Complete Randomized Block Design (RBD)
Lecture 6 More on Complete Randomzed Block Desgn (RBD) Multple test Multple test The multple comparsons or multple testng problem occurs when one consders a set of statstcal nferences smultaneously. For
More informationChapter 15 Student Lecture Notes 15-1
Chapter 15 Student Lecture Notes 15-1 Basc Busness Statstcs (9 th Edton) Chapter 15 Multple Regresson Model Buldng 004 Prentce-Hall, Inc. Chap 15-1 Chapter Topcs The Quadratc Regresson Model Usng Transformatons
More informationLecture 5 September 17, 2015
CS 229r: Algorthms for Bg Data Fall 205 Prof. Jelan Nelson Lecture 5 September 7, 205 Scrbe: Yakr Reshef Recap and overvew Last tme we dscussed the problem of norm estmaton for p-norms wth p > 2. We had
More informationPhysicsAndMathsTutor.com
PhscsAndMathsTutor.com phscsandmathstutor.com June 005 5. The random varable X has probablt functon k, = 1,, 3, P( X = ) = k ( + 1), = 4, 5, where k s a constant. (a) Fnd the value of k. (b) Fnd the eact
More informationResource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis
Resource Allocaton and Decson Analss (ECON 800) Sprng 04 Foundatons of Regresson Analss Readng: Regresson Analss (ECON 800 Coursepak, Page 3) Defntons and Concepts: Regresson Analss statstcal technques
More information7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA
Sngle classfcaton analyss of varance (ANOVA) When to use ANOVA ANOVA models and parttonng sums of squares ANOVA: hypothess testng ANOVA: assumptons A non-parametrc alternatve: Kruskal-Walls ANOVA Power
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationCIS 700: algorithms for Big Data
CIS 700: algorthms for Bg Data Lecture 5: Dmenson Reducton Sldes at htt://grgory.us/bg-data-class.html Grgory Yaroslavtsev htt://grgory.us Today Dmensonalty reducton AMS as dmensonalty reducton Johnson-Lndenstrauss
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours
UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationexperimenteel en correlationeel onderzoek
expermenteel en correlatoneel onderzoek lecture 6: one-way analyss of varance Leary. Introducton to Behavoral Research Methods. pages 246 271 (chapters 10 and 11): conceptual statstcs Moore, McCabe, and
More information[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact
Multcollnearty multcollnearty Ragnar Frsch (934 perfect exact collnearty multcollnearty K exact λ λ λ K K x+ x+ + x 0 0.. λ, λ, λk 0 0.. x perfect ntercorrelated λ λ λ x+ x+ + KxK + v 0 0.. v 3 y β + β
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationStatistics for Business and Economics
Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear
More informationTHE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationLinear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the
Chapter 11 Student Lecture Notes 11-1 Lnear regresson Wenl lu Dept. Health statstcs School of publc health Tanjn medcal unversty 1 Regresson Models 1. Answer What Is the Relatonshp Between the Varables?.
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationPropagation of error for multivariable function
Propagaton o error or multvarable uncton ow consder a multvarable uncton (u, v, w, ). I measurements o u, v, w,. All have uncertant u, v, w,., how wll ths aect the uncertant o the uncton? L tet) o (Equaton
More informationAn R implementation of bootstrap procedures for mixed models
The R User Conference 2009 July 8-10, Agrocampus-Ouest, Rennes, France An R mplementaton of bootstrap procedures for mxed models José A. Sánchez-Espgares Unverstat Poltècnca de Catalunya Jord Ocaña Unverstat
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationSimulation and Probability Distribution
CHAPTER Probablty, Statstcs, and Relablty for Engneers and Scentsts Second Edton PROBABILIT DISTRIBUTION FOR CONTINUOUS RANDOM VARIABLES A. J. Clark School of Engneerng Department of Cvl and Envronmental
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationStatistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation
Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear
More informationCIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example.
CIS587 - Artfcal Intellgence Bayesan Networks KB for medcal dagnoss. Example. We want to buld a KB system for the dagnoss of pneumona. Problem descrpton: Dsease: pneumona Patent symptoms (fndngs, lab tests):
More informationStatistics for Economics & Business
Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable
More informationDurban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications
Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department
More informationDetecting Attribute Dependencies from Query Feedback
Detectng Attrbute Dependences from Query Feedback Peter J. Haas 1, Faban Hueske 2, Volker Markl 1 1 IBM Almaden Research Center 2 Unverstät Ulm VLDB 2007 Peter J. Haas The Problem: Detectng (Parwse) Dependent
More informationIntroduction to Algorithms
Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of
More informationProvable Security Signatures
Provable Securty Sgnatures UCL - Louvan-la-Neuve Wednesday, July 10th, 2002 LIENS-CNRS Ecole normale supéreure Summary Introducton Sgnature FD PSS Forkng Lemma Generc Model Concluson Provable Securty -
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More information3D Estimates of Analysis and Short-Range Forecast Error Variances
3D Estmates of Analyss and Short-Range Forecast Error Varances Je Feng, Zoltan Toth Global Systems Dvson, ESRL/OAR/NOAA, Boulder, CO, USA Malaquas Peña Envronmental Modelng Center, NCEP/NWS/NOAA, College
More informationProbability, Statistics, and Reliability for Engineers and Scientists SIMULATION
CHATER robablty, Statstcs, and Relablty or Engneers and Scentsts Second Edton SIULATIO A. J. Clark School o Engneerng Department o Cvl and Envronmental Engneerng 7b robablty and Statstcs or Cvl Engneers
More informationIRO0140 Advanced space time-frequency signal processing
IRO4 Advanced space tme-frequency sgnal processng Lecture Toomas Ruuben Takng nto account propertes of the sgnals, we can group these as followng: Regular and random sgnals (are all sgnal parameters determned
More informationBasically, if you have a dummy dependent variable you will be estimating a probability.
ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy
More informationSTAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression
STAT 45 BIOSTATISTICS (Fall 26) Handout 5 Introducton to Logstc Regresson Ths handout covers materal found n Secton 3.7 of your text. You may also want to revew regresson technques n Chapter. In ths handout,
More informationDepartment of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6
Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationContinuous vs. Discrete Goods
CE 651 Transportaton Economcs Charsma Choudhury Lecture 3-4 Analyss of Demand Contnuous vs. Dscrete Goods Contnuous Goods Dscrete Goods x auto 1 Indfference u curves 3 u u 1 x 1 0 1 bus Outlne Data Modelng
More informationMultivariate Ratio Estimator of the Population Total under Stratified Random Sampling
Open Journal of Statstcs, 0,, 300-304 ttp://dx.do.org/0.436/ojs.0.3036 Publsed Onlne July 0 (ttp://www.scrp.org/journal/ojs) Multvarate Rato Estmator of te Populaton Total under Stratfed Random Samplng
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationEconomics 130. Lecture 4 Simple Linear Regression Continued
Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do
More informationLecture 3 Stat102, Spring 2007
Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationSee Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)
Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More informationCopyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor
Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data
More informationwhere I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).
11.4.1 Estmaton of Multple Regresson Coeffcents In multple lnear regresson, we essentally solve n equatons for the p unnown parameters. hus n must e equal to or greater than p and n practce n should e
More informationDepartment of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution
Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable
More informationAlgebraic properties of polynomial iterates
Algebrac propertes of polynomal terates Alna Ostafe Department of Computng Macquare Unversty 1 Motvaton 1. Better and cryptographcally stronger pseudorandom number generators (PRNG) as lnear constructons
More informationBasic Business Statistics, 10/e
Chapter 13 13-1 Basc Busness Statstcs 11 th Edton Chapter 13 Smple Lnear Regresson Basc Busness Statstcs, 11e 009 Prentce-Hall, Inc. Chap 13-1 Learnng Objectves In ths chapter, you learn: How to use regresson
More informationPROBABILITY PRIMER. Exercise Solutions
PROBABILITY PRIMER Exercse Solutons 1 Probablty Prmer, Exercse Solutons, Prncples of Econometrcs, e EXERCISE P.1 (b) X s a random varable because attendance s not known pror to the outdoor concert. Before
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationStatistics II Final Exam 26/6/18
Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models
More informationComparison of Regression Lines
STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationHydrological statistics. Hydrological statistics and extremes
5--0 Stochastc Hydrology Hydrologcal statstcs and extremes Marc F.P. Berkens Professor of Hydrology Faculty of Geoscences Hydrologcal statstcs Mostly concernes wth the statstcal analyss of hydrologcal
More informationTHE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens
THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationMethods in Epidemiology. Medical statistics 02/11/2014. Estimation How large is the effect? At the end of the lecture students should be able
Methods n Epdemology Estmaton How large s the effect? Medcal statstcs At the end of the lecture students should be able to llustrate the prncples of statstcal nference to nterpret confdence ntervals Methods
More informationChapter 11: I = 2 samples independent samples paired samples Chapter 12: I 3 samples of equal size J one-way layout two-way layout
Serk Sagtov, Chalmers and GU, February 0, 018 Chapter 1. Analyss of varance Chapter 11: I = samples ndependent samples pared samples Chapter 1: I 3 samples of equal sze one-way layout two-way layout 1
More informationWhy Monte Carlo Integration? Introduction to Monte Carlo Method. Continuous Probability. Continuous Probability
Introducton to Monte Carlo Method Kad Bouatouch IRISA Emal: kad@rsa.fr Wh Monte Carlo Integraton? To generate realstc lookng mages, we need to solve ntegrals of or hgher dmenson Pel flterng and lens smulaton
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationChapter 3. Two-Variable Regression Model: The Problem of Estimation
Chapter 3. Two-Varable Regresson Model: The Problem of Estmaton Ordnary Least Squares Method (OLS) Recall that, PRF: Y = β 1 + β X + u Thus, snce PRF s not drectly observable, t s estmated by SRF; that
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationA New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics
A New Method for Estmatng Overdsperson Davd Fletcher and Peter Green Department of Mathematcs and Statstcs Byron Morgan Insttute of Mathematcs, Statstcs and Actuaral Scence Unversty of Kent, England Overvew
More informationIntroduction to Algorithms
Introducton to Algorthms 6.046J/18.401J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) What data structures
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More information4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA
4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected
More information+, where 0 x N - n. k k
CO 745, Mdterm Len Cabrera. A multle choce eam has questons, each of whch has ossble answers. A student nows the correct answer to n of these questons. For the remanng - n questons, he checs the answers
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationBiostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li
Bostatstcs Chapter 11 Smple Lnear Correlaton and Regresson Jng L jng.l@sjtu.edu.cn http://cbb.sjtu.edu.cn/~jngl/courses/2018fall/b372/ Dept of Bonformatcs & Bostatstcs, SJTU Recall eat chocolate Cell 175,
More informationRELIABILITY ASSESSMENT
CHAPTER Rsk Analyss n Engneerng and Economcs RELIABILITY ASSESSMENT A. J. Clark School of Engneerng Department of Cvl and Envronmental Engneerng 4a CHAPMAN HALL/CRC Rsk Analyss for Engneerng Department
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationAverage Decision Threshold of CA CFAR and excision CFAR Detectors in the Presence of Strong Pulse Jamming 1
Average Decson hreshold of CA CFAR and excson CFAR Detectors n the Presence of Strong Pulse Jammng Ivan G. Garvanov and Chrsto A. Kabachev Insttute of Informaton echnologes Bulgaran Academy of Scences
More informationAnswers Problem Set 2 Chem 314A Williamsen Spring 2000
Answers Problem Set Chem 314A Wllamsen Sprng 000 1) Gve me the followng crtcal values from the statstcal tables. a) z-statstc,-sded test, 99.7% confdence lmt ±3 b) t-statstc (Case I), 1-sded test, 95%
More informationTAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES
TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent
More informationCopyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor
Taylor Enterprses, Inc. Adjusted Control Lmts for U Charts Copyrght 207 by Taylor Enterprses, Inc., All Rghts Reserved. Adjusted Control Lmts for U Charts Dr. Wayne A. Taylor Abstract: U charts are used
More informationAs is less than , there is insufficient evidence to reject H 0 at the 5% level. The data may be modelled by Po(2).
Ch-squared tests 6D 1 a H 0 : The data can be modelled by a Po() dstrbuton. H 1 : The data cannot be modelled by Po() dstrbuton. The observed and expected results are shown n the table. The last two columns
More information