Estimating Mutual Information on Data Streams
|
|
- Florence Cooper
- 6 years ago
- Views:
Transcription
1 Estimating Mutual Information on Data Streams Pavel Efros Fabian Keller, Emmanuel Müller, Klemens Böhm KARLSRUHE INSTITUTE OF TECHNOLOGY (KIT) KIT University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
2 Mutual Information... an information-theoretic correlation measure. Intuitively: The reduction of uncertainty on one random variable X given knowledge of another variable Y. Advantage over traditional correlation measures: non-linear, sensitive to any kind of dependence. Application domains: Many (our example: stock data) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 2 / 25
3 Mutual Information... an information-theoretic correlation measure. Intuitively: The reduction of uncertainty on one random variable X given knowledge of another variable Y. Advantage over traditional correlation measures: non-linear, sensitive to any kind of dependence. Application domains: Many (our example: stock data) Unsolved problem on data streams! Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 2 / 25
4 Mutual Information On Data Streams Data streams Dependencies change over time Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 3 / 25
5 Mutual Information On Data Streams Data streams Dependencies change over time Analysts have many questions: What is the mutual information in 2000? Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 3 / 25
6 Mutual Information On Data Streams Data streams Dependencies change over time Analysts have many questions: How does the MI of 2000 compare to the overall MI? Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 3 / 25
7 Mutual Information On Data Streams Data streams Dependencies change over time Analysts have many questions:... and to the MI in the 90 s? Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 3 / 25
8 Mutual Information On Data Streams Data streams Dependencies change over time Analysts have many questions:... and in 2000 if go from February to February instead? Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 3 / 25
9 Mutual Information On Data Streams Data streams Dependencies change over time Not only retrospective Even more examples for online scenario (continuous / rolling queries) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 3 / 25
10 Contributions We provide a DSMS which can answer such ad-hoc queries efficiently. Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 4 / 25
11 Contributions We provide a DSMS which can answer such ad-hoc queries efficiently. But: Approximation is required to avoid storing the entire data stream. Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 4 / 25
12 Queries on Multiple Time Scales Analysts may compare: Current minute vs previous minute? Today vs yesterday? Current year vs previous year? Raises the question: How to achieve same quality over all time scales? Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 5 / 25
13 Contributions We provide a DSMS which can answer such ad-hoc queries efficiently. Multiscale Sampling Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 6 / 25
14 Agenda Introduction Challenges Contributions MISE (Mutual Information Stream Estimator) Query Anchor Framework Multiscale Sampling Experiments Conclusion Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 7 / 25
15 Mutual Information Estimation Basics On static data: Kraskov principle has emerged as the leading estimator (good bias/variance properties) Our goal: Make use of this established estimation principle Requires: Handling the dynamics of nearest neighbor relationships Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 8 / 25
16 Dynamics of Nearest Neighbors Dynamics of: k nearest neighbor distance (let k = 1) Marginal counts number of points in marginal slice Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 9 / 25
17 Dynamics of Nearest Neighbors Dynamics of: k nearest neighbor distance (let k = 1) Marginal counts number of points in marginal slice Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 9 / 25
18 Dynamics of Nearest Neighbors Dynamics of: k nearest neighbor distance (let k = 1) Marginal counts number of points in marginal slice Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 9 / 25
19 Dynamics of Nearest Neighbors Dynamics of: k nearest neighbor distance (let k = 1) Marginal counts number of points in marginal slice Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 9 / 25
20 Dynamics of Nearest Neighbors Dynamics of: k nearest neighbor distance (let k = 1) Marginal counts number of points in marginal slice Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 9 / 25
21 Dynamics of Nearest Neighbors Dynamics of: k nearest neighbor distance (let k = 1) Marginal counts number of points in marginal slice Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 9 / 25
22 Dynamics of Nearest Neighbors Dynamics of: k nearest neighbor distance (let k = 1) Marginal counts number of points in marginal slice Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 9 / 25
23 Dynamics of Nearest Neighbors Dynamics of: k nearest neighbor distance (let k = 1) Marginal counts number of points in marginal slice Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 9 / 25
24 Dynamics of Nearest Neighbors Dynamics of: k nearest neighbor distance (let k = 1) Marginal counts number of points in marginal slice MC x Time Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 9 / 25
25 Mutual Information Estimation Why is this data important? Adapted Kraskov principle: Digamma function MI est = ψ(k) + ψ(n) ψ(mc x + 1) ψ(mc y + 1) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 10 / 25
26 Mutual Information Estimation Why is this data important? Adapted Kraskov principle: Digamma function MI est = ψ(k) + ψ(n) ψ(mc x + 1) ψ(mc y + 1) For subsequence Q 4 0 : MI est = ψ(1) + ψ(5) ψ(2) ψ(3) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 10 / 25
27 Mutual Information Estimation Why is this data important? Adapted Kraskov principle: Digamma function MI est = ψ(k) + ψ(n) ψ(mc x + 1) ψ(mc y + 1) For subsequence Q 7 0 : MI est = ψ(1) + ψ(8) ψ(3) ψ(1) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 10 / 25
28 Query Anchor... a data structure which keeps track of marginal counts. lives at a certain time point t stores the X and Y values of the corresponding data point. a method INSERTRIGHT(Q t ) which adds a data point Q t in forward time direction, i.e., t > t, a method INSERTLEFT(Q t ) which adds a data point Q t in backward time direction, i.e., t < t, a method QUERY(t 1, t 2 ) which returns the marginal counts MC x (Q t, Q t 1 t2 ) and MC y (Q t, Q t 1 t2 ), Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 11 / 25
29 Query Anchor... a data structure which keeps track of marginal counts. lives at a certain time point t stores the X and Y values of the corresponding data point. a method INSERTRIGHT(Q t ) which adds a data point Q t in forward time direction, i.e., t > t, a method INSERTLEFT(Q t ) which adds a data point Q t in backward time direction, i.e., t < t, a method QUERY(t 1, t 2 ) which returns the marginal counts MC x (Q t, Q t 1 t2 ) and MC y (Q t, Q t 1 t2 ), Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 11 / 25
30 Query Anchor... a data structure which keeps track of marginal counts. lives at a certain time point t stores the X and Y values of the corresponding data point. a method INSERTRIGHT(Q t ) which adds a data point Q t in forward time direction, i.e., t > t, a method INSERTLEFT(Q t ) which adds a data point Q t in backward time direction, i.e., t < t, a method QUERY(t 1, t 2 ) which returns the marginal counts MC x (Q t, Q t 1 t2 ) and MC y (Q t, Q t 1 t2 ), Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 11 / 25
31 Query Anchor... a data structure which keeps track of marginal counts. lives at a certain time point t stores the X and Y values of the corresponding data point. a method INSERTRIGHT(Q t ) which adds a data point Q t in forward time direction, i.e., t > t, a method INSERTLEFT(Q t ) which adds a data point Q t in backward time direction, i.e., t < t, a method QUERY(t 1, t 2 ) which returns the marginal counts MC x (Q t, Q t 1 t2 ) and MC y (Q t, Q t 1 t2 ), Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 11 / 25
32 Query Anchor... a data structure which keeps track of marginal counts. lives at a certain time point t stores the X and Y values of the corresponding data point. a method INSERTRIGHT(Q t ) which adds a data point Q t in forward time direction, i.e., t > t, a method INSERTLEFT(Q t ) which adds a data point Q t in backward time direction, i.e., t < t, a method QUERY(t 1, t 2 ) which returns the marginal counts MC x (Q t, Q t 1 t2 ) and MC y (Q t, Q t 1 t2 ), Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 11 / 25
33 Query Anchor... a data structure which keeps track of marginal counts. lives at a certain time point t stores the X and Y values of the corresponding data point. a method INSERTRIGHT(Q t ) which adds a data point Q t in forward time direction, i.e., t > t, a method INSERTLEFT(Q t ) which adds a data point Q t in backward time direction, i.e., t < t, a method QUERY(t 1, t 2 ) which returns the marginal counts MC x (Q t, Q t 1 t2 ) and MC y (Q t, Q t 1 t2 ), Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 11 / 25
34 Query Anchor... a data structure which keeps track of marginal counts. lives at a certain time point t stores the X and Y values of the corresponding data point. a method INSERTRIGHT(Q t ) which adds a data point Q t in forward time direction, i.e., t > t, a method INSERTLEFT(Q t ) which adds a data point Q t in backward time direction, i.e., t < t, a method QUERY(t 1, t 2 ) which returns the marginal counts MC x (Q t, Q t 1 t2 ) and MC y (Q t, Q t 1 t2 ), Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 11 / 25
35 Query Anchor... a data structure which keeps track of marginal counts. lives at a certain time point t stores the X and Y values of the corresponding data point. a method INSERTRIGHT(Q t ) which adds a data point Q t in forward time direction, i.e., t > t, a method INSERTLEFT(Q t ) which adds a data point Q t in backward time direction, i.e., t < t, a method QUERY(t 1, t 2 ) which returns the marginal counts MC x (Q t, Q t 1 t2 ) and MC y (Q t, Q t 1 t2 ), Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 11 / 25
36 Information Stored in a Query Anchor Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 12 / 25
37 MISE Framework Idea Ensemble of Query Anchors Insert: Add new anchor Forward initialization Reverse initialization Sampling (decides which anchors to delete) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 13 / 25
38 MISE Framework Idea Ensemble of Query Anchors Insert: Add new anchor Forward initialization Reverse initialization Sampling (decides which anchors to delete) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 13 / 25
39 MISE Framework Idea Ensemble of Query Anchors Insert: Add new anchor Forward initialization Reverse initialization Sampling (decides which anchors to delete) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 13 / 25
40 MISE Framework Idea Ensemble of Query Anchors Insert: Add new anchor Forward initialization Reverse initialization Sampling (decides which anchors to delete) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 13 / 25
41 MISE Framework Idea Ensemble of Query Anchors Insert: Add new anchor Forward initialization Reverse initialization Sampling (decides which anchors to delete) Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 13 / 25
42 MISE Framework Idea Ensemble of Query Anchors Query: Determine relevant query anchors Delegate query to anchors Aggregate estimation results Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 14 / 25
43 MISE Framework Idea Ensemble of Query Anchors Query: Determine relevant query anchors Delegate query to anchors Aggregate estimation results Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 14 / 25
44 Multiscale Sampling Define unit-free quantity of window size vs offset: o w Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 15 / 25
45 Multiscale Sampling Multiscale Query Equivalence: Define equivalence relation between queries A and B by: A B A = B Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 15 / 25
46 Multiscale Sampling Multiscale Sampling:... is a sampling scheme which provides an equal expected number of sampling elements for all queries which belong to the same equivalence class. Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 15 / 25
47 Required Sampling Distribution Multiscale sampling property fulfilled for (derivation in paper): P n = { 1 if n α α n otherwise (1) 1 0 Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 16 / 25
48 Required Sampling Distribution Multiscale sampling property fulfilled for (derivation in paper): P n = { 1 if n α α n otherwise (1) 1 0 Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 16 / 25
49 Required Sampling Distribution Multiscale sampling property fulfilled for (derivation in paper): P n = { 1 if n α α n otherwise (1) 1 0 Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 16 / 25
50 Required Sampling Distribution Multiscale sampling property fulfilled for (derivation in paper): P n = { 1 if n α α n otherwise (1) 1 0 Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 16 / 25
51 Sample Size Relationship between α and sample size S over time T : E[S] = α + O(logT ) Two MISE versions: fixed α, log-growing sample size S fixed sample size S, logarithmic adjustments to α Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 17 / 25
52 EXPERIMENTS Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 18 / 25
53 Simple Example Close Quote MI (6 months) 2 0 IBM GE Reference MISE MI (5 years) 2 Reference MISE Runtime Reference: min Runtime MISE: 3.8 min Computed with only 1 query for every 10 inserts. Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 19 / 25
54 Experiment Question 1... since we precompute queries Question 1: How many queries are required to make a stream approach worthwhile? Trivially: #queries = 0: #queries #inserts: Nothing to speed-up Arbitrarily high speed-up Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 20 / 25
55 Experiment Question 1 Speed-up with ratio #queries /#inserts: Reservoir Size S Window Size Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 21 / 25
56 Experiment Question 1 Speed-up with ratio #queries /#inserts: Reservoir Size S Window Size Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 21 / 25
57 Experiment Question 1 Speed-up with ratio #queries /#inserts: Reservoir Size S Window Size Stream processing worthwhile even when queries are rare. Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 21 / 25
58 Experiment Question 2... since part of the query computation occurs on insert Question 2: What is the raw processing speed (insert speed) of the stream? Can we process high-frequent streams? Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 22 / 25
59 Experiment Question 2 Time / Insert [ms] dynamic α =100 dynamic α =1000 fixed S =100 fixed S =1000 fixed S = Stream Length Despite precomputation: Stream frequencies possible up to 100Hz (for high estimation quality) or 20kHz (for low estimation quality). Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 23 / 25
60 Experiment Question 3 Question 3: How does sampling influence estimation quality? What are the benefits of multiscale sampling? huge experiment in the paper complex results Data: 26 real world data streams Ground truth: unsampled estimator Quality measures: answerability, bias, variance Competitors: multiscale vs reservoir vs sliding window sampling Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 24 / 25
61 Experiment Question 3 Question 3: How does sampling influence estimation quality? What are the benefits of multiscale sampling? huge experiment in the paper complex results Traditional sampling does not work with queries comprising multiple time scales Very low estimation error Example: α = 1000, = 1.0 MI value precise up to ±0.01 Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 24 / 25
62 Conclusions Summary First approach for estimating mutual information on data streams Proposal of multiscale sampling Experiments clearly show the benefit of a stream-specific solution Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 25 / 25
63 Conclusions Summary First approach for estimating mutual information on data streams Proposal of multiscale sampling Experiments clearly show the benefit of a stream-specific solution Thank you for your attention! Questions? Efros Keller, Müller, Böhm Estimating Mutual Information on Data Streams 25 / 25
On Complexity and Efficiency of Mutual Information Estimation on Static and Dynamic Data
On Complexity and Efficiency of Mutual Information Estimation on Static and Dynamic Data Michael Vollmer Karlsruhe Institute of Technology Karlsruhe, Germany michael.vollmer@kit.edu ABSTRACT Mutual Information
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationMining Data Streams. The Stream Model Sliding Windows Counting 1 s
Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make
More informationMining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s
Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make
More informationIntroduction to Randomized Algorithms III
Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability
More informationAN INTERNATIONAL SOLAR IRRADIANCE DATA INGEST SYSTEM FOR FORECASTING SOLAR POWER AND AGRICULTURAL CROP YIELDS
AN INTERNATIONAL SOLAR IRRADIANCE DATA INGEST SYSTEM FOR FORECASTING SOLAR POWER AND AGRICULTURAL CROP YIELDS James Hall JHTech PO Box 877 Divide, CO 80814 Email: jameshall@jhtech.com Jeffrey Hall JHTech
More informationAdvanced Data Structures
Simon Gog gog@kit.edu - Simon Gog: KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Predecessor data structures We want to support
More informationSearch and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012
Search and Lookahead Bernhard Nebel, Julien Hué, and Stefan Wölfl Albert-Ludwigs-Universität Freiburg June 4/6, 2012 Search and Lookahead Enforcing consistency is one way of solving constraint networks:
More informationAdvanced Data Structures
Simon Gog gog@kit.edu - 0 Simon Gog: KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Dynamic Perfect Hashing What we want: O(1) lookup
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationECEN 689 Special Topics in Data Science for Communications Networks
ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 5 Optimizing Fixed Size Samples Sampling as
More informationA Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window
A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia
More informationUnderlying Event Measurements at the LHC
14th EDS Blois Workshop Underlying Event Measurements at the LHC Michael Heinrich for the ATLAS and CMS collaborations INSTITUTE OF EXPERIMENTAL PARTICLE PHYSICS (EKP) PHYSICS FACULTY KIT University of
More informationChapter 3 Deterministic planning
Chapter 3 Deterministic planning In this chapter we describe a number of algorithms for solving the historically most important and most basic type of planning problem. Two rather strong simplifying assumptions
More informationMultimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data
1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
More informationEcon 2148, spring 2019 Statistical decision theory
Econ 2148, spring 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Takeaways for this part of class 1. A general framework to think about what makes a
More informationPractical SAT Solving
Practical SAT Solving Lecture 10 Carsten Sinz, Tomáš Balyo June 27, 2016 INSTITUTE FOR THEORETICAL COMPUTER SCIENCE KIT University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz
More informationDisjoint-Set Forests
Disjoint-Set Forests Thanks for Showing Up! Outline for Today Incremental Connectivity Maintaining connectivity as edges are added to a graph. Disjoint-Set Forests A simple data structure for incremental
More informationAnalysis of Algorithms I: Perfect Hashing
Analysis of Algorithms I: Perfect Hashing Xi Chen Columbia University Goal: Let U = {0, 1,..., p 1} be a huge universe set. Given a static subset V U of n keys (here static means we will never change the
More informationMigration-based Detection and Location of the Seismicity Induced at Rittershoffen Geothermal Field (Alsace, France)
Migration-based Detection and Location of the Seismicity Induced at Rittershoffen Geothermal Field (Alsace, France) INSTITUTE OF APPLIED GEOSCIENCES DIVISION OF GEOTHERMAL RESEARCH Emmanuel GAUCHER (KIT)
More informationLecture 11: Information theory THURSDAY, FEBRUARY 21, 2019
Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationDART_LAB Tutorial Section 2: How should observations impact an unobserved state variable? Multivariate assimilation.
DART_LAB Tutorial Section 2: How should observations impact an unobserved state variable? Multivariate assimilation. UCAR 2014 The National Center for Atmospheric Research is sponsored by the National
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 9: Real-Time Data Analytics (2/2) March 29, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationEffective computation of biased quantiles over data streams
Effective computation of biased quantiles over data streams Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com
More informationObserved cloud anomalies associated with the North Atlantic Oscillation and their radiative feedback
Observed cloud anomalies associated with the North Atlantic Oscillation and their radiative feedback 1, Aiko Voigt 1,2, Peter Knippertz 1 ¹Institute of Meteorology and Climate Research, Karlsruhe Institute
More informationEstimation of the Click Volume by Large Scale Regression Analysis May 15, / 50
Estimation of the Click Volume by Large Scale Regression Analysis Yuri Lifshits, Dirk Nowotka [International Computer Science Symposium in Russia 07, Ekaterinburg] Presented by: Aditya Menon UCSD May 15,
More informationPERFORMANCE OF THE CELLULAR AUTOMATON (CA) SEED FINDER IN TPC TRACKING. Tracking Focus Group. Abstract
PERFORMANCE OF THE CELLULAR AUTOMATON (CA) SEED FINDER IN TPC TRACKING Tracking Focus Group Abstract We present a summary of performance evaluations of a new track seed finder in STAR TPC tracking. The
More informationHans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek
Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek SSDBM, 20-22/7/2011, Portland OR Ludwig-Maximilians-Universität (LMU) Munich, Germany www.dbs.ifi.lmu.de Motivation Subspace clustering for
More informationWindow-aware Load Shedding for Aggregation Queries over Data Streams
Window-aware Load Shedding for Aggregation Queries over Data Streams Nesime Tatbul Stan Zdonik Talk Outline Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental
More informationAUTOMATED TEMPLATE MATCHING METHOD FOR NMIS AT THE Y-12 NATIONAL SECURITY COMPLEX
AUTOMATED TEMPLATE MATCHING METHOD FOR NMIS AT THE Y-1 NATIONAL SECURITY COMPLEX J. A. Mullens, J. K. Mattingly, L. G. Chiang, R. B. Oberer, J. T. Mihalczo ABSTRACT This paper describes a template matching
More informationModern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture V (12.11.07) Contents: Central Limit Theorem Uncertainties: concepts, propagation and properties Central Limit Theorem Consider the sum X of n independent variables,
More informationCompressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park
Compressing Kinetic Data From Sensor Networks Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park Motivation Motivation Computer Science Graphics: Image and video
More informationProbabilities & Statistics Revision
Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF
More informationLecture 2. Frequency problems
1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments
More informationCopyright 2001 University of Cambridge. Not to be quoted or copied without permission.
Course MP3 Lecture 4 13/11/2006 Monte Carlo method I An introduction to the use of the Monte Carlo method in materials modelling Dr James Elliott 4.1 Why Monte Carlo? The name derives from the association
More informationFig.3.1 Dispersion of an isolated source at 45N using propagating zonal harmonics. The wave speeds are derived from a multiyear 500 mb height daily
Fig.3.1 Dispersion of an isolated source at 45N using propagating zonal harmonics. The wave speeds are derived from a multiyear 500 mb height daily data set in January. The four panels show the result
More informationActivity Identification from GPS Trajectories Using Spatial Temporal POIs Attractiveness
Activity Identification from GPS Trajectories Using Spatial Temporal POIs Attractiveness Lian Huang, Qingquan Li, Yang Yue State Key Laboratory of Information Engineering in Survey, Mapping and Remote
More informationForecasting a Moving Target: Ensemble Models for ILI Case Count Predictions
Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions Prithwish Chakraborty 1,2, Pejman Khadivi 1,2, Bryan Lewis 3, Aravindan Mahendiran 1,2, Jiangzhuo Chen 3, Patrick Butler 1,2,
More informationRepresentation of inhomogeneous, non-separable covariances by sparse wavelet-transformed matrices
Representation of inhomogeneous, non-separable covariances by sparse wavelet-transformed matrices Andreas Rhodin, Harald Anlauf German Weather Service (DWD) Workshop on Flow-dependent aspects of data assimilation,
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More informationProbability theory basics
Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:
More informationLeverage Sparse Information in Predictive Modeling
Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from
More informationPreview: Text Indexing
Simon Gog gog@ira.uka.de - Simon Gog: KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Text Indexing Motivation Problems Given a text
More informationSurveying Prof. Bharat Lohani Department of Civil Engineering Indian Institute of Technology, Kanpur. Module - 11 Lecture No. # 01 Project surveys
Surveying Prof. Bharat Lohani Department of Civil Engineering Indian Institute of Technology, Kanpur Module - 11 Lecture No. # 01 Project surveys (Refer Slide Time: 00:24) Welcome to this video lecture,
More informationFoundations of Mathematics 11 and 12 (2008)
Foundations of Mathematics 11 and 12 (2008) FOUNDATIONS OF MATHEMATICS GRADE 11 [C] Communication Measurement General Outcome: Develop spatial sense and proportional reasoning. A1. Solve problems that
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 13 Competitive Optimality of the Shannon Code So, far we have studied
More informationOnline Appendices: Inventory Control in a Spare Parts Distribution System with Emergency Stocks and Pipeline Information
Online Appendices: Inventory Control in a Spare Parts Distribution System with Emergency Stocks and Pipeline Information Christian Howard christian.howard@vti.se VTI - The Swedish National Road and Transport
More informationBaseline Climatology. Dave Parker ADD PRESENTATION TITLE HERE (GO TO: VIEW / MASTER / SLIDE MASTER TO AMEND) ADD PRESENTER S NAME HERE / ADD DATE HERE
Baseline Climatology Dave Parker ADD PRESENTATION TITLE HERE (GO TO: VIEW / MASTER / SLIDE MASTER TO AMEND) ADD PRESENTER S NAME HERE / ADD DATE HERE Copyright EDF Energy. All rights reserved. Introduction
More informationFlexible and Adaptive Subspace Search for Outlier Analysis
Flexible and Adaptive Subspace Search for Outlier Analysis Fabian Keller Emmanuel Müller Andreas Wixler Klemens Böhm Karlsruhe Institute of Technology (KIT), Germany {fabian.keller, emmanuel.mueller, klemens.boehm}@kit.edu
More informationStability of Ensemble Kalman Filters
Stability of Ensemble Kalman Filters Idrissa S. Amour, Zubeda Mussa, Alexander Bibov, Antti Solonen, John Bardsley, Heikki Haario and Tuomo Kauranne Lappeenranta University of Technology University of
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationLOGARITHMS. Examples of Evaluating Logarithms by Using the Definition:
LOGARITHMS Definition of Logarithm: Given: a is a (real) number > 0 but a 1, called the base of the logarithm, and x is a (real) number > 0. log a x is defined to be the number y such that: y is the exponent
More informationOptimization and Complexity
Optimization and Complexity Decision Systems Group Brigham and Women s Hospital, Harvard Medical School Harvard-MIT Division of Health Sciences and Technology Aim Give you an intuition of what is meant
More informationAny live cell with less than 2 live neighbours dies. Any live cell with 2 or 3 live neighbours lives on to the next step.
2. Cellular automata, and the SIRS model In this Section we consider an important set of models used in computer simulations, which are called cellular automata (these are very similar to the so-called
More informationHEALTHCARE. 5 Components of Accurate Rolling Forecasts in Healthcare
HEALTHCARE 5 Components of Accurate Rolling Forecasts in Healthcare Introduction Rolling Forecasts Improve Accuracy and Optimize Decisions Accurate budgeting, planning, and forecasting are essential for
More informationStatistical Analysis of BTI in the Presence of Processinduced Voltage and Temperature Variations
Statistical Analysis of BTI in the Presence of Processinduced Voltage and Temperature Variations Farshad Firouzi, Saman Kiamehr, Mehdi. B. Tahoori INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE
More informationLecture 10 Multistep Analysis
Lecture 10 Multistep Analysis 16.0 Release Introduction to ANSYS Mechanical 1 2015 ANSYS, Inc. February 27, 2015 Chapter Overview In this chapter, aspects of reviewing results will be covered: A. Multistep
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationWavelet decomposition of data streams. by Dragana Veljkovic
Wavelet decomposition of data streams by Dragana Veljkovic Motivation Continuous data streams arise naturally in: telecommunication and internet traffic retail and banking transactions web server log records
More informationProbabilistic Energy Forecasting
Probabilistic Energy Forecasting Moritz Schmid Seminar Energieinformatik WS 2015/16 ^ KIT The Research University in the Helmholtz Association www.kit.edu Agenda Forecasting challenges Renewable energy
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More information+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing
Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable
More informationProject Planning & Control Prof. Koshy Varghese Department of Civil Engineering Indian Institute of Technology, Madras
Project Planning & Control Prof. Koshy Varghese Department of Civil Engineering Indian Institute of Technology, Madras (Refer Slide Time: 00:16) Lecture - 52 PERT Background and Assumptions, Step wise
More informationSlides 12: Output Analysis for a Single Model
Slides 12: Output Analysis for a Single Model Objective: Estimate system performance via simulation. If θ is the system performance, the precision of the estimator ˆθ can be measured by: The standard error
More informationCS246 Final Exam. March 16, :30AM - 11:30AM
CS246 Final Exam March 16, 2016 8:30AM - 11:30AM Name : SUID : I acknowledge and accept the Stanford Honor Code. I have neither given nor received unpermitted help on this examination. (signed) Directions
More informationOutline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System
Outline I Data Preparation Introduction to SpaceStat and ESTDA II Introduction to ESTDA and SpaceStat III Introduction to time-dynamic regression ESTDA ESTDA & SpaceStat Learning Objectives Activities
More informationPredicting New Search-Query Cluster Volume
Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive
More informationWatershed Modeling With DEMs
Watershed Modeling With DEMs Lesson 6 6-1 Objectives Use DEMs for watershed delineation. Explain the relationship between DEMs and feature objects. Use WMS to compute geometric basin data from a delineated
More informationFast Multidimensional Subset Scan for Outbreak Detection and Characterization
Fast Multidimensional Subset Scan for Outbreak Detection and Characterization Daniel B. Neill* and Tarun Kumar Event and Pattern Detection Laboratory Carnegie Mellon University neill@cs.cmu.edu This project
More informationCovariance. if X, Y are independent
Review: probability Monty Hall, weighted dice Frequentist v. Bayesian Independence Expectations, conditional expectations Exp. & independence; linearity of exp. Estimator (RV computed from sample) law
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationTime-domain representations
Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling
More informationRS Metrics CME Group Copper Futures Price Predictive Analysis Explained
RS Metrics CME Group Copper Futures Price Predictive Analysis Explained Disclaimer ALL DATA REFERENCED IN THIS WHITE PAPER IS PROVIDED AS IS, AND RS METRICS DOES NOT MAKE AND HEREBY EXPRESSLY DISCLAIMS,
More informationCS5112: Algorithms and Data Structures for Applications
CS5112: Algorithms and Data Structures for Applications Lecture 14: Exponential decay; convolution Ramin Zabih Some content from: Piotr Indyk; Wikipedia/Google image search; J. Leskovec, A. Rajaraman,
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationUsability Extensions for the Worklet Service
Usability Extensions for the Worklet Service Michael Adams Queensland University of Technology, Brisbane, Australia. mj.adams@qut.edu.au Abstract. The YAWL Worklet Service is an effective approach to facilitating
More informationTEMPORAL EXPONENTIAL- FAMILY RANDOM GRAPH MODELING (TERGMS) WITH STATNET
1 TEMPORAL EXPONENTIAL- FAMILY RANDOM GRAPH MODELING (TERGMS) WITH STATNET Prof. Steven Goodreau Prof. Martina Morris Prof. Michal Bojanowski Prof. Mark S. Handcock Source for all things STERGM Pavel N.
More informationVerifying Computations in the Cloud (and Elsewhere) Michael Mitzenmacher, Harvard University Work offloaded to Justin Thaler, Harvard University
Verifying Computations in the Cloud (and Elsewhere) Michael Mitzenmacher, Harvard University Work offloaded to Justin Thaler, Harvard University Goals of Verifiable Computation Provide user with correctness
More informationStats Probability Theory
Stats 241.3 Probability Theory Instructor: Office: W.H.Laverty 235 McLean Hall Phone: 966-6096 Lectures: Evaluation: M T W Th F 1:30pm - 2:50pm Thorv 105 Lab: T W Th 3:00-3:50 Thorv 105 Assignments, Labs,
More informationRandom Walks A&T and F&S 3.1.2
Random Walks A&T 110-123 and F&S 3.1.2 As we explained last time, it is very difficult to sample directly a general probability distribution. - If we sample from another distribution, the overlap will
More informationNeutral Bayesian reference models for incidence rates of (rare) clinical events
Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference
More informationChallenges of Communicating Weather Information to the Public. Sam Lashley Senior Meteorologist National Weather Service Northern Indiana Office
Challenges of Communicating Weather Information to the Public Sam Lashley Senior Meteorologist National Weather Service Northern Indiana Office Dilbert the Genius Do you believe him? Challenges of Communicating
More informationExtensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.
Extensions of Bayesian Networks Outline Ethan Howe, James Lenfestey, Tom Temple Intro to Dynamic Bayesian Nets (Tom Exact inference in DBNs with demo (Ethan Approximate inference and learning (Tom Probabilistic
More informationAlgebraic Reconstruction of Ultrafast Tomography Images at the Large Scale Data Facility
Algebraic Reconstruction of Ultrafast Tomography Images at the Xiaoli Yang, Thomas Jejkal, Rainer Stotzka, Halil Pasic, Tomy dos Santos Rolo, Thomas van de Kamp, Achim Streit Institute for Data Processing
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationGCOE Discussion Paper Series
GCOE Discussion Paper Series Global COE Program Human Behavior and Socioeconomic Dynamics Discussion Paper No.34 Inflation Inertia and Optimal Delegation of Monetary Policy Keiichi Morimoto February 2009
More informationMeasurement of Jet Energy Scale and Resolution at ATLAS and CMS at s = 8 TeV
Measurement of Jet Energy Scale and Resolution at ATLAS and CMS at s = 8 TeV EDSBlois 2015 02.07.2015 Dominik Haitz on behalf of the ATLAS and CMS Collaborations INSTITUT FÜR EXPERIMENTELLE KERNPHYSIK
More informationEarthquake Clustering and Declustering
Earthquake Clustering and Declustering Philip B. Stark Department of Statistics, UC Berkeley joint with (separately) Peter Shearer, SIO/IGPP, UCSD Brad Luen 4 October 2011 Institut de Physique du Globe
More informationBright Advance Corporation
USER INSTRUCTIONS TABLE OF CONTENTS INSTRUCTIONS FOR USE2 PREPARING TO USE THE SCALE2 DISPLAYS3 KEYBOARD FUNCTION4 OPERATION7 COUNTING14 DIFFERENT KEYBOARD TYPES21 INTERFACE31 POWER SOURCES40 1 INSTRUCTIONS
More informationSelect/Special Topics in Atomic Physics Prof. P. C. Deshmukh Department of Physics Indian Institute of Technology, Madras
Select/Special Topics in Atomic Physics Prof. P. C. Deshmukh Department of Physics Indian Institute of Technology, Madras Lecture - 9 Angular Momentum in Quantum Mechanics Dimensionality of the Direct-Product
More informationData Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 17 K - Nearest Neighbor I Welcome to our discussion on the classification
More informationCSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13
CSCB63 Winter 2019 Week 11 Bloom Filters Anna Bretscher March 30, 2019 1 / 13 Today Bloom Filters Definition Expected Complexity Applications 2 / 13 Bloom Filters (Specification) A bloom filter is a probabilistic
More informationHarmonic Oscillator I
Physics 34 Lecture 7 Harmonic Oscillator I Lecture 7 Physics 34 Quantum Mechanics I Monday, February th, 008 We can manipulate operators, to a certain extent, as we would algebraic expressions. By considering
More informationBeyond Distinct Counting: LogLog Composable Sketches of frequency statistics
Beyond Distinct Counting: LogLog Composable Sketches of frequency statistics Edith Cohen Google Research Tel Aviv University TOCA-SV 5/12/2017 8 Un-aggregated data 3 Data element e D has key and value
More informationComputing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More informationBasic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).
Basic Statistics There are three types of error: 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). 2. Systematic error - always too high or too low
More information