Fair Exponential Smoothing with Small Alpha

Similar documents
The Derivative of a Function

Industrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee

DIRECTED NUMBERS ADDING AND SUBTRACTING DIRECTED NUMBERS

Algorithms: COMP3121/3821/9101/9801

The 1d Kalman Filter. 1 Understanding the forward model. Richard Turner

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

Instructor (Brad Osgood)

Lesson 21 Not So Dramatic Quadratics

Solutions For Stochastic Process Final Exam

Descriptive Statistics (And a little bit on rounding and significant digits)

Alg2H Ch6: Investigating Exponential and Logarithmic Functions WK#14 Date:

Lecture - 30 Stationary Processes

Chapter 8: An Introduction to Probability and Statistics

Finding Limits Graphically and Numerically

Dennis Bricker Dept of Mechanical & Industrial Engineering The University of Iowa. Forecasting demand 02/06/03 page 1 of 34

Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities)

Computer Systems on the CERN LHC: How do we measure how well they re doing?

Regression Discontinuity

Talk Science Professional Development

WSMA Algebra - Expressions Lesson 14

Power of Water/Effects of Water Grade 5 Classroom Activity

CS711008Z Algorithm Design and Analysis

STA 2201/442 Assignment 2

Notes on Green s Theorem Northwestern, Spring 2013

MITOCW MITRES6_012S18_L23-05_300k

Algebra 1B notes and problems March 12, 2009 Factoring page 1

PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14

y k = ( ) x k + v k. w q wk i 0 0 wk

Chapter REVIEW ANSWER KEY

STARTING WITH CONFIDENCE

Andrej Liptaj. Study of the Convergence of Perturbative Power Series

Quantitative Understanding in Biology Module IV: ODEs Lecture I: Introduction to ODEs

CS 188: Artificial Intelligence Fall 2008

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

Operation and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

base 2 4 The EXPONENT tells you how many times to write the base as a factor. Evaluate the following expressions in standard notation.

CPU Scheduling. CPU Scheduler

Unit 9 Study Sheet Rational Expressions and Types of Equations

Counting. 1 Sum Rule. Example 1. Lecture Notes #1 Sept 24, Chris Piech CS 109

Edexcel AS and A Level Mathematics Year 1/AS - Pure Mathematics

Regression Discontinuity

Sequential Monte Carlo Methods for Bayesian Computation

NP-Completeness. Until now we have been designing algorithms for specific problems

Sequential Decision Problems

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Lab 1: Precision and accuracy in glassware, and the determination of density

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 3

Missing Data and Dynamical Systems

Stochastic Processes

Problem Solving in Math (Math 43900) Fall 2013

Transient response of RC and RL circuits ENGR 40M lecture notes July 26, 2017 Chuan-Zheng Lee, Stanford University

CSE 241 Class 1. Jeremy Buhler. August 24,

Discrete Fourier transform (DFT)

Calculus Favorite: Stirling s Approximation, Approximately

P, NP, NP-Complete. Ruth Anderson

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

{ }. The dots mean they continue in that pattern to both

The Basics of Physics with Calculus Part II. AP Physics C

Lecture - 1 Motivation with Few Examples

#29: Logarithm review May 16, 2009

Lecture - 24 Radial Basis Function Networks: Cover s Theorem

Lecture 15: Exploding and Vanishing Gradients

Lecture 22: Quantum computational complexity

A particularly nasty aspect of this is that it is often difficult or impossible to tell if a model fails to satisfy these steps.

Vibratory Motion -- Conceptual Solutions

211 Real Analysis. f (x) = x2 1. x 1. x 2 1

INTRODUCTION TO ANALYSIS OF VARIANCE

University of Athens School of Physics Atmospheric Modeling and Weather Forecasting Group

Chapter 4: An Introduction to Probability and Statistics

Lecture 4: Training a Classifier

=.55 = = 5.05

Phys 243 Lab 7: Radioactive Half-life

CHAPTER 7: TECHNIQUES OF INTEGRATION

Addition & Subtraction of Polynomials

AQA Level 2 Further mathematics Further algebra. Section 4: Proof and sequences

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

2. FUNCTIONS AND ALGEBRA

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 1

Taylor approximation

Math 410 Homework 6 Due Monday, October 26

Logarithms and Exponentials

CHAPTER 6 VECTOR CALCULUS. We ve spent a lot of time so far just looking at all the different ways you can graph

SUPERCHARGED SCIENCE. Unit 2: Motion.

Nonlinear Regression Curve Fitting and Regression (Statcrunch) Answers to selected problems

Slope Fields: Graphing Solutions Without the Solutions

4.4 Graphs of Logarithmic Functions

Module 07 Controllability and Controller Design of Dynamical LTI Systems

Why? Is this anomaly or adaptation out of necessity?

Time-bounded computations

6.12 System Identification The Easy Case

Lecture 5: Using electronics to make measurements

Name: Period: Date: Unit 1 Part 1 Introduction to Chemistry. What is the equation for density and its rearrangements for mass and volume?

Introduction to Algorithms

Physics 8 Friday, November 4, 2011

INFINITE SUMS. In this chapter, let s take that power to infinity! And it will be equally natural and straightforward.

T has many other desirable properties, and we will return to this example

Hidden Markov Models (recap BNs)

MATH 320, WEEK 4: Exact Differential Equations, Applications

Chapter 13: Integrals

Transcription:

Fair Exponential Smoothing with Small Alpha (fessa) Juha Reunanen juha.reunanen@tomaattinen.com October 12, 2015 Abstract This short paper shows that standard exponential smoothing with a small α may not give the desired results s(t) for small values of t, because the first sample x(0) has such a huge impact on the outcome. Fortunately, the paper also proposes a remedy to the problem. 1 Introduction Exponential smoothing 1 is a simple and effective technique for low-pass filtering a time series. In its most basic form, it is defined like this: s(0) = x(0) s(t) = α x(t) + (1 α) s(t 1) where x(t) is the raw input signal, s(t) is the filtered signal, and α is a parameter (0 < α < 1). The technique has many nice properties, one of which is that even with a small α (i.e., heavy filtering), finding s(t) is very fast compared to calculating a moving average using a long window. This is very nice in case x(t) is a sequence of large images, for instance. 2 Problem One problem with using exponential smoothing, in particular with a small α, is that if x(0) happens to be an outlier, it affects s(t) even for pretty large values of t. Denote with w(k, t) the effect that x(k) has over s(t). Clearly: w(0, 0) = 1 w(0, 1) = 1 α w(1, 1) = α w(0, 2) = (1 α) 2 w(1, 2) = α(1 α) w(2, 2) = α... Note that w(k, t) = 0 for any k > t, because we have a causal filter (i.e., events from the future cannot affect the current filtered value). Now we can see the problem: the effect of the first sample, w(0, t) = (1 α) t, is quite significant even for large t, if α is small. 1 https://en.wikipedia.org/wiki/exponential_smoothing 1

Consider for instance α = 0.001. You need to wait until t = 693 before the relative weight of the first sample drops below 50%! So the 691 samples x(1),..., x(691), all newer than x(0), together weigh less than x(0) does alone! And what if you want to use an even smaller α? It does not appear far-fetched to say that this can be quite unfair, depending of course on what exactly it is that you are doing. 3 Challenge So we want to define a new, fairer scheme for w(k, t). What properties should we expect? Clearly we need to have t k=0 w(k, t) = 1 for any t 0. Let s call this our first requirement. Also, we would like to keep this asymptotic property of standard exponential smoothing: w(k 1, t) = (1 α) w(k, t) But why not just require it for any k and t? Standard exponential smoothing caters for this only when k > 2 (any t k) and t (any k). Yes, let s try and call this our second requirement. Finally, we would like to be able to compute a new s(t) using a small number of operations only. 4 Solution Putting together the first and the second requirement, we have: t w(k, t) = 1 k=0 w(k 1, t) = (1 α) w(k, t) (1) Solving, it appears that we get: w(k, t) = (1 α)t k t τ=0 (1 α)τ This holds for k t. For k > t, we have w(k, t) = 0, as pointed out earlier. For the fessa method being introduced 2, we are looking for an α fessa (t) that we can use exactly as we use α in the standard method. Now we can define α fessa (t) as follows: α fessa (t) = w(t, t) = Turns out that we can update α fessa recursively: t τ=0 1 1 = (1 α)τ d(t) a(0) = d(0) = α fessa (0) = 1 a(t) = a(t 1) (1 α) d(t) = d(t 1) + a(t) α fessa (t) = 1 d(t) The filtering itself works as previously we just use α fessa (t) instead of α: s(t) = α fessa (t) x(t) + (1 α fessa (t)) s(t 1) 2 fessa stands for fair exponential smoothing with small alpha. 2

Table 1: Effective w(k, t) in standard exponential smoothing for different k and t 10, when α = 0.001. The numbers are rounded. It is clear how w(0, t) dominates which means that the value of x(0) has a significant impact on s(t) when t is small. t 0 1 2 3 4 5 6 7 8 9 10 0 1.000.9990.9980.9970.9960.9950.9940.9930.9920.9910.9900 1.0010.0010.0010.0010.0010.0010.0010.0010.0010.0010 2.0010.0010.0010.0010.0010.0010.0010.0010.0010 3.0010.0010.0010.0010.0010.0010.0010.0010 4.0010.0010.0010.0010.0010.0010.0010 k 5.0010.0010.0010.0010.0010.0010 6.0010.0010.0010.0010.0010 7.0010.0010.0010.0010 8.0010.0010.0010 9.0010.0010 10.0010 Table 2: Effective w(k, t) in the fessa method for different k and t 10, when α = 0.001. Compared to Table 1, we can see that for each t, all samples received so far have an almost equal significance, however a more recent sample always weighs slightly more than do the earlier values of x. t 0 1 2 3 4 5 6 7 8 9 10 0 1.000.4997.3330.2496.1996.1663.1424.1246.1107.0996.0905 1.5003.3333.2499.1998.1664.1426.1247.1108.0997.0905 2.3337.2501.2000.1666.1427.1248.1109.0997.0906 3.2504.2002.1667.1429.1249.1110.0998.0907 4.2004.1669.1430.1251.1111.0999.0908 k 5.1671.1431.1252.1112.1000.0909 6.1433.1253.1113.1001.0910 7.1254.1114.1003.0911 8.1116.1004.0912 9.1005.0913 10.0914 5 Examples For α = 0.001, Tables 1 and 2 show the effective w(k, t) for small values of k and t for both the standard method and the proposed fessa approach, respectively. Based on Table 1, it is clear that in standard exponential smoothing the value of x(0) has a huge impact on s(t), when t is small. On the other hand, based on Table 2, the properties of the fessa method appear to be much nicer: at any time t, all samples processed so far have roughly the same weight, however the more recent the sample, the (slightly) more it weighs. As another simple illustration, consider x(t) drawn from a uniform distribution between 0 and 100. Have a look at Figures 1 3. It is clear that with such a small alpha (0.001), whatever x(0) happens to be affects the results of the standard exponential smoothing method for a very long time too long, one could probably say. On the other hand, the fessa method being proposed finds its way near the expected value quite soon. 3

100 α =0.001 80 60 x(t), s(t) 40 20 0 raw data (x) standard exponential smoothing fessa 0 10 20 30 40 time t Figure 1: Comparison between standard exponential smoothing and the proposed fessa method for small t. Note that the standard method gets seriously stuck with the x(0), even though it is not really even an outlier. 100 α =0.001 80 60 x(t), s(t) 40 20 0 raw data (x) standard exponential smoothing fessa 0 100 200 300 400 time t Figure 2: Comparison between standard exponential smoothing and the proposed fessa method for a medium range of t. Given that the expected value is 50, the standard method is still far away, even though the direction is clearly right. 4

100 α =0.001 80 60 x(t), s(t) 40 20 0 raw data (x) standard exponential smoothing fessa 0 1000 2000 3000 4000 5000 time t Figure 3: Comparison between standard exponential smoothing and the proposed fessa method for large t. Now even the standard method has reached the expected value. From around here on, the two methods are essentially identical. 6 Runtime performance Because asymptotically (t ) the fessa method is equivalent to standard exponential smoothing, any implementation can switch to the standard method as soon as (1 α) t gets really small (close to machine epsilon or so). Then, for a large t the computational cost is on the same level with standard exponential smoothing. With a small t, compared to the standard method there is additional multiplication (to update a), addition (to update d) and division 3 (to calculate α fessa ), so do not use fessa if the overhead is too much for you. 4 Otherwise, feel free to use the proposed method in place of standard exponential smoothing. 7 Missing proofs It would be nice to prove that lim t α fessa (t) = α. It is quite obvious that w(k, t) = is unique? (1 α)t k t τ=0 (1 α)τ satisfies (1), but should it be shown that the solution 3 Actually this is just the reciprocal, which may be faster to calculate than generic division. 4 But in this case please let me know what it is that you are working with it sounds quite interesting already! 5

8 Disclaimer The technique presented in this paper may well be a well-known one, but nevertheless I had fun figuring it out not to mention coming up with the name. Indeed I think I need the method in practice, since I have a few times hesitated to set a low α because of being afraid of a very noisy x(0). So if after this we can set lower αs and get smoother signals, it is quite enough for me, regardless of whether someone came up with this method already in the 1600s or so. 9 Code See https://github.com/reunanen/fessa for a simple implementation of the fessa technique. 6