Data Analysis, Statistics, Machine Learning Leland Wilkinsn Adjunct Prfessr UIC Cmputer Science Chief Scien<st H2O.ai leland.wilkinsn@gmail.cm
Time Series 10000000 8000000 Time series sta<s<cs invlve randm prcesses ver <me Spa<al sta<s<cs invlve randm prcesses ver space Bth invlve similar mathema<cal mdels When there is n tempral r spa<al influence, these bil dwn t rdinary sta<s<cal methds DO NOT USE OLS methds n tempral/spa<al data These require stchas<c mdels, nt OLS trend lines measurements at each <me/space pint are nt independent 1.0 Autcrrelatin Plt Sales 2 6000000 4000000 2000000 0 1998 2004 2010 2016 Year Quarterly US Ecmmerce Retail Sales, Seasnally Adjusted Crrelatin 0.5 0.0-0.5-1.0 0 10 20 30 40 50 60 Lag Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses Up t nw, we ve been dealing with i.i.d. randm variables Independent. Iden<cally. Distributed. We assumed there was n rdering f thse randm variables Our mdels depended n randm errr plus systema<c effects Time series analy<cs deal with rdered randm variables We (usually) assume these variables are equally spaced acrss <me A variable at <me t i is predictable in part by anther variable at anther <me The simplest example f this type f behavir is called autregressive (AR) x t = φx t 1 + t 3 In this mdel each bserva<n at a given <me is a func<n f the previus bserva<n plus randm errr E[ t ]=0 E[ 2 t ]=σ 2 E[ s t ] = 0 fr all s = t Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses Diagnsing a stchas<c prcess Crrelate a series with itself shi_ed backward by ne <me perid Crrelate the shi_ed series with itself shi_ed backward by ne <me perid And s n Here s an Autcrrela<n Func<n (ACF) Plt f white nise x t = t 4 Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses Diagnsing an autregressive prcess Crrelate a series with itself shi_ed backward by ne <me perid Crrelate the shi_ed series with itself shi_ed backward by ne <me perid And s n Here s an Autcrrela<n Func<n Plt f an AR(1) prcess x t = φx t 1 + t 5 Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses Mving Average (MA) prcesses In this mdel each bserva<n at a given <me is a func<n the previus errr Plus randm errr x t = θ t 1 + t Here s an Autcrrela<n Func<n Plt f an MA(1) prcess 6 Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses Mving Average (MA) prcesses The θ parameter can be nega<ve x t = θ t 1 + t Here s an Autcrrela<n Func<n Plt f a nega<ve MA(1) prcess Nega<ve θ enhances high frequencies Psi<ve θ enhances lw frequencies 7 Cpyright 2016 Leland Wilkinsn
ACF Plts N<ce that withut an ACF plt, diagnsis f raw series is difficult White nise MA(1) 8 Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses ARMA prcesses (Bx & Jenkins) We can mix these mdels An ARMA mdel lks like this x t = p φ i x t i + i=1 q θ j t j + t j=1 In mst cases, the cefficients f the terms decay expnen<ally S we d nt have t make p and q large fr mdeling mst series All the mdels we ve seen s far can include a cnstant We can als add trend t these mdels x t = α + βx t + p φ i x t i + i=1 j=1 q θ j t j + t 9 Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses Seasnal prcesses Dependencies in the mdel can be acrss seasns A seasnal autregressive mdel lks like this x t = φx t s + t And a seasnal mving average mdel lks like this x t = θ t s + t Ecnmists lve this stuff They even mix stchas<c and classical mdels in the same equa<n Their gal is t accunt fr dependencies in the residuals in regressin mdels Here s an example f ne f their mdels Generalized Least Squares ˆβ =(X Σ 1 X) 1 (X Σ 1 Y) 10 Cpyright 2016 Leland Wilkinsn
11 Stchas<c prcesses Es<ma<ng ARMA mdels Are yu serius? This is a black art And usually yu want ARIMA instead f ARMA Which I haven t even tld yu abut Even a_er a semester curse in ARIMA mdels yu wn t be able t d it Yu have t learn hw t diagnse ACF plts And PACF plts, which I haven t even tld yu abut Yu have t knw when t difference yur series t achieve sta<narity Which I haven t even tld yu abut Leave this t the experts That brings us t the next tpic There s a simple mdel that des bener than fancy ARIMA fr many real frecasts It s called Expnen<al Smthing It includes seasnal effects as well Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses Expnen<al Smthing We begin with the mving average smthing mdel Fr a pint at <me t (1 t n), a mving average smthed value is given by p ˆx t = 1 p i=1 x t i Sme cnsidera<ns: Our smthing es<mate is simply the average f the p previus values. The first p pints in the series are nt smthed. If each pint in the series has a randm cmpnent, we are averaging fixed and randm cmpnents f previus pints. In this case, the mdel smths nly p prir randm cmpnents (nt n). In ther wrds, the mdel ignres any randmness befre the previus p <me pints. If we presume nly randm errr gverns the prcess, we call the prcess a randm walk. If we believe the prcess is a randm walk, then we shuld set p = 1. If p = 1, the smth is just the previus bserva<n. If p = 1, we are assuming there is n mre infrma<n we can get ut f the data If p > 1, we are assuming we can eliminate the effects f the errrs by averaging them. 12 Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses Expnen<al Smthing Nw g n t the weighted mving average smthing mdel ˆx t = 1 p p w i x t i i=1 Let s make these weights decline expnen<ally w i = p i And let s nrmalize them t add t 1 w i = p 1 1 p p p i This makes the expnen<ally weighted smthing mdel 13 Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses The Expnen<ally Weighted Mving Average Mdel (EWMA) Here is the recursive frm f the expnen<ally weighted smthing mdel N<ce the ˆx t 1 n the right We assume 0 < α < 1 s things dn t explde ˆx t = αx t 1 +(1 α)ˆx t 1 This frmula gives us a recursive es<ma<n methd N fancy p<miza<n needed What we are ding here is prjec<ng frward lcal panerns in the series We culd cnsider this a sta<s<cal es<ma<n methd Or we culd just think f it as a determinis<c frward panern duplicatr 14 Cpyright 2016 Leland Wilkinsn
Time and Space Stchas<c prcesses The Hlt- Winters methd Nw it gets pwerful Hlt and Winters added trend and seasnality t EWMA H- W fits three types f trend mdels (nne, linear, mul<plica<ve) H- W des nt fit ther types f trend func<ns (althugh it culd be mdified t d s) H- W fits three types f seasnality (nne, addi<ve, mul<plica<ve) H- W can fit mre than simple sinusidal seasnality func<ns H- W des nt fit mre than ne type f seasnality in ne mdel (but it culd) H- W addi<ve linear mdels parallel specific ARIMA mdels H- W mul<plica<ve mdels d nt have ARIMA parallels Frecas<ng 15 Fit first half f series Extraplate t secnd half t get residuals Analyze residuals fr anmalies Frecast beynd end f series Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses The Hlt- Winters methd 16 Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses The Hlt- Winters methd Here is the H- W frecast fr the famus Bx- Jenkins Airline dataset hw passengers estimate / smth=.3, linear=.4,seasn=12, multiplicative=.5, frecast=10 And here is the frecast using ARIMA (0,1,0)(0,1,0) arima passengers lg difference difference / lag=12 estimate / q=1, qs = 1, seasn=12, backcast = 13,frecast=10 17 Cpyright 2016 Leland Wilkinsn
Stchas<c prcesses Seasnal decmpsi<n X11/12 (US Census), SABL (Cleveland, Bell Labs) Trend Seasn Residual 18 Cpyright 2016 Leland Wilkinsn
Crrela<ng Time Series Dn t d this I lve this site! hnp://www.tylervigen.cm/spurius- crrela<ns 19 Cpyright 2016 Leland Wilkinsn
30 Crrela<ng Time Series Series Plt Science Raw 2.5 Series Plt Science Detrended 2.0 SCIENCE 25 20 SCIENCE 1.5 1.0 0.5 15 0 4 8 12 Case 0.0 0 4 8 12 Case 10000 Series Plt Suicide Raw 1000 Series Plt Suicide Detrended SUICIDE 9000 8000 7000 6000 SUICIDE 500 0 5000 0 4 8 12 Case 20-500 0 4 8 12 Case Cpyright 2016 Leland Wilkinsn
Crrela<ng Time Series CCF f Raw series vs. CCF f detrended Crss Crrelatin Plt Crss Crrelatin Plt 1.0 1.0 Crrelatin 0.5 0.0-0.5 Crrelatin 0.5 0.0-0.5-1.0-6 -4-2 0 2 4 6 Lag -1.0-6 -4-2 0 2 4 6 Lag Detrending desn t always get yu ut f the wds There can be secnd- rder ar<facts that influence crrela<n between series 21 Cpyright 2016 Leland Wilkinsn
Mul<variate analysis f <me series The cau<ns men<ned earlier apply t any analysis f <me series e.g., Clustering r Principal Cmpnents f <me series Need t difference t achieve sta<narity befre clustering First differences f a randm walk achieves sta<narity Other mdels require mre ex<c measures First 9 lags f a randm walk 22 Cpyright 2016 Leland Wilkinsn
Wait, there s mre But if yu insist n trying this stuff, yu d bener talk t a <me- series sta<s<cian r ecnmist But dn t ask the ecnmist t predict the ecnmy! 23 Cpyright 2016 Leland Wilkinsn