Extra Online Questions Determine the trend for time series data Covers AS 90641 (Statistics and Modelling 3.1) Scholarship Statistics and Modelling Chapter 1 Essent ial exam notes Time series 1. The value of the retail sales at NAILS (a large hardware store) was recorded each day between Monday 1 st January 2001 and Friday 30 th April 2004. You have been provided with some of the resulting time series data, which are shown in the tables and graphs that follow. Table 1 shows the monthly retail sales from July 2001 to October 2003 inclusive. Centred moving means have been calculated and are shown in the table. For these data, the least squares regression line fitting the centred moving means was obtained. The equation of the regression line is y = 0.91x + 225.21, where x represents the number of months since June 2001 (so x = 1 corresponds with July 2001, etc) and y represents the values of sales (in thousands of dollars, $000). Table 2 shows the daily retail sales for the five weeks from Sunday 28 th September 2003 until Saturday 1 st November 2003 inclusive; moving means of order seven have been calculated. Graph 1 shows the value of the monthly retail sales from 1 st July 2001 to 31 st October 2003. Centred moving means of order 12 have also been plotted on the graph. Graph 2 shows the value of the daily retail sales for the five weeks from Sunday 28 th September 2003 until Saturday 1 st November 2003 inclusive. Moving means of order seven have also been plotted on the graph. Year 2004 Ans. p. 15 TABLE 1 Month Sales ($000) Centred Moving Mean Month Sales ($000) Centred Moving Mean Jul-01 171.8 225.4 Sep-02 216.2 239.2 Aug-01 182.3 225.9 Oct-02 265.3 240.5 Sep-01 237.2 226.7 Nov-02 276.4 241.6 Oct-01 270.3 228.2 Dec-02 376.3 242.9 Nov-01 245.5 229.5 Jan-03 290.1 244.0 Dec-01 357.3 230.8 Feb-03 224.1 243.0 Jan-02 281.2 232.3 Mar-03 214.0 242.3 Feb-02 218.1 234.2 Apr-03 243.0 243.0 Mar-02 202.3 234.6 May-03 211.1 244.9 Apr-02 223.6 233.5 Jun-03 209.3 247.0 May-02 202.9 234.6 Jul-03 190.9 247.4 Jun-02 188.1 236.7 Aug-03 185.9 249.0 Jul-02 185.9 237.8 Sep-03 226.7 249.9 Aug-02 213.3 238.5 Oct-03 272.2 250.6
2 Scholarship Statistics and Modelling (Chapter 1) TABLE 2 Day Date Sales ($000) Moving Mean Day Date Sales ($000) Moving Mean Sun 28-Sep 8.9 6.5 Thu 16-Oct 6.0 8.7 Mon 29-Sep 7.4 6.7 Fri 17-Oct 8.0 9.0 Tue 30-Sep 5.4 6.7 Sat 18-Oct 19.0 9.1 Wed 1-Oct 3.7 6.7 Sun 19-Oct 12.6 9.3 Thu 2-Oct 6.3 6.5 Mon 20-Oct 6.6 9.4 Fri 3-Oct 6.2 6.4 Tue 21-Oct 6.7 9.4 Sat 4-Oct 9.0 6.5 Wed 22-Oct 6.2 9.2 Sun 5-Oct 7.7 6.5 Thu 23-Oct 6.6 10.1 Mon 6-Oct 6.2 6.3 Fri 24-Oct 8.4 12.7 Tue 7-Oct 6.1 6.2 Sat 25-Oct 17.1 12.6 Wed 8-Oct 3.9 7.0 Sun 26-Oct 18.9 12.3 Thu 9-Oct 5.0 8.8 Mon 27-Oct 24.7 11.9 Fri 10-Oct 5.7 8.6 Tue 28-Oct 6.2 11.7 Sat 11-Oct 14.1 8.6 Wed 29-Oct 3.9 12.2 Sun 12-Oct 20.6 8.7 Thu 30-Oct 4.1 11.5 Mon 13-Oct 5.0 8.8 Fri 31-Oct 7.2 8.9 Tue 14-Oct 5.7 9.2 Sat 1-Nov 20.2 8.9 Wed 15-Oct 4.8 9.9 GRAPH 1 400 NAILS Retail Sales 350 300 250 200 150 Jul-01 Oct-01 Jan-02 Apr-02 Jul-02 Oct-02 Jan-03 Apr-03 Jul-03 Oct-03 Sales ($000) Centred Moving Mean
Determine the trend for time series data 3 GRAPH 2 25 NAILS Retail Sales 20 15 10 5 0 28-Sep 1-Oct 4-Oct 7-Oct 10-Oct 13-Oct 16-Oct 19-Oct 22-Oct 25-Oct 28-Oct 31-Oct Sales ($000) Moving Mean a. To improve sales, the duty manager for each day is offered a bonus payment if the sales for that day exceed the expected value by at least 5%. i. What features of the time series should be considered in setting up this bonus payment scheme for managers? ii. Explain how the daily sales targets could be calculated.
4 Scholarship Statistics and Modelling (Chapter 1) b. Labour Day, the last Monday of October, results in a three-day weekend, which is traditionally used for home renovation and gardening. This produces high sales for hardware stores throughout New Zealand at this time. i. What effect does the high sales on Labour Day 2003 have on the (centred) moving mean for the daily retail sales? ii. How would you allow for the high sales figures for Labour Days in the calculation of a sales forecast for Mondays? c. Using the given information, forecast the sales for Tuesday 7 th December 2004. You must make clear the method you are using to make your forecast and justify your reasoning.
Determine the trend for time series data 5 d. Describe two limitations of the forecast you made in part c.
Year 2005 Ans. p. 16 6 Scholarship Statistics and Modelling (Chapter 1) 2. You are contracted as a statistical analyst to investigate sales patterns for an internet café over the previous three years, and to make a sales forecast for February 2006. You have been provided with the following data: The value of sales ($000) from the café for each month from November 2002 (t = 1) to October 2005 (t = 36) inclusive. The 12-point centred moving average (CMA) sales values ($000) for the months May 2003 (x = 1) to April 2005 (x = 24) inclusive. In addition you are provided with the following statistical output or information: A graph of the value of the monthly sales on which the CMA values have also been plotted. A table showing some summary statistics for the sales over each six-monthly period. A linear regression line fitted to the plotted CMA points has equation: y = 0.1112x + 17.709 and R 2 = 0.2249 A quadratic regression curve fitted to the plotted CMA points has the equation: y = 0.0324x 2 + 0.6987x + 14.199 and R 2 = 0.9527 Write a report, no more than a page long (excluding calculations), to the owner of the internet café that summarises the output. Include in your report two calculations, one using the line and the other using the curve, to forecast sales for February 2006. Comment on the usefulness and limitations of your forecasts. 40 Internet Café 35 30 Sales ($000) 25 20 15 10 5 0 1 5 10 15 20 25 30 35 Months since Oct 2002 Sales ($000) CMA($000) Summary of Monthly Sales ($000) in each Six-monthly Period from Nov 2002 Summary Statistics Nov 02 to Apr 03 May 03 to Oct 03 Nov 03 to Apr 04 May 04 to Oct 04 Nov 04 to Apr 05 May 05 to Oct 05 Mean 14.17 16.50 17.17 19.17 13.67 9.17 Median 13.50 16.50 16.50 17.50 13.50 9.50 Standard Deviation 2.64 2.43 2.79 5.12 2.42 1.47
Determine the trend for time series data 7 t-value Month Year Sales ($000) x-value CMA($000) 1 Nov 2002 15 2 Dec 2002 19 3 Jan 2003 12 4 Feb 2003 12 5 Mar 2003 13 6 Apr 2003 14 7 May 2003 13 1 15.58 8 Jun 2003 15 2 15.88 9 Jul 2003 17 3 16.00 10 Aug 2003 16 4 16.21 11 Sep 2003 18 5 16.46 12 Oct 2003 20 6 16.71 13 Nov 2003 21 7 16.92 14 Dec 2003 20 8 17.13 15 Jan 2004 14 9 17.75 16 Feb 2004 15 10 18.25 17 Mar 2004 16 11 18.21 18 Apr 2004 17 12 18.17 19 May 2004 15 13 18.04 20 Jun 2004 18 14 17.67 21 Jul 2004 29 15 17.29 22 Aug 2004 16 16 17.08 23 Sep 2004 17 17 16.83 24 Oct 2004 20 18 16.54 25 Nov 2004 18 19 16.21 26 Dec 2004 14 20 15.71 27 Jan 2005 11 21 14.63 28 Feb 2005 13 22 13.54 29 Mar 2005 12 23 12.88 30 Apr 2005 14 24 11.96 31 May 2005 10 32 Jun 2005 11 33 Jul 2005 10 34 Aug 2005 9 35 Sep 2005 8 36 Oct 2005 7
8 Scholarship Statistics and Modelling (Chapter 1)
Determine the trend for time series data 9 3. A large swimming pool complex has both indoor and outdoor facilities. At some times of the year (for example during the April, July and October school holidays) there is an increase in the number of admissions. Ans. p. 17 8000 Swimming Pool Admissions 7000 6000 5000 4000 3000 2000 Apr-95 Jul-95 Oct-95 Jan-96 Apr-96 Jul-96 Oct-96 Jan-97 Apr-97 Jul-97 Oct-97 Jan-98 Apr-98 Jul-98 Oct-98 Jan-99 Apr-99 Jul-99 Number of admissions The number of admissions each month has been recorded over a four-year period. The data are shown in the spreadsheet at the end of this question. a. What is/are the order(s) of the seasonal effect(s) shown in those data? Explain. b. Which of the following moving averages of order 3 would show the seasonal effects more obviously a moving median or a moving mean? Support your answer with an explanation.
10 Scholarship Statistics and Modelling (Chapter 1) c. Use the data for December 1995, December 1996 and December 1997 to seasonally adjust the data for December 1998. d. Explain whether the number of admissions in December 1998 was higher or lower than expected. Month/Year Number of Admissions Jul-95 4501 Aug-95 3341 Sep-95 3373 Oct-95 4471 Nov-95 3676 Moving Mean (order 12) Seasonal Effect Dec-95 6003 4520 1483 Jan-96 7440 4588 2852 Feb-96 6125 4548 1577 Mar-96 3290 4523 1233 Apr-96 5295 4527 768 May-96 3440 4537 1097 Jun-96 3289 4532 1243 Jul-96 5319 4560 759 Aug-96 2859 4560 1701 Sep-96 3080 4554 1474 Oct-96 4518 4551 33 Nov-96 3794 4550 756 Dec-96 5943 4547 1396 Jan-97 7772 4503 3269 Feb-97 6129 4510 1619 Mar-97 3215 4519 1304 Apr-97 5257 4516 741 May-97 3430 4515 1085 Jun-97 3254 4515 1261
Determine the trend for time series data 11 Month/Year Number of Admissions Moving Mean (order 12) Seasonal Effect Jul-97 4788 4503 285 Aug-97 2949 4484 1535 Sep-97 3190 4499 1309 Oct-97 4479 4503 24 Nov-97 3779 4498 719 Dec-97 5944 4512 1432 Jan-98 7631 4545 3086 Feb-98 5903 4540 1363 Mar-98 3386 4536 1150 Apr-98 5311 4530 781 May-98 3366 4521 1155 Jun-98 3424 4528 1104 Jul-98 5186 4502 684 Aug-98 2886 4515 1629 Sep-98 3145 4508 1363 Oct-98 4405 4515 110 Nov-98 3668 4503 835 Dec-98 6025 4493 1532 Jan-99 7326 Feb-99 6058 Mar-99 3302 Apr-99 5389 May-99 3229 Jun-99 3299
Ans. p. 17 12 Scholarship Statistics and Modelling (Chapter 1) 4. This graph shows total Australian electricity production in millions of kw-hours measured at quarterly intervals from March 1960 to December 1980. 40 000 35 000 30 000 25 000 20 000 15 000 10 000 5 000 Mar- 1960 Dec- 1962 Sep- 1965 Jun- 1968 Mar- 1971 Dec- 1973 Sep- 1976 Jun- 1979 Mar 1982 a. Give a full description of the features of this time series. The first 24 items of data are shown in the spreadsheet below. The spreadsheet shows some of the data values and calculations that can be made from these. A B C D 1 Quarter Electricity use Moving mean 2 3 Mar-1960 10 091 4 Jun-1960 11 858 5 Sep-1960 13 198 11 640.00 6 Dec-1960 11 413 11 901.25 7 Mar-1961 11 136 12 196.75 8 Jun-1961 13 040 12 459.25 9 Sep-1961 14 248 12 796.50 10 Dec-1961 12 762 13 094.50 11 Mar-1962 12 328 13 457.75 12 Jun-1962 14 493 13 811.00 13 Sep-1962 15 661 14 065.75 14 Dec-1962 13 781 14 308.50 15 Mar-1963 13 299 14 492.75 16 Jun-1963 15 230 14 744.25 17 Sep-1963 16 667 14 920.00 18 Dec-1963 14 484 15 054.75 19 Mar-1964 13 838 15 227.00 20 Jun-1964 15 919 15 347.50 21 Sep-1964 17 149 15 617.50 22 Dec-1964 15 564 15 914.00 23 Mar-1965 15 024 16 200.25 24 Jun-1965 17 064 16 541.00 25 Sep-1965 18 512 16 766.75 26 Dec-1965 16 467 17 073.00
Determine the trend for time series data 13 b. What is the order of the moving means shown in column C? c. Show that the best estimate, from the available data, of the seasonal adjustment for the June quarter is 568.6 million kw-hours. d. The moving mean for the September 1980 quarter is 35 081 million kw-hours. Use this information and the first smoothed value, to predict the electricity use for the June quarter for 1995. Include an explanation, with working, of the method you used.
14 Scholarship Statistics and Modelling (Chapter 1)
Answers 3.1 Time series (page 1) 1. a. i. The features of the time series include: The long-term trend of a gradual increase in sales over time. The seasonal effects at a monthly level, peak sales in December and low sales in July and August. The seasonal effects at a daily level, peak sales during the weekends and low sales mid-week. ii. Obtain a moving mean forecast for the month in question. Obtain the average seasonal effect for that month and adjust the forecast appropriately. Divide the month s forecast to get the daily forecast for each day in that month. Find the average seasonal effects for the different days of the week. Adjust the daily forecast by this average seasonal effect for the day under consideration. To calculate a sales target of 5% above the expected sales, multiply by 1.05. b. i. The high sales on Labour Day 2003 has the effect of causing seven values of the centred moving mean to increase. This means that the centred moving mean values from the Friday before to the Thursday after Labour Day are larger than the rest of the data would suggest they should be. This would cause any analysis of this area of the time series to have an inflated daily value trend. ii. Labour Day could be treated as an outlier and as such ignored during the calculation of Mondays average seasonal effect. A particular Monday could have a moving mean value forecasted and then seasonally adjusted. If this particular Monday happened to be a Labour day, it could be adjusted further by using any historical data for the seasonal effects of a Labour day. c. December 2004 corresponds to 42 months after June 2001 hence, the raw forecast value for December 2004 is: y = 0.91 42 + 225.1 = 263.43 The average seasonal effect for December is needed: 126. 5 + 133. 4 2 = 129. 95 Forecasted daily sales for December 2004: 263.43+129.95 30 = 13.11 [divide by 30 since no sales on Christmas day] The average seasonal effect for Tuesdays is needed: 131. 0. 1 3. 5 2. 7 5. 5 = 262. 5 Forecasted sales for Tuesday 7th December 2004 is: (13.11 2.62) 1 000 = $10 490 [important to include 1 000] d. Any two of the following would be appropriate. Any statement made must be consistent with anything that has been said so far. The trend within the month of December was to gradually increase, no account has been taken into consideration for the predicted value being at the start of the month and hence possibly lower than if it were later in the month. The trendline was calculated on data ending in October 2003. It is therefore assumed that the slow increasing trend continues up to and including the month of December 2004. This is over 12 months after advertising and hence might not be the case.
16 Answers and explanations Both of the calculated average seasonal effects, especially December, have been found using relatively small numbers of values. December s average seasonal effect was found using only two values, this leaves a lot of room for error. The average seasonal effect for Tuesdays was calculated using an inflated value on Tuesday 28th October, due to the inclusion of Labour Day. 2. The following bullet points should form the basis of the report: The shape of the raw data should be commented on. Aside from a peak of $29 000 in July 2004, the sales have fluctuated in value from a minimum of about $12 000 to a maximum of about $21 000 during the first two years to October 2004. The last year of data from October 2004 has dropped steadily. Comment on the summary statistics. Within each six-monthly interval, there appears to be little variation in the monthly sales as shown by the low standard deviation. The exception to this is seen in the period May 2004 to October 2005 where the standard deviation is almost double the others. The difference between the mean and median of each interval is minimal and it could be said the mean and medians were approximately equal. Comment on seasonal variation Some months were clearly higher than average as seen by their sales being higher than the moving average for that month. These were July, September, October and November. These months were favourable for sales. January, February, March, May and August had lower than average sales, as seen by the moving average values being higher than the raw data values. These months were not favourable to sales. The shape of the moving average data should be commented on. The moving average graph steadily increased to February 2004 (t = 16) where it reached a peak. The graph steadily decreased from February 2004 to the end of the data. Comment on the fit of the line and the fit of the curve to the moving average data. The line does not fit the data very well, but it seems there is a weak relationship as shown by the coefficient of determination R 2 = 0.2249. The curve fits the data well and a strong relationship is indicated by the coefficient of determination R 2 = 0.9527. Calculation of the seasonal effect for February: (15 18.25) +(13 13.54) Seasonal effect = 2 = 1.895 Calculation of the forecast for February 2006 can be done in two different ways. Using the line: x = 34 y = 0.1112 34 + 17.709 = 13.928 Add the seasonal effect to get the prediction: 13.928 1.895 = 12.033 The line gives a forecast sales value of $12 000.
Answers and explanations 17 Using the curve: x = 34 y = 0.0324 34 2 + 0.6987 34 + 14.199 = 0.500 Add the seasonal effect to get the prediction: 0.500 1.895 = 1.395 The curve gives a forecast sales value of $1 400. Comment on the usefulness and limitations of the forecasts made. The fact that the curve forecasts a negative sales figure (which is not possible), indicates that the curve no longer fits the data in February 2006. As the line is used to predict a value 10 months in the future, its negative gradient most likely no longer holds. This gives limited use to this forecast. The average seasonal effect for the month of February is calculated using data from only two Februaries. It is most likely that this is not a good representation of what the seasonal effect is in actuality. More data is needed to give a better idea of the seasonal effect of February. 3. a. There are two seasonal effects that can be seen in the graph. There is one that occurs every summer which has order 12 and the other one occurs at every school holidays and has order 3. b. The mean of a set of data is more affected by unusually large or small values than the median. Consequently it would be the moving mean that would show the seasonal effects more obviously. c. The average seasonal effect for the three given Decembers is: 1483 +1396 +1432 =1437 3 Using this, the seasonally adjusted value for December 1998 is: 6 025 1 437 = 4 588 d. Since the seasonally adjusted value for December 1998 (4 588) is higher than the moving average value (4 493), it can be said that the value for December 1998 was higher than expected. 4. a. There is an almost linear increasing trend. The seasonal variation is at regular intervals with the highest value of each year being during the September quarter and the lowest being during the March quarter. b. The moving means in the table are order 4. c. Find the individual seasonal effects for June and average them. June 61: 13 040 12 459.25 = 580.75 June 62: 14 493 13 811 = 682 June 63: 15 230 14 744.25 = 485.75 June 64: 15 919 15 347.5 = 571.5 June 65: 17 064 16 541 = 523 The average seasonal effect for June is: 580. 75 + 682 + 485. 75 + 571. 5 + 523 = 568. 6 5
18 Answers and explanations d. For this question it is assumed the moving mean data is increasing with a linear pattern. Using the two points and the fact that the first smoothed data value is the 3rd and September 1980 is the 83rd to find the equation of the line: y 11 640 3 35 081 11 640 = 83 3 y = 293. 0125x + 10 760. 9625 For the purpose of this investigation and given the accuracy of the given data, the following is more than accurate enough: y = 293x + 10 761 The June quarter in 1995 is the 142nd data point hence its moving mean value is: y = 293 142 + 10 761 = 52 367 Then add in the seasonal adjustment to find the predicted usage for the June quarter: 52 367 + 569 = 52 936 So the predicted usage of power for the June quarter 1995 is 52 936 million kw-hours.