ASSIGNMENT #1: Atmospheric Statistics Go to the class web site (https://people.ucsc.edu/~wjcrawfo/assignment1) and download the following files to your computer: exercise1_p1.r exercise1_p2.r exercise1_p3.r exercise1_p4.r melbourne_max_temps_1859-2013.dat melbourne_jan_max_temps_1859-2013.dat Next, start up R and at the prompt (>>) enter your name in the following way: name<- Jane Doe Important: Be sure to do this before you run each new program. This step is very important since it will identify you on the results of your work. If your name does not appear appropriately on the graphical output that you hand-in with your completed assignments, you will receive no credit for this assignment. In addition, if you omit this step, the R-programs for this exercise will not work correctly and you will get an error message. PLEASE HAND IN ALL OF THE FIGURES GENERATED BY THE PROGRAMS THAT YOU RUN FOR THIS ASSIGNMENT. Probability Density Functions As we discussed in class, measurements of the atmosphere and ocean can be considered to be random variables, and hence are most precisely described by a probability density function. In this exercise we will solidify the introductory statistical concepts that were discussed in class using another example from the atmosphere. Question 1: (a) In your own words, briefly explain how you would calculate a histogram for a random variable, such as measurements of temperature in the atmosphere. (b) In your own words, briefly explain how you would convert the histogram from part (a) into a probability density function for the same random variable. 1
Question 2: In class, we introduced a number of important fundamental concepts from the fields of probability and statistics using a long time series of measurements of daily maximum temperature made at the Sydney Observatory in Australia. In this exercise, we will consider a similar time series from the Melbourne Observatory. The locations of Melbourne and Sydney in relation to one another are shown in Fig. 1. Sydney Melbourne Figure 1: Map of Australia showing the location of Melbourne and Sydney. 2
Figure 2: The Melbourne Observatory. (a) Run the program exercise1_p1.r in R (using source( exercise1_p1.r ) ), as you were instructed to do in class and in section. This will load the data for the observed daily maximum temperatures recorded at Melbourne Observatory (Fig. 2) for the period 1859-2013 and will generate three figures: exercise1_plot1.jpg - a plot of the time series of Melbourne daily maximum temperature; exercise1_plot2.jpg a histogram of Melbourne daily maximum temperature; exercise1_plot3.jpg the probability density function (PDF) of Melbourne daily maximum temperature. The time series of daily maximum temperatures is now called mmt_data. (i) First compute the mean of the mean daily maximum temperature for the entire 155 year record using the command mean(mmt_data) (you don t need to type the quotes) and record the value that you obtain on plot3. (ii) Next, estimate the mode of the temperature distribution using either plot2 or plot3, and record this number on plot3. (iii) Briefly explain how the mean is computed, and the definition of the mode, and compare the two numbers computed here. 3
(b) Using the command sd(mmt_data) (you don t need to type the quotes), compute the standard deviation of Melbourne daily maximum temperatures. Record this number on plot3. Explain what is meant by the standard deviation. (c) Now run the program exercise1_p2.r. This will generate a new figure called exercise1_plot4.jpg which is the same as plot3 except a normal distribution with the same mean and standard deviation as the Melbourne daily maximum temperature is superimposed on the PDF. Briefly describe how well the PDF of the Melbourne daily maximum temperature is described by a normal distribution. (d) Now run exercise1_p3.r which yields an additional figure called exercise1_plot5.jpg which shows the cumulative probability distribution for Melbourne daily maximum temperature. (i) In your own words, explain what a cumulative probability density function shows. (ii) Using plot5, estimate the median of the distribution and explain clearly how you did this. Record this number on plot3 and compare to the mean and the mode from part (a). (e) If the PDF of Melbourne daily maximum temperature was described by a normal distribution, what can you say about the mean, the mode and the median temperatures? Question 3: Consider the following scenario. The Westpac Banking Corporation is sponsoring a major summer time outdoor concert event at the Melbourne Cricket Ground (MCG) to celebrate Australia Day (26 January). The capacity of the MCG is over 100,000. At the event, Norgen Vaaz and Everest Foods will be launching a new ice cream product. With a large potential turn-out for the event, Norgen Vaaz stand to make a handsome profit. However, there is risk involved for Norgen Vaaz - on the one hand they can plan on manufacturing a large number of units of the new product in the hope that they can sell them all at the event. However, ice cream is a perishable commodity so if it turns out to be unseasonably cold on the day of the event then ice cream sales are likely to be low, and Norgen Vaaz stands to lose money. On the otherhand, if it is a very warm day, Norgen Vaaz can expect good sales but in that case they need to be able to ensure that they have sufficient cold storage for their new product so that it does not spoil on the day. So for Norgen Vaaz there is risk involved if the temperature is too low or if the temperature is too high. Norgen Vaaz and Everest Foods plan to purchase insurance against any loss that they might incur due to either unseasonably warm or cold weather conditions. Such insurance is sometimes called a weather derivative. (An infamous company called Enron was in the business of buying and selling weather derivatives). Suppose that you are an insurance assessor for AMP, a leading wealth management and insurance company in Australia. One month before the planned concert at the MCG, a representative of Norgen Vaaz and Everest Foods contacts you about purchasing risk 4
Question 4: insurance for their part in the event at the MCG. They are particularly interested in the risk involved when temperatures drop below 20 C and rise above 35 C. Market research shows that ice cream sales fall dramatically when the temperature is below 20 C, while above 35 C the storage capacity of conventional mobile refrigeration units must be reduced by 25% to account for the anticipated increased power load on each unit (hence more units will be needed on site, at considerable additional cost, to prevent the product from spoiling on very warm days). Using the cumulative probability distribution function in plot5 for Melbourne daily maximum temperature, estimate the probability that the temperature will be at or below 20 C, and above 35 C. (a) After some reflection, you decide that the risk information that you are about to provide to Norgen Vaaz and Everest Foods would be much more reliable if you consider instead the daily maximum temperatures for Melbourne during the month of January only. Briefly explain why this is a better approach than that which you used in Question 3. (b) Now run the program exercise1_p4.r which generates two additional figures: exercise1_plot6.jpg which shows the PDF for Melbourne January daily maximum temperatures, and exercise1_plot7.jpg which shows the cumulative probability distribution for January daily maximum temperature. Using plot7, reevaluate the risks for Norgen Vaaz based on the probability that the temperature will be at or below 20 C, and at or above 35 C in January. (c) How would you describe the shape of the PDF for Melbourne January daily maximum temperature in plot6? 5