ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents for the Degree Masters of Arts by Sandra DeSousa May 2006

APPROVED FOR THE DEPARTMENT OF MATHEMATICS Dr. Steven Crunk Dr. Leslie Foster Dr. Bee Leng Lee APPROVED FOR THE UNIVERSITY

ABSTRACT ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS by Sandra DeSousa Much research has been done on rando polynoials. Topics of past investigation include estiating the nuber of zeros, finding the roots (and associated distributions), and confidence intervals for the roots of rando polynoials. Research regarding the extrea of rando polynoials and their associated confidence intervals is lacking. Fairley perfored a study on a ethod for foring the confidence intervals for the roots of rando polynoials. This thesis expands upon Fairley's results by foring confidence intervals for the abscissa and ordinate of the extrea of rando polynoials. Three ethods to calculate the confidence intervals are copared: the Fairley ethod, the delta ethod, and bootstrapping. It was deterined that all three ethods produced accurate confidence intervals that statistically were not significantly different between the ethods. An application of the theoretical work was ipleented using data provided by the NASA Aes Research Center, associated with the possibility of a runaway greenhouse effect.

ACKNOWLEDGEMENTS I dedicate this to Crystal, Kyle, Joelle, and Logan, y wonderful children, for their unconditional love, patience, understanding, and support. I would like to thank y thesis advisor, Dr. Steve Crunk, for all of his knowledge and guidance with this endeavor. v

Table of Contents Table of Contents... vi List of Tables... vii List of Figures... ix Chapter 1: Introduction... 1 Chapter 2: Methods... 7 Fairley Method... 7 Delta Method... 11 Bootstrapping... 16 Chapter 3: Ipleentation... 18 Siulations... 18 Results - Confidence Interval Accuracy... 27 Results - Confidence Interval Lengths... 31 Results - Overall... 33 Chapter 4: Epirical Application... 36 Chapter 5: Conclusions... 40 References... 45 Appendix: S-Plus Code... 46 vi

List of Tables Table 3.1. Percentages of ties that the true abscissa lies within the calculated confidence liits for a second degree rando polynoial... 28 Table 3.2. Percentages of ties that the true ordinate lies within the calculated confidence liits for a second degree rando polynoial... 28 Table 3.3. Percentages of ties that the true abscissa lies within the calculated confidence liits for a third degree rando polynoial... 29 Table 3.4. Percentages of ties that the true ordinate lies within the calculated confidence liits for a third degree rando polynoial... 29 Table 3.5. Percentages of ties that the true abscissa lies within the calculated confidence liits for a fourth degree rando polynoial... 30 Table 3.6. Percentages of ties that the true ordinate lies within the calculated confidence liits for a fourth degree rando polynoial... 30 Table 3.7. Average lengths of the abscissa confidence intervals for a second degree rando polynoial... 31 Table 3.8. Average lengths of the ordinate confidence intervals for a second degree rando polynoial... 31 Table 3.9. Average lengths of the abscissa confidence intervals for a third degree rando polynoial... 32 Table 3.10. Average lengths of the ordinate confidence intervals for a third degree rando polynoial... 32 Table 3.11. Average lengths of the abscissa confidence intervals for a fourth degree rando polynoial... 33 vii

List of Tables (cont'd) Table 3.12. Average lengths of the ordinate confidence intervals for a fourth degree rando polynoial... 33 Table 4.1. Calculated abscissa and ordinate confidence intervals for the axiu of sea surface teperature vs. outgoing flux... 39 viii

List of Figures Figure 2.1. Illustration of nonparaetric bootstrap... 17 Figure 3.1. One exaple of a second degree rando polynoial with noral distribution noise and associated confidence intervals calculated via the delta ethod... 21 Figure 3.2. Close up of exaple of a second degree rando polynoial with noral distribution noise and associated confidence intervals... Figure 3.3. One exaple of a third degree rando polynoial with exponential distribution noise and associated confidence intervals calculated via bootstrapping ethod... Figure 3.4. One exaple of a fourth degree rando polynoial with t distribution noise and associated confidence intervals calculated via the Fairley ethod... 22 23 25 Figure 3.5. Plot of ( xˆ ˆ, y) for all siulations of the third degree polynoial with noral noise... 26 Figure 4.1. Graph of Sea Surface Teperature vs Outgoing Flux with estiated rando polynoial and associated axiu... 38 ix

Chapter 1 Introduction The topic of regression analysis is widely used in statistics. According to Myers (1990), "The ter regression analysis describes a collection of statistical techniques that serve as a basis for drawing inferences about relationships aong quantities in a scientific syste." Regression analysis allows scientists to ake predictions, perfor variable screening, paraeter estiation, and odel specification. The ost coon uses of regression analysis are predicting response values for given inputs and easuring the iportance of a variable (e.g., its influence on the response). When given a set of input (regression variable, x ) and response (y) data, regression is often used to fit a curve, px ( ), to the data. If the for of the data is assued to be x y x x x 2 k ( ) = = β0 + β1 + β2 +... + βk (a true but unknown polynoial), then a curve fit to a set of data is referred to as a rando algebraic polynoial, such as ˆ ˆ ˆ 2 ˆ 0 1 2 k k p( x) = yˆ = β + β x+ β x +... + β x. Here, k is the degree of the polynoial, β0, β1,..., β k are unknown constants known as regression coefficients and ˆ β0, ˆ β1,..., ˆk β are estiates of these coefficients. - 1 -

As an exaple of a typical regression, there are a set of n observations ( x1, y1),( x2, y2),( x3, y3),...,( xn, y n)}, and we are interested in finding a curve, ˆ ˆ ˆ 2 ˆ 0 1 2 k k yˆ = β + β x+ β x +... + β x that is the best representation for the set of data. Each point in the set of data ( x, y ) can be i i represented as 2 k yi β0 β1xi β2 xi... βkxi i = + + + + +ε where we often assue that the errors, ε i, are norally distributed with ean zero and standard deviation of σ (which is unknown). In atrix for, we have a syste of linear equations: 2 k y1 1 x1 x1 x ε 1 1 2 k 0 y β 2 1 x2 x2 x ε 2 2 β 2 k 1 y 3 = 1 x3 x3 x + ε 3 3 (1.1) β 2 k k y n 1 xn xn x n ε n or Y = Xβ + ε. There are several ways to estiate the coefficients, β0, β1,..., β k, of the polynoial. The ost coon are the ethod of least squares, axiu likelihood estiation, and projection. Under the assuption of noral errors, each of these ethods leads to the sae solution. Using linear algebra to solve the syste of linear equations above we get ˆ β ( X X) 1 = XY, where X indicates the transpose of the X atrix. So, ˆ β is a linear cobination of the eleents of Y which are noral, as the X atrix is assued to be fixed. Therefore, as the ε i are - 2 -

assued noral, the yi are also noral and it follows that ˆ β follows a noral distribution. In the study of rando polynoials, of particular interest are the roots (or zeros) of the equation: px ( ) = yˆ = 0. Much research has been conducted in estiating the nuber of real roots along with distributions of the nuber of real roots. Additional investigation has been conducted in estiating the roots of rando polynoials and their associated distributions. Here, we are going to focus on finding confidence intervals for the abscissa (x value) and ordinate (y value, or (x)) of the extrea of the true, but unknown, polynoials. The abscissa is the root of the derivative of the rando polynoial and the ordinate is the y=(x) value associated with this abscissa value. The distributions of the estiated extrea (iniu/axiu) of rando polynoials are based on the rando coefficients generated fro the regression fit of the data. In ost cases, the rando variable coefficients are assued norally distributed, as described above. To establish a foundation for the rest of this thesis, we need to introduce soe notation. Let ( x, y ) be the true abscissa and ordinate for the iniu or axiu of interest of the true, unknown polynoial, x ( ). Siilarly, let ( xˆ, y ˆ ) be the abscissa and ordinate of the extrea of the estiated rando polynoial, px ( ) (i.e., x ˆ is the - 3 -

solution to the equation p ( xˆ ) = 0 and yˆ ( ˆ = p x )). Fro Wackerly, Mendenhall, and Scheaffer (2002), we can define ( X, Y ) as rando variables associated with the abscissa and ordinate since they vary depending on the results of the regression. A confidence interval is defined as an interval that contains a rando value of interest with soe specified level of confidence (e.g., 100(1 α )% ). For instance, the probability stateent for a norally distributed rando variable X is ( /2 /2 ) P x zα * Var( X ) X x + zα * Var( X ) = 1 α, where z α /2 is the α 2 quantile of the noral distribution. After soe algebra, this becoes ( /2 /2 ) P X z * Var( X ) x X + z * Var( X ) = 1, where, although it α α α looks like a probability stateent for x (since it is in the center), this is still a probability stateent with respect to the rando variable X. Upon replacing the rando variable X with a value x ˆ, estiated fro a set of data, we for a confidence interval, x ˆ ± z /2* Var ( X ) α, for the true value x (Myers, 1990). Just as we define a confidence interval for the true abscissa (x value), siilar stateents can be ade for the ordinate (y value) of the extrea. The confidence intervals for the abscissa and ordinate values, together, create a confidence region for the true extrea of the rando - 4 -

polynoial. Assuing the estiated abscissa and ordinate are uncorrelated, which we have found through siulation to be approxiately true, the confidence region would be expected to be an oval. However, we siply present confidence regions that look like rectangular boxes that are independent confidence intervals for each of the abscissa and ordinate of the extrea, not a joint confidence interval for ( x, y ), the true location of the extrea. The goal here is to study, prototype, and test three different ethods of finding confidence intervals for the extrea of these rando polynoials. The three ethods investigated included the Fairley ethod (based on Fieller's Theore), the delta ethod (based on Taylor series expansions), and bootstrapping (based on repeated sapling with replaceent). Each ethod is described in detail, and is shown along with siulations used to test each ethod. An epirical application using data fro the NASA Aes Research Center, associated with the possibility of a runaway greenhouse effect, is also presented. To test each of the ethods, data fro three different known polynoials (degrees 2, 3, and 4) were siulated. For each known polynoial, three different types of rando noise were added to the known polynoials to iitate a set of input and rando response data. The rando noise cae fro the noral, exponential, and t - 5 -

distributions. The details of the polynoials used and noise added will be discussed further in the following chapters. Chapter 2 gives the atheatical details of each of the confidence interval ethods used, as well as theores and proofs regarding these confidence interval ethods. In Chapter 3, we discuss the siulation processes and state the nuerical results of the siulations. An epirical application using data fro the NASA Aes Research Center is presented in Chapter 4. Finally, Chapter 5 draws conclusions based on the results in Chapter 3 and gives recoendation for areas of future research in this field. - 6 -

Chapter 2 Methods Fairley Method Fairley (1968) expands upon a ratio, initially presented by Fieller (1954), and iplies that for any degree rando polynoial, the ratio between the estiated polynoial squared and the variance of the polynoial evaluated at the root follows an F-distribution with 1 nuerator degree of freedo and n-(k+1) denoinator degrees of 2 ( px ( 0)) freedo (i.e., σ ( px ( )) 0 2 F 1, n ( k+ 1) ). As entioned earlier, the variables n and k represent the saple size of the data (nuber of observations) and the order of the fitted polynoial respectively. If one notes the difference between Fieller and Fairley with respect to the distributions of the ratios, the rationale for one using the t-distribution while the other uses the F is that the F-distribution with 1 nuerator degree of freedo is the equivalent to the square of a t-distribution (Casella and Berger, 2002, p. 255). Fairley develops the use of the ratio ethod initially presented by Fieller for the linear and quadratic cases. In fact, Fairley (1968) gives a - 7 -

definition of the confidence interval for the root (x value at which (x)=0) of a rando polynoial as follows: The region on the x-axis where 2 ( px ( i )) σ ( px ( )) i F 2 1, n ( k+ 1) ( α) <, (2.1) (where F ( ) 1, n ( k+ 1) α denotes the upper α point of the F1, n ( k + 1) distribution), defines a confidence region for the root of the polynoial with confidence coefficient 1- α (p.125). To copute σ ( p( x )) 2 i in Equation (2.1), we start 2 k with σ ( px ( )) ( ( )) ( ˆ i = V pxi = V yi) = Vx ( i β ), where xi = (1, xi, xi,..., x i ), and β = β0, β1,..., β k 2 ˆ ˆ ( ˆ ˆ ˆ ). Now, V( x ˆ iβ ) = xv i ( ˆ β ) xi ˆ β ( ) 1 = X X XY 1 1 1 (( ) ) = (( ) ) VY ( )(( ) ) and ˆ 1 V( β ) = V (( XX ) XY ) as discussed in the Introduction. Fro here, since 2 V XX XY XX X XX X where VY ( ) = V( Xβ + ε) = σ I since ε is a vector of independent and identically distributed rando 2 errors with constant variance σ. Thus, 1 ( ) ( ) ( ( ) σ ( σ ( )( V( ˆ β ) = V ( XX) XY = ( XX ) X V( Y) X( XX ) = ( XX ) X I X( XX ) 2 1 1 = ( XX ) X X( XX ) 2 1 = σ ( XX ) 1 1 1 2 1 ) ) ) - 8 -

V ( ˆ β0) cov( ˆ β0, k ˆβ ) 2 1 and ( ˆ V β) = σ ( XX ) =. Consequently, cov( ˆ β, ˆ ˆ k β0) V ( βk) Vˆ( ˆ β 0 ) cov( ˆ β0, ˆ β ) k ˆ ˆ 2 1 V( β) = ˆ σ ( XX ) = (2.2) cov( ˆ β, ˆ ˆ ˆ k β0) V ( βk) where 2 ˆ σ, an estiate of 2 σ, is the ean square error (MSE) of the regression. Therefore, 2 2 1 σ( p( x )) ˆ i = σ ( x i ( XX ) x i). These forulas apply to any degree rando polynoial. Theore 2.1: A 100(1 α )% confidence interval for x, the abscissa of the extrea of a rando polynoial, is the set of x i such that z ( p ( xi )) < < σ ( p ( x )) α/2 z1 α /2 i, where z is the appropriate percentile of the standard noral distribution, p ( x ) = ˆ β + 2 ˆ β x + 3 ˆ β x +... + k ˆ β x, 2 k 1 i 1 2 i 3 i k i ( ( )) ( ) ˆ( ˆ σ p xi = x i V β ) xi 2 k 1, xi = (1, 2 xi,3 xi,..., kxi ) and Vˆ( ˆ β 1 ) cov( ˆ β1, ˆβ ) k ˆ( ˆ ) ˆ( ˆ V β = V β) 2: k+ 1,2: k+ 1 = cov( ˆ β, ˆ ˆ ˆ k β1) V ( βk) is the 2:k+1,2:k+1 portion of V ˆ( β ˆ ) as described in Equation (2.2). Proof: In this case, we are interested in calculating confidence intervals for the abscissa of the extrea, or the root of the derivative, of the - 9 -

rando polynoial. Since the derivative of a rando polynoial is another rando polynoial, we siply replace px ( ) in the Equation (2.1) above (fro Fairley) with p ( x). Then, the region on the x-axis as described above defines a confidence interval for the abscissa of the extrea of the rando polynoial. Equation (2.1) is equivalent to t ( p ( xi )) < < t α where tn ( k + 1) represents the t-distribution σ ( p ( x )) α/2, n ( k + 1) 1 /2, n ( k + 1) i with n-(k+1) degrees of freedo (since the F-distribution with 1 nuerator degree of freedo corresponds to the square of the t- distribution as entioned earlier). Since the t-distribution converges to the noral distribution, this is equivalent to z ( p ( xi)) < < z σ ( p ( xi)) α/2 1 α/2, where 2 ( p ( x )) (( x ) ( Vˆ ( ˆ σ = β )) x i i i ) interval for x is the set, as outlined above. Therefore, a confidence ( p ( xi )) xi zα/2 < < z1 α/2}. σ ( p ( x )) Theore 2.2: A 100(1 α )% confidence interval for, the ordinate of the i y extrea of a rando polynoial, is given by, 1 yx ˆ( ˆ ) ˆ ˆ ˆ ± za/2σ x ( XX) x where z is the appropriate percentile of the standard noral distribution, 2 k xˆ (1, ˆ, ˆ,..., ˆ = x x x ) and atrix X is as described in and around Equation (1.1). - 10 -

Proof: The confidence interval for the ordinate of the extrea, y, is calculated using ordinary ethods fro regression analysis. As in Myers (1990), assuing noral errors, 100(1 α )% confidence bounds for E( y x = x ) are given by 0 yˆ( ) t s ( ) 1 x0 ± a/2, n p x0 XX 0 x (p.112). Delta Method The delta ethod is based on using Taylor series expansions to approxiate variances and covariances of functions of paraeter estiators. Many references exist for the delta ethod, for exaple, details can be found in Meeker and Escobar (1998). The following describes the details regarding the use of the delta ethod to find confidence intervals for the extrea of a rando quadratic polynoial. To calculate the true abscissa, x, of the extrea of, for exaple, a quadratic equation, x ( ) x x 2 = β0 + β1 + β2, set the derivative equal to zero β1 (i.e., ( x) = β1+ 2β2x= 0). Solving this equation for x gives x = which 2β we define to be g ( β ) 1, where β = ( β0, β1, β 2) are the true values of the paraeters. Now that the true abscissa of the extrea has been calculated, the true ordinate (y value) of the extrea can be coputed. 2 Recall that, y x x 2 = β0 + β1 + β2 and we are interested in calculating y - 11 -

β1 β1 when x =. Upon substituting 2β 2β 2 2 in place of x in, we get y y 2 2 β 1 β 1 β1 0 1 2 0 g2 2β2 2β2 4β2 = β + β + β = β = : ( ) β. Define g( β ) = ( g1( β), g2( β)). The estiated rando quadratic equation is p( x) = yˆ = ˆ β + ˆ β x+ ˆ β x 0 1 2 2 whose derivative is p ( x) = yˆ = ˆ β ˆ 1+ 2β2x. Siilar to what was done above for the true abscissa, to calculate the estiated abscissa ( x ˆ ) of the extrea, set the derivative of the estiated rando polynoial equal to zero (i.e., ˆ ˆ ˆ β1 p ( x) = yˆ = β1+ 2β2x= 0). Solving for x gives xˆ =. Then, 2 ˆ β 2 ˆ g1( β ): x ˆ β1 = ˆ = 2 ˆ β 2 where ˆ β = ( ˆ β ˆ ˆ 0, β1, β 2) are estiates of the paraeters. Now, copute the estiated ordinate (y value) of the extrea. Recall 2 that, yˆ ˆ ˆ ˆ ˆ = β + β x + β xˆ and calculate when 0 1 2 1 yˆ x 2 ˆ β2 ˆ ˆ β =. Substitute ˆ β1 2 ˆ β 2 in place of x ˆ in yˆ, to get 2 ˆ ˆ ˆ 2 ˆ ˆ β 1 ˆ β 1 ˆ β1 ˆ 0 1 2 0 g2 2 ˆ β ˆ ˆ 2 2β 2 4β2 β yˆ = β + β + β = β = : ( ). Siilarly, define g( ˆ β ) = ( g ˆ ˆ 1( β), g2( β)). - 12 -

Fro Meeker and Escobar (1998), V( g1( ˆ β)) cov( g ˆ ˆ 1( β), g2( β)) ˆ V( X) cov( X, Y) V[ g( β )] = cov( ˆ ˆ ˆ = g cov(, ) ( ) 2( β), g1( β)) V( g2( β)) Y X V Y (2.3) and is calculated as g( β ) g( β ) V[ g( ˆ β)] V( ˆ β), β β where g1( β) g2( β) 0 1 β0 β0 1 ˆ β1 g( β ) g1( β) g2( β) 2 ˆ = = β ˆ 2 2β 2. V ( ˆ β ) is estiated β β β ˆ ˆ g 1 1 2 β1 β1 1( β) g2( β) ˆ 2 ˆ 2 2β2 4β2 β2 β 2 β= ˆ β fro the Fisher Inforation atrix, I β. Since data is available, one can copute the observed inforation atrix 2 ( β ) ˆ Iβ : = β β 1, where ( β ) the log likelihood for the specified odel. Then, an estiate of the is variance of ˆ β is Vˆ( ˆ β ) = ( ˆ ) 1 I β Fairley ethod previously discussed.. In our case, V ˆ( β ˆ ) is calculated as in the Fro here, V ( X ) ˆ( ˆ = V g 1( β )) can be used to calculate the confidence interval for the abscissa, assuing X to be norally distributed, as x ˆ ± z V /2 ( X ) α and VY ( ) ˆ( ˆ = Vg2( β )) can be used can be used to copute the confidence interval for the ordinate, as yˆ ± z V ( Y ). α /2-13 -

One should note when using the delta ethod, that the [2,1] eleent of V[ g ( ˆ β )] is the covariance between the rando variables associated with abscissa and ordinate of the extrea of the rando polynoial. The covariance is a easure of the linear dependence between the abscissa and ordinate. The larger the absolute value of the covariance, the greater the linear dependence between the abscissa and ordinate. Once the covariance is calculated, the correlation can be coputed as cov( X, Y) ρ = σ σ. The correlation gives a standardized value X Y for the linear dependence between the estiated abscissa and ordinate values (Wackerly, et al., 2002, p.250). Although the details are oitted here, the sae steps were followed for a rando cubic polynoial. However, the coplexity of the partial derivatives generally akes the use of this procedure for fourth order and higher degree polynoials too coplicated to calculate and progra. Theore 2.3: A 100(1 α )% confidence interval for x is xˆ ± z V ( X ), α /2 where z is the appropriate percentile of the standard noral distribution. Here, V ( X ) ˆ( ˆ = V g 1( β )) rando variable X. is the estiated value of the variance of the - 14 -

Proof: For a noral rando variable W, the for of a 100(1 α )% confidence interval is wˆ ± z V ( W). Bharucha-Reid and Sabandha α /2 (1986) show that the real roots of rando polynoials with norally distributed coefficients are also approxiately norally distributed. Since X is a real root of a rando polynoial, and the coefficients, ˆ β, are norally distributed, X is a noral rando variable. Therefore, a 100(1 α )% confidence interval for x is x ˆ ± z V ( X, where ) α /2 V ( X ) ˆ( ˆ = V g 1( β )) and is calculated as the [1,1] eleent of Equation (2.3). Theore 2.4: A 100(1 α )% confidence interval for y is V ˆ[ g ( β ˆ )] in yˆ ± z V ( Y ), α /2 where z is the appropriate percentile of the standard noral distribution. Here, VY ( ) ˆ( ˆ = Vg2( β )) rando variable Y. is the estiated value of the variance of the Proof: For a noral rando variable W, the for of a 100(1 α )% confidence interval is wˆ ± z V ( W). Fro ordinary regression theory, α /2 Y is a noral rando variable (Myers, 1990). Therefore, a 100(1 α )% confidence interval for y is calculated as the [2,2] eleent of yˆ ± zα /2 V( Y ), where VY ( ) ˆ( ˆ = Vg2( β )) V ˆ[ g ( β ˆ )] in Equation (2.3). and is - 15 -

Bootstrapping As Meeker and Escobar (1998, p. 205) describe, "The idea of bootstrap sapling is to siulate the repeated sapling process and use the inforation fro the distribution of appropriate statistics in the bootstrap saples to copute the needed confidence interval (or intervals), reducing the reliance on large-saple approxiations." The values of interest are the abscissa and ordinate values for the iniu or axiu of a rando polynoial. Therefore, we want to take an observed data set ( x1, y1),( x2, y2),( x3, y3),...,( xn, y n)}, and draw B saples (with replaceent), each of size n. For the th i rando saple, calculate the extrea ( xˆ, y ˆ of the rando polynoial that was ) i i generated fro the regression fit of the sapled data set. To accoplish this, first estiate the coefficients of the rando polynoial via the ethod of least squares. Then, estiate the abscissa of the extrea by setting the derivative of the estiated rando polynoial, p(x), equal to zero and solve for x (i.e., find x ˆ i such that p ( xˆ i ) = 0). Once the abscissa is calculated, the ordinate follows as yˆ = p( x ˆ ). After B saples have been generated, their associated extrea calculated and saved, the data is sorted in ascending order ( x ˆ i and yˆ i independently). The abscissa confidence intervals are coputed by taking the upper and lower α /2 i i - 16 -

quantiles of the sorted xˆ, xˆ,... x ˆ } bootstrapped estiates. Siilarly, 1 2 B the ordinate confidence intervals are obtained by taking the appropriate quantiles of the sorted ˆ y i bootstrapped estiates. Resaple with replaceent (Draw B saples fro True Extrea Estiated Extrea DATA, each of size n) ( x, y ) n observations DATA ( xˆ, y ˆ ) * DATA 1 ( xˆ, y ˆ ) 1 1 * DATA 2 ( xˆ, y ˆ ) 2 2... * DATA B ( x ˆ, y ˆ ) B B Figure 2.1. Illustration of nonparaetric bootstrap - 17 -

Chapter 3 Ipleentation Siulations For siplicity in coparing the three ethods, polynoials were chosen which had a single unique axiu. The confidence intervals for the one axiu were calculated. The sae can be done for the inius of rando polynoials or in situations with ultiple extrea (assuing it is known in what region the extrea of interest is located). Progras were written in S-Plus Version 7.0 to ipleent each algorith (the Fairley ethod, the delta ethod, and bootstrapping). Additionally, siulation and testing functions were written. For each ethod, each degree polynoial, and each type of rando noise (noral, exponential, and t-distribution), the following steps were carried out: Start with a known polynoial of specified degree, x = β + β x+ β x + + β x 2 ( ) 0 1 2... k k, with a known real-valued axiu For a doain of interest (chosen to surround the axiu) specify a set of n=100 x values x1, x2, x3,..., x n } at which the polynoial is evaluated Evaluate the polynoial at each of the specified x values Repeat these steps for N siulations: o Add rando noise (noral, exponential or t-distribution) to siulate response data and get y1, y2, y3,..., y n } - 18 -

o Using the siulated data ( x1, y1),( x2, y2),...,( xn, y n)}, fit an algebraic polynoial of specified degree (i.e., we are assuing the degree of the polynoial is known), ˆ ˆ ˆ 2 ˆ 0 1 2 k k p( x) = β + β x+ β x +... + β x using the least squares estiation regression function fro S-Plus o Obtain regression odel paraeters fro S-Plus such as the standard deviation of the residuals ( ˆ σ ), and covariance atrix for the paraeter estiates, ˆ ˆ( ) V β o Calculate the derivative of the rando polynoial, p ( x) = ˆ β x+ 2 ˆ β x+... + k ˆ β x 1 2 k k 1 o Estiate the root of the derivative x ˆ of the rando polynoial using nuerical techniques fro S-Plus o Calculate the associated y value of the axiu, ˆ ˆ ˆ 2 ˆ ( ) β0 β1 β2... βp yˆ = p xˆ = + xˆ + xˆ + + xˆ o Calculate the upper and lower 95% confidence liits for ( x, y ) using the specified ethod Deterine the accuracy of the ethod by counting the nuber of ties (out of N) that the true axiu, ( x, y ), lies within the calculated liits p The second degree polynoial used for siulation was 2 x ( ) = y= x + 4x+ 3 with ( x) = y = 2x+ 4. Thus, the only root of the derivative is at x = 2. This polynoial has a axiu at (2,7). The doain used for siulation was [0,5]. As noted earlier, for each degree polynoial, three different types of rando noise were added. For the - 19 -

noral distribution noise, a ean of zero and a standard deviation of 1 was used. The standard deviation of the noise for each known polynoial was chosen to be between 10 and 20 percent of the range of the y values, which is large enough to give good variation, but sall enough to correctly estiate the rando polynoial. For the exponential distribution, a scale of 1 was used to deterine the effect of a skew distribution with a standard deviation of 1. The scale value of 1 was then subtracted to aintain a zero ean (i.e., if exp (1) is th th the i exponential rando value with a scale value of 1, the i noise i ter was noise = exp (1) 1 so that E( noise ) = 0 and Var( noise ) =1), as in the i i i i noral distribution. For the t-distribution, we used υ = 3 degrees of freedo in order to exaine the effect of a long tailed distribution. However, in order to aintain consistency, this rando quantity was divided by the square t 3,i root of 3 (i.e., if is a rando value fro the t-distribution with 3 degrees of freedo, then the th i added noise ter was noise = t3, / 3 so i i that E( noise i ) = 0 and Var( noise i ) = 1, as it was for the other distributions). Figure 3.1, below, shows an exaple of a siulation for a second degree polynoial with noise fro the noral distribution. The sall solid circles are the n observations, the solid line is the known polynoial, (x), the dotted line is the estiated rando polynoial, - 20 -

p(x), and the box represents the confidence intervals for the abscissa and ordinate as coputed using the delta ethod. Y -2 0 2 4 6 8 10 True Polynoial Estiated Polynoial 0 1 2 3 4 5 X Figure 3.1. One exaple of a second degree rando polynoial with norally distributed noise and associated confidence intervals calculated via the delta ethod Figure 3.2 is a close up of Figure 3.1 and shows the true and estiated polynoials, the true axiu, ( x, y ), the estiated axiu, ( xˆ, y ˆ ), as well as the confidence region obtained using the delta ethod. - 21 -

Y 4 5 6 7 8 9 True Maxiu Estiated Maxiu 1.6 1.8 2.0 2.2 2.4 X Figure 3.2. Close up of exaple of a second degree rando polynoial with norally distributed noise and associated confidence intervals For the third degree polynoial we used 3 2 x ( ) = y= x 7.5x + 12x with x ( ) = y = (3x 3)( x 4). The roots of the derivative are at x = 1, 4. This polynoial has a axiu at (1, 5.5). The doain used for siulation was [-0.5,5.5]. Again, three types of rando noise were added. For the noral distribution noise, a ean of zero and a standard deviation of 2.5 was used. For the exponential distribution, a scale of 2.5 was used, then subtracted to aintain a zero ean. The t-distribution used 3 degrees of - 22 -

freedo and was then divided by 3 2.5 so that the standard deviation was 2.5 as it was for the other two distributions. See Figure 3.3, below, for an exaple of the third degree polynoial siulation. True Polynoial Estiated Polynoial Y -10-5 0 5 0 1 2 3 4 5 Figure 3.3. One exaple of a third degree rando polynoial with exponentially distributed noise and associated confidence intervals calculated via bootstrapping ethod X The fourth degree polynoial that was used was 1 1 16 78 ( ) = = + + with 60 9 15 15 4 3 2 x y x x x x - 23 -

'( x) = y = - 1 ( x-3)( x- (1-5 i))( x- (1 + 5 i)). The roots of the derivative are 15 at x = 3,1+ 5 i,1 5i. This polynoial has a axiu at (3, 7.65). The doain used for siulation was [0,5]. Again, three types of rando noise were added. For the noral distribution noise, a ean of zero and a standard deviation of 1.5 was used. For the exponential distribution, a scale of 1.5 was used, then subtracted off to aintain a zero ean. The t-distribution used 3 degrees of freedo and was then divided by 3 1.5 so that the standard deviation was 1.5 as it was for the other two distributions. See Figure 3.4, below, for an exaple of a fourth degree polynoial siulation. - 24 -

Y -2 0 2 4 6 8 10 True Polynoial Estiated Polynoial 0 1 2 3 4 5 X Figure 3.4. One exaple of a fourth degree rando polynoial with t- distribution noise and associated confidence intervals calculated via the Fairley ethod As stated in the introduction, the abscissa and ordinate of the extrea of the rando polynoial are assued uncorrelated. Figure 3.5 below, shows an exaple of one set of siulations where the estiated abscissa, x ˆ, is plotted against the estiated ordinate, yˆ for all 2500 siulations of the third order polynoial with noral distribution noise. The saple correlation calculated between the estiated abscissa and ordinate for this case was ˆ ρ = 0.065. To test for absence of correlation, - 25 -

ˆ ρ n 2 which is the null hypothesis, we used t = = 0.646 2 1 ˆ ρ (Wackerly, Mendenhall, and Scheaffer, 2002). This corresponds to a P-value of 0.74, which iplies that we retain the null hypothesis of uncorrelated abscissa and ordinate values. Y 4.0 4.5 5.0 5.5 6.0 6.5 7.0 0.90 0.95 1.00 1.05 1.10 X Figure 3.5. Plot of ( xˆ, ˆ y ) for all siulations of the third degree polynoial with norally distributed noise - 26 -

Results - Confidence Interval Accuracy The following tables give the accuracy of each ethod (Fairley, delta, and bootstrap) for each of the three choices of noise (noral, exponential, and t-distribution). Each table specifies the results for the abscissa or ordinate and is based on the degree of the original siulated polynoial. We would like to see the percentage of ties that the true value lies within the calculated confidence interval to be approxiately 100(1- α ) (i.e., # ties true value lies within CI 100( ) 100(1- α) ), where N is the N nuber of siulations). In our case, N=2500 and 95% ( α =.05 ) confidence intervals were calculated. Therefore, if the percentage is close to 95%, which is the noinal level of the confidence interval, then the ethod is deeed successful. The nubers in bold indicate which ethod was the closest to the noinal level of confidence for each set of siulations run. If the values for two different ethods (for a particular degree polynoial and noise) are less than about 1.2% apart, there is no statistically significant difference between the accuracy of the ethods. The nuber 1.2% is approxiately two standard errors of the difference of the proportions. With very few exceptions, the three ethods do not have statistically significant differences in the accuracy results. - 27 -

Table 3.1. Percentages of tie that the true abscissa lies within the calculated confidence liits for a second degree rando polynoial Noral Exponential t Fairley Method 94.20 94.28 93.96 Delta Method 94.88 94.92 94.60 Bootstrapping 94.20 93.96 94.12 Table 3.2. Percentages of tie that the true ordinate lies within the calculated confidence liits for a second degree rando polynoial Noral Exponential t Fairley Method 94.20 94.80 94.80 Delta Method 93.80 94.88 94.96 Bootstrapping 94.72 93.76 93.20 For the second degree polynoial abscissa, it sees that the delta ethod gives the best results (closest to the noinal level) for all choices of noise. However, for the second degree polynoial ordinate, bootstrapping was the best for the noral distribution noise; the delta ethod was just barely better than the Fairley ethod for both the exponential and t-distribution noise. - 28 -

Table 3.3. Percentages of tie that the true abscissa lies within the calculated confidence liits for a third degree rando polynoial Noral Exponential t Fairley Method 94.80 94.64 94.60 Delta Method 94.48 94.20 94.08 Bootstrapping 94.00 92.76 93.24 Table 3.4. Percentages of tie that the true ordinate lies within the calculated confidence liits for a third degree rando polynoial Noral Exponential t Fairley Method 95.16 93.68 95.20 Delta Method 94.92 94.72 94.84 Bootstrapping 94.44 93.00 93.12 For the third order polynoial, the Fairley ethod was the closest to noinal for the abscissa and ordinate for alost all choices of noise. The only exception was for the ordinate when exponential noise was added, where the delta ethod provided closer to noinal results. Bootstrapping was the farthest fro the noinal confidence levels for all third degree siulations run. - 29 -

Table 3.5. Percentages of tie that the true abscissa lies within the calculated confidence liits for a fourth degree rando polynoial Noral Exponential t Fairley Method 94.92 94.72 95.16 Bootstrapping 94.20 94.32 94.72 Table 3.6. Percentages of tie that the true ordinate lies within the calculated confidence liits for a fourth degree rando polynoial Noral Exponential t Fairley Method 94.12 94.40 95.08 Bootstrapping 93.48 95.28 93.52 For the fourth degree polynoial, only the Fairley ethod and bootstrapping were used to calculate the confidence intervals. As entioned earlier, the coplexity of the partial derivatives required for the delta ethod ade it too coplicated to use. Again, the results are close, but ixed. The Fairley ethod percentages were closest to the noinal level for both the abscissa and ordinate when noral and t- distributed noise were used. However, when the noise cae fro the exponential distribution, the Fairley ethod was closer to noinal for the abscissa whereas bootstrapping was closer to noinal for the ordinate value. - 30 -

Results - Confidence Interval Lengths The following tables give the average length of the confidence intervals calculated for each ethod (Fairley, delta, and bootstrap) for each of the three variations of noise (noral, exponential, and t- distribution). Each table specifies the results for the abscissa or ordinate and is based on the degree of the original polynoial siulated. We would like to see the length of the confidence interval to be as sall as possible, assuing the ethod is sufficiently accurate. Therefore, if the ethod is accurate and the width is sall, then the ethod is optial. The nubers in bold indicate the sallest average confidence interval length for each set of siulations run. Table 3.7. Average lengths of the abscissa confidence intervals for a second degree rando polynoial Noral Exponential t Fairley Method.170.169.165 Delta Method.147.183.148 Bootstrapping.169.170.162 Table 3.8. Average lengths of the ordinate confidence intervals for a second degree rando polynoial Noral Exponential t Fairley Method.565.563.546 Delta Method.529.644.499 Bootstrapping.562.555.540 Looking at Tables 3.7 and 3.8 there is no clear ethod that - 31 -

provides the sallest confidence intervals. The delta ethod provided the sallest average confidence interval length when noral and t- distribution noise were used. When exponentially distributed noise was used for the siulations, the Fairley ethod offers the sallest average confidence interval length for the abscissa, whereas bootstrapping gave the sallest average confidence interval length for the ordinate value. Table 3.9. Average lengths of the abscissa confidence intervals for a third degree rando polynoial Noral Exponential t Fairley Method.139.139.135 Delta Method.135.138.133 Bootstrapping.139.137.134 Table 3.10. Average lengths of the ordinate confidence intervals for a third degree rando polynoial Noral Exponential t Fairley Method 1.745 1.732 1.679 Delta Method 1.600 1.732 1.675 Bootstrapping 1.731 1.699 1.650 For the third degree polynoial, the average confidence interval length results were siilar to those for the second degree. Once again, the delta ethod had the sallest average confidence interval length for the noral distribution noise. However, bootstrapping provided the sallest average confidence interval length for both the abscissa and ordinate when exponentially distributed noise was used. The results were ixed for the t-distribution noise. - 32 -

Table 3.11. Average lengths of the abscissa confidence intervals for a fourth degree rando polynoial Noral Exponential t Fairley Method.656.653.510 Bootstrapping.653.642.612 Table 3.12. Average lengths of the ordinate confidence intervals for a fourth degree rando polynoial Noral Exponential t Fairley Method 1.037 1.028.792 Bootstrapping 1.027 1.060.979 Just as with the second and third degree polynoials, there is no ethod that clearly gives the sallest confidence interval lengths for fourth degree polynoials. Bootstrapping has the sallest average confidence interval length (in both abscissa and ordinate) for the norally distributed noise. For the t-distributed noise, the Fairley ethod provided the sallest average confidence intervals. However, for the exponential distribution, the results are ixed. Results - Overall Looking at the accuracy data (percentages of ties the true abscissa or ordinate lies within the calculated confidence intervals) cobined with the length of the confidence intervals, we can deterine if there is an optial ethod for coputing these confidence intervals. - 33 -

For the second degree polynoials siulated, the delta ethod for the abscissa had accuracy results that were the closest to the noinal level and the sallest average confidence interval lengths for the noral and t-distribution noise. Although the delta ethod had the sallest average confidence interval length for the ordinate for the norally distributed noise, the bootstrapping ethod had the best accuracy. For the second degree exponentially distributed noise siulations, the delta ethod had the best accuracy, but not the sallest average confidence interval lengths for either the abscissa or ordinate. The third degree polynoial siulations did not indicate an optial ethod. The Fairley ethod was the closet to noinal for all siulations except the exponential noise ordinate where the delta ethod was closer to noinal. However, the delta ethod and bootstrapping had the sallest average confidence interval lengths for both the abscissa and ordinate with all choices of noise. The results for the fourth degree polynoial siulations, again, do not indicate a single optial ethod. For both the abscissa and ordinate, the Fairley ethod provided the ost accurate confidence intervals for norally distributed noise, but bootstrapping had the sallest average confidence interval lengths. For the cases where t- distribution noise was added, the Fairley ethod had the closest to noinal percentages, and the sallest average confidence interval - 34 -

lengths for both the abscissa and ordinate values. When using the Fairley ethod for fourth degree polynoial siulations with t-distributed noise, caution need be taken. There are cases using a long tailed distribution where the Fairley ethod creates an indefinite length confidence interval. If the estiated variance of p ( x i ), ( ( p ( x )) 2 p ( xi ) σ i ) increases as x i increases, then the ratio σ ( p ( x )) i ay reain in the interval ( za/2, z1 a/2) indefinitely. Siilarly, if σ ( p ( x i )) increases as x i decreases, this can also cause the ratio to reain in the interval as described above. Either of these cases can cause the Fairley ethod to create an infinite length confidence interval. Over the 2500 siulations of the fourth order polynoial with t- distribution noise, there were four instances of the infinite confidence intervals using the Fairley ethod. Each of these four instances was caused by an outlier whose distance fro the estiated curve to the outlier, in ters of nuber of standard deviations away, was 36.3, 45.0, 67.2, and 141.1. When the outliers were reoved, the confidence interval calculations were successful and appropriate. One should note that these outliers are reasonable values to occur in n*n siulations with a t-distribution with 3 degrees of freedo. - 35 -

Chapter 4 Epirical Application Data fro NASA Aes Research Center was gathered over a region of the Pacific Ocean to investigate the possibility of a runaway greenhouse effect. A runaway greenhouse effect occurs when the aount of solar radiation absorbed exceeds the aount reflected or released (i.e., when planetary heat loss begins to decrease as surface teperature rises). The data included easureents of sea surface teperature in degrees Kelvin and clear sky upward long-wave flux (also known as outgoing flux) easured in watts per eter squared at nuerous latitudes and longitudes over the Pacific Ocean. Outgoing flux is a easure of the rate of flow of heat back up out of the atosphere. Of particular interest was to odel the relationship between the sea surface teperature and outgoing flux, deterine its axiu, and calculate confidence intervals for this axiu. The data that was used for this study included weekly observations over a one-year tie span fro March 1, 2000 through February 28, 2001 (53 weeks). For each week, ten different latitudes and thirty-six different longitudes at which the easureents were observed were included. This gives 19,080 observations total. - 36 -

Several regressions were run to deterine which order polynoial best represented the data. It was deterined that an eighth degree polynoial best fit the data. The order of the polynoial to best fit the data was deterined by attepting various degree regressions and choosing the one with the sallest ean square error (MSE). Once the rando polynoial was estiated, the estiated axiu was coputed. The estiated axiu for this set of data was (299.975, 291.6639). This iplies that there is evidence of a runaway greenhouse effect occurring in specific regions of the Pacific Ocean, when planetary heat loss begins to decrease, as surface teperature continues to rise, (i.e., as outgoing flux reaches a axiu then begins to decrease while sea surface teperature continues to increase). After the axiu was estiated, confidence intervals were calculated using the Fairley ethod and bootstrapping. Figure 4.1 shows the data, estiated polynoial, and estiated axiu. - 37 -

Upward Longwave Flux (watts/eter squared) 260 280 300 320 285 290 295 300 Sea Surface Teperature (Kelvin) Figure 4.1. Graph of Sea Surface Teperature vs. Outgoing Flux with estiated rando polynoial and associated axiu The confidence intervals calculated via the Fairley ethod and bootstrapping are given in Table 4.1. It is interesting to note that the Fairley ethod has a slightly wider confidence interval for the abscissa than the bootstrapping ethod. Conversely, the bootstrapping ethod confidence interval for the ordinate is slightly wider than the corresponding confidence interval as calculated by the Fairley ethod. It should be noted that both ethods of calculating confidence intervals contain the estiated axiu as would be expected. - 38 -

Table 4.1. Calculated abscissa and ordinate confidence intervals for the axiu of sea surface teperature vs. outgoing flux Abscissa Confidence Intervals Ordinate Confidence Intervals Fairley (299.9093, 300.0423) (291.4298, 291.8980) Bootstrap (299.9144, 300.0348 (291.3218, 291.9251) - 39 -

Chapter 5 Conclusions The topic of estiating the abscissa and ordinate of an extrea, and their associated confidence intervals, for rando polynoials was studied. Three ethods of interest for calculating the confidence intervals were tested; these included the Fairley ethod, delta ethod, and bootstrapping. The Fairley ethod is based on a ratio discussed by Fieller. Fairley (1968) indicated that the region on the x axis where the ratio of the estiated polynoial and the variance of the polynoial is less than an F-distribution with n-(k+1) degrees of freedo defines a confidence region for the root of the rando polynoial. This ethod was odified to find the confidence interval for the root of the derivative of a rando polynoial, therefore giving a confidence interval for the abscissa of the extrea of the rando polynoial. A standard confidence interval technique fro regression analysis was used to copute the confidence bounds for the ordinate of the extrea of the rando polynoial. The delta ethod, using Taylor series approxiations, was used for second and third degree polynoials only. The coplexity of the partial derivatives that are required for the use of this ethod for fourth and higher degree polynoials are too coplicated. This ethod proved - 40 -

to be the fastest coputationally, of the three ethods that were investigated. For exaple, to run one hundred siulations for the delta ethod only took a atter of seconds, whereas the Fairley ethod ay take an hour and bootstrapping could take two hours. Both the Fairley ethod and bootstrapping have internal loops that require a significant aount of processing tie but the delta ethod has only siple calculations at each siulation, after the initial partial derivatives are coputed. Nonparaetric bootstrapping, based on repeated sapling with replaceent, although by far the easiest to ipleent, only needing to copute the axius and take quantiles, was the ost tie consuing (coputer intensive) process to run because for each single siulation, there were B=2000 bootstrap iterations. Literature recoends using between 2000 and 10,000 bootstrap iterations to achieve reasonably accurate results (Meeker and Escobar, 1998, p. 206). To fully investigate each of the ethods, known polynoials of degree 2, 3, and 4 were used as bases for siulations using rando noise fro the noral, exponential, and t distributions. It should be noted that in all of the regressions that were run for the siulations, we are assuing that the degree of the polynoial is known. However, this is rarely the case. One should study inforation about odel selection to deterine the correct degree polynoial to be fit for a given set of - 41 -

input and response data. See Myers (1990) for ore inforation on odel selection. In general, all three ethods seeed to be roughly equally accurate and provided approxiately the noinal level of confidences desired. Of the three ethods investigated, the Fairley ethod is the best overall choice for coputing confidence intervals for the abscissa and ordinate of the extrea of rando polynoials in the absence of outliers creating infinite confidence intervals. In any case, such outliers would likely be detected and reoved prior to regression analysis, thus obviating this difficulty. This ethod is the ost versatile (applicable for all degree polynoials), the accuracy provides approxiately the noinal level of confidence, the average confidence interval lengths were siilar to the other two ethods, and the coputational ties, although longer than the delta ethod, were far better than the bootstrapping ethod. Using the data fro NASA Aes Research Center proved to be a powerful application of the use of finding and calculating confidence intervals for the extrea of rando polynoials. Since an eighth degree polynoial was fit to the data, only the Fairley ethod and bootstrapping were used to copute the confidence regions. Both ethods gave siilar results. Soe areas for additional research could include finding other ethods for coputing the confidence regions for the extrea of rando - 42 -

polynoials and investigating joint confidence intervals (instead of independent abscissa and ordinate confidence bounds). The joint confidence regions would give oval shaped areas (instead of rectangular boxes), and would give a slightly better indication of where the true extrea would lie if the estiated ordinate and abscissa are correlated. The siulations for this thesis were liited to noral, exponential, and t-distribution noise, with relatively sall variances. Another area of future research could be to continue investigation with additional nonnoral errors. Siilarly, higher order polynoials could be siulated, and the results copared. Also, different saple sizes could be used to test the ethods. Additionally, these ethods were only applied to single variate polynoials (i.e., p(x) ); expanding the use of these ethods to ultivariate polynoials (i.e., p( x1, x2,..., x s ), a rando polynoial in s variables) could be useful. Of particular interest would be working out the details and the prograing of the delta ethod for higher order polynoials, since the ethod is so coputationally efficient. With all of the atheatical software packages available, the partial derivatives are coputable. Thus, reducing the coplexity of the derivative and associated calculations for the delta ethod would be a worthwhile endeavor. Such software could be written and ade publicly available. - 43 -