ROBUST AND EFFICIENT ESTIMATION OF THE MODE OF CONTINUOUS DATA: THE MODE AS A VIABLE MEASURE OF CENTRAL TENDENCY

Size: px
Start display at page:

Download "ROBUST AND EFFICIENT ESTIMATION OF THE MODE OF CONTINUOUS DATA: THE MODE AS A VIABLE MEASURE OF CENTRAL TENDENCY"

Transcription

1 ROBUST AND EFFICIENT ESTIMATION OF THE MODE OF CONTINUOUS DATA: THE MODE AS A VIABLE MEASURE OF CENTRAL TENDENCY Davd R. Bckel Medcal College of Georga Offce of Bostatstcs and Bonformatcs Ffteenth St., AE-337 Augusta, GA URL: E-mal: dbckel@mal.mcg.edu, bckel@malaps.org Key words: Robust estmaton; robust mode; mode estmator; average value; measure of locaton; asymmetry; transformaton; effcency. ABSTRACT. Although a natural measure of the central tendency of a sample of contnuous data s ts mode (the most probable value), the mean and medan are the most popular measures of locaton due to ther smplcty and ease of estmaton. The medan s often used nstead of the mean for asymmetrc data because t s closer to the mode and s nsenstve to extreme values n the sample. However, the mode tself can be relably estmated by frst transformng the data nto approxmately normal data by rasng the values to a real power, and then estmatng the mean and standard devaton of the transformed data. Wth ths method, two estmators of the mode of the orgnal data are proposed: a smple estmator based on estmatng the mean by the sample mean and the standard devaton by the sample standard devaton, and a more robust estmator based on estmatng the mean by the medan and the standard devaton by the standardzed medan absolute devaton. Both of these mode estmators were tested usng smulated data drawn from normal (symmetrc), lognormal (asymmetrc), and Pareto (very asymmetrc) dstrbutons. The latter two dstrbutons were chosen to test the generalty of the method snce they are not power transforms of the normal dstrbuton. Each of the proposed estmators of the mode has a much lower varance than the mean and medan for the two asymmetrc dstrbutons. When outlers were added to the smulatons, the more robust of the two proposed mode estmators had a lower bas and varance than the medan for the asymmetrc dstrbutons, especally when the level of contamnaton approached the 5% breakdown pont. It s concluded that the mode s often a more relable measure of locaton than the mean or medan for asymmetrc data. The proposed estmators also performed well relatve to prevous estmators of the mode. Whle dfferent estmators are better under dfferent condtons, the proposed robust estmator s relable for a wde varety of dstrbutons and contamnaton levels. D. R. Bckel, submtted to InterStat.

2 . INTRODUCTION Although many measures of locaton have been developed n recent years, researchers stll mostly use the mean and medan to descrbe the locaton or average value of contnuous data, largely because those measures are easy to understand and estmate. The concept of the mode s also easly understood and s attractve as the most probable value, but relable methods of estmatng the mode of contnuous data are not wdely known. Most nvestgators descrbe the average of contnuous data by the mean except when the data are hghly skewed, hghly kurtotc, or contamnated wth outlers, n whch case the medan s often used. The medan s ndeed a good choce n the latter two cases, when the mean s very unrelable or even undefned. In the case of asymmetrc data, the medan s preferred to the mean, often snce the medan s almost always closer to the mode, but a better approach would be to estmate the mode tself n many cases. (Dharmadhkar and Joag-dev (988) gve the condtons under whch the medan s between the mode and the mean.) The mode s the most ntutve measure of central tendency snce the mode represents the most typcal value of the data. However, prevous estmators of the mode have suffered from hgh bas, low effcency (hgh varance), or a senstvty to outlers, and these lmtatons have probably contrbuted to the neglect of the mode as a descrpton of the central tendency. To enable a wder use of the mode, t wll be demonstrated heren that the mode can be estmated wth lower bas and even hgher effcency and robustness to outlers than the medan for asymmetrc, contnuous data. The mean, medan, and mode are all measures of locaton µ ( X) of a contnuous random varable X n the sense that they satsfy µ ( ax + b )= aµ ( X)+ b, µ ( X )= µ ( X ), and X µ ( X) (Staudte and a>,b Sheather 99). The sample mean and sample medan are smple nonparametrc estmators of the mean and medan of the underlyng contnuous dstrbuton. For symmetrc dstrbutons, the sample mean and sample medan estmate the same D. R. Bckel, submtted to InterStat.

3 estmand snce the mean and medan are equal n ths case. For asymmetrc dstrbutons, the sample mean and sample medan estmate dfferent, but known, estmands. The mode, however, has no natural estmator. In ths paper, prevous estmators of the mode are compared wth estmators desgned to have low varance. The proposed strategy of mode estmaton conssts of the followng steps:. transform the data such that the transformed data s approxmately normal;. estmate the mean and standard devaton of the transformed data; 3. assumng that the transformed data were drawn from a normal dstrbuton, use the estmated mean and standard devaton of the transformed data to estmate the mode of the orgnal data. The transformaton used heren s the smple power transformaton: yx;α ( )= x α, where y s called the transformed varable, x s called the orgnal varable, and α s a nonzero real constant. We requre that x >, but the transformaton can be generalzed by yx;α,β ( )= ( x + β) α to allow negatve values of x (Box & Tao 99). Thus, gven a data set x n {} = n { } =, the transformed data set s y ( α), where y ( α)= x α. The value of α s chosen to make the transformed data as close as possble to normally-dstrbuted data. Although y s not exactly normal, t s constructed to be approxmately normal through the choce of α, so t can be consdered normal for the purpose of estmaton. If y s normally dstrbuted wth parameters y and σ, then the probablty densty functon (PDF) of y s f y ( y; y,σ)= πσ and thus the PDF of x s f x ( ) ( ) exp y y σ () ( ) ( x; y,σ,α)= f y ( x α ; y,σ) y x = ( πσ) α x α exp xα y σ. () D. R. Bckel, submtted to InterStat. 3

4 Many dstrbutons can be approxmated by Eq. (), n whch α quantfes the skewness, wth α = for zero skewness. The mode of x, denoted by M, s the value of x that maxmzes ts PDF. Requrng that [ f ( x; y, σ, α ) x] = fnd that ( α ) α x x=m, we 4σ M = y + y +. (3) α (Note that α = mples that M = y, as expected from the fact that the mode of a normal dstrbuton equals ts mean.) Therefore, M can be estmated by replacng y and σ wth estmates of the mean and standard devaton from the transformed sample { ( )} n α y =. If α s postve and so low that the argument of the square root s negatve, as sometmes occurs for small samples wth hgh skewness, then the estmate of the mode s the mnmum value of the sample. Ths method of estmatng the mode s descrbed n detal n Secton. Its bas, effcency, and resstance to outlers were studed by smulaton, as reported n Secton 3.. ESTIMATORS OF THE MODE. Standard parametrc estmator A smple mplementaton of the mode estmaton technque of Secton s the followng algorthm:. transform the data usng the value of α that maxmzes the standard correlaton coeffcent between the ordered transformed data and the expected order statstcs for a normal dstrbuton;. estmate the mean and standard devaton of the transformed data usng the sample mean and sample standard devaton; D. R. Bckel, submtted to InterStat. 4

5 3. n Eq. (3), substtute for y and σ the sample mean and sample standard devaton of the transformed data n order to estmate the mode of the orgnal data. The frst step nvolves computng Pearson s correlaton coeffcent between { ( )} n α, ordered such that y ( α ) y ( α ) y ( α ) L n, and { } n y = z =, the expected order statstcs gven by the cumulatve densty functon (CDF) Φ of the standard normal dstrbuton: z : = Φ. (4) n The correlaton coeffcent can be expressed as where ( α ) s ( α ) ( α ) + s ( α ) s r, (5) + ( α ) = s+ ( α ) z ( ) ± α δz y s ( ) = ± α δ. (6) δy The operator δ gves the sample standard devaton of ts argument; e.g., Let δy ( α)= n y n ( α) n y n j ( α). (7) = j = r α reaches a maxmum. There s only one α be the value of α for whch ( ) maxmum for data from sngle-modal dstrbutons snce r ( α ) decreases monotoncally as the transformed data becomes less and less normal. Thus, α s easy to compute numercally; the Appendx gves an algorthm that can fnd the α maxmum. The transformaton ( α ) x y = ensures that the transformed data s as close as possble to followng a normal dstrbuton. Then the sample mean and sample standard devaton of { ( )} n y = mode of the dstrbuton for whch { } n α are used n Eq. (3) to estmate M, the x = s a sample. D. R. Bckel, submtted to InterStat. 5

6 Ths estmator of the mode, called the standard parametrc mode (SPM), has advantages n ts smplcty and ts effcency n the case that { ( )} n y = α s approxmately normal. However, t s not robust to outlers snce f the value of a sngle x s suffcently large, then α can be brought past any bound and the estmaton can thereby be rendered worthless. The next subsecton modfes the algorthm to make t resstant to outlers.. Robust parametrc estmator The steps n Secton for computng the mode become hghly robust to contamnaton n the data when they take ths form:. transform the data usng the value of α that maxmzes a robust correlaton coeffcent between the ordered transformed data and the expected order statstcs for a normal dstrbuton;. estmate the mean and standard devaton of the transformed data usng the medan and standardzed medan absolute devaton (MAD); 3. n Eq. (3), substtute for y and σ the medan and MAD of the transformed data n order to estmate the mode of the orgnal data. The robust correlaton coeffcent s based on a generalzaton of the lnear correlaton coeffcent (Huber, 98), wth the δ of Eq. (6) denotng a general measure of dsperson or scale, rather than the standard devaton. Agan assumng that y ( α ) y ( α ) L y ( α ) where n, we use the correlaton coeffcent gven by ( α ) S ( α ) ( α ) + S ( α ) S R, (8) + ( α ) = S + ( α ) z ( ) ± α z y S ( ) = ± α. (9) y Here, the operator yelds MAD, normalzed such that y = σ f y s normally dstrbuted wth standard devaton σ. For example, D. R. Bckel, submtted to InterStat. 6

7 y ( α)= [ Φ ( 34) ]med y ( α) medy ( α), ().486med y ( α) med y ( α) where med s the sample medan operator, so that ( α ) { ( )} n α. Snce ( α ) y = med s the medan of R quantfes the normalty of the transformed data, the value α s found that maxmzes R ( α ),.e., R( α ) max R( α ) α y =. For a normal dstrbuton, the mean s equal to the medan and the standard devaton s equal to MAD, so y ( ) and ( ) med α estmate the mode M. are substtuted for y and σ n Eq. (3) to y α The robustness of ths estmator of the mode, termed the robust parametrc mode (RPM), can be quantfed by ts fnte-sample breakdown pont, the mnmum proporton of outlers n a sample that could make an estmator unbounded (Donoho and Huber 983). For example, for a sample of sze n, the breakdown pont of the medan s ( n ) ( n) + for odd n or for even n snce at least half of the ponts n the sample would have to be replaced wth suffcently hgh or suffcently low values before the medan would be hgher or lower than any bound. Beng based on the medan, the mode estmator descrbed n ths subsecton has the same breakdown pont, whch s the hghest breakdown pont possble for a measure of locaton. The mode estmator of Secton., on the other hand, s less robust, snce the sample mean and sample standard devaton each has a breakdown pont of only them arbtrarly large..3 Grenander s estmators n, entalng that a sngle outler can make The estmators of the mode ntroduced above are called parametrc estmators snce they make use of the parameters of the famly of normal dstrbutons. A smple class of nonparametrc estmators s Grenander s (965) x famly of estmators of the mode of { } n, wth x x x = L n : D. R. Bckel, submtted to InterStat. 7

8 M, k = = n k p, < p k, () n k p ( x+ k + x ) ( x+ k x ) p < = ( x x ) + k where p and k are real numbers, fxed for each estmator. M p, k has a breakdown pont of only ( k + ) n, whch approaches as n (Bckel, b), so, lke the estmator of Subsecton., M, s not robust to outlers. p k M, s compared p k to the parametrc estmators n Secton 3..4 Robust drect estmators Grenander s estmators are drect n the sense that they do not requre densty estmaton. A class of drect estmators of the mode that are much more robust to outlers s based on the shortest half sample, the subsample of half of the orgnal data wth the mnmum dfference between the mnmum and maxmum values. The mdpont of the shortest half sample (locaton of the least medan of squares) and the mean of the shortest half sample (Rousseeuw and Leroy, 987) are hghly based estmators of the mode (Bckel, a). A low-bas mode estmator, the half-sample mode (HSM), can be computed by repeatedly takng shortest half samples wthn shortest half samples (Bckel, b). A closelyrelated drect, nonparametrc estmator s the half-range mode (HRM), whch s based on the modal nterval, the nterval of a certan wdth that contans more values than any other nterval of that wdth. The HRM s found by computng modal ntervals wthn modal ntervals, where each modal nterval has a wdth equal to half the range of the observatons wthn the prevous modal nterval, begnnng wth a modal nterval contanng the entre sample; Bckel (a) provdes a detaled algorthm for ths estmator. All of the estmators of ths subsecton have the same breakdown pont as the medan and are even more robust than the medan n the sense that they are unaffected by any suffcently hgh, fnte outler (Bckel, a)..5 Nonparametrc densty estmators D. R. Bckel, submtted to InterStat. 8

9 The nonparametrc estmators of Sectons.3 and.4 are drect, but there are also nonparametrc estmators of the mode that depend on estmaton of the probablty densty. Grenander (965) and Dharmadhkar and Joag-dev (988) note that the mode can be estmated as the argument for whch a smoothed emprcal densty functon (EDF), an estmate of the PDF, reaches a maxmum. The EDF based on a normal kernel functon s n ˆ f ()= x exp x x nh π h. () = Smaller values of the smoothng parameter h yeld lower bases, but hgher varances, n the mode estmates. Based on optmal estmates of the PDF, Slverman (986) recommended settng h equal to (.9)S, where S s the mnmum of the sample standard devaton and the normal-consstent nterquartle range. Ths recommendaton s followed n the smulatons below, except that the nterquartle range s replaced wth the MAD, made consstent wth the normal dstrbuton by multplyng by.486. The MAD s preferred for these studes wth large numbers of outlers snce ts asymptotc breakdown pont s twce that of the nterquartle range (Rousseeuw and Croux, 993). The mode s estmated by the emprcal densty functon mode (EDFM), denoted by M and defned such that ˆ f ( M )= max f ˆ (). x x 3. SIMULATIONS The methods of Secton were used to estmate the mode for samples generated from a normal dstrbuton, whch s symmetrc (zero skewness), a lognormal dstrbuton, whch s moderately asymmetrc (fnte postve skewness), and a Pareto dstrbuton, whch s extremely asymmetrc (nfnte skewness). The normal dstrbuton has a mean parameter of 6 and a standard devaton parameter of, wth a medan of 6 and a mode of 6; the lognormal dstrbuton has a mean parameter of and a standard devaton parameter of, D. R. Bckel, submtted to InterStat. 9

10 wth a medan of e. 7 and a mode of ; the Pareto dstrbuton has a PDF of ( 3 ) x for x and for x <, wth a medan of 4 and a mode of. From each of these dstrbutons, samples, each of n random numbers, were generated for n =,, and. For each sample, the mode was estmated by the SPM of Subsecton., by the RPM of Subsecton., by M,3 and M, of Subsecton.3, by the HRM of Subsecton.4, and by the EDFM of Subsecton.5. The sample means and medans were also computed for comparson. The bas, defned as the mean of the estmates mnus the value of the estmand (e.g., 6 for the mode of the normal dstrbuton), and the varance of the estmates are dsplayed n Tables -3 for each estmator and sample sze, based on the smulatons wthout contamnaton. D. R. Bckel, submtted to InterStat.

11 Normal dstr. Lognormal dstr. Pareto dstr. Mean.77 (.5795) (.439) N/A (N/A) Medan.466 (.837).75 (.59698).65 ( ) Standard parametrc mode.368 (.459).4738 (.8785) (.6455) Robust parametrc mode.538 (.9475) (.6567) (.35) Grenander M ,3 (.6737) (.3887) (5.46) Half-range mode.69 (.3385) (.79) (.6493) Emprcal densty mode.59 (.874).9566 (.333) (.556) Table. Bas (varance) of seven estmators of locaton, based on smulatons of samples of observatons each, wth observatons drawn from one of three dstrbutons. N/A ndcates that estmates of the mean are unstable snce the Pareto dstrbuton has an nfnte populaton mean; M s not ncluded here, snce the sample sze s too small for that estmator. The smallest absolute bas or varance of the mode estmators for a dstrbuton. D. R. Bckel, submtted to InterStat.

12 Normal dstr. Lognormal dstr. Pareto dstr. Mean.7595 (.755) (.35677) N/A (N/A) Medan.386 (.56868) (.39).36 (.67445) Standard parametrc mode.5738 (.669).649 ( ) (.555) Robust parametrc mode.4438 (.64788).6989 (.79789).573 (.685) Grenander M ,3 (.4634) (.5499) (.56443) Grenander M , (.3979) ( ) (.33) Half-range mode (.877) (.3335) (.78498) Emprcal densty mode (.74397).6544 (.59535).87 (.758) Table. Bas (varance) of eght estmators of locaton, based on smulatons of samples of observatons each, wth observatons drawn from one of three dstrbutons. N/A ndcates that estmates of the mean are unstable snce the Pareto dstrbuton has an nfnte populaton mean. The smallest absolute bas or varance of the mode estmators for a dstrbuton. D. R. Bckel, submtted to InterStat.

13 Normal dstr. Lognormal dstr. Pareto dstr. Mean (.95839).939 ( ) N/A (N/A) Medan.35 (.7484).6 (.634) (.67678) Standard parametrc mode (.9474).8966 (.974) (.9655) Robust parametrc mode.4354 (.8849).4563 (.66873).3349 (.386) Grenander M ,3 (.) (.9399) (.55889) Grenander M , ( ) (3.8784) (4.8345) Half-range mode.563 (.64674).5975 (.8978).544 (.73768) Emprcal densty mode (.4846) (.67536) (.43567) Table 3. Bas (varance) of eght estmators of locaton, based on smulatons of samples of observatons each, wth observatons drawn from one of three dstrbutons. N/A ndcates that estmates of the mean are unstable snce the Pareto dstrbuton has an nfnte populaton mean. The smallest absolute bas or varance of the mode estmators for a dstrbuton. Because of ther hgh breakdown ponts, the medan, RPM, HRM, and EDFM have meanng even n the presence of many outlers, so they were appled to samples generated as descrbed above wth four levels of contamnaton: 5%, %, 5%, and % for n = and n =, and %, %, 3%, and 4% for n =. The level of contamnaton s the probablty that a gven value n the sample was replaced by a value drawn from a normal dstrbuton wth a mean equal to the th percentle of the man dstrbuton (normal, lognormal, or Pareto) and wth a standard devaton equal to a hundredth of the nterquartle range of the man dstrbuton dvded by the nterquartle range of the standard normal dstrbuton. Thus, the normal dstrbuton was contamnated by N( 9.79, (.) ), the lognormal dstrbuton by N(.58, (.993) ), and the Pareto dstrbuton by N( 8,(.549) ). Hgher levels of contamnaton D. R. Bckel, submtted to InterStat. 3

14 could not be used for the smaller sample szes because that would sometmes lead to more than half of the values of a sample drawn from the outler dstrbuton, whch would break down any estmator. The bas and varance n the estmators for each contamnaton level and each man dstrbuton are dsplayed n Fg. ( n = ), Fg. ( n = ), and Fg. 3 ( n = ). Fgs. 4-6 dsplay the PDFs estmated from a sample of values from each dstrbuton and the same sample wth 4% contamnaton. The PDFs were estmated by Eq. (), usng the parameters α, y, and σ estmated as descrbed n Secton.. The value of x yeldng the maxmum value of each estmated PDF s the RPM. D. R. Bckel, submtted to InterStat. 4

15 4. DISCUSSION The bas and varance of the two proposed parametrc estmators of the mode were low for all three dstrbutons, even though the lognormal and Pareto dstrbutons cannot be converted nto a normal dstrbuton by the smple power transformaton. For the contamnated normal and lognormal dstrbutons and for the Pareto dstrbuton wth and wthout contamnaton, Fgs. 4-6 show large dscrepances between the theoretcal dstrbutons and the dstrbutons estmated usng Eq. (). The fact that the estmates of the mode were affected lttle by those dscrepances suggests that the parametrc estmators can be successfully appled not only to power transforms of the normal dstrbuton, but also to a much more general class of contamnated sngle-modal dstrbutons. Tables -3 gve an ndcaton of the relatve performance of the locaton estmators consdered n the absence of contamnaton. The two mode estmators of Grenander (965) have the hghest bas and varance of the mode estmators for all dstrbutons consdered. The SPM performs consstently better than the other estmators of the mode, except that the RPM and HRM tend to be less based for the Pareto dstrbuton, and, at n=, the EDFM has a lower bas for the normal dstrbuton. Based on these results, the SPM s a good choce of a mode estmator for uncontamnated data, except when the dstrbuton of the data has extreme skewness or long tals, n whch cases the RPM or HRM would be better. When a partcular data set may be contamnated wth outlers, the selecton of an estmator of the mode for that data can be nformed by the estmator propertes dsplayed n Fgs. -3. Whle the EDFM has the lowest absolute bas for the contamnated normal dstrbuton, the RPM has the best bas for the other two dstrbutons, except for the Pareto dstrbuton at n= (Fg. 3), n whch case the HRM has a lower bas and varance. Thus, the RPM s approprate for many cases of data wth outlers and moderate to hgh skewness. The HRM s better n some nstances of hgh sample sze and very hgh skewness, but n these D. R. Bckel, submtted to InterStat. 5

16 cases, the computaton speed of HRM s slow and smlar results can nstead be obtaned usng the HSM, whch can be computed very quckly. The EDFM appears to work well for contamnated normal dstrbutons. The RPM s recommended as a general-purpose estmator of the mode snce t was often the best mode estmator and when t was not, t never performed much worse than the best mode estmator n the smulatons of ths study, except for the uncontamnated normal dstrbuton. If the data are known to be approxmately normal and uncontamnated, then the sample mean would have the lowest bas and varance and would be approxmately equal to the mode snce the normal dstrbuton s symmetrc. Wthout ths knowledge, the RPM s a safe estmator of the mode: although t has less effcency n some cases, t has low bas and varance n many cases and s never affected much by outlers when the number of outlers s less than the number of good values. Its much greater resstance to hgh levels of outlers makes the mode a vable alternatve to the medan as a robust measure of central tendency: the bas and varance of the sample medan consstently ncrease wth the level of contamnaton, a reflecton of the fact that the medan does not reject outlers, unlke robust estmators of the mode (Bckel, a) such as RPM. Fgs. -3 show that the medan can be much less relable than RPM for contamnated, skewed dstrbutons. Modfcatons of the SPM and RPM may lead to mproved estmaton; I make three suggestons:. Usng a trmmed mean to estmate the mean of transformed data would have better effcency n the uncontamnated normal case than usng the medan and better robustness to outlers than usng the mean, so the resultng estmator of the mode would have characterstcs ntermedate to those of the SPM and RPM.. The method of Secton can be mplemented wth crtera for selectng the transformaton exponent other than those proposed n Secton, e.g., α D. R. Bckel, submtted to InterStat. 6

17 could be defned as the value of α for whch the Kolmogorov-Smrnov dstance (Press et al., 996) between the EDF of the transformed data and a normal dstrbuton s mnmzed. 3. The power transformaton was chosen for ts smplcty, but other transformatons to approxmate normalty could gve better results. Explorng the propertes of these modfed estmators and other generalzatons of the proposed technque requres further research. BIBLIOGRAPHY Bckel, D. R. (a) Robust estmators of the mode and skewness of contnuous data, Computatonal Statstcs and Data Analyss (n press); avalable from preprnt server: Bckel, D. R. (b), Smple estmator of the mode for contnuous dstrbutons: The mode as a more robust measure of locaton, under revew. Box, G. E. P. and G. C. Tao (99), Bayesan Inference n Statstcal Analyss, John Wley and Sons (New York). Dharmadhkar, S. and K. Joag-dev (988), Unmodalty, Convexty, and Applcatons, Academc Press (New York). Donoho, D. L. and P. J. Huber (983), The noton of a breakdown pont, n Fetschrft for Erch L. Lehmann, edted by P. J. Bckel, K. A. Doksum, J. L. Hodges Jr., Wadsworth: Belmont, CA. Grenander, U. (965), Some drect estmates of the mode, Annals of Mathematcal Statstcs 36, Huber, P. J. (98), Robust Statstcs, John Wley & Sons (New York). Press, W. H., S. A. Teukolsky, W. T. Vetterlng, B. P. Flannery (996), Numercal Recpes n C, Cambrdge Unversty Press: Cambrdge. Rousseeuw, P. J. and C. Croux (993) Alternatves to the medan absolute devaton Journal of the Amercan Statstcal Assocaton 88, Slverman, B. W. (986) Densty Estmaton for Statstcs and Data Analyss, Chapman and Hall (New York). Staudte, R. G. and S. J. Sheather (99), Robust Estmaton and Testng, Wley- Interscence: New York, 99. D. R. Bckel, submtted to InterStat. 7

18 ( α ) APPENDIX: Algorthm to fnd the transformaton exponent, α Let ρ ( α ) be a functon, such as r ( α ) or R ( α ), wth a sngle maxmum, ρ, and no plateaus (monotoncally ncreasng for α α and monotoncally decreasng for α α ). To compute α, frst fnd three values α, α, and α 3, that satsfy α α < α 3 α α 3 <, ρ ( α ) < ρ( ), and ( α ) ρ( ) α ρ < ; ths ensures that 3 α α < <. For the smulatons n ths paper, the values α =, α, and α =. were used as ntal guesses and 3 α was decreased or α 3 was ncreased as needed to ensure that ρ ( α ) < ρ( ) and ( α 3 ) ρ( ) α to avod the numercal dffcultes of evaluatng ( ) used for 3 = ρ <. Non-ntegral values were ρ n the followng algorthm. The algorthm ArgumentForMax( α, α 3 ) returns α to wthn the desred level of precson (. was used n ths study). ArgumentForMax( A, 5. If A. 5 A ) [ ρ ( A ) ρ( α ) < ρ( ) A, then return ( ) Step. A 5 < must be true] A5 A + and stop; otherwse, proceed to. Dvde the doman [ A ] nto four equally-spaced ntervals, [ ] [ A, A 3 ],[ A, A ],[ A, A ] A 3. Compute, d, A 5, whch satsfy A = A3 A = A4 A3 = A5 A4 A, A,. (3) d, the dfference n ( α ) ( A ) ρ( A ) = ρ + for =,,3,4. ρ across each of the four ntervals, lettng 4. If there s an nterval number j for whch d and d, then t s known that ρ ( A ) < ρ( ) and ρ ( ) < ρ( ) j α j+ α j j+ A and thus that the recursve call ArgumentForMax( A j, A j+ ) satsfes the condtons needed to return α, so return ArgumentForMax( A j, A j+ ); otherwse, proceed to Step 5. D. R. Bckel, submtted to InterStat. 8

19 5. If ( A ) ρ( ) ρ < A 5, then return ArgumentForMax( 4 ArgumentForMax( A, A ). A, A 5 ); otherwse, return D. R. Bckel, submtted to InterStat. 9

20 Estmator bas for normal data Estmator varance for normal data Estmator bas for lognormal data Estmator varance for lognormal data Estmator bas for Pareto data Estmator varance for Pareto data Medan Parametrc mode Half-range mode Emp. densty mode Fg.. Bas and varance of locaton estmators for samples of n= values from the normal, lognormal, and Pareto dstrbutons. Fgures of D. R. Bckel

21 Estmator bas for normal data Estmator varance for normal data Estmator bas for lognormal data Estmator varance for lognormal data Estmator bas for Pareto data Estmator varance for Pareto data Medan Parametrc mode Half-range mode Emp. densty mode Fg.. Bas and varance of locaton estmators for samples of n= values from the normal, lognormal, and Pareto dstrbutons. Fgures of D. R. Bckel

22 Estmator bas for normal data Estmator varance for normal data Estmator bas for lognormal data Estmator varance for lognormal data Estmator bas for Pareto data Estmator varance for Pareto data Medan Parametrc mode Half-range mode Emp. densty mode Fg. 3. Bas and varance of locaton estmators for samples of n= values from the normal, lognormal, and Pareto dstrbutons. The bas of the medan for the Pareto dstrbuton wth 4% contamnaton s 34.45, whch s too hgh to plot here. Fgures of D. R. Bckel

Chapter 3 Describing Data Using Numerical Measures

Chapter 3 Describing Data Using Numerical Measures Chapter 3 Student Lecture Notes 3-1 Chapter 3 Descrbng Data Usng Numercal Measures Fall 2006 Fundamentals of Busness Statstcs 1 Chapter Goals To establsh the usefulness of summary measures of data. The

More information

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics ) Ismor Fscher, 8//008 Stat 54 / -8.3 Summary Statstcs Measures of Center and Spread Dstrbuton of dscrete contnuous POPULATION Random Varable, numercal True center =??? True spread =???? parameters ( populaton

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

An (almost) unbiased estimator for the S-Gini index

An (almost) unbiased estimator for the S-Gini index An (almost unbased estmator for the S-Gn ndex Thomas Demuynck February 25, 2009 Abstract Ths note provdes an unbased estmator for the absolute S-Gn and an almost unbased estmator for the relatve S-Gn for

More information

Modeling and Simulation NETW 707

Modeling and Simulation NETW 707 Modelng and Smulaton NETW 707 Lecture 5 Tests for Random Numbers Course Instructor: Dr.-Ing. Magge Mashaly magge.ezzat@guc.edu.eg C3.220 1 Propertes of Random Numbers Random Number Generators (RNGs) must

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Statistics Chapter 4

Statistics Chapter 4 Statstcs Chapter 4 "There are three knds of les: les, damned les, and statstcs." Benjamn Dsrael, 1895 (Brtsh statesman) Gaussan Dstrbuton, 4-1 If a measurement s repeated many tmes a statstcal treatment

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Estimation of the Mean of Truncated Exponential Distribution

Estimation of the Mean of Truncated Exponential Distribution Journal of Mathematcs and Statstcs 4 (4): 84-88, 008 ISSN 549-644 008 Scence Publcatons Estmaton of the Mean of Truncated Exponental Dstrbuton Fars Muslm Al-Athar Department of Mathematcs, Faculty of Scence,

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

4.3 Poisson Regression

4.3 Poisson Regression of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

AS-Level Maths: Statistics 1 for Edexcel

AS-Level Maths: Statistics 1 for Edexcel 1 of 6 AS-Level Maths: Statstcs 1 for Edecel S1. Calculatng means and standard devatons Ths con ndcates the slde contans actvtes created n Flash. These actvtes are not edtable. For more detaled nstructons,

More information

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0 Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of

More information

SIMPLE LINEAR REGRESSION

SIMPLE LINEAR REGRESSION Smple Lnear Regresson and Correlaton Introducton Prevousl, our attenton has been focused on one varable whch we desgnated b x. Frequentl, t s desrable to learn somethng about the relatonshp between two

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Statistical Evaluation of WATFLOOD

Statistical Evaluation of WATFLOOD tatstcal Evaluaton of WATFLD By: Angela MacLean, Dept. of Cvl & Envronmental Engneerng, Unversty of Waterloo, n. ctober, 005 The statstcs program assocated wth WATFLD uses spl.csv fle that s produced wth

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE

ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE P a g e ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE Darmud O Drscoll ¹, Donald E. Ramrez ² ¹ Head of Department of Mathematcs and Computer Studes

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Cathy Walker March 5, 2010

Cathy Walker March 5, 2010 Cathy Walker March 5, 010 Part : Problem Set 1. What s the level of measurement for the followng varables? a) SAT scores b) Number of tests or quzzes n statstcal course c) Acres of land devoted to corn

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE

USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE STATISTICA, anno LXXV, n. 4, 015 USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE Manoj K. Chaudhary 1 Department of Statstcs, Banaras Hndu Unversty, Varanas,

More information

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT Malaysan Journal of Mathematcal Scences 8(S): 37-44 (2014) Specal Issue: Internatonal Conference on Mathematcal Scences and Statstcs 2013 (ICMSS2013) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Notes prepared by Prof Mrs) M.J. Gholba Class M.Sc Part(I) Information Technology

Notes prepared by Prof Mrs) M.J. Gholba Class M.Sc Part(I) Information Technology Inverse transformatons Generaton of random observatons from gven dstrbutons Assume that random numbers,,, are readly avalable, where each tself s a random varable whch s unformly dstrbuted over the range(,).

More information

Explaining the Stein Paradox

Explaining the Stein Paradox Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics

A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics A New Method for Estmatng Overdsperson Davd Fletcher and Peter Green Department of Mathematcs and Statstcs Byron Morgan Insttute of Mathematcs, Statstcs and Actuaral Scence Unversty of Kent, England Overvew

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0 Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the

More information

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata TC XVII IMEKO World Congress Metrology n the 3rd Mllennum June 7, 3,

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear

More information

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014 Measures of Dsperson Defenton Range Interquartle Range Varance and Standard Devaton Defnton Measures of dsperson are descrptve statstcs that descrbe how smlar a set of scores are to each other The more

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

RELIABILITY ASSESSMENT

RELIABILITY ASSESSMENT CHAPTER Rsk Analyss n Engneerng and Economcs RELIABILITY ASSESSMENT A. J. Clark School of Engneerng Department of Cvl and Envronmental Engneerng 4a CHAPMAN HALL/CRC Rsk Analyss for Engneerng Department

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol Georgetown Unversty From the SelectedWorks of Mark J Meyer 8 Usng the estmated penetrances to determne the range of the underlyng genetc model n casecontrol desgn Mark J Meyer Neal Jeffres Gang Zheng Avalable

More information

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

Statistical Hypothesis Testing for Returns to Scale Using Data Envelopment Analysis

Statistical Hypothesis Testing for Returns to Scale Using Data Envelopment Analysis Statstcal Hypothess Testng for Returns to Scale Usng Data nvelopment nalyss M. ukushge a and I. Myara b a Graduate School of conomcs, Osaka Unversty, Osaka 560-0043, apan (mfuku@econ.osaka-u.ac.p) b Graduate

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Appendix B. The Finite Difference Scheme

Appendix B. The Finite Difference Scheme 140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS CHAPTER IV RESEARCH FINDING AND DISCUSSIONS A. Descrpton of Research Fndng. The Implementaton of Learnng Havng ganed the whole needed data, the researcher then dd analyss whch refers to the statstcal data

More information

A Note on Test of Homogeneity Against Umbrella Scale Alternative Based on U-Statistics

A Note on Test of Homogeneity Against Umbrella Scale Alternative Based on U-Statistics J Stat Appl Pro No 3 93- () 93 NSP Journal of Statstcs Applcatons & Probablty --- An Internatonal Journal @ NSP Natural Scences Publshng Cor A Note on Test of Homogenety Aganst Umbrella Scale Alternatve

More information

A Hybrid Variational Iteration Method for Blasius Equation

A Hybrid Variational Iteration Method for Blasius Equation Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College Unverst at Alban PAD 705 Handout: Maxmum Lkelhood Estmaton Orgnal b Davd A. Wse John F. Kenned School of Government, Harvard Unverst Modfcatons b R. Karl Rethemeer Up to ths pont n

More information

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist? UNR Jont Economcs Workng Paper Seres Workng Paper No. 08-005 Further Analyss of the Zpf Law: Does the Rank-Sze Rule Really Exst? Fungsa Nota and Shunfeng Song Department of Economcs /030 Unversty of Nevada,

More information