Fisher Information Test of Normality

Size: px
Start display at page:

Download "Fisher Information Test of Normality"

Transcription

1 Fisher Iformatio Test of Normality by Yew-Haur Lee Dissertatio submitted to the Faculty of the Virgiia Polytechic Istitute ad State Uiversity i partial fulfillmet of the requiremets for the degree of Doctor of Philosophy i Statistics Approved by Dr George R. Terrell, Chairma Dr Clit W. Coakley Dr Klaus Hikelma Dr Eric P. Smith Dr Keyig Ye September 3, 998 Blacksburg, Virgiia Keywords: Normality testig, Fisher iformatio for locatio, Power study, Residuals, Noparametric desity estimatio, Calculus of variatio Copyright 998, Yew-Haur Lee

2 Fisher Iformatio Test of Normality by Yew-Haur Lee (ABSTRACT) A extremal property of ormal distributios is that they have the smallest Fisher Iformatio for locatio amog all distributios with the same variace. A ew test of ormality proposed by Terrell (995) utilizes the above property by fidig that desity of maximum likelihood costraied o havig the expected Fisher Iformatio uder ormality based o the sample variace. The test statistic is the costructed as a ratio of the resultig likelihood agaist that of ormality. Sice the asymptotic distributio of this test statistic is ot available, the critical values for = 3 to 00 have bee obtaied by simulatio ad smoothed usig polyomials. A extesive power study shows that the test has superior power agaist distributios that are symmetric ad leptokurtic (log-tailed). Aother advatage of the test over existig oes is the direct depictio of ay deviatio from ormality i the form of a desity estimate. This is evidet whe the test is applied to several real data sets. Testig of ormality i residuals is also ivestigated. Various approaches i dealig with residuals beig possibly heteroscedastic ad correlated suffer from a loss of power. The approach with the fewest udesirable features is to use the Ordiary Least Squares (OLS) residuals i place of idepedet observatios. From simulatios, it is show that oe has to be careful about the levels of the ormality tests ad also i geeralizig the results.

3 Ackowledgemets I would like to exted my sicere gratitude ad appreciatio to Dr George R. Terrell for his guidace throughout this period of research. He was always there whe I eeded his advice. I am ideed fortuate to have this opportuity to lear ad work with him. I would also like to thak Dr Klaus Hikelma, Dr Eric P. Smith, Dr Clit W. Coakley, ad Dr Keyig Ye for their time i carefully readig my dissertatio ad for the may helpful suggestios ad commets that greatly improve my dissertatio. I would also like to exted my thaks to the Statistics Departmet for the ivaluable learig experiece that I received. I would also like to thak Michele Marii ad Bill Sydor for resolvig the may computig obstacles that I ecoutered. I would also like to express my thaks to the may frieds that I have come to kow for their ecouragemet ad compaioship. I would also like to express special thaks to my parets, my sisters ad Poh Lig, for their love, ecouragemet ad support all this while. This would ot be possible without sacrifices they made so that I could pursue my goal. Fially, I would like to thak God for his grace, mercy ad stregth that has sustaied me throughout this time of my life. iii

4 Table of Cotets Ackowledgemets...iii List of Tables...vi Table of Figures...viii Chapter Itroductio ad Motivatio.... Statemet of the Problem.... Directio of Research... Chapter Existig Normality Tests...4. Momets Tests - Bowma-Sheto K...4. Distace/ECDF Tests - Aderso-Darlig A Regressio/Correlatio Tests - Shapiro-Wilk W...6 Chapter 3 Fisher Iformatio Test Normal Iformatio Iequality (for locatio) Normal Iformatio Statistic F Solutio to the Problem Computatioal Algorithm Geeratio of Critical Values Usig Simulatio...6 Chapter 4 Evaluatios ad Applicatios to Real Data Features of Data that Affect Desity Estimate, g Power Compariso Simulatio Set-up Results Summary Applicatios to Real Data Sets Male Weights Data Mississippi River Data PCB Data Buffalo Sowfall Data Mice Data iv

5 Chapter 5 Testig Normality of Residuals Backgroud Power Comparisos Simulatio Set-up Results Coclusio Chapter 6 Summary ad Discussio...46 Appedix A Critical Values of the Normal Iformatio Statistic, F...48 Appedix B Computatioal Details...5 B. Details of programmig for F...5 B. Details of power study...5 B.3 Program Listig for F...5 Appedix C Results from Power Study...56 Refereces...67 Vita...70 v

6 List of Tables Table 4- Properties of symmetric distributios used i simulatio study... Table 4- Properties of asymmetric distributios used i simulatio study... Table 4-3 Discrete distributio with ormal momets... Table 4-4 Power estimates of discrete distributio with ormal momets... Table 5- Power comparisos of ormality tests o iid observatios ad OLS residuals across differet values of based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α= Table 5- Measures of correspodece betwee true ad modified test statistics across differet values of based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α= Table 5-3 Power comparisos of ormality tests o iid observatios ad OLS residuals across differet values of k based o 000 samples usig X=data set from Weisberg (980) at α=0. ad = Table 5-4 Measures of correspodece betwee true ad modified test statistics across differet values of k based o 000 samples usig X=data set from Weisberg (980) at α=0. ad = Table B- List of critical values used i power study... 5 Table C- Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0.05 ad = Table C- Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0.05 ad = Table C-3 Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0.05 ad = Table C-4 Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0.05 ad = vi

7 Table C-5 Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0.05 ad = Table C-6 Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0. ad = Table C-7 Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0. ad = Table C-8 Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0. ad = Table C-9 Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0. ad = Table C-0 Power comparisos of ormality tests o iid observatios ad OLS residuals based o 000 samples usig k=4 with X = ad X i, i=,3,4 draw from the uiform distributio at α=0. ad = vii

8 Table of Figures Figure 4- Estimated desity for {-, 0, }... 8 Figure 4- Estimated desity for {-, 0, 0, }... 8 Figure 4-3 Estimated desity for {-, 0, 0,, 5}... 9 Figure 4-4 Desity estimates of Male Weights Data (=)... 9 Figure 4-5 Desity estimates of Mississippi River Data (=49) Figure 4-6 Desity estimates of PCB Data (=65)... 3 Figure 4-7 Desity estimates of Buffalo Sowfall Data(=63)... 3 Figure 4-8 Desity estimates of Mice Data(=99) viii

9 Chapter Itroductio ad Motivatio The problem of ormality testig is well kow ad has geerated plety of attetio from researchers, see Mardia (980) ad D Agostio ad Stephes (986). This is because a lot of classical optimal procedures were developed based o the ormality assumptio. However, researchers soo realized that this assumptio was ot always satisfied. Three approaches ca be take to deal with o-ormality of data. The first approach is trasformig the data to ormality so that the classical procedures could still be used. The secod approach is the use of oparametric procedures. The third is to use robust procedures that are less sesitive to deviatio from ormality, especially tail behavior. Each of the three comes with stregths ad weakesses; there is o cosesus o which is the best approach. The role of ormality testig is ot just to see if the data are well approximated by the ormal distributio; but also to provide iformatio o the deviatio from ormality. This iformatio would the guide the researchers to the best approach to dealig with the o-ormality of their data.. Statemet of the Problem I this dissertatio, it will be assumed that oe is testig ormality because the user wishes to fit a locatio model. Suppose the data collected x, x,..., x represet a idepedet ad idetically distributed (iid) radom sample of size from a populatio with probability desity fuctio f(x) ad cumulative desity fuctio (cdf) F(x). Let Φ be the cdf of x that is ormally distributed with ukow mea ad variace. The ull hypothesis i this problem of testig for ormality is H 0 : F(x) = Φ(x) ad the alterative hypothesis simply states H 0 is false. Hece oly omibus tests will be cosidered i this dissertatio. Here, omibus refers to the ability of a test to detect ay deviatio from ormality with a adequate sample size.

10 I this problem, the focus is o failig to reject H 0 so that the coclusio is that the data come from a ormal distributio. As oted by D Agostio ad Stephes (986), this distiguishes ormality testig from most statistical tests. Also, with a vague alterative hypothesis, they commeted that the appropriate statistical test will ofte be by o meas clear ad o geeral Neyma-Pearso type (test) appears applicable. Hece, it will be ulikely to have a sigle test that will have power superior to their alteratives.. Directio of Research There are literally hudreds of ormality tests i the literature. Major power studies doe by Shapiro et al. (968) ad Pearso et al. (977) have ot arrived at a defiitive aswer; but a geeral cosesus has bee reached about which tests are powerful. Pearso s (900) chi-squared test, which is possibly the oldest, is ot very sesitive. Data are grouped ad compared to the expected couts uder ormality. Sice iformatio is lost i the groupig ad this test is ot specially tailored for the ormal distributio, the coclusio is ot surprisig. Bowma ad Sheto (975) proposed the use of joit cotour plots of the third ad fourth momets for their test. It proves powerful amog tests based o momets. For these tests, sample momets are compared to those which are expected from a ormal distributio. However, these tests are ot omibus sice a distributio could have skewess ad kurtosis close to that of a ormal but yet the distributio could be oormal. Aother class of ormality tests are based o the empirical cumulative distributio fuctio (ECDF). Deviatio from ormality is measured as a fuctio of the discrepacies betwee the empirical ad hypothesized distributio fuctios. Stephe s (974) versio to the Aderso-Darlig (954) test has proved to be the most powerful amog these tests. However, it is ot clear if this measure of deviatio is of primary importace i decidig what to do if the data are o-ormal. A ew directio was established i ormality testig whe Shapiro ad Wilk (965) formalized the evaluatio of ormality i probability plottig. Probability plottig

11 ivolves plottig of ordered observatios agaist their expected values uder ormality. Normality is judged by the liearity of the plot. This test ad its modificatios have proved to be popular amog researchers sice it is as powerful, if ot more so, agaist certai alterative distributios as the Aderso-Darlig ad the Bowma-Sheto tests. However, as with the ECDF tests, it is uclear if the measure of deviatio from ormality is what researchers are cocered about. A well kow property of the ormal distributio is that it has the least Fisher iformatio amog all other distributios with the same variace. A ew approach to ormality testig suggested by Terrell (995) essetially provides a oparametric desity estimate of the data costraied o the above property whe fidig the likelihood of the data. This costrait elimiates the eed for a smoothig parameter. The test is the based o the ratio of that likelihood to that uder ormality. Excess iformatio would be reflected i a poorer fit of the data to ormality sice the Fisher Iformatio is uderestimated. Hece, the test is omibus ad sesitive to departures from ormality that are maifested i the excess iformatio. The goal of this dissertatio is to gai a uderstadig ito the workigs of the Fisher Iformatio Test for ormality, provide a comprehesive table of critical values ad evaluate the power performace agaist existig tests. The Fisher Iformatio Test will also be modified to test for ormality i residuals. I the ext chapter, we will give a brief review of the backgroud ad details of the existig tests of ormality. Chapter 3 develops the theory ad operatioal details behid the Fisher Iformatio Test. Chapter 4 presets a evaluatio of the sesitivity of the Fisher Iformatio Test ad a power compariso is doe agaist existig tests. Chapter 5 ivestigates the testig of ormality i residuals ad aother power compariso is doe to see if the results differ from those usig idepedet observatios. Fially, Chapter 6 summarizes the fidigs ad also discusses directios for future research. 3

12 Chapter Existig Normality Tests For a detailed survey of the literature, see Mardia (980) ad D Agostio ad Stephes (986). I this chapter, the focus is o existig ormality tests which are metioed i Chapter that are powerful. Each of these tests belogs to a differet class of ormality tests. A brief backgroud to the geeral approach i each class is give before details are preseted for each test.. Momets Tests - Bowma-Sheto K Sice the cocepts of skewess ad kurtosis ca be used to differetiate betwee distributios, oe of the earliest classes of ormality tests is based o these momets. The stadardized coefficiets of skewess, β, ad kurtosis, β are defied as β µ 3 = 3 ad β σ = µ σ 4 4 where µ i is the ith cetral momet. Skewess refers to the symmetry of a distributio. For a symmetric distributio like the ormal, β = 0. A distributio that is skewed to the right has β > 0 while oe that is skewed to the left has β < 0. Kurtosis refers to the flatess or peakedess of a distributio. The ormal distributio has β = 3 ad is used as a referece for other distributios. A leptokurtic distributio is oe that is more peaked ad with heavier tails tha the ormal, resultig i β > 3. A platykurtic distributio has a flatter distributio with shorter tails tha the ormal, hece β < 3. The sample skewess, b, ad kurtosis, b, are defied as b m3 = ad ( s ) 3 b = m 4 ( s ) where m i is the ith sample momet. Sice the momets of b ad b are kow, their distributios have bee approximated usig Pearso curves. The critical values for the 4

13 ormality tests of skewess ad kurtosis are tabulated i Pearso ad Hartley (97) for selected values of 5 at α = 0.0 ad 0.0. Normalizig trasformatios have bee foud for b ad b by D Agostio (970) ad D Agostio ad Pearso (973) respectively. Z( b ) ad Z( b ) deote the resultig approximate stadardized ormal variables. D Agostio ad Pearso suggested combiig b ad b i the followig way: ( ) ( ) K = Z b + Z b where K is distributed as χ sice it is the sum of the squares of stadardized ormal equivalet deviates. However, they assumed that the squared stadardized ormal equivalet deviates are idepedet which is ot true especially for small sample sizes. Usig simulatio, Bowma ad Sheto (975) obtaied 90%, 95% ad 99% cotours for K for sample sizes betwee 0 ad 000. Carryig out this test would the oly require calculatig b ad b, selectig the appropriate cotour, ad determiig if ( b,b ) falls withi the cotours. If it does ot, the ormality is rejected.. Distace/ECDF Tests - Aderso-Darlig A ECDF or distace tests are aother broad class of ormality tests that are based o i a compariso betwee the ECDF, F ( x( i) ) =, ad the hypothesized distributio uder ormality, Z i, as defied by Z i = x i Φ ( ) x s x i where x = = i ad s = ( xi x) i= ECDF tests with ukow µ ad σ.. Stephes (974) provided versios of the 5

14 ECDF tests ca be further classified ito those ivolvig either the supremum or the square of the discrepacies, F ( x( i) ) Zi. The most well kow ECDF tests ivolvig the supremum is the Kolmogorov-Smirov statistic + K = max( D, D ) i i where D + = max Z i ad D = Zi max ECDF tests ivolvig the square of the discrepacies are kow as those from the Cramér-vo Mises family with the geeral form CvM i = [ Z Z dz ] ψ ( ) i i i where ψ( Z i ) is the weightig fuctio. If ψ( Z i ) =, that is the Cramér-vo Mises statistic itself, W. For the Aderso- Darlig statistic, A, ψ( Zi ) = Z ( Z ) values ad the computatioal form is give by A i i. This choice of ψ( Z i ) gives emphasis to tail [ [ i + i ]] = ( i ) l( Z ) + l( Z ) i= Stephes foud that A has the highest power amog all ECDF tests. The asymptotic distributio is kow ad it was foud that the critical values for fiite samples quickly coverge to their asymptotic values for 5..3 Regressio/Correlatio Tests - Shapiro-Wilk W The mai idea behid these tests is ormal probability plottig. Normal probability plottig is a graphical techique to determie the ormality of the data by lookig for liearity i a plot of the ordered observatios x (i) agaist the expected values of stadard ormal order statistics, m i. Formal determiatio of the liearity uses regressio or correlatio techiques, hece the ame of this group of tests. If x (i) is ideed ormal, the the slope would give the stadard deviatio of x i, σ, ad the itercept, the mea of the x i s, µ. Sice the ordered observatios are ot 6

15 idepedet, let V=(v ij ) be the x covariace matrix, x = (x, x,, x ) ad m = (m, m,, m ). The best liear ubiased estimators of the slope ad itercept usig geeralized least squares are m V x $σ = ad $µ = x m V m The usual symmetric estimate of the variace regardless of the distributio of x i is give by s. The Shapiro ad Wilk (965) W statistic is defied as where K $ a x W = = ( ) s ( ) s σ = i= i= a x i ( i ) ( x x) ( a, a,..., a ) = m V [( m V )( V )] a = m ' m V m K = m V V m i W compares the ratio of two estimates of variace, ˆσ ad s, apart from a ormalizig costat, K, ad (-). If the distributio of x i is ormal, the W will be close to. Otherwise, W is less tha. The critical values of W are tabulated up to sample sizes of 50. However, values for {a i } are also eeded to carry out this test. For larger sample sizes, Shapiro ad Fracia (97) oted that the ordered observatios, as icreases, may be treated as idepedet (i.e. v ij = 0 for i j). Treatig V as a idetity matrix, W ca be exteded for larger tha 50 by mi x( i) i= W = ( xi x) mi i= i= Values of {m i } are available from Harter (96) up to sample sizes of 400. However, two tables are still eeded to carry out this test. 7

16 A further modificatio was suggested by Weisberg ad Bigham (975) that uses this approximatio 3 i Φ m 8 i + 4 due to Blom (958). This approximatio was show to be close eve i small samples, ad the ull distributio of W was practically idetical to W. This simplifies the computatio of the test statistics sice separate values for m i eed ot be kept. Roysto (98) used aother approximatio suggested by Shapiro ad Wilk (965) for {a i } ad applied the followig ormalizig trasformatio to W: λ y = ( W ) ad z = ( y µ y ) / σ y where z is stadard ormal ad λ, µ y ad σ y are fuctios of. λ is estimated by maximizig the correlatio betwee certai empirical quatiles of W ad the correspodig stadard ormal equivalet with weights give accordig to the variace of a ormal quatile. The relatio betwee µ y ad σ y ad is the determied by applyig λ to simulated values W. The ormalizig trasformatio producig W* does away with ay special tables, besides the stadard oes, eeded to fid the critical values of W. 8

17 Chapter 3 Fisher Iformatio Test I this chapter, the theory behid the Fisher Iformatio Test as suggested by Terrell (995) will be explaied i detail. First, the Fisher Iformatio Iequality will be derived. Next, the Fisher Iformatio Test will be developed ad a implemetatio algorithm will be preseted. Lastly, details of the simulatio doe to geerate the critical values will be give. 3. Normal Iformatio Iequality (for locatio) The Fisher iformatio umber which is deoted by I F is give by I F = E[ (log f ) ] = E[{(log f ) } ] = ( f ) where f is ay desity. It measures, o average, how fast the log-likelihood chages as the mea moves away from the ceter of the distributio. Aother way of lookig at it would be the ease of usig a sample to locate the ceter of a distributio. A famous property of the ormal distributio is that its I F is the smallest amog all distributios with the same variace. The implicatio is that it is hardest to tell where the mea is i a ormal distributio. The proof give by Terrell (995) is preseted here sice it is itegral to the developmet of the Fisher Iformatio Test. The right had side of the familiar property of a desity give below = f is itegrated by parts for ay µ ad f that goes to zero at the limits of its support that results i f = (x µ ) f Itroducig f to the itegral by replacig f with f f f results i = ( x µ ) f f f 9

18 Applyig the Cauchy-Schwartz Iequality yields f f = [ ( x µ )] f [ ( x µ )] f f = ( x µ ) f f f f f f Choosig µ to miimize the first itegral makes ( x µ ) f = var( X ). Hece, the iequality after rearragig becomes I F = f ( ) f var( X ) f where equality is achieved whe ( x µ ) is proportioal to = (log f ) almost f everywhere. For a ormal distributio, d (log f ) = [ ( x µ ) ] = ( x µ ) ( x µ ) dµ πσ σ σ Therefore, for the ormal distributio, the equality is achieved ad I F = σ. This Fisher Iformatio Iequality is sort of a dual to the Cramér-Rao Iequality. The Fisher Iformatio Iequality gives the lower boud for the Fisher Iformatio for ay distributio i terms of its variace while the Cramér-Rao Iequality gives the lower boud for the variace of ay locatio estimator i terms of its Fisher Iformatio. 3. Normal Iformatio Statistic F For ay other distributio besides the ormal, the Fisher Iformatio umber would be i excess of the iverse of its variace. This excess would thus be a atural measure of o-ormality or deviatio from ormality. A atural test statistic for ormality would be to get a direct estimate of I F. However, that would require a estimate of the desity which relies o oparametric methods that are asymptotically iefficiet. Moreover, there is also the eed to specify a smoothig parameter. Terrell circumveted these two problems by formulatig the problem usig maximum likelihood i the followig way max log f ( xi ) subject to I F f i= = ad f s = (3-) 0

19 where the first costrait estimates I F usig the asymptotically efficiet statistic s uder ormality. The secod costrait gives the familiar property of a desity. The Normal Iformatio statistic, F, is the a log-ratio of this likelihood to that uder ormality. If the data are ormal, the F is close to sice I F has bee correctly estimated. Otherwise, I F is uderestimated ad F reflects a poorer fit of the data. Rewritig (3.) with Lagrage multipliers gives ( f ) f x i max log ( ) f i= λ γ f (3-) f where the values of λ ad γ are chose so that the costraits are met. If λ is treated as a parameter, the above form is similar to the first pealized maximum likelihood desity estimatio problem of Good ad Gaskis (97). This problem is a formal dual to theirs, ad the form of the solutios to both are similar. A solutio i terms of expoetial splies has bee foud by demotricher et al. (975). Hece, the Fisher Iformatio Test of Normality gives a graphical tool i the form of a oparametric desity estimate with the smoothig parameter specified by the costrait o I F. 3.3 Solutio to the Problem Sice there is a eed for the resultig desity estimate to be o-egative, the techique from Good-Gaskis of gettig the solutio i terms of a fuctio h = f will be used. This will elimiate the eed for a o-egativity costrait. With this modificatio, f = hh. The expressio for Fisher iformatio i terms of h is the I F = ( f ) ( hh ) = = 4 f h ( h ) The optimizatio ca be modified by maximizig the average log-likelihood ad (3-) becomes max h i= logh( x ) 4λ i ( h ) γ h (3-3)

20 where λ ad γ are chose so that 4 ( h ) = ad = s h. (3-3) is a particular case of the so-called isoperimetric problem i the calculus of variatio. Calculus of variatio is a techique usig classical calculus methods to solve maximizatio ad miimizatio problems where the solutio is a fuctio istead of a poit while isoperimetric problems are those that ivolve derivatives i the costraits. The mai idea is to write the objective fuctio ad costraits as a Lagragia fuctio with h beig replaced by g + εp. Here, g is assumed to be the solutio to the problem ad ε p is the perturbig fuctio with p beig a arbitrary fuctio that vaishes at - ad (same as g ) ad ε is a arbitrary costat. The Lagragia fuctio (V) is the a fuctio of ε. Note that as ε 0, h g. Therefore, the first-order ecessary coditio dv for the problem is the give by = 0 d ε ε= 0 ad the secod-order sufficiet coditio for d V maximizatio problems is < 0. For a referece to calculus of variatio, see Chiag dε (99). To get the Euler-Lagrage variatioal coditio for (3-3) from first priciples, (3-3) is re-writte as a fuctio of ε as follows: V ( ε) = log{ g( xi ) + εp( xi )} 4λ ( g + εp ) γ { g + εp} (3-4) i= Expadig (3-4) results i V ( ε) = log i i p i= { g( x ) + εp( x )} 4λ {( g ) + εg p + ε ( p ) } γ { g + εgp + ε } Differetiatig V(ε) with respect to ε, dv ( ε) = dε i= p( xi ) 4λ g( x ) + εp( x ) ad usig the first-order ecessary coditio gives i i { g p + ε( p ) } γ { gp + εp } dv ( ε) dε ε= 0 = p( xi ) 4λ g p γ gp = 0 g( x ) i= i

21 Usig the shiftig property of the Dirac delta fuctio, p x ) ca be writte as ad itegratig the secod term by parts, p x ) = p( x) δ( x x ) ( i i ( i g p = g p g p = g where use is made of the fact that p vaishes at - ad. Substitutig the above ito the first-order coditio gives p( x) δ( x x ) i + 4λ g p γ gp = i = g( x ) Factorig p ad collectig terms gives p i = Sice p is arbitrary, the above reduces to i = i δ( x xi ) + 4λg γg = 0 g( xi ) δ( x xi ) + 4λg γg = 0 g( x ) i As the Dirac delta fuctioal is zero except at zero, the fial form of the Euler-Lagrage variatioal coditio is give by = δ ( x xi ) For the secod-order ecessary coditio, g = γg 4λg p 0 (3-4) d V p( xi ) = 8λ dε i= { g( x ) + εp( x )} i i ( p ) γ p < 0 which esures that the solutio g gives a maximum solutio to the problem. Kloias (98) foud that g * h ( x) = i= ( i ) *( x ) K x x h g h i (3-5) characterized the solutios to (3-4) where K h = h K x ad K = h e x. Here, the priciple of suppositio is employed where (3-4) is solved for each ad the the 3

22 solutio is added together for the fial solutio to (3-4). Kerel fuctios, K, are used to solve the equatio K K = δ ad the scaled versio, K h, is a solutio to the equatio K h h K = δ (3-6) * which looks like a = versio of (3-4). Dividig (3-6) by gh ( xi ) from to gives * where g ( x) h = ( i ) ( x ) h ( i ) ( ) K x x K x x h h δ h = i = g i= g x i= i= * * * h i h i h * * g x h g x h ( ) ( ) ( i ) *( x ) K x x h g h i h δ = i= ( x xi ) * g ( x ) fuctioal is zero except at zero δ x x * * i= gh ( x) h gh ( x) = * g ( x) h i ( x xi ) g ( x ) i ad summig ad (3-5) are used for the substitutio. Sice the delta h ( ) i (3-7) Istead of solvig for λ ad γ, the solutio has bee reparameterized to oly oe * parameter, h. To tackle the secod costrait that g h is the square root of a desity * g where the square will itegrate to oe, let a = g * h ( x) h ( x) dx. The gh ( x) = where a * itegratig the square of g h gives oe, which verifies that g h root of a desity. Replacig (3-7) with g h results i * where g ( x) = ag ( x). h h δ i= a gh ( x) a h gh ( x) = g ( x) h ( x x ) i ideed results i a square (3-8) As for the first costrait o the Fisher Iformatio, a expressio for Fisher Iformatio is obtaied by multiplyig (3-8) by g h ( x) ad itegratig. The first term equals a sice gh ( x) dx = by defiitio. Itegratig the secod by parts 4

23 h ( ) h ( ) [ h ( ) h ( ) ( ) ( ) h ( ) ] h ( ) h a g x g x dx = h a g x g x g x dx = h a g x dx where use is made of the assumptio that the desity vaishes at the limits of its support. The third term equals sice δ( x xi ) dx = δ( x xi ) dx = = i= i= i= After the above simplificatio, (3-8) becomes Hece, I F ( gh ) a = 4 = 4 a h ( ) a + a h g h = where the Fisher Iformatio is a cotiuously decreasig fuctio of h. If the data have bee stadardized, solvig h for I F = would produce the required desity estimate. The Normal Iformatio Test statistic, F, is twice the log likelihood ratio that compares the estimated desity, g, to that of ormality, f 0 : ( g ) l F = log = log g log f0 l f (3-9) ( 0) i= i= Usig the maximum likelihood estimators, $ σ ad x, i f 0 yields F = log g log + log $ xi x i= σ$ i= ( π ) ( σ ) ( ) The third term vaishes sice σ$ = as the data have bee stadardized. I additio, the last term simplifies to usig the defiitio of $ σ. Fially, F simplifies to F = 4 log g + log( π ) + (3-0) i= 3.4 Computatioal Algorithm The algorithm to get F is as follows :. Stadardize the data to get variace oe. *. Solve the fixed poit equatio, g ( x) h = 5 i= ( i ) *( x ) K x x h g h i, by

24 a. usig a iitial estimate g ( 0) by takig the square root of a Laplace kerel desity estimate. ( i ) ( 0) ( xi ) K x x ( ) h b. computig a secod estimate by g ( x) = i= g ( ) c. suppressig oscillatios by g ( ) ( ) ( x) = [ g 0 ( x) + g ( x) ] ad iteratig to covergece. 3. Normalize the desity so that its square itegrates to oe. 4. Compute Fisher Iformatio i terms of h. 5. Fid h that gives Fisher Iformatio of oe usig the secat method, ad the calculate F. The details of implemetig this algorithm i FORTRAN are give i Appedices B. ad B Geeratio of Critical Values Usig Simulatio Sice the distributio of F is ukow, critical values have bee geerated via simulatio. Sets of ormal deviates are obtaied usig the subroutie ra from Press et al. (99). Te thousad values of F were geerated for each sample size, = 3()00(5)00. Differet sets of pseudo-radom umbers were used for each simulatio to avoid depedece betwee results. The critical values obtaied were the smoothed usig fifth degree polyomials. The resultig smoothed critical values are tabulated i Appedix A for α at 0.50, 0.5, 0.0, 0.5, 0.0, 0.05, 0.05, 0.0 ad 0.0 where bigger α values are available for those who are more iclied to acceptig o-ormality i their data. 6

25 Chapter 4 Evaluatios ad Applicatios to Real Data From the defiitio of F i (3-9), it ca be expected that the power of F is drive by the discrepacies betwee g ad f 0. The exact relatioship is give i (3-0) which shows that F depeds o the sample size,, ad the resultig square root of the desity, g. To evaluate the sesitivity of F to o-ormality, features of the data that affect g will be examied. The, based o those features, cojectures will be formed to see what aspects of o-ormality F will be sesitive to. These cojectures could the be cofirmed by a power compariso of F agaist existig tests. Fially, F is applied to some real data sets. 4. Features of Data that Affect Desity Estimate, g If the data are ormal, the theory behid F would idicate that g would give a good estimate of the desity. To get a rough bell-shaped desity, oe would expect clusterig of data poits i the ceter ad tail behavior to greatly affect the shape of g. Figure 4- shows the estimated desity plot for {-, 0, }. There are spikes at each data poit with the oe i the ceter receivig more weight tha the other two. For {-, 0, 0, } i Figure 4-, the middle spike has eve more weight with the additioal data poit i the ceter. With sparse data, the resultig desity estimate has to fill the spaces betwee ad aroud data poits to have a desity with area that sums to oe. Hece, oe would ot expect F to be powerful. As sample size icreases, the desity estimate is icreasigly drive by the locatio of data poits ad how they cluster together. As a result, the ability of F to detect o-ormality icreases. 7

26 Desity x Leged(desity):Solid-Estimated;Dotted-Normal Figure 4- Estimated desity for {-, 0, } Desity x Leged(desity):Solid-Estimated;Dotted-Normal Figure 4- Estimated desity for {-, 0, 0, } 8

27 Note that for Figure 4- ad Figure 4-, the resultig tails are both taperig getly dow at both eds. With o data i the tails, there is little discrepacy betwee g ad f 0. For {-, 0, 0,, 5} with a promiet outlier, the spike i the right tail i Figure 4-3 testifies to the sesitivity of g to tail behavior. Hece, tail behavior is aother feature i the data that affects g. Although g is ot a cosistet desity estimate of the uderlyig desity with o-ormal data, the discrepacies betwee g ad f 0 will have the potetial to iflate F sice the data might exhibit asymmetry ad/or sigificat tail misbehavior. Sice g is affected by clusterig of data ad tail behavior, oe would cojecture that F is most sesitive to leptokurtic, symmetric distributios sice the ability to iflate F exists i both tails. Next would be leptokurtic, asymmetric where the ability is ow cofied to oly oe tail. With short tails i platykurtic distributios, F should be less powerful. 0. Desity x Leged(desity):Solid-Estimated;Dotted-Normal Figure 4-3 Estimated desity for {-, 0, 0,, 5} 4. Power Compariso The power of F agaist existig ormality tests is compared through a simulatio study. For a review of other major power studies, see Shapiro et al. (968) ad Pearso 9

28 et al. (977). Refer to Sectio. for some geeral coclusios that have bee reached from these major power studies. 4.. Simulatio Set-up The simulatio study was carried out with = 0, 0, 50, 70 ad 00 with 000 samples draw from 3 o-ormal distributios specified i Table 4- ad Table 4- for symmetric ad asymmetric distributios, respectively. The distributios cosidered are classified accordig to the followig groups: I. symmetric, leptokurtic II. III. IV. symmetric, platykurtic asymmetric, leptokurtic asymmetric, platykurtic The distributios withi each group are arraged i order of icreasig departure from ormality as measured by the stadardized Fisher Iformatio, var(x)i F. This measure is chose so as to accout for differig variaces i distributios. Where var(x)i F does ot exist, the distributios are ordered o the basis of their stadardized coefficiet of kurtosis, β. I group I, the distributios iclude SC(ε,σ ε ) which is the scale-cotamiated ormal with 00ε% of N(0,σ ε ) beig the cotamiat. Similarly, LC(ε,µ ε ) is the locatio-cotamiated ormal with 00ε% of N( µ ε,) beig the cotamiat i group III. 0

29 Table 4- Properties of symmetric distributios used i simulatio study Distributios Var(X) β I F Var(X)I F I Symmetric, leptokurtic Normal 3 t Logistic SC(0.05, 9) *.4 SC(0.0, 9) *.4 t SC(0.05, 5) *.95 Laplace 6 SC(0.0, 5) *.7 t Cauchy II Symmetric, platykurtic U(0,) Beta(.5,.5) Beta(,) *usig umerical itegratio Table 4- Properties of asymmetric distributios used i simulatio study Distributios Var(X) β β I F Var(X)I F III Asymmetric, leptokurtic Weibull() LC(0.05,3) *. LC(0.0,3) *.40 LC(0.0,3) *.63 Chi-squared(0) LC(0.05,5) *.09 LC(0.0,5) * 3.03 LC(0.05,7) * 3.3 LC(0.0,5) * 4.54 LC(0.0,7) * 5.38 LC(0.0,7) * 8.78 Chi-squared(4) Chi-squared() Chi-squared() Weibull(0.5) Logormal(0, ) IV Asymmetric, platykurtic Beta(3,) Beta(,) *usig umerical itegratio

30 Table 4-3 Discrete distributio with ormal momets X P(X=x) Table 4-4 Power estimates of discrete distributio with ormal momets Sample size, K W W* A F α = α = The existig ormality tests cosidered i this study iclude W(W ), W* ad A. Recall that W is the Shapiro-Wilk (965) test ad A is Stephe s (974) versio to the Aderso-Darlig (954) test. Where the sample size exceeds 50, Shapiro-Fracia (97) W will be used i place of W sice it exteds the rage of W from 50 ad below to 400. W*, which is Roysto s (98) approximatio to W(W ), will be cosidered a separate test as it will be iformative to compare its power to W(W ). K is left out of the power study sice it is ot a omibus test. To illustrate this poit, a discrete distributio with ormal momets is added to this simulatio study. Table 4-3 gives the details of such a discrete distributio that has the same first to fourth momets as the ormal. The power of the ormality tests with this distributio is give i Table 4-4. Results are ot obtaied for K at = 0 sice the exact cotours are ot available. All the tests except K had estimated power above 0.60 eve for as low as 0. For sample sizes 0 or larger, these tests had estimated power of.00. The power for K is eve lower tha the omial α value especially for higher sample sizes. Hece, K,

31 i particular, ad momets tests, i geeral, are oly able to detect distributios with oormal momets ad are ot omibus tests. To differetiate betwee the tests to see if oe test is superior to aother, the practice i the literature has bee to determie which test has the highest power based o the same set of pseudo-radom umbers for each distributio. To geeralize the results across differet distributios, the averaged rak calculated for each test is sometimes used. The fact that a differet set of pseudo-radom umbers might give rise to a differet orderig of the power is usually igored. To accout for this variability, a formal statistical test o the equality of the power of the tests is coducted i this power study. As all the tests are subjected to the same set of pseudo-radom umbers, the powers of the idividual tests are correlated. Hece, Cochra s Q is used to accout for this correlatio. I cases where the equal power hypothesis is rejected, McNemar s test with correctio for cotiuity is used for pairwise comparisos to determie whether the test with the highest power is sigificatly differet from the rest. To maitai the overall type I error rate at 0.05 i the presece of multiple testigs, the idea from Fisher s Least Sigificace Differece is used here. This meas that multiple comparisos are carried out oly if the hypothesis of equal power usig Cochra s Q is rejected. I additio, the same type I error rate is used for both Cochra s Q ad McNemar s tests. For details of both tests, refer to Siegel ad Castella (988). The results from usig Cochra s Q ad McNemar s tests will be reflected as superscripts to the test with the highest power i this power study. The superscripts will deote the umber of tests, icludig the oe with the highest power, that are sigificatly better tha the rest. Hece, a would reflect that the test with the highest power has sigificatly higher power tha the rest while a 4 would mea that all the tests have the same power. The empirical level of each test is also give based o a ormal sample of % cofidece itervals o the empirical level of each test will be used to assess if they cotai the relevat omial levels. This iformatio is useful sice it acts as a check o possible iflatio/deflatio of the power estimates. 3

32 For programmig details ivolved i this simulatio study, please refer to Appedix B Results =0 Table C-(a) shows the results for α = The empirical level for each test is give by the power estimates for the ormal distributio. Here, all the cofidece itervals cotai the omial value of For group I, F is the most sesitive for most of the distributios, havig sigificatly higher power tha the other tests for SC(0.0, 9), SC(0.0, 5) ad t. For t 0 ad SC(0.05, 5) where F did ot have the highest power, all four tests have power that are ot sigificatly differet from oe aother. W is the least sesitive i distributios where ot all tests have the same power. For group II, A has the highest power i all three distributios. However, its power is ot sigificatly higher tha W* ad W while F proves to be the least sesitive. For asymmetric ad leptokurtic distributios i group III, there is o clear domiace of ay oe test. For locatio cotamiated ormals (LCs), A, W* ad F have the highest power for differet LCs, with A havig sigificatly higher power for LC(0.0,5). As for o-lcs, W* clearly is the most sesitive with sigificatly higher power for all distributios except Weibull(), Chi-squared(0) ad Weibull(0.5); F is the least sesitive especially for those with higher var(x)i F. As for distributios i group IV, W* has the highest power but it is ot sigificatly differet from W ad A while F agai proves to be the least sesitive. The results for α = 0.0 are give i Table C-6(a). Here, F has the highest power i most of the distributios i group I with the power beig sigificatly higher for the Laplace distributio. Agai, W is the least sesitive for distributios that are symmetric ad leptokurtic. As for group II, all tests except F are equally good at detectig oormality. For o-lcs i group III, W* is the most sesitive, with the power for Chisquared(4), Chi-squared() ad Logormal(0,) beig sigificatly higher. As for LCs, A ad F are more sesitive tha W ad W* i detectig o-ormality, with A havig 4

33 sigificatly higher power i LC(0.0,7). For group IV, both A ad W have the highest power but oe of them are sigificatly higher tha the rest. Oce agai, F is the least sesitive. =0 The results for α = 0.05 are give i Table C-(a). F is the most sesitive i detectig o-ormality i group I with sigificatly higher power i all distributios except t 0 ad SC(0.05, 9). For groups II, IV ad o-lcs, W is the most sesitive i most distributios, with sigificatly higher power i Chi-squared(0). W* proves to be equally good i most cases while F is the least sesitive. As for LCs, F is the most sesitive i six of the distributios with those for LC(0.05,3), LC(0.0,3) ad LC(0.05,7) beig sigificatly higher. W, W* ad A are equally sesitive i detectig o-ormality for the remaiig LCs but are ot as domiat as F. The results for α = 0.0 are give i Table C-7(a). For group I, F has sigificatly higher power i all distributios except i t 0 where all four tests are equally sesitive. O the whole, both W ad W* are most sesitive i detectig o-ormality i groups II ad IV as well as o-lcs. As for LCs, F has sigificatly higher power i LC(0.05,3), LC(0.0,3) ad LC(0.0,5). For the remaiig LCs, F, W ad W* are equally sesitive. =50 The results for α = 0.05 are give i Table C-3(a) with the empirical level for W beig much lower tha the omial value. Hece, the power for W is uderestimated ad it is ot surprisig that W* emerged with sigificatly higher power i groups II ad IV as well as i Weibull() ad Chi-squared(0) for the o-lcs. Further, F s positio is uchalleged i group I with sigificatly higher power i all distributios except for the Cauchy. F is also most sesitive for most LCs with sigificatly higher power for LC(0.05,3), LC(0.0,3), LC(0.05,5) ad LC(0.0,5). The other thig to ote is that certai distributios i group III with higher var(x)i F are begiig to be so extreme that all tests are equally adept at detectig them. 5

34 These distributios iclude LC(0.0,5), Chi-squared(), Weibull(0.5) ad Logormal(0, ). Table C-8(a) cotais the results for α =0.0. Here, the power for W is ot uderestimated. A fairer compariso ca the be made of the sesitivity of W* ad F. The results are similar for F i group I ad i LCs. I groups II ad III, W* still has sigificatly higher power i Beta(,), Beta(,) ad Beta(3,) but are equally sesitive for the remaiig distributios as W. The same applies to o-lcs. As for LCs, the oly aomaly is that A has sigificatly higher power for LC(0.0,3). Agai, distributios with high var(x)i F i group III are all detected by all of the ormality tests. =70 Table C-4(a) displays the results for α = The empirical level of W of is much higher tha the omial value of This cofirms the hesitace of Pearso et al. (977) i recommedig the use of W sice they poited out that the empirical critical values were overstated as a result of beig based oly o 000 samples. They wared that this ufairly ehaces the power of W. With this i mid, it is ot surprisig that for distributios i group I ad some LCs, W has sigificatly higher power tha the other tests. For these distributios, F cosistetly has the secod highest power for these distributios. I spite of the iflated power for W, both W* ad A maaged to have sigificatly higher power: W* i groups II ad IV as well as Weibull() i group III ad A i LC(0.0,3). I additio, the iflatio of power i W did ot affect those distributios i group III that are detected by all the ormality tests 00% of the time. The results for α = 0.0 i Table C-9(a) are very similar. =00 Sice the power of W is iflated for =70, the critical value used for W for =00 was the average empirical critical value of W obtaied by Pearso et al. (977) to adjust for the iflatio to get a fair compariso. This is reflected i Table C-5(a) ad 6

35 Table C-0(a) where the omial levels are cotaied i the 95% cofidece itervals for the empirical levels. I Table C-5(a) for α = 0.05, F ad W are sesitive i detectig o-ormality i group I with F havig sigificatly higher power for t 4 ad the Laplace distributio. As for groups II ad IV as well as o-lcs distributio like Weibull() ad Chi-squared(0), W* has sigificatly higher power. As for LCs, W has sigificatly higher power i LC(0.05,3) ad LC(0.0,3) while A excels i detectig LC(0.0,3). For the most of the remaiig distributios i group III, all the tests are able to detect o-ormality 00% of the time. A look at Table C-0(a) for α = 0.0 reveals similar fidigs. W* has sigificatly higher power i group II as well as for Weibull() i group III ad Beta(3,) i group IV. Agai, W ad A have sigificatly higher power i the same LCs ad slightly more tha half of the distributios i group III are detected 00% of the time by all the ormality tests. However, i group I, F ad W are ow equally sesitive i detectig o-ormality i group I Summary As expected, o oe test has sigificatly higher power tha all other tests for all the distributios. However, some broad patters have emerged regardig the sesitivity of each test to the differet types of distributio. The followig summarizes the results from the power study:. For distributios that are symmetric ad leptokurtic, F is superior to the other tests for detectig o-ormality.. For distributios that are platykurtic or asymmetric excludig LCs, W* is superior for larger sample sizes ( 50) while both W ad W* are equally sesitive for smaller oes. 3. LCs behaves like a cotiuum betwee leptokurtic distributios that are symmetric to those that are asymmetric as, p ad µ ε icreases. Hece, o oe test is superior. With p ad µ ε small, F is more sesitive for smaller ( 50) while W is better at larger. As p ad µ ε icrease, A is more sesitive. However, there comes a poit 7

36 whe p ad µ ε become so big that all the ormality tests easily detect o-ormality 00% of the time. 4. It is ot surprisig that whe sample sizes are small (<50), W* has power that is equal to W. However, at larger sample sizes, W* is preferred, sice its power is either iflated/deflated. I some cases, W* has sigificatly higher power tha W eve whe W s power estimates are iflated. Hece W* is preferred over W. 5. A examiatio of the power of F shows that besides icreasig with, it also varies directly with Var(X)I F, albeit the relatioship is ot a determiistic oe. 4.3 Applicatios to Real Data Sets I this sectio, F is applied to several real data sets. Here, the estimated desity is plotted agaist the ormal desity with the same sample mea ad variace as the data. This graphic best illustrates ay deviatio from ormality ad provides a ready explaatio whe ormality is rejected. 8

37 0.0 Desity F = P-value < X Leged(desity):solid-estimated;dotted-ormal Figure 4-4 Desity estimate of Male Weights Data (=) 4.3. Male Weights Data Shapiro ad Wilk (965) used their test o a data set of adult male weights take from Sedecor (946). These are, i pouds, 48, 54, 58, 60, 6, 6, 66, 70, 8, 95, ad 36. The resultig statistic F is 4.758, which is beyod the 99 th percetile. This is cosistet with the result give by W. The resultig desity estimate i Figure 4-4 shows a promiet outlier at 36 with a peak i the right tail that accouts for the rejectio of ormality for this data set. 9

38 Desity F = 5.63 P-value betwee 0.0 ad X Leged(desity):solid-estimated;dotted-ormal Figure 4-5 Desity estimate of Mississippi River Data (=49) 4.3. Mississippi River Data Aother example is take from Gumbel (943) which gives the maximum daily rates of discharge from the Mississippi River at Vicksburg i cubic feet per secod for 49 years startig from 890. Assumig that the data are idepedet, the resultig statistic F beig 5.63 is betwee the 50 th ad 90 th percetile. From Figure 4-5, it ca be see that the data do ot deviate much from ormality ad hece supports the cotetio that the data ca be approximated by the ormal distributio. 30

39 Desity F = P-value betwee 0.05 ad X Leged(desity):solid-estimated;dotted-ormal Figure 4-6 Desity estimate of PCB Data (=65) PCB Data A third example is take from Risebrough (97) who was studyig cocetratios of polychloriated bipheyl (PCB), a idustrial pollutat, i the yolk lipids of pelica eggs. He had a sample size of 65 ad the resultig F is 7.675, which is betwee the 95 th ad 97.5 th percetile. Figure 4-6 shows the resultig desity plot which is close to ormal except for two outlyig poits i the right tail. Rejectio of ormality usig α of 0.05 is therefore ot surprisig. 3

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality A goodess-of-fit test based o the empirical characteristic fuctio ad a compariso of tests for ormality J. Marti va Zyl Departmet of Mathematical Statistics ad Actuarial Sciece, Uiversity of the Free State,

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract Goodess-Of-Fit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the

More information

Power Comparison of Some Goodness-of-fit Tests

Power Comparison of Some Goodness-of-fit Tests Florida Iteratioal Uiversity FIU Digital Commos FIU Electroic Theses ad Dissertatios Uiversity Graduate School 7-6-2016 Power Compariso of Some Goodess-of-fit Tests Tiayi Liu tliu019@fiu.edu DOI: 10.25148/etd.FIDC000750

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y. Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Department of Mathematics

Department of Mathematics Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you

More information

The performance of univariate goodness-of-fit tests for normality based on the empirical characteristic function in large samples

The performance of univariate goodness-of-fit tests for normality based on the empirical characteristic function in large samples The performace of uivariate goodess-of-fit tests for ormality based o the empirical characteristic fuctio i large samples By J. M. VAN ZYL Departmet of Mathematical Statistics ad Actuarial Sciece, Uiversity

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2 82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract Goodess-Of-Fit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated

More information

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE TERRY SOO Abstract These otes are adapted from whe I taught Math 526 ad meat to give a quick itroductio to cofidece

More information

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation Metodološki zvezki, Vol. 13, No., 016, 117-130 Approximate Cofidece Iterval for the Reciprocal of a Normal Mea with a Kow Coefficiet of Variatio Wararit Paichkitkosolkul 1 Abstract A approximate cofidece

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Access to the published version may require journal subscription. Published with permission from: Elsevier.

Access to the published version may require journal subscription. Published with permission from: Elsevier. This is a author produced versio of a paper published i Statistics ad Probability Letters. This paper has bee peer-reviewed, it does ot iclude the joural pagiatio. Citatio for the published paper: Forkma,

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

Stat 319 Theory of Statistics (2) Exercises

Stat 319 Theory of Statistics (2) Exercises Kig Saud Uiversity College of Sciece Statistics ad Operatios Research Departmet Stat 39 Theory of Statistics () Exercises Refereces:. Itroductio to Mathematical Statistics, Sixth Editio, by R. Hogg, J.

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

GUIDELINES ON REPRESENTATIVE SAMPLING

GUIDELINES ON REPRESENTATIVE SAMPLING DRUGS WORKING GROUP VALIDATION OF THE GUIDELINES ON REPRESENTATIVE SAMPLING DOCUMENT TYPE : REF. CODE: ISSUE NO: ISSUE DATE: VALIDATION REPORT DWG-SGL-001 002 08 DECEMBER 2012 Ref code: DWG-SGL-001 Issue

More information

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS STRUCTURE OF EXAMINATION PAPER. There will be oe 2-hour paper cosistig of 4 questios.

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Power and Type II Error

Power and Type II Error Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error

More information

Math 10A final exam, December 16, 2016

Math 10A final exam, December 16, 2016 Please put away all books, calculators, cell phoes ad other devices. You may cosult a sigle two-sided sheet of otes. Please write carefully ad clearly, USING WORDS (ot just symbols). Remember that the

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Radom Samplig The basic idea of the statistical iferece is that we are allowed to draw ifereces or coclusios about a populatio based

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

GG313 GEOLOGICAL DATA ANALYSIS

GG313 GEOLOGICAL DATA ANALYSIS GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9 Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

of the matrix is =-85, so it is not positive definite. Thus, the first

of the matrix is =-85, so it is not positive definite. Thus, the first BOSTON COLLEGE Departmet of Ecoomics EC771: Ecoometrics Sprig 4 Prof. Baum, Ms. Uysal Solutio Key for Problem Set 1 1. Are the followig quadratic forms positive for all values of x? (a) y = x 1 8x 1 x

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

SNAP Centre Workshop. Basic Algebraic Manipulation

SNAP Centre Workshop. Basic Algebraic Manipulation SNAP Cetre Workshop Basic Algebraic Maipulatio 8 Simplifyig Algebraic Expressios Whe a expressio is writte i the most compact maer possible, it is cosidered to be simplified. Not Simplified: x(x + 4x)

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. XI-1 (1074) MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. R. E. D. WOOLSEY AND H. S. SWANSON XI-2 (1075) STATISTICAL DECISION MAKING Advaced

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information

11 THE GMM ESTIMATION

11 THE GMM ESTIMATION Cotets THE GMM ESTIMATION 2. Cosistecy ad Asymptotic Normality..................... 3.2 Regularity Coditios ad Idetificatio..................... 4.3 The GMM Iterpretatio of the OLS Estimatio.................

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information