Data Processing Techniques

Size: px

Start display at page:

Download "Data Processing Techniques"

Rosaline O’Neal’
5 years ago
Views:

1 Universitas Gadjah Mada Department of Civil and Environmental Engineering Master in Engineering in Natural Disaster Management Data Processing Techniques Hypothesis Tes,ng 1

2 Hypothesis Testing Mathema,cal model vs measurement. Comparison of theore,cal line (computed by model) and measured values. If computed values match with measured ones, the model is accepted. If computed values do not fit to measured ones, the model is rejected. We have in many cases Comparison of the computed and measured values cannot give clear clue whether to accept or to reject the model. Hypothesis tes,ng provides an analysis tool in the comparison.

3 Hypothesis Testing Steps in making sta,s,cal tests Formulate the hypothesis to be tested. Formulate an alterna,ve hypothesis. Define a test sta,s,c. Define the distribu,on of the test sta,s,c. Define the rejec,on region or cri,cal region of the test sta,s,c. Collect the data needed to calculate the test sta,s,c. Determine if the calculated value of the test sta,s,c falls in the rejec,on region of the distribu,on of the test sta,s,c. 3

4 Errors in Hypothesis Testing decision hypothesis is true hypothesis is false accept hypothesis correct decision Type II error reject hypothesis Type I error correct decision 4

5 Notation H 0 = null hypothesis (hypothesis being tested) H 1 = alterna,ve hypothesis 1 α = confidence level α = level of significance 5

6 Hypothesis Testing on Mean H 0 : μ = μ 1 H 1 : μ = μ Test sta,s,c: Z = Normal distribu,on σ X is known If μ 1 > μ à H 0 is rejected if: If μ > μ 1 à H 0 is rejected if: n ( X µ σ 1 ) has a normal distribu,on. X X µ 1 z 1 α σ X n X µ 1 + z 1 α σ X n Z z 1 α - z 1- α 1 α α Z z 1 α z 1- α α 1 α 6

7 Hypothesis Testing on Mean H 0 : μ = μ 1 H 1 : μ = μ Test sta,s,c: Normal distribu,on σ X is unknown T = If μ 1 > μ à H 0 is rejected if:! T t 1 α,n 1 If μ > μ 1 à H 0 is rejected if:! T t 1 α,n 1 n ( X µ s 1 ) X has a t distribu,on with n 1 degrees of freedom. α - t 1- α,n- 1 1 α 1 α α t 1- α,n- 1 7

8 Hypothesis Testing on Mean H 0 : μ = μ 0 H 1 : μ μ 0 Normal distribu,on σ X is known Test sta,s,c: Z = n ( X µ has a normal distribu,on. σ 0 ) X H 0 is rejected if: Z = n σ X X µ 0 ( ) > z 1 α α/ - z 1- α/ 1 α α/ z 1- α/ 8

9 Hypothesis Testing on Mean H 0 : μ = μ 0 H 1 : μ μ 0 Test sta,s,c: Normal distribu,on σ X is unknown T = H 0 is rejected if: n ( X µ s 0 ) X T = has a t distribu,on with n 1 degrees of freedom. n ( X µ s 0 ) > t 1 α,n 1 X α/ 1 α α/ - t 1- α/,n- 1 t 1- α/,n- 1 9

10 Hypothesis Testing on Mean Result of a hypothesis tes,ng accept H 0 fail to reject H 0 Meaning H 0 : μ = μ 0 Accep,ng H 0 means that we fail to reject H 0 à we say that based on the sample that we have, we say that the popula,on mean is not significantly different from μ 0 we cannot say that the popula,on mean really equals to μ 0 since we do not prove that μ = μ 0 10

11 Test for Differences in Means of Two Normal Distributions H 0 : μ 1 μ = δ H 1 : μ 1 μ δ Test sta,s,c: Z = H 0 is rejected if: Z > z 1 α Normal distribu,on var(x 1 ) and var(x ) are known X 1 X δ ( σ 1 n 1 + σ n ) 1 has a standard normal distribu,on. α/ - z 1- α/ 1 α α/ z 1- α/ 11

12 Test for Differences in Means of Two Normal Distributions H 0 : μ 1 μ = δ H 1 : μ 1 μ δ Test sta,s,c: T = H 0 is rejected if: Normal distribu,on var(x 1 ) and var(x ) are unknown X 1 X δ ( )s 1 + ( n 1)s n 1 n n 1 + n { } ( n 1 + n ) n 1 1 ( ) 1 has a t distribu,on with n 1 +n degrees of freedom. T > t 1 α,n1 +n α/ 1 α α/ - t 1- α/,n1+n- t 1- α/,n1+n- 1

13 Test on Variance H 0 : σ = σ 0 H 1 : σ σ 0 Test sta,s,c: H 0 is accepted if: χ c = Normal distribu,on n i=1 χ α,n 1 ( X i X ) σ 0 < χ c < χ 1 α,n 1 has a chi- square distribu,on. α/ 1 α χ α,n 1 χ 1 α,n 1 α/ 13

14 Test on Variance of Two Normal Distributions H 0 : σ 1 = σ H 1 : σ 1 σ Test sta,s,c: H 0 is rejected if: Normal distribu,on F c = s 1 s has an F distribu,on with n 1 1 and n 1 degrees of freedom. F c > F 1 α,n1 1,n 1 1 α α F 1 α,n1 1,n 1 14

15 Test on Variance of Several Normal Distributions H 0 : σ 1 = σ = = σ k H 1 : σ 1 σ σ k h = 1+ N = k i=1 1 3 k 1 ( ) n i k 1 n i=1 N k Normal distribu,on Q Test sta,s,c: has a chi- square distribu,on with (k 1) degrees of freedom. h k k n Q = n H 0 is rejected if: ( ( i 1 i 1)s i k )ln ( n i=1 i=1 N k i 1)lns i i=1 Q h > χ 1 α,k 1 1 α χ 1 α,k 1 α 15

16 Hypothesis Testing Exercises Refer to the annual peak discharge of XYZ River. Test that the peak discharge of XYZ River has mean value of 650 m 3 /s and variance of 45,000 m 6 /s. Refer to file en,tled Exercises on hypothesis tes,ng.pdf Do these exercises. 16

17 Testing The Goodness of Fit of Data to Probability Distributions Graphical (and visual) methods to judge whether or not a par,cular distribu,on adequately describes a set of observa,ons: plot and compare the observed rela,ve frequency curve with the theore,cal rela,ve frequency curve plot the observed data on appropriate probability paper and judge as to whether or not the resul,ng plot is a straight line Sta,s,cal tests: chi- square goodness of fit test the Kolmogorov- Smirnov test 17

18 Annual Peak Discharge of XYZ River Rela4ve frequency 0.0 theore,cal distribu,on observed data Discharge (m 3 /s) 18

19 markers: observed data line: theore,cal distribu,on 19

20 Normal Distribution Paper 0

21 Chi- square Goodness of Fit Test Method of test Comparison between the actual number of observa,ons and the expected number of observa,ons (expected according to the distribu,on under test) that fall in the class intervals. The expected numbers are calculated by mul,plying the expected rela,ve frequency by the total number of observa,ons. The test sta,s,c is calculated from the following rela,onship: χ c = k i=1 ( O i E i ) E i 1

22 Chi- square Goodness of Fit Test The test sta,s,c is calculated from the following rela,onship: χ c = k i=1 ( O i E i ) E i where: k is the number of class intervals O i is the number of observa,ons in the ith class interval E i is the expected number of observa,ons in the ith class interval according to the distribu,on being tested χ c has a distribu,on of chi- square with (k p 1) degrees of freedom, where p is the number of parameters es,mated from the data

23 Chi- square Goodness of Fit Test The test sta,s,c is calculated from the following rela,onship: χ c = k i=1 ( O i E i ) E i The hypothesis that the data are from the specified distribu,on is rejected if: χ c > χ 1 α,k p 1 1 α α χ 1 α,k p 1 3

24 The Kolmogorov- Smirnov Test Steps in the Kolmogorov- Smirnov test: Let P X (x) be the completely specified theore,cal cumula,ve distribu,on func,on under the null hypothesis. Let S n (x) be the sample comula,ve density func,on based on n observa,ons. For any observed x, S n (x) = k/n where k is the number of observa,ons less than or equal to x. Determine the maximum devia,on, D, defined by: D = max P X (x) S n (x) If, for the chosen significance level, the observed value of D is greater than or equal to the cri,cal tabulated of the Kolmogorov- Smirnov sta,s,c, the hypothesis is rejected. Table of Kolmogorov- Smirnov test sta,s,c is available in many books on sta,s,cs. 4

25 The Kolmogorov- Smirnov Test Notes on the Kolmogorov- Smirnov test: The test can be conducted by calcula,ng the quan,,es P X (x) and S n (x) at each observed point or By plosng the data on the probability paper and and selec,ng the greatest devia,on on the probability scale of a point from the theore,cal line. The data should not be grouped for this test, i.e. plot each point of the data on the probability paper. 5

26 Chi- square Goodness of Fit Test and The Kolmogorov- Smirnov Test Exercise Do the chi- square goodness of fit test and the Kolmogorov- Smirnov test to the annual peak discharge of XYZ River against normal distribu,on. 6

27 Chi- square Goodness of Fit Test and The Kolmogorov- Smirnov Test Notes on both tests when tes,ng hydrologic frequency distribu,ons. Both tests are insensi,ve in the tails of the distribu,ons. On the other hand, the tails are important in hydrologic frequency distribu,ons. To increase sensi,vity of chi- square test The expected number of observa,ons in a class shall not be less than 3 (or 5). Define the class interval so that under the hypothesis being tested, the expected number of observa,ons in each class interval is the same. The class intervals will be of unequal width. The interval widths will be a func,on of the distribu,on being tested. 7

28 Chi- square Goodness of Fit Test and The Kolmogorov- Smirnov Test Exercise Redo the chi- square goodness of fit test and the Kolmogorov- Smirnov test to the annual peak discharge of XYZ River against normal distribu,on. Define the class intervals so that the expected number of observa,ons in each class interval is the same. 8

29 9

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz

Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques