Relating Graph to Matlab

Size: px

Start display at page:

Download "Relating Graph to Matlab"

Roy Horn
5 years ago
Views:

1 There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics background Hypothesis Testing and Matlab -should be read by everyone (except those who have previously done hypothesis testing with Matlab functions) Handout

2 Relating Graph to Matlab The quantity t a,n corresponds to the c in the previous graph. This quantity can be looked up in statistics books or is available in matlab as the function tinv. The reading on Hypothesis testing and Matlab tells you how to compute the arguments of tinv. The x (horizontal axis) in the previous graph is related to the values of the specific samples that you collect. To do the analysis we do some transformations to covert the variable into something with a mean near zero and a normalized variance. Corresponding to x horizontal axis is the sample statistic t x o t n(x o ) /s s / n 2

3 See the handout for calculating the test statistic and rejection regions. Matlab in statistical testing: The following commands will be useful for statistical testing: ttest, ttest2, ranksum, mean, std 3

4 Formal testing procedure Test statistic: Hypothesis Testing: Details t x o o s / n n(x ) /s appropriate when s is unknown. s is the sample standard deviation, x is the sample mean, and t is dimensionless and allows many problems to be formulated in a common framework. If Ho is true, then T n-1 ~ (Student) t-distribution with v= n-1 Choice of Hypothesis "Statistical tests are predisposed to accept Ho. A test is only effective if one collects sufficient data to reject the null hypothesis. Upon which hypothesis should the burden of proof be placed? Hypothesis Testing 4

5 Hypothesis Testing: Details (1 of 2) Decision Rules The null hypothesis is Ho: = o The test statistic value is We construct a rejection region such that the Type I error probability is controlled to a desired level, i.e., we select an a. If the alternative hypothesis is: t x o o s / n n(x ) /s Then the rejection region for a level a test is: H a = > 0 H a = < 0 t t a,n t t a,n H a = 0 t t a/2,n If H a is true, then the type II error b can be computed using the type I error a, the degrees of freedom n, and the standardized distance Hypothesis Testing 5

6 Hypothesis Testing: Example On a national test the average is 75. I think Cornell students are smarter, so we randomly select 7 Cornell students and they take the test. Results: x 81.3 s x = 6.83 n = 7 Null Hypothesis: Ho: = 75 Alternative Hypothesis: Ha: > 75 Compute: t n(x o) /s 7 ( ) / Use α = 1% => t 0.01,6 = Because t < t a,n, we should not reject the Null Hypothesis. Hypothesis Testing 6

7 Interpretation Because t < t a,n, we should not reject the Null Hypothesis. What does this mean? The null Hypothesis is H o, which is the assumption that the mean is 75, the national average. Since you cannot reject the hypothesis that the Cornell mean is 75, that means you cannot statistically infer that the Cornell mean is better than the national average. 7

8 If the underlying distributions are assumed to be normal with unknown variance then the mean of the random variable will have a student t distribution. Under this additional assumption one of the following tests is applicable 8

Important Information on Starting values for N trials For methods (like SA, TS, DDS, etc.) that have ONE starting vector that is Scurr you should have N different stating values, e.g. Trial 1 starting value =(3,1); Trial 2 starting value= (7,2), Trial N starting value = (8,3) when you compare two algorithms (e.

9 Important Information on Starting values for N trials For methods (like SA, TS, DDS, etc.) that have ONE starting vector that is Scurr you should have N different stating values, e.g. Trial 1 starting value =(3,1); Trial 2 starting value= (7,2), Trial N starting value = (8,3) when you compare two algorithms (e.g. SA versus TS), then the ith trial should have the same value for the test of each of the algorithms. Hence, the second trial for each of the algorithms would start with Scurr=(7,2) in the example above. 9

10 Statistical Tests (Hypothesis testing) Hypothesis tests are a formal statistical way of making a decision about data that exhibit variability. In our case, the performance criterion ( the best Objective Function or the number of iterations required by an algorithm to reach a pre-specified low value) is usually variable between trials and between algorithms. The following assumptions are usually made in applying statistical tests: The random variables (the performance criterion) are all identically distributed with the same shape and spread. The random samples obtained from each trial of the algorithm are independent 10

11 Introduction To Random Variables A random variable X(s) is a real-valued function which assigns a real number X(s) = x to every sample point s S In our algorithm examples the random variable can be : The best objective function found in an algorithm trial The number of objective function evaluations in an algorithm trial to come within some percentage of the optimal value Random Variables 11

12 Sample Mean is If x i are sample values x Sample Variance 1 n xi n i 1 1 n 2 s xi x n 1 i Alternate formula for sample variance s 2 i 2 / 2 i x x n n 1 12

13 Hypothesis Testing: Details Decision Rules The null hypothesis is Ho: = o The test statistic value is We construct a rejection region such that the Type I error probability is controlled to a desired level, i.e., we select an a. If the alternative hypothesis is: t x o o s / n n(x ) /s Then the rejection region for a level a test is: H a = > 0 H a = < 0 t t a,n t t a,n H a = 0 t t a/2,n If H a is true, then the type II error b can be computed using the type I error a, the degrees of freedom n, and the standardized distance Hypothesis Testing 13

14 Hypothesis Testing How to make decisions? Define your Hypothesis Compute a test statistic (explained later) Decide on acceptable error probability (significance level, α) Obtain an appropriate critical value for the test statistic at the chosen significance level If absolute value of the test statistic is greater than the critical value then reject the null hypothesis of equality of means at significance level α 14

15 Hypothesis Testing Type-1 error: Incorrectly rejecting null hypothesis when it is true. α = probability of type-i error = P(Rejecting H 0 when H 0 is true). The α is often referred to as the significance level of the test p-value is the smallest value of the type-i error a such that the observed results would be sufficient to reject the null hypothesis. It is a convenient summary of the statistical significance of the observed result. 15

16 Hypothesis Testing Paired t-test: When the number of samples are equal and the conditions under which the computational experiments are performed allows the samples to be considered as paired data then this test is appropriate. Conditions? Are the starting solutions are the same in each trial for both algorithms? For example, in comparing SA and Greedy Search, it is possible to start both algorithms with the same starting solutions Are the same number of trials are performed for both algorithms? Our Example is suitable for a Paired t-test if we assume the algorithms are such that in the kth trial for each algorithm, both algorithms have the same initial guess. 16

17 Hypothesis Testing Paired t-test: Define Hypothesis H 0 : D = 0 Both the algorithms are same H a: D 0 algorithms are different (two tailed test) Where, D = µ 1 - µ 2 is the difference in means of the two algorithms. D has a student t distribution. Test statistic (for example in earlier slide) t D S n (we use this number later)» Where D is the mean of the sample differences (X i Y i ) and s is the standard deviation of sample differences. 17

18 Statistical Comparison of Algorithms:Example (repeated) Consider the following table which shows the objective function values for the best solution in each trial for two algorithms applied to the same problem (minimization) Mean Std Dev Algorithm Algorithm Which algorithm is better? We ll use hypothesis testing to answer this question. 18

19 Hypothesis Testing on Example (assuming paired test is OK): Are Algorithms Different? paired t=4.52 (from previous slide) α = 5% so t (α/2,v) = 2.23 (2.23 comes from tables depending on the value of α /2 and v= number of degrees of freedom=n-1) Hence Rejection region is for t 2.23 or if t Since 4.52 > 2.23, reject Null Hypothesis (H 0 : D = 0 Both the algorithms are same) so can conclude the algorithms are different. 19

20 Paired t test: Which algorithm is better? Next thing is to decide which algorithm is better H 0 : D = 0 Both the algorithms are same H a: D 0 Algorithm2 is better (Upper tailed test) (recall Algorithm 2 had the lower mean for a minimization problem) Everything stays the same as the previous test except the rejection region t α,v = Rejection region if t Conclusion?. paired t=4.52> Hence you can reject Ho so you can say Algorithm 2 is better based on paired test. 20

21 Hypothesis Testing: 2 Sample t-test (if paired test is not justified) 2- sample t-test Use this test if the number of samples (i.e., trials) for each algorithm are not equal or the data for the two algorithms is not paired For example, GA requires an initial population whereas SA requires only a single starting solution; in such a case it is not possible to consider the samples as paired. Define Hypothesis Null hypothesis H 0 : µ 1 = µ 2 Vs. Alternative hypothesis H a: µ 1 µ 1 21

22 Hypothesis Testing: Two Sample Test (different) Test statistic The S i 2 are the sample variances for Algorithms i=1 or 2 Choose alpha α = 5% t X 1 S n t-distribution has one parameter- degrees of freedom X 2 S n v = V= 22

23 Hypothesis Testing: Details Decision Rules The null hypothesis is Ho: = o The two sample test statistic value is given on previous slide with v=d.f. We construct a rejection region such that the Type I error probability is controlled to a desired level, i.e., we select an a. If the alternative hypothesis is: Then the rejection region for a level a test is: H a = > 0 H a = < 0 H a = 0 t t a,n t t a,n t t a/2,n If H a is true, then the type II error b can be computed using the type I error a, the degrees of freedom n, and the standardized distance 23 Hypothesis Testing

24 Hypothesis Testing Plugging in the numbers D. F = 18 for two tailed test t = Rejection region t 2.1 or if t -2.1 Cannot reject H o (so cannot say they are different) For upper tailed test t = 1.73 Rejection region t 1.73 Cannot reject H o 24

25 Non-parametric tests Normality assumption-large number of samples For costly objective functions it may not be possible to perform a large number of trials. Wilcoxon rank-sum test It does not assume normality of the underlying population. It requires that the two populations have the same shape and spread 25

26 Rank Sum test Test statistic Non-parametric tests w n 1 r i i 1 where r i = rank of X i in the combined sample of m + n (X s and Y s) Null Hypothesis H 0 : µ 1 = µ 2 (samples from same dist.) Alternative Hypotheses Rejection Region for Level Test Ha: µ 1 µ 2 (two-tailed test) w c or w m(m + n + 1) c Ha: µ 1 µ 2 (lower-tailed test) w m(m + n + 1) c 1 Ha: µ 1 µ 2 (upper-tailed test) t c 1 where P(W c 1 H o true) = α, and P(W c H o true) = α/2 26

27 Non-parametric tests If number of sample exceed 8 w can be approximated with a normal distribution z w m( m n 1) / 2 mn( m n 1) /12 Alternative Hypotheses Rejection Region for Level Test Ha:µ 1 µ 2 (two-tailed test) z z α /2 or z -z α /2 Ha:µ 1 µ 2 (lower-tailed test) z -z α /2 Ha:µ 1 µ 2 (upper-tailed test) z z α /2 27

28 Non-parametric tests rank Data value Algorithm 1 or rank Data value Algorithm 1 or m n 10 w m i 1 r i 118 Z w m( m n 1) / 2 mn( m n 1) /

29 Non-parametric tests: Example For the given data Test statistic w = 118 Z = p-value= Choose alpha α = 5% H o cannot be rejected 29

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<= A frequency distribution is a kind of probability distribution. It gives the frequency or relative frequency at which given values have been observed among the data collected. For example, for age, Frequency