Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla

Similar documents
Business Statistics. Lecture 10: Course Review

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Review of Statistics 101

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

ECON Introductory Econometrics. Lecture 2: Review of Statistics

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Inferences About Two Proportions

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Statistics Introductory Correlation

Review of probability and statistics 1 / 31

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Two Sample Problems. Two sample problems

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Business Statistics. Lecture 10: Correlation and Linear Regression

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Correlation & Simple Regression

Unit 6 - Introduction to linear regression

Analysis of repeated measurements (KLMED8008)

Applied Statistics and Econometrics

review session gov 2000 gov 2000 () review session 1 / 38

Correlation and Simple Linear Regression

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Inferences for Correlation

Applied Statistics and Econometrics

Quantitative Analysis and Empirical Methods

Can you tell the relationship between students SAT scores and their college grades?

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Week 3: Simple Linear Regression

Statistics for IT Managers

General Linear Model (Chapter 4)

Interpreting coefficients for transformed variables

Simple Linear Regression

ECO220Y Simple Regression: Testing the Slope

Tables Table A Table B Table C Table D Table E 675

Medical statistics part I, autumn 2010: One sample test of hypothesis

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

THE PEARSON CORRELATION COEFFICIENT

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Lecture 5: ANOVA and Correlation

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

TOPIC 9 SIMPLE REGRESSION & CORRELATION

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Chapter Eight: Assessment of Relationships 1/42

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Inference with Simple Regression

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Basic Probability Reference Sheet

Lab #12: Exam 3 Review Key

Gov 2000: 9. Regression with Two Independent Variables

Do not copy, post, or distribute

Econ 371 Problem Set #1 Answer Sheet

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Statistical Inference for Means

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Psych 230. Psychological Measurement and Statistics

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

16.400/453J Human Factors Engineering. Design of Experiments II

Notes 6: Correlation

Inference for Binomial Parameters

Part III: Unstructured Data

10.4 Hypothesis Testing: Two Independent Samples Proportion

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Introduction to Econometrics. Review of Probability & Statistics

Two-Sample Inferential Statistics

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Statistical Methods in Natural Resources Management ESRM 304

Binary Dependent Variables

Simple Linear Regression: One Quantitative IV

Categorical Data Analysis. The data are often just counts of how many things each category has.

POL 681 Lecture Notes: Statistical Interactions

Gov 2000: 6. Hypothesis Testing

Ch 2: Simple Linear Regression

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

STAT 4385 Topic 01: Introduction & Review

Correlation 1. December 4, HMS, 2017, v1.1

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Sampling Distributions: Central Limit Theorem

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Fundamental Probability and Statistics

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

Practice exam questions

Chapter 7 Comparison of two independent samples

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Sociology Exam 2 Answer Key March 30, 2012

Transcription:

Outline for Today 1 Review of In-class Exercise 2 Bivariate hypothesis testing 2: difference of means 3 Bivariate hypothesis testing 3: correlation 2 / 51

Task for ext Week Any questions? 3 / 51

In-class Exercise Is regime GDP per capita (US$): a 2 cats democracy? Low High Total -----------+----------------------+---------- o 49 19 68 Yes 40 69 109 -----------+----------------------+---------- Total 89 88 177 Pearson chi2(1) = 20.9459 Pr = 0.000 4 / 51

Interpreting Stata Output p-value will always be between 0 and 1 When Stata gets a p-value smaller than 0.0005, it will round the number and tell you it s 0.000. When Stata tells you it s 0.000, what it actually means is 0 < p < 0.0005. When Stata tells you it s 0.000, we say p-value is smaller than 0.001. 5 / 51

Bivariate Hypothesis Test 2: Difference of Means Y is continuous but X is categorical We follow the same logic and same steps as cross-tabulation analysis: 1 Form the null and alternative 2 Examine and describe the sample 3 Compare the observed and expected 4 Reject or not reject the null 6 / 51

Female Representation in Parliament 7 / 51

Female Representation and PR System Proportional representation voting system, compared with majority voting system, favours minority and under-represented groups in a society. Hypothesis Female representation is higher in countries that adopt the PR system than in countries that adopt the majority system. What s Y? Female representation (% women in parliament) What s X? Whether a country has a PR system or not (Yes or o) The null hypothesis: there is no relationship between PR system and female representation 8 / 51

Graphical Summaries of Y by X Percent women in lower house of parliament, 2009 0 20 40 60 o Yes Graphs by PR system? 9 / 51

Graphical Summaries of Y by X o Yes Frequency 0 5 10 15 20 0 20 40 60 0 20 40 60 Percent women in lower house of parliament, 2009 Graphs by PR system? 9 / 51

Graphical Summaries of Y by X 0.01.02.03.04 PR System Majority System 0 20 40 60 Percent women in lower house of parliament, 2009 9 / 51

umerical Summaries of Y by X bysort pr_sys: sum women09 -> pr_sys = o Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- women09 114 14.15965 9.459815 0 43.2 ------------------------------------------------------------------------- -> pr_sys = Yes Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- women09 66 22.38939 11.71783 5.8 56.3 -------------------------------------------------------------------------. sum women09 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- women09 180 17.17722 11.05299 0 56.3 10 / 51

Calculating the Test Statistic It does seem that there is some relationship between X and Y in the sample. The next step is to see if this observed relationship is sufficiently different from the relationship we would obtain if the null hypothesis were true. This involves calculating the test statistic: In univariate analysis of the mean, we calculated the sample mean. In cross-tabulation, we calculated the 2 statistic. In difference of means test, we calculate the t-statistic. 11 / 51

t-statistic The t-statistic for the difference of means test is where t = Ȳ1 Ȳ 2 se(ȳ 1 Ȳ 2 ), Ȳ i is the sample mean of Y for group i In our example, Ȳ PR is the mean percentage of women in parliament for countries with PR system, and Ȳ M is the mean percentage of women in parliament for countries with Majority system. t is small (in absolute values) when the difference between two mean values are similar. The greater the se, the smaller the t-statistic! less confidence we have in rejecting the null. 12 / 51

Standard Error of the Difference Recall that, in the univariate case, se of the sample mean is: se(ȳ )= s p n. In the bivariate case, se of the difference of sample means is: s (n 1 1)s1 2 se(ȳ 1 Ȳ 2 )= +(n r 2 1)s2 2 1 + 1, n 1 + n 2 2 n 1 n 2 where n i is the sample size for group i s i is the standard deviation for group i 13 / 51

Standard Error of the Difference -> pr_sys = o Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- women09 114 14.15965 9.459815 0 43.2 ------------------------------------------------------------------------- -> pr_sys = Yes Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- women09 66 22.38939 11.71783 5.8 56.3 Let s say group 1 is Majority System and group 2 is PR System: n 1 = 114, Ȳ 1 = 14.15965, s 1 = 9.459815 n 2 = 66, Ȳ 2 = 22.38939, s 2 = 11.71783 q q (n se(ȳ 1 Ȳ 2 )= 1 1)s1 2+(n 2 1)s2 2 n 1 +n 2 2 q (114 1)9.459815 = 2 +(66 1)11.71783 2 114+66 2 1 n 1 + 1 n 2 q 1 114 + 1 66 = 1.5995682 14 / 51

t-statistic n 1 = 114, Ȳ 1 = 14.15965, s 1 = 9.459815 n 2 = 66, Ȳ 2 = 22.38939, s 2 = 11.71783 se(ȳ 1 Ȳ 2 )=1.5995682 t = Ȳ1 Ȳ 2 se(ȳ 1 Ȳ 2 ) = 14.15965 22.38939 1.5995682 = 5.144976 We now need to determine how unusual (significant) the t-statistic of 5.145 is. 15 / 51

Signs of t-statistic As we calculated the t-statistic by setting Majority System as group 1 and PR system as group 2, Our causal theory expects Y 1 t-statistic < 0. The null hypothesis expects Y 1 be 0. Y 2 to be negative, and hence Y 2 to be zero, hence t-statistic to The smaller the t statistic, the more confidence we have in our causal theory. Stata does not know which alternative hypothesis you have: Ȳ 1 > Ȳ 2 Ȳ 1 6= Ȳ 2 Ȳ 1 < Ȳ 2 so it reports all three results. 16 / 51

Sampling Distribution Recall that the sampling distribution of sample mean follows ormal distribution; the sampling distribution of 2 statistic follows 2 distribution. Similarly, the sampling distribution of t-statistic follows (Student s) t-distribution. Let s learn a little bit about probability distributions. 17 / 51

Probability Distribution Probability distribution: list of probabilities assigned to possible outcomes. One way to describe a probability distribution is to identify PMF or PDF: Discrete (' categorical) variables: probability mass function (PMF) Bernoulli distribution: e.g., heads with p, tails with 1 Continuous variables: probability density function (PDF) uniform distribution ormal distribution 2 distribution t distribution Area under the curve represents the probabilities. p 18 / 51

Coin Flips: Bernoulli Distribution Let s say the outcome of a coin flip is our variable of interest: Outcomes: Heads or Tails Probabilities: Heads with p, Tails with 1 When we have a variable that has two possible outcomes, the probability distribution is called Bernoulli distribution. One probability distribution is Heads 0.5 (1) Tails 0.5 Another probability distribution is Heads 0.1 (2) Tails 0.9 Both (1) and (2) are proper PMFs, as they list up all possible outcomes as well as the associated probabilities. p 19 / 51

Uniform Distribution Let s say a random variable x is distributed continuously from 0 to 10: Outcomes: any number between 0 and 10. As there are infinite number of values between 0 and 10, we cannot list up all values and associated probabilities. Instead, we describe the probability distribution with a graph of PDF. 0.00 0.02 0.04 0.06 0.08 0.10 0.12 2 0 2 4 6 8 10 12 x 20 / 51

Uniform Distribution 0.00 0.02 0.04 0.06 0.08 0.10 0.12 2 0 2 4 6 8 10 12 x With a graph of PDF, we can calculate the probabilities that x takes a certain range of values by calculating the area under the curve. What s the probability that 0 < x < 10? What s the probability that x < 0? What s the probability that x > 10? What s the probability that x < 5? 21 / 51

ormal Distribution Let s say a random variable x is distributed ormally with mean 0 and standard deviation = 1 Outcomes: any number between 1 and 1. Once again, as there are infinite number of values, we cannot list up all values and associated probabilities. PDF for the ormal distribution with mean 0 and standard deviation = 1: 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x 22 / 51

ormal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x The area under the curve represents probabilities: What s the probability that 1 < x < 1? What s the probability that x < 0? What s the probability that x > 0? What s the probability that 1 < x < 1? What s the probability that 2 < x < 2? What s the probability that 3 < x < 3? 23 / 51

ormal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x The probability that 1 < x < 1 is 1. 24 / 51

ormal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x The probability that x < 0 is 0.5. 24 / 51

ormal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x The probability that x > 0 is also 0.5. 24 / 51

ormal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x The probability that 1 < x < 1 is approximately 0.68. 24 / 51

ormal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x The probability that 2 < x < 2 is approximately 0.95. 24 / 51

ormal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x The probability that 3 < x < 3 is approximately 0.99. 24 / 51

ormal Distribution 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 x We can also ask the questions the other way around: What s the interval of x that makes the probability of observing a value as extreme (i.e., further away from 0) as them is equal to 0.05? 25 / 51

2 Distribution Recall: 2 statistic is always positive 2 = 0 if the null hypothesis is true The shape of 2 distribution depends on the degree of freedom (df) 26 / 51

2 Distribution Recall: 2 statistic is always positive 2 = 0 if the null hypothesis is true The shape of 2 distribution depends on the degree of freedom (df) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 df = 1 0 2 4 6 8 10 12 14 Chi squared statistic 26 / 51

2 Distribution Recall: 2 statistic is always positive 2 = 0 if the null hypothesis is true The shape of 2 distribution depends on the degree of freedom (df) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 df = 1 df = 2 0 2 4 6 8 10 12 14 Chi squared statistic 26 / 51

2 Distribution Recall: 2 statistic is always positive 2 = 0 if the null hypothesis is true The shape of 2 distribution depends on the degree of freedom (df) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 df = 1 df = 2 df = 3 0 2 4 6 8 10 12 14 Chi squared statistic 26 / 51

2 Distribution Recall: 2 statistic is always positive 2 = 0 if the null hypothesis is true The shape of 2 distribution depends on the degree of freedom (df) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 df = 1 df = 2 df = 3 df = 5 0 2 4 6 8 10 12 14 Chi squared statistic 26 / 51

2 Distribution Recall: 2 statistic is always positive 2 = 0 if the null hypothesis is true The shape of 2 distribution depends on the degree of freedom (df) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 df = 1 df = 2 df = 3 df = 5 df = 10 0 2 4 6 8 10 12 14 Chi squared statistic 26 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 df = 2 0 2 4 6 8 10 12 14 Chi squared statistic 2 statistic is always 0! probability that 2 0is. 27 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability that chi squared >= 0 0 2 4 6 8 10 12 14 Chi squared statistic Area under the curve for 2 0 represents the probability that 2 0 27 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability that chi squared >= 1 0 2 4 6 8 10 12 14 Chi squared statistic Area under the curve for 2 1 represents the probability that 2 1 28 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability that chi squared >= 2 0 2 4 6 8 10 12 14 Chi squared statistic Area under the curve for 2 2 represents the probability that 2 2 28 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability that chi squared >= 3 0 2 4 6 8 10 12 14 Chi squared statistic Area under the curve for 2 3 represents the probability that 2 3 28 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability that chi squared >= 4 0 2 4 6 8 10 12 14 Chi squared statistic Area under the curve for 2 4 represents the probability that 2 4 28 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability that chi squared >= 5 0 2 4 6 8 10 12 14 Chi squared statistic Area under the curve for 2 5 represents the probability that 2 5 28 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability that chi squared >= 6 0 2 4 6 8 10 12 14 Chi squared statistic Area under the curve for 2 6 represents the probability that 2 6 28 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability that chi squared >= 6 0 2 4 6 8 10 12 14 Chi squared statistic Area under the curve for 2 c 2 = probability that c = p-value we obtain when the 2 statistic is equal to c 29 / 51

Area Under the Curve = Probability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability that chi squared >= 6 0 2 4 6 8 10 12 14 Chi squared statistic Statistical tables (e.g., page 295) tell us the values of 2 at which p-value is equal to certain values tell us the range of p-values for given values of 2 we obtain 30 / 51

t-distribution Let s go back to the example: we obtained t statistic = 5.145. Unlike values. 2 statistic, t-statistic can take both negative and positive Just like 2 distribution, the shape of t distribution depends on the degree of freedom (df). When df is greater than 30, it is almost indistinguishable from ormal distribution with mean 0 and s=1. 31 / 51

t-distribution t distribution with df = 1 0.0 0.1 0.2 0.3 0.4 0.5 df = 1 4 2 0 2 4 t statistic 32 / 51

t-distribution t distribution with df = 1withormal(ingray) 0.0 0.1 0.2 0.3 0.4 0.5 ormal df = 1 4 2 0 2 4 t statistic 32 / 51

t-distribution t distribution with df = 2 0.0 0.1 0.2 0.3 0.4 0.5 ormal df = 2 df = 1 4 2 0 2 4 t statistic 32 / 51

t-distribution t distribution with df = 3 0.0 0.1 0.2 0.3 0.4 0.5 ormal df = 3 df = 2 df = 1 4 2 0 2 4 t statistic 32 / 51

t-distribution t distribution with df = 5 0.0 0.1 0.2 0.3 0.4 0.5 ormal df = 5 df = 3 df = 2 df = 1 4 2 0 2 4 t statistic 32 / 51

t-distribution t distribution with df = 10 0.0 0.1 0.2 0.3 0.4 0.5 ormal df = 10 df = 5 df = 3 df = 2 df = 1 4 2 0 2 4 t statistic 32 / 51

t-distribution t distribution with df = 30 0.0 0.1 0.2 0.3 0.4 0.5 ormal df = 30 df = 10 df = 5 df = 3 df = 2 df = 1 4 2 0 2 4 t statistic 32 / 51

t-distribution t distribution with df = 30 0.0 0.1 0.2 0.3 0.4 0.5 ormal df = 30 df = 1 4 2 0 2 4 t statistic 32 / 51

t-statistic We obtained t statistic = 5.145. Degree of freedom for a difference of means is given as n 1 + n 2 2 = 114 + 66 2 = 178 33 / 51

t-statistic t-distribution with degree of freedom = 178 t-statistic is 0 if the null is true. 0.0 0.1 0.2 0.3 0.4 0.5 6 4 2 0 2 4 6 t statistic 34 / 51

Quiz How do we calculate the probability that t < 1? (How do we calculate the p-value when t statistic is 1?) 0.0 0.1 0.2 0.3 0.4 0.5 6 4 2 0 2 4 6 t statistic 35 / 51

Quiz Area under the curve for t < 1givestheprobabilitythatt < 1. This is the p-value when t statistic is 1. 0.0 0.1 0.2 0.3 0.4 0.5 6 4 2 0 2 4 6 t statistic 35 / 51

Quiz Area under the curve for t < 2givestheprobabilitythatt < 2. This is the p-value when t statistic is 2. 0.0 0.1 0.2 0.3 0.4 0.5 6 4 2 0 2 4 6 t statistic 35 / 51

t-statistic was 5.145 p-value is obtained by calculating the area under curve for t < 5.145 0.0 0.1 0.2 0.3 0.4 0.5 6 4 2 0 2 4 6 t statistic You can t really see in the graph, as it s so tiny. 36 / 51

Reading Statistical Tables Another way to obtain p-value is to refer to the statistical table on page 296. A bit tricky, as the cell entries are represented in absolute values. For example, the condition t < t > 2. 2 means the absolute value of 37 / 51

Reading Statistical Tables Let s say our t-statistics were 2(withdf=100) What is the minimum p-value we can get? What is the maximum level of confidence we can get? 38 / 51

Reading Statistical Tables As our t-statistics is 5.145 (with df = 178) What is the minimum p-value we can get? What is the maximum level of confidence we can get? 39 / 51

Reading Stata Output ------------------------------------------------------------------------------ Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- o 114 14.15965.8859928 9.459815 12.40434 15.91496 Yes 66 22.38939 1.442365 11.71783 19.50879 25.26999 ---------+-------------------------------------------------------------------- combined 180 17.17722.8238416 11.05299 15.55153 18.80291 ---------+-------------------------------------------------------------------- diff -8.229745 1.599568-11.3863-5.073188 ------------------------------------------------------------------------------ diff = mean(o) - mean(yes) t = -5.1450 Ho: diff = 0 degrees of freedom = 178 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr( T > t ) = 0.0000 Pr(T > t) = 1.0000 Our causal theory expects diff to be negative. t-statistic is 5.1450, which yields a p-value < 0.0001. We reject the null hypothesis that there is no difference in mean levels of female representation between PR system and majority system. 40 / 51

Bivariate Hypothesis Test 3: Correlation Analysis Both Y and X are continuous. We follow the same logic and same steps as cross-tabulation analysis and difference of means test: 1 Form the null and alternative 2 Examine and describe the sample 3 Compare the observed and expected 4 Reject or not reject the null 41 / 51

Labour Rights Protection and Unionisation As more labourers join unions, unions can exert greater influence on the government. Therefore, the greater the level of unionisation, the better the protection of labourer rights. Hypothesis There will be a positive relation between levels of unionisation and labour rights protection. Y : Labour rights protection (0 100) X : Levels of unionisation (% of labourers in unions: 0 100) The null hypothesis: there is no relationship between the two 42 / 51

Describe the Relationship in the Sample: Do It Graphically Heritage Fndn rating: labor frdm 20 40 60 80 100 0 20 40 60 80 100 Union density 43 / 51

Describe the Relationship in the Sample: Do It umerically Covariance between X and Y is: cov XY = P n i=1 (X i n X )(Y i Ȳ ) (X i X )(Y i Ȳ ) is positive when (X i X ) and (Y i Ȳ ) are both positive or both negative. (X i X )(Y i Ȳ ) is negative when (X i X ) and (Y i Ȳ ) have different signs. 44 / 51

Covariance cov XY =( X (X i X )(Y i Ȳ ))/n Heritage Fndn rating: labor frdm 20 40 60 80 100 0 20 40 60 80 100 Union density 45 / 51

Covariance cov XY =( X (X i X )(Y i Ȳ ))/n Heritage Fndn rating: labor frdm 20 40 60 80 100 0 20 40 60 80 100 Union density 45 / 51

From Covariance to Correlation Covariance q s 2 X s2 Y < cov XY < q s 2 X s2 Y Correlation coefficient r = cov XY q s 2 X s2 Y 1 < r < 1 t statistic for correlation is t = rp n 2 p 1 r 2 with df = n 2. p-value for correlation can also be obtained. 46 / 51

otes on r r is equal to 1 (and p-value < 0.0001) when all the observations are aligned on a single line with a positive slope. However, greater values of r (or smaller values of p) donotsuggest stronger relationship between X and Y. All of the following three samples generate r = 1. 0.2.4.6.8 1 0.2.4.6.8 1 w 0.2.4.6.8 1 y z 0.2.4.6.8 1 x 0.2.4.6.8 1 x 0.2.4.6.8 1 x 47 / 51

Reading Stata Output free_l~r unions -------------+------------------ free_labor 1.0000 unions 0.1781 1.0000 0.0913 r =0.1781 p-value is 0.0913 The correlation is positive (as expected by our causal theory) and statistically significant at 10% significance level (We reject the null hypothesis at 90% confidence level). We fail to reject the null hypothesis at the conventional 95% confidence level. 48 / 51

Summary Bivariate hypothesis tests Steps Cross-tabular analysis ( 2 test) Difference of means analysis (t test) Correlation analysis (t test) 1 Form the null and alternative hypotheses 2 Describe the pattern: calculate some statistic 3 Compare the obtained statistic with some threshold value 4 When the obtained statistic is greater/smaller than the threshold, we reject the null Establishing a bivariate relationship is only the starting point! 49 / 51