Review of Statistics

Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and correlation Central Limit Theorem Hypothesis testing z-test, p-value Simple Linear Regression

Statistical Methods Statistical Methods Descriptive Statistics Inferential Statistics

Descriptive Statistics Involves Collecting Data Presenting Data Characterizing Data Purpose Describe Data 90 80 70 60 50 40 30 20 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North

Inferential Statistics Involves Estimation Hypothesis Testing Population? Purpose Make Decisions About Population Characteristics

Descriptive Statistics

Mean Measure of central tendency Acts as Balance Point Affected by extreme values ( outliers ) Formula: X n Xi i = = = n X + X + + X 1 1 2 n n

Median Measure of central tendency Middle value in ordered sequence If odd n, Middle Value of Sequence If even n, Average of 2 Middle Values Value that splits the distribution into two halves Not Affected by Extreme Values

Median (Example) Raw Data: 17 16 21 18 13 16 12 11 Ordered: 11 12 13 16 16 17 18 21 Position: 1 2 3 4 5 6 7 8 Median = 16 + 16 2 = 16

Mode Measure of Central Tendency Value That Occurs Most Often Not Affected by Extreme Values There May Be Several Modes Raw Data: 17 16 21 18 13 16 12 11 Ordered: 11 12 13 16 16 17 18 21

Sample Variance S 2 = n i = 1 (X X) i n 1 2 n - 1 in denominator! (Use n if population variance) = (X X) + (X X) + + (X X) 1 2 2 n 1 2 2 n

Sample Standard Deviation S = S 2 = n i = 1 (X X ) i n 1 2 = (X X ) + (X X ) + + (X X ) 1 2 2 n 1 2 2 n

Probability

Event, Sample Space Event: one possible outcome Sample space: collection of all the possible events S = { } Probability of an outcome: proportion of times that the outcome occurs in the long run The complement of event A: includes all the events that are not part of the event A: Symbol A Event A { } Complement of A A { }

Properties of an Event 1. Mutually Exclusive Two outcomes that cannot occur at the same time Experiment: Observe gender of one person 2. Collectively Exhaustive One outcome in sample space must occur

Joint Events Joint event: Event that has two or more characteristics means intersection of event (set) A and event (set) B Example: A and B, (A B): Female, Under age 20

Compound Events Union of event A and event B ( A B ): Total area of the two circles A B contains all the outcomes which are part of event (set) A, part of event (set) B or part of both A and B means union of event A and event B

Compound Probability Addition Rule Used to Get Compound Probabilities for Unions of Events P(A OR B) = P(A B) = P(A) + P(B) - P(A B) For Mutually Exclusive Events: P(A OR B) = P(A B) = P(A) + P(B) Mutually Exclusive Events A B

Random variables Random variable numerical summary of a random outcome a function that assigns a numerical value to each simple event in a sample space Discrete or continuous random variables Discrete: only a discrete set of possible values => summarized by probability distribution: list of all possible values of the variables and the probability that each value will occur. Continuous: continuum of possible values => summarized by the probability density function (pdf)

Discrete Probability Distribution 1. List of pairs [ X i, P(X i ) ] X i = Value of Random Variable (Outcome) P(X i ) = Probability Associated with Value 2. Mutually exclusive (no overlap) 3. Collectively exhaustive (nothing left out) 4. 0 P(X i ) 1 5. Σ P(X i ) = 1

Joint Probability Using Contingency Table Event Event B 1 B 2 Total A 1 P(A1 B1) P(A1 B2) P(A 1 ) A 2 P(A2 B1) P(A2 B2) P(A2) Total P(B 1 ) P(B 2 ) 1 Conditional probability: P( AB1) P( A1 B1) PB ( 1) PA ( 2 B1) PB ( ) 1 1 Joint Probability Joint distribution: Marginal Probability Marginal distributions: Conditional distribution:

Contingency Table Example Joint Event: Draw 1 Card. Note Kind, Color Color Type Red Black Total Ace 2/52 2/52 4/52 Non-Ace 24/52 24/52 48/52 Total 26/52 26/52 52/52 P(Ace) P(Red) P(Ace AND Red)

Moments Discrete Case Moment: Summary of a certain aspect of a distribution Mean, Expected Value Mean of Probability Distribution Weighted Average of All Possible Values μ = E(X) = ΣX i P(X i ) Variance Weighted Average Squared Deviation about Mean σ 2 = E[ (X i μ) 2 ] = Σ (X i μ) 2 P(X i )

Statistical Independence When the outcome of one event (B) does not affect the probability of occurrence of another event (A), the events A and B are said to be statistically independent. Example: Toss a coin twice => no causality Condition for independence: Two events A and B are statistically independent if and only if (iff) P(A B) = P(A)

Bayes Theorem and Multiplication Rule Bayes Theorem P(A B) = P(A B) P(B) The difficult part is P(A B) Use above equation to derive P(A B) P(A and B) = P(A B) = P(A)P(B A) = P(B)P(A B) For independent events: P(A and B) = P(A B) = P(A)P(B)

Covariance Measures the joint variability of two random variables N σ XY = (X i μ X )(Y i μ Y )P(X i, Y i ) i=1 Can take any value in the real numbers Depends on units of measurement (e.g., dollars, cents, billions of dollars) Example: positive covariance = y and x are positively related; when y is above its mean, x tends to be above its mean; when y is below its mean, x tends to be below its mean.

Correlation Standardized covariance, takes values in [-1, 1] Does not depend on unit of measurement Correlation coefficient (ρ) formula: ρ = cov( XY ) σ σ X Y = σ σ X XY σ Covariance and correlation measure only linear dependence! Example: Cov(X,Y)=0 Does not necessarily imply that y and x are independent. They may be non-linearly related. But if X and Y are jointly normally distributed, then they are independent. Y

Sum of Two Random Variables Expected Value of the Sum of Two Random Variables E(X + Y) = E(X) +E(Y) Variance of the Sum of Two Random Variables Var (X + Y) = σ 2 = + σ 2 Y + 2σ XY σ 2 X X+Y

Continuous Probability Distributions - Normal Distribution Bell-Shaped, symmetrical Mean, median, mode are equal Infinite range 68% of the data are within 1 standard deviation of the mean 95% of the data are within 2 standard deviations of the mean In early 1800's, German mathematician and physicist Karl Gauss used it to analyze astronomical data, therefore known as Gaussian distribution. f(x) Mean, Median, Mode X

Normal Distribution Probability Density Function f( X) = 1 2 e π σ 1 2 X μ σ 2 f(x) = frequency of random variable X π = 3.14159; e = 2.71828 σ = population standard deviation X = value of random variable (- < X < ) μ = population mean

Effect of Varying Parameters (μ & σ) f(x) B A C X

Normal Distribution Probability Probability is the area under the curve! d Pc ( X d) = f ( x) dx? c f(x) c d X

Infinite Number of Normal Distribution Tables Normal distributions differ by mean & standard deviation. Each distribution would require its own table. f(x) X That s an infinite number!

Standardize the Normal Distribution Normal Distribution σ Z = X μ σ Standardized Normal Distribution σ z = 1 μ X μ Z = 0 Z One table!

Standardizing Example Normal Distribution σ = 10 Z = X μ = 6.2 5 = 0.12 σ 10 Standardized Normal Distribution σ Z = 1 μ = 5 6.2 X μ Z = 0.12 Z

Moments: Mean, Variance (Continuous Case) Mean, Expected Value Mean of probability distribution Weighted average of all possible values Variance μ = E(X) = X f(x) dx - Weighted average squared deviation about mean σ 2 = E[ (X μ) 2 ] = (X- μ) 2 f(x) dx -

Moments: Skewness, Kurtosis ( ) 3 Skewness: E X μ S = 3 Measures asymmetry in distribution σ The larger the absolute size of the skewness, the more asymmetric is distribution. A large positive value indicates a long right tail, and a large negative value indicates a long left tail. A zero value indicates symmetry around the mean. ( μ ) 4 E X Kurtosis: 4 σ Measures thickness of tails of a distribution A kurtosis above three indicates fat tails or leptokurtosis, relative to the normal, i.e. extreme events are more likely to occur. K =

Central Limit Theorem: Basic Idea As sample size gets large (n 30)... sample mean will have a normal distribution. X

Important Continuous Distributions All derived from normal distribution 2 χ distribution: arises from squared normal random variables, t distribution: arises from ratios of normal 2 and χ variables F distribution: arises 2 from ratios of χ variables. 2 χ t distribution (red), normal distribution (blue) distribution F distribution

Fundamentals of Hypothesis Testing

Identifying Hypotheses 1. Question, e.g. test that the population mean is equal to 3 2. State the question statistically (H 0 : μ = 3) 3. State its opposite statistically (H 1 : μ 3) Hypotheses are mutually exclusive & exhaustive Sometimes it is easier to form the alternative hypothesis first. 4. Choose level of significance α Typical values are 0.01, 0.05, 0.10 Rejection region of sampling distribution: the unlikely values of sample statistic if null hypothesis is true

Identifying Hypotheses: Examples 1. Is the population average amount of TV viewing 12 hours? μ = 12 μ 12 H 0 : μ = 12 H 1 : μ 12 2. Is the population average amount of TV viewing different from 12 hours? μ 12 μ = 12 H 0 : μ = 12 H 1 : μ 12

Hypothesis Testing: Basic Idea Sampling Distribution It is unlikely that we would get a sample mean of this value...... if in fact this were the population mean.... Therefore, we reject the null hypothesis that μ = 50. 20 μ = 50 H 0 sample mean

Example: Z-test statistic (σ known) 1. Convert Sample Statistic (e.g., ) to Standardized Z Variable X μx X μ Z = = σ σ x n 2. Compare to Critical Z Values If Z-test statistic falls in critical region, reject H 0 ; Otherwise do not reject H 0 X

p-value Probability of obtaining a test statistic more extreme ( or ) than actual sample value given H 0 is true Smallest value of α for which H 0 can be rejected Used to make rejection decision If p value α, do not reject H 0 If p value < α, reject H 0

One-Tailed Test: Rejection Region H 0 : μ 0 H 1 : μ < 0 H 0 : μ 0 H 1 : μ > 0 Reject H 0 α Reject H 0 α 0 Must be significantly below μ. Z 0 Z Here: Small values don t contradict H 0.

One-Tailed Z Test: Finding Critical Z Values What Is Z Given α = 0.025?.500 -.025.475 0 σ Z = 1 1.96 Z α /2 =.025 Standardized Normal Probability Table (Portion).06 Z.05.07 1.6.4505.4515.4525 1.7.4599.4608.4616 1.8.4678.4686.4693 1.9.4744.4750.4756

Two-Tailed Test: Rejection Regions Sampling Distribution Rejection Region 1/2 α 1 - α Nonrejection Region H 0 : μ = 0 H 1 : μ 0 Level of Confidence Rejection Region 1/2 α Critical Value H 0 Value Critical Value Sample Statistic

t-test, F-test Test statistic may not be normally distributed => z-test not applicable Examples: Variance unknown, but estimated. Hypothesis that the slope of a regression line differs significantly from zero. => t-test Hypothesis that the standard deviations of two normally distributed populations are equal. => F-test

Jarque-Bera test Assesses whether a given sample of data is normally distributed. Aggregates information in the data about both skewness and kurtosis. Test of the hypothesis that S = 0 and K = 3, based on Ŝ and ˆK. Test statistic: T ˆ 2 1 ( ˆ ) 2 JB = S + K 3 6 4 (here T is the number of observations) Under the null hypothesis of independent normallydistributed observations, the Jarque-Bera statistic is 2 distributed in large samples as a χ random variable with 2 degrees of freedom.

Simple Linear Regression

Simple Linear Regression Model y-intercept slope random iid 2 error ε (0, σ ) t y = + x + t 0 1 t t β β ε dependent (response) variable independent (explanatory) variable

Linear Regression Assumptions 1. x is exogenously determined 2. ε t are iid(0,σ 2 ) (iid = independently and identically distributed ) Zero mean Independence of errors (no autocorrelation) Constant variance (homoscedasticity) More things to think about: Normality of ε t (if not satisfied, inference procedures only asymptotically valid) Model specification (e.g. linearity, β 1 constant over time?)

Simple Linear Regression Model y y = β + β x + ε t 0 1 t t ε t = disturbance observed value observed value ( ) E yx = β + β x * * 0 1 x

-- Sample Linear Regression Model y y = b + b x + e i 0 1 i i e i = Random Error y = b + b x i 0 1 i Unsampled Observation Observed Value x

Ordinary Least Squares OLS minimizes sum of squared residuals min ˆ β, ˆ β 0 1 y T t= 1 ( y ˆ ˆ ) t β0 β1xt 2 y = β + β x + ε t 0 1 t t T ( ) 2 2 y yˆ = e t t t t= 1 t= 1 T predicted value e 2 e 4 e 1 e 3 fitted value (in-sample forecast) y = ˆ β + ˆ β x ˆt 0 1 x t

On Thursday: Evaluating the Model 1. Examine variation measures coefficient of determination ( goodness of fit ) standard error of the estimate 2. Analyze residuals e serial correlation 3. Test coefficients for significance β y ˆt = ˆ β + ˆ β x 0 1 t

Random Error Variation 1.Variation of Actual Y from Predicted Y 2. Measured by Standard Error of Estimate Sample Standard Deviation of e Denoted S YX 3. Affects Several Factors Parameter Significance Prediction Accuracy

Measures of Variation in Regression 1.Total Sum of Squares (SST) Measures variation of observed Y i around the mean, Y 2.Explained Variation (SSR) Variation due to relationship between X & Y 3.Unexplained Variation (SSE) Variation due to other factors

Variation Measures Y Y i Unexplained Sum of Squares (Y i - Y ^ i ) 2 Total Sum of Squares (Y i -Y) 2 X i Explained Sum of Squares (Y ^ i -Y) 2 Y = b + b X i Y 0 1 X i

Coefficient of Determination Proportion of Variation Explained by Relationship Between X & Y r 2 Explained Variation SSR = = Total Variation SST = n n b Y + b XY n(y) 0 i 1 i i i = 1 i = 1 n 2 Yi i = 1 n(y) 2 ˆ 2 0 r 2 1

Coefficients of Determination (r2) and Correlation (r) Yr 2 = 1, r = +1 Y r 2 = 1, r = -1 ^ Y^ i = b 0 + b 1 X i X Y i = b 0 + b 1 X i Yr 2 =.8, r = +0.9 Y r 2 = 0, r = 0 X Y^ i = b 0 + b 1 X i X Y^ i = b 0 + b 1 X i X

Standard Error of Estimate 2 2 ) ˆ ( 1 1 1 1 0 2 1 2 = = = = = = n Y X b Y b Y S n Y Y S n i n i n i i i i i YX n i i i YX

Residual Analysis 1.Graphical Analysis of Residuals Plot residuals vs. X i values Residuals mean errors Difference between actual Y i & predicted Y i 2.Purposes Examine functional form (linear vs. non-linear Model) Evaluate violations of assumptions

Test of Slope Coefficient for Significance 1.Tests If There Is a Linear Relationship Between X & Y 2.Hypotheses H 0 : β 1 = 0 (No Linear Relationship) H 1 : β 1 0 (Linear Relationship) 3.Test Statistic b = 1 t β 1 n 2 S b 1 where S = b 1 S YX n X 2 n( X ) 2 i i = 1