STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016

Outline Welcome What is Regression Analysis? Basics Variable Types Expectation and Variance Basic Probability Rules Probability Distributions Binomial Distribution Normal Distribution Sampling Distributions Statistical Inference Estimation Example: Confidence Interval for µ Hypothesis Testing Example: Hypothesis Testing on µ

Welcome Message and Reminders Welcome Message and Reminders Self-introduction Class Website: https://sites.google.com/site/xgsu00/stat4385 Review syllabus Statistical Computing R: http://cran.us.r-project.org/ Questions and concerns

What is Regression Analysis? Regression Analysis in General Regression analysis refers to a set of statistical methods and procedures that are designed to model the functional association or relationships between one (or several) variables (often called response, target, or dependent variables) and another groups of variables (predictors, independent variables). Examples: The sale price of a house vs. selected physical characteristics (e.g., square footage, listed price, location, etc.) cigarette consumption vs. age, education, income, and price of cigas

What is Regression Analysis? The Set-Up The data {(y i, x i1,..., x ip ) : i = 1,..., n; j = 1,..., p} consist of n i.i.d. copies of variables (Y ; X 1,..., X p ). The variable Y is called the dependent variable, response, target, endpoint, or outcome variable, depending on the application setting. Variables Xj s (for j = 1,..., p) are called the independent variables, or predictors, inputs, attributes, or features. Functional Model General model (assuming Y is continuous) where ε is an error term. Y = f (X 1, X 2,..., X p ) + ε, Linear model Y = β0 + β1x1 + + βpxp + ε, where {β 0, β 1,..., β p } are regression coefficients or parameters.

What is Regression Analysis? Example: The Real Estate Data Suppose a property appraiser wants to model the relationship between the sale price (Y ) of a residential property in a mid-size city and the following three predictors: 1. X 1 Appraised land value (in dollars) 2. X 2 Appraised improvements (in dollars) 3. X 3 Area (square feet)

What is Regression Analysis? Example: The Data Sale Price Land Improvements Area Property (y) Value (x 1 ) Value (x 2 ) x 3 1 68900 5960 44967 1873 2 48500 9000 27860 928 3 55500 9500 31439 1126 4 62000 10000 39592 1265 5 116500 18000 72827 2214 6 45000 8500 27317 912 7 38000 8000 29856 899 8 83000 23000 47752 1803 9 59000 8100 39117 1204 10 47500 9000 29349 1725 11 40500 7300 40166 1080 12 40000 8000 31679 1529 13 97000 20000 58510 2455 14 45500 8000 23454 1151 15 40900 8000 20897 1173 16 80000 10500 56248 1960 17 56000 4000 20859 1344 18 37000 4500 22610 988 19 50000 3400 35948 1076 20 22400 1500 5779 962

What is Regression Analysis? The Data Layout in General In general, the data available consist of {(y i, x i1,..., x ip ) : i = 1,..., n}, where n is the sample size. ID Y X 1 X 2 X p 1 y 1 x 11 x 12 X 1p 2 y 2 x 21 x 22 X 2p...... n y n x n1 x n2 X np

What is Regression Analysis? Purposes of Regression We want a functional form that models the relationship between Y and X j s. Two practical purposes of regression: Predict Y via X j s for prediction or forecasting purposes, e.g., stock market data Study the relationship between response and predictors, e.g., clinical trial data

What is Regression Analysis? Types of Regression Several different ways of categorizing regression models exist. Approach: parametric vs. non-parametric Functional form: linear vs. nonlinear Number of Y s: univariate regression vs. multivariate regression Number of X s: simple vs. multiple regression Measurement types of Y : logistic regression, log-linear regression, survival analysis, etc. Dependence structures: longitudinal data analysis, time series, spatial statistics, etc. In the course, attention is confined to single continuous Y, one or multiple X of mixed types, linear models. We will study simple/multiple linear regression models.

What is Regression Analysis? Steps in Regression Analysis Data Collection: statement of problem or research, selection of potentially relevant variables; question Exploratory Data Analysis (EDA): numerical measures and graphical tools for describing and summarizing data and associations; Model Specification: model form, model assumptions Model Fitting: estimation of parameters involved in the model and statistical inference; Model Selection: Model Diagnostics: detection; variable selection techniques; model assumption check and outlier Model Validation and Deployment: Using the final model to answer the initial scientific question.

Basics Variable Types Types of Variables

Basics Expectation and Variance Expectation and Variance: Simple Facts If X and Y are random variables and a and b are constants, then (i) E(X + Y ) = E(X ) + E(Y ) (ii) E(aX ) = a E(X ) (iii) var(x + Y ) = var(x ) + var(y ) + 2cov(X, Y ) (iv) var(ax ) = a 2 var(x ) (v) cov(ax, by ) = ab cov(x, Y ) (vi) cov(x, X ) = var(x )

Basics Expectation and Variance Expectation and Variance of Linear Combinations If X 1, X 2,..., X n are n random variables and a 1,..., a n are constants, then ( n ) n E a i X i = a i E(X i ) i=1 ( n ) var a i X i i=1 = i=1 n ai 2 var(x i ) + i=1 n i i =1 a i a i cov(x i, X i ). Example: Given X 1,, X n IID (µ, σ 2 ), let X = n i=1 X i/n denote their average. Verify that E( X ) = µ and var( X ) = σ 2 /n.

Basics Basic Probability Rules Basic Probability Rules (i) Pr( ) = 0 (ii) For any event A, 0 Pr(A) 1 (iii) If A B then Pr(A) Pr(b) (iv) Pr(A B) = Pr(A) + Pr(B) Pr(A B) (v) If A and B are mutually disjoint, i.e., A B =, then Pr(A B) = Pr(A) + Pr(B). (vi) For any event A, let A c or Ā denote its complement. Then Pr(A c ) = 1 Pr(A)

Basics Basic Probability Rules Basic Probability Rules (vii) Given events B 1, B 2,..., B n mutually exclusive and n i=1 B i = Ω (i.e., {B i } n i=1 form a partition of the probability space), then, for any event A, Pr(A) = n Pr(A B i ) i=1 (viii) For any events {A, B, C}, it follows that Pr(A B C) = Pr(A) + Pr(B) + Pr(C) Pr(A B) Pr(A C) Pr(B C) + Pr(A B C)

Basics Basic Probability Rules Bayes Theorem Conditional Probability: Pr(A B) = Pr(A B)/ Pr(B). For any events A and B, Pr(B A) = = Pr(A B) Pr(B) Pr(A) Pr(A B) Pr(B) Pr(A B) Pr(B) + Pr(A B c ) Pr(B c ) Two events A and B are independent if Pr(A B) = Pr(A) Pr(B) or, equivalently, Pr(A B) = Pr(A). Given that {B i } n i=1 forms a partition of Ω, then Pr(B i A) = P(A B i ) P(B i ) n i=1 Pr(A B i) Pr(B i )

Probability Distributions Probability Distributions Discrete Bernoulli Trial Binomial distribution Poisson distribution and etc. Continuous Uniform Normal and Multivariate Normal Sampling Distributions: Distribution of sample statistics, e.g., χ 2 (ν), t(ν), and F (ν 1, ν 2 )

Probability Distributions Binomial Distribution Binomial Distribution A Typical Example: tossing a (fair) coin for 100 times and record the number of heads obtained. Definition An experiment consists of n independent and identical trials, each trial having two possible outcomes success or failure with Pr(getting a success ) = p. Let X denote the total number of successes obtained. Then X is said to follow Binomial(n, p).

Probability Distributions Binomial Distribution Binomial Distribution: Facts Given X Binomial(n, p), it follows that E(X ) = np and var(x ) = np(1 p) Possible values of X are 0, 1, 2,..., n Probability distribution function ( ) n Pr(X = k) = p k (1 p) n k with k for k = 0, 1,..., n. ( ) n = k n! k!(n k)! X /n can be viewed as the sample average of n IID Bernoulli trials so that the central limit theorem (CLT) applies.

Probability Distributions Binomial Distribution An Example Consider an experiment of rolling a fair six-sided die 20 times. The probability p of rolling a six on any roll is 1/6. the count X of sixes has a Binomial(20, 1/6) distribution. The mean of this distribution is 20/6 = 3.33, and the variance is 20 1/6 5/6 = 100/36 = 2.78. The mean of the proportion of sixes in the 20 rolls (X /20) is equal to p = 1/6 = 0.167, and the variance of the proportion is equal to (1/6 5/6)/20 = 0.007. Find the probability of obtaining at most 2 sixes.

Probability Distributions Normal Distribution Normal Distribution A random variable X is said to follow N(µ, σ 2 ) if it has a density function as f X (x) = 1 { σ 2π exp 1 2 (x µ) 2 σ 2 }. Normal density curve is symmetrical, centered about its mean µ, with its spread determined by its standard deviation σ.

Probability Distributions Normal Distribution Facts Given that X N(µ, σ 2 ), it follows that a + bx N(a + bµ, b 2 σ 2 ) In general, a linear combination of normal variables also follows a normal distribution. Standard Normal Distribution Z N(0, 1). We have X µ σ N(0, 1) and µ + σ Z N(µ, σ 2 ).

Probability Distributions Normal Distribution Sample Average Given a random sample {X 1,..., X n } taken from a population P with mean µ and variance σ 2. Let X = n i=1 X i/n denote the sample average. If P is normal, then X N(µ, σ 2 /n) exactly no matter how small n is. (Central Limit Theorem) When n is large, one has, by CLT, X N(µ, σ 2 /n) approximately no matter what distribution P has.

Probability Distributions Normal Distribution Exercises on Normal Distributions Know how to find normal probabilities using Tables Standardization Use probability rules Normal approximation to binomial distribution is an application of CLT. Given X Binomial(n, p), X N{np, np(1 p)} approximately when n is large.

Probability Distributions Normal Distribution Example on Normal Distributions and Sampling Distribution Suppose that the weights of milk bottles is normally distributed with a mean of 1.1 lbs and a standard deviation (σ)=0.20. (note that normality is obviously an assumption). Let X denote the weight of a randomly selected milk bottle. What is the distribution of X? Solution: X {µ = 1.1, σ 2 = 0.20 2 }. What is the probability that a randomly selected milk bottle will be greater than 0.99 lbs? Solution: Pr(X > 0.99) = Pr ( Z > ) 0.99 1.1 = Pr(Z > 0.55) 0.20 = 1 Pr(Z 0.55) = 1 0.2912 = 0.7088

Probability Distributions Normal Distribution Consider a random sample of 5 milk bottles. Let X denote their average weight. What is the distribution of X and why? Solution: X {µ = 1.1, σ 2 /n = 0.20 2 /5 = 0.008}. The sample average from a normal population is always normally distributed, no matter how small the sample size is. What is the probability that the mean/average weight of a random sample of 5 bottles will be greater than 0.99 lbs? Solution: Pr( X > 0.99) = ( ) 0.99 1.1 Pr Z > 0.20/ = Pr(Z > 1.23) 5 = 1 Pr(Z 1.23) = 1 0.1093 = 0.8907

Probability Distributions Normal Distribution Multivariate Normal Distribution Definition A random vector X = (X 1,..., X n ) T R n is said to have a multivariate normal (or Gaussian) distribution when mean µ = (µ i ) R n and covariance matrix Σ = (σ ii ) 0 (meaning positive semidefinite), denote it as X N (µ, Σ), if its probability density function is given by density X1 X2 { } f (X; µ, Σ) = (2π) n/2 Σ 1/2 exp (x µ)t Σ 1 (x µ) 2.

Probability Distributions Normal Distribution Bivariate Normal Distribution Illustration

Probability Distributions Normal Distribution Properties of Multivariate Normal Distributions Given X N (µ, Σ), E(X) = µ and cov(x) = Σ Each component X i N (µ i, σ ii ) marginally. AX N (Aµ, AΣA T ) for constant matrix A of appropriate dimension. This implies that linear combinations of any components of X are normally distributed. For multivariate normal random variables, zero covariance (or correlation) implies independence.

Probability Distributions Sampling Distributions Sampling Distribution Sampling Distribution is the probability distribution of a sample statistic. A statistic is a numerical summary of sample data such as a sample proportion or sample mean. The sample statistic is used to estimate or infer about parameter. With random sampling, the sampling distribution provides probabilities for all the possible values of the statistic. The sampling distribution provides the key for telling us how close a sample statistic falls to the corresponding unknown parameter. Its standard deviation is called the standard error.

Probability Distributions Sampling Distributions Sampling Distribution of the Sample Mean X

Probability Distributions Sampling Distributions χ 2 (ν) Distribution Definition Given i.i.d. standard normal variables {Z 1,..., Z ν }, χ 2 = ν i=1 Z i 2 follows a (central) chi-squared distribution with ν degrees of freedom. Facts: Given X χ 2 (ν), X Gamma(ν/2, 1/2) E(X ) = ν and var(x ) = 2ν.

Probability Distributions Sampling Distributions t(ν) Distribution Definition Given that Z N(0, 1) and χ 2 χ 2 (ν) are independent, let t = Z follows a χ 2 /ν (central) t distribution with ν degrees of freedom. Facts: Given t t(ν) with ν > 0, E(t) = 0 for ν > 1 and var(t) = ν/(ν 2) > 1 for ν > 2 t density is also bell-shaped yet with more spread than N(0, 1). As ν, t approximates N(0, 1).

Probability Distributions Sampling Distributions F (ν 1, ν 2 ) Distribution Definition Given that χ 2 1 χ2 (ν 1 ) and χ 2 2 χ2 (ν 2 ) are independent, let F = χ2 1 /ν 1 χ 2 2 /ν 2 follows a (central) F distribution with numerator df ν 1 and denominator df ν 2. Facts: Given F F (ν 1, ν 2 ), it follows that t 2 (ν) = F (1, ν) F (ν 2, ν 1 ) = 1/F (ν 1, ν 2 )

Probability Distributions Sampling Distributions Relevant R Functions R has built-in functions for computing the density (d), probability (p), and quantile (q) for common distributions, as well as for simulating random data (r). Use R Help facilities to find out the details, e.g., help(rchisq). Example: pnorm(q, mean = 0, sd = 1, lower.tail = TRUE) qf(p, df1, df2, ncp, lower.tail = TRUE)

Statistical Inference Statistical Inference Statistical Inference is the process of drawing conclusions about the population based on analysis of sampled data. Two forms of inferences Estimation: point estimation and interval estimation. Hypothesis Testing Types of Inferences Parametric, nonparametric, semiparametric Inferences Bayesian vs. Frequentist Inferences

Statistical Inference Estimation Statistical Inference: Estimation Given a parameter θ, want to find the best estimate or best guess ˆθ based on sample data. Criteria for defining best include unbiasedness, minimum variance, minimum mean squared error,... The probability distribution of ˆθ is needed for reliability assessment. Confidence Intervals (CI): looking for an interval (L, U) such as Pr(L < θ < U) = (1 α) 100%.

Statistical Inference Estimation One-Sample Inference on Mean µ Suppose that a random sample {X 1, X 2,..., X n } of size n is taken from a population that has mean µ and variance σ 2. Want to infer about µ. Point Estimate: the sample average X = n i=1 X i/n. E( X ) = µ and var( X ) = σ 2 /n. Sampling Distribution When the original population is normal, X N(µ, σ 2 /n) exactly no matter how small or large the sample size n is. When n is large, X N(µ, σ 2 /n) approximately no matter what distribution the original population has.

Statistical Inference Estimation Confidence Interval for µ To remove the unknown nuisance parameter σ 2, note ( X µ)/ } σ 2 /n N(0, 1) (n 1)ˆσ 2 /σ 2 χ 2 independent = (n 1) t = X µ ˆσ 2 /n t(n 1). (1 α) 100% Confidence Interval for µ X ± t (n 1) ˆσ 1 α/2, n where t (n 1) 1 α/2 denotes the (1 α/2) 100% percentile of t distribution with DF (n 1).

Statistical Inference Estimation Example: Confidence Interval for µ We randomly select and measure the contents of 15 bottles of cough syrup. The results (in fluid ounces) are shown below: 4.211 4.246 4.269 4.241 4.260 4.293 4.189 4.248 4.220 4.239 4.253 4.209 4.300 4.256 4.290 Construct a 95% CI for the mean content of cough syrup. Solution: 4.248 ± 2.145.001032495/15, which leads to (4.2305, 4.2661).

Statistical Inference Hypothesis Testing Statistical Inference: Hypothesis Testing In statistics, a hypothesis is a statement about the population. The null hypothesis H0 The alternative (research) hypothesis H a Hypothesis testing is a statistical procedure used to makes decisions on the validity of a hypothesis based on analysis of sample data.

Statistical Inference Hypothesis Testing Concepts in Hypothesis Testing True State (Population) Decision (Sample) True H 0 False H 0 Fail to reject H 0 Correct (1 α) Type II error (β) Reject H 0 Type I error (α) Correct (1 β) Which of Type I or II error is more severe? The size is the probability of making type I error. The significance level α is the maximum size tolerable, i.e., α = max Pr (reject H 0 H 0 ). The power is the probability of rejecting the null H 0 if H 0 is false in reality, i.e., power = Pr (reject H 0 H a ).

Statistical Inference Hypothesis Testing Steps in Hypothesis Testing 1. State the null and alternative hypotheses: H 0 and H a ; 2. Compute the value of the observed test statistics T ; The value of T should be sensitive to whether data support H 0 or H a ; The probability distributions of T under H0 and H a are available. 3. Find the decision/rejection region; Critical values, e.g., reject H 0 if T is greater than some threshold. Reject H0 whenever P-value < α 4. Make conclusions and interpret within application context.

Statistical Inference Hypothesis Testing The P-Value The p-value is the probability of obtaining a value that is as extreme as or more unusual than the actually observed test statistics value, assuming that the null hypothesis H 0 is true. Reject H 0 whenever the p-value is smaller than the significance level α. Common Misconceptions The p-value is NOT the probability that the null hypothesis is true; NOR is it the probability that H a is false; NOR the probability of falsely rejecting the null hypothesis. A small p-value is evidence against the null hypothesis while a large p-value means little or no evidence against H 0. Note that little or no evidence against the null hypothesis is not the same as a lot of evidence for the null hypothesis.

Statistical Inference Hypothesis Testing Estimation vs. Hypothesis Testing The two types of inferences are equivalent analytically One may make a decision in hypothesis testing problems by looking at the appropriate CI; on the other hand, CI can be derived by inverting the hypothesis testing procedure. Nevertheless, CIs are sometimes preferable to hypothesis testing as a set of confidence intervals tells the range with which the parameter is likely to fall while the hypothesis testing only tells you whether the parameter is likely to be the same as a pre-specified or hypothesized value.

Statistical Inference Hypothesis Testing One-Sample Inference: Hypothesis Testing on µ Hypothesis The Null H0 : µ = µ 0 H a : µ µ 0 (two-sided); The Alternative H a : µ > µ 0 (upper-sided); H a : µ < µ 0 (lower-sided); where µ 0 is the hypothesized value. Test Statistic: t obs = X µ 0 ˆσ 2 /n t(n 1) only when H 0 : µ = µ 0 is true.

Statistical Inference Hypothesis Testing Rejection Rule: Reject H 0 at the significance level α if tobs (n 1) > t 1 α/2 (two-sided); t obs > t (n 1) 1 α (upper-sided); t obs > t α (n 1) = t (n 1) 1 α (lower-sided). Compute the associated P-Value 2 Pr ( t (n 1) > ) t obs (two-sided); P-Value = Pr ( t (n 1) ) > t obs (upper-sided); Pr ( t (n 1) ) < t obs (lower-sided).

Statistical Inference Hypothesis Testing Equivalence of Significance Testing and CI Consider the two-sided test of H 0 : µ = µ 0 vs. H a : µ µ 0. We reject H 0 at α when t obs > t (n 1) 1 α/2 > t (n 1) X µ 0 ˆσ/ n 1 α/2 µ 0 > X + t (n 1) 1 α/2 ˆσ/ n µ 0 < X t (n 1) 1 α/2 ˆσ/ n, Namely, we reject H 0 when µ 0 falls outside of the (1 α) 100% CI for µ. or

Statistical Inference Hypothesis Testing Consider the upper-sided test of H 0 : µ = µ 0 vs. H a : µ > µ 0. We reject H 0 when t obs > t (n 1) 1 α X µ 0 ˆσ/ n > t(n 1) 1 α or equivalently, µ 0 < X t (n 1) 1 α ˆσ/ n. Namely, reject H 0 at significance level α when µ 0 is smaller than the lower bound of the (1 2a) 100% CI for µ. Similarly, we reject H 0 at α in testing H 0 : µ = µ 0 vs. H a : µ < µ 0, when µ 0 is greater than the upper bound of the (1 2α) 100% CI for µ.

Statistical Inference Hypothesis Testing Example The average sleep time (µ) is supposed to be 8 hours a day. We think college students sleep a different amount, maybe more maybe less. We survey ten students to see how much they sleep. The data are as follows (each cell represents a student): 6 5 4 3 7 5 5 5 6 6 R output from Function t.test() N Mean Std. Deviation SLEEP 10 5.2 1.1353 t df Sig. (2-tailed) SLEEP -7.799 9 <.0001

Statistical Inference Hypothesis Testing Example (Continued) It can be found that t (9) 0.05 = 1.833 and t(9) 0.975 = 2.262. The sample mean is 5.2. Compared against µ 0 = 8, the difference is 2.8 hours. The difference is significant as the observed t value is so big and the probability of this being chance is so low. Even when the sample size n is large, one can always use the t test as the t distribution with large d.f. gets close to standard normal anyways.

Statistical Inference Hypothesis Testing Discussion Thanks! Questions?