401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Size: px
Start display at page:

Download "401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis."

Transcription

1 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis of one distribution. 2. Graphical comparison of two distributions. 3. One-sample and two-sample hypothesis tests. 4. Confidence intervals for the population mean. 5. Correlation analysis for bivariate data. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 7. Regression analysis parameter estimation, CI s and PI s, hypothesis tests, model building. 8. Sums of squares in regression. 9. Diagnostics and transformations for all of the above. Major Definitions 1. Random Variable: A natural process whose outcome can not be predicted with certainty. 2. Sample Space: The set of all possible outcomes for a particular random variable. 3. Probability: A number between 0 and 1 which describes the likelihood that some event will occur. 4. Distribution: The sample space, along with the probability of observing each point in the sample space. The distribution is a complete description of a random variable. 5. Qualitative/Quantitative (random variable): A random variable whose outcomes are numerical is quantitative (e.g. length, price, time). If this is not the case, the random variable is qualitative (e.g. gender, success, failure). 1

2 6. (Empirical) Cumulative Distribution Function (CDF/ECDF): The CDF function of a random variable X is a function F (t) such that F (t) = P (X t). The ECDF is an estimate ˆF (t) of the CDF based on a sample X 1,..., X n, where ˆF (t) is the proportion of the X i less than or equal to t. 7. Probability Density Function (PDF): The PDF f(t) of a random variable X is a function such that P (a X b) is the area under the graph of f between a and b. 8. Population vs. Sample: The population describes all possible outcomes of an experiment or measurement (possibly infinite). A sample is a list of outcomes actually obtained by carrying out the experiment some finite number of times. The population is fixed, but the sample is random (if you repeat your experiment, you will get different values in your sample). 9. iid sample/srs: iid stands for independent and identically distributed, SRS stands for Simple Random Sample. Both terms refer to a set of measurements that can be viewed as arising independently from a single distribution. In practice, this means that the same instruments and experimental procedures are followed for every data point, and the observations are selected at random. 10. Inference: A statement made about the population based on statistical analysis of a sample from the population. Since the data are random, we can never be certain that an inference is correct. A key goal of statistics is to use the information in the data efficiently so that inferences are correct as often as possible. 11. Prediction: Suppose you observe a sample from a population, and based on this sample you are able to learn something about the distribution. This allows you to make a better guess of a future random value from the distribution than you would have been able to make had you not observed the sample. This guess is called a prediction. 12. Sampling variation: The variation of a statistic due to random variation in the sample used to compute it. 13. Quantile: For a quantitative random variable X, if 0 p 1, Q(p) is the point in the sample space such that P (X Q(p)) = p. 14. Histogram: An estimate of the PDF. 15. Order statistics: Given a sample X 1, X 2,..., X n, the order statistics are the data listed in increasing order. The notation for the i th element of the sorted list (the i th order statistic) is X (i). 16. Resistant (estimator): A resistant estimator is not highly sensitive to changes in the value of a single data point. Specifically, regardless of how much a single data point is changed, a resistant estimator will only change by a bounded amount. 2

3 17. Mean (population/sample): The mean is one way to measure the most typical value of a distribution (a measure of location ). The population mean is a balancing point such that if you multiply each point in the sample space by its probability, the sums of these values to the right and to the left of the mean are equal. The sample mean is simply the average of the data. The population mean may also be called expected value or expectation. 18. Variance, standard deviation: These are measures of scale. The variance is a measure of how far random values tend to be from their mean. Specifically, it is the average squared distance to the mean (actually, not quite the average since you divide by n 1 rather than n). The standard deviation is the square-root of the variance. The sample variance and standard deviation are estimates of the population variance and standard deviation. 19. Median, IQR: The median is the 0.5 quantile. Roughly speaking, the median is the point θ such that half the data are greater than θ, and half the data are less than θ. The IQR is the difference between the 75 th and 25 th percentiles. The median is a resistant measure of location, and the IQR is a resistant measure of scale. 20. Median center: Subtracting the sample median from all data points gives a data set with median 0, but other statistical properties (such as the variance) are unchanged. 21. Standardize: Subtracting the mean from all data points and then dividing each value by the standard deviation yields a set of points with mean 0 and standard deviation 1. In other respects these values resemble the original values. 22. Right/left skew, symmetric (distribution): A right skewed distribution has more atypically large values than atypically small values. A left skewed distribution has more atypically small values than atypically large values. A symmetric distribution has equal tendency to produce atypically large and atypically small values. 23. Thick/thin (tail): A thick tail produces a relatively greater number of extreme values. A thick right tail will produce a greater number of extreme large values, and a thick left tail will produce a greater number of extreme small values (small means close to not close to 0). Similarly, thin tails produce a relatively smaller number of extreme values. A right-skewed distribution has a thicker right tail and a thinner left tail. A left-skewed distribution has a thicker left tail and a thinner right rail. A symmetric distribution has equally-thick right and left tails. 24. QQ (quantile/quantile) plot: A plot of the quantiles of one random variable against those of another. For example, if Q X (0.75) is the 75 th percentile of X and Q Y (0.75) is the 75 th percentile of Y, then the point (Q X (0.75), Q Y (0.75) would be plotted, along with all other points obtained by replacing 0.75 with other numbers between 0 and 1. If the two random variables have the same distribution, a diagonal line results. The further the QQ points fall from the diagonal, the greater the level of difference between the two distributions. Based on a QQ plot, one can determine which of the 3

4 two variables is larger on average, which is more variable, and which has thicker right or left tail. QQ plots are often made with median centered or standardized data, to highlight differences that are not obvious in other analyses. 25. Normal probability plot: A type of QQ plot in which a univariate sample is standardized, then the order statistics are plotted against the corresponding quantiles of the standard normal distribution. It s main application is to assess whether data are normal, since normality is an assumption of many statistical procedures. 26. Translation: The transform that adds a constant to each data point. 27. Scaling: The transform that multiplies each data point by a constant. 28. Invariant: A statistic is invariant to a certain transform if it doesn t change when the transform is applied to the data. For example, the variance is invariant to translations. 29. Power transform: If the data are transformed by a function of the form X q, they have been power transformed. If the exponent q is close to 0, the power transform is very similar to a log transform. 30. Normal distribution: This distribution is often used to describe the variation in data. More importantly, the sample mean and many other statistics have approximately normal distributions even if the underlying data used to form the statistic are not normally distributed. The standard normal distribution has mean zero and variance one. This is the distribution provided in the normal table. To obtain other normal probabilities you must standardize. 31. t-distribution: This family of distributions describes the variation in certain statistics involving the variance, in which the sample variance has been substituted for the population variance. Due to the extra uncertainty in the sample variance, the result is more variable than a normal distribution. 32. Degrees of freedom: This number determines which particular t-distribution is used in a given problem. The larger the degrees of freedom, the closer the resulting t-distribution is to a normal distribution. Degrees of freedom also occur in the F-distribution (which has two degrees of freedom, one for the MSR and one for the MSE). 33. Null and alternative hypotheses: The standard approach to hypothesis testing requires stating two hypotheses that are compared to each other. These hypotheses are not handled symmetrically if the evidence is not overwhelmingly in favor of the alternative hypothesis, the decision is made in favor of the null hypothesis. The hypotheses should be devised so that it is a more serious mistake to erroneously decide in favor of the alternative hypothesis than it is to erroneously decide in favor of the null hypothesis. 34. Test statistic: Given a sample X 1,..., X n from a population about which you wish to make some inference, a test statistic T is a function that compresses the data into a single number that contains all the relevant information for the inference. 4

5 35. Rejection region: The rejection region is the set of all test statistic values that are extreme enough that you may reject the null hypothesis. Typically this would be any test statistic value that has p-value< One-sided, two-sided, right-tailed, left-tailed (tests): If the parameter being tested is θ and the alternative hypothesis is θ > c (right tailed) or θ < c (left tailed) for some constant c, we have a one-sided test. If the alternative hypothesis is θ c, we have a two-sided test. 37. Type I/II error, false negative, false positive: A type I error (false positive) is a decision in favor of the alternative hypothesis when the null hypothesis is true. A type II error (false negative) is a decision in favor of the null hypothesis when the alternative hypothesis is true. In the usual hypothesis testing situation, false positives are worse than false negatives. 38. p-value: The probability of observing a test statistic at least as extreme as the observed test statistic value under the null hypothesis. 39. Power: The power is the probability of rejecting the null hypothesis when the alternative hypothesis is true. To calculate the power, you must know the specific alternative hypothesis (i.e. θ > 0 is not adequate, you must have a specific value like θ = 1), and you must know the p-value at which the null will be rejected (usually 0.05 or 0.01). 40. Effect size: The effect size is the difference between the alternative and null values of the parameter being tested. This term is usually used in the context of power analysis, where one can state the smallest effect size that is detectable at a given power, or the smallest samples size at which a given effect size is detectable at a given power. 41. Point estimate: A numerical estimate of an unknown quantity. For example, the sample mean is a point estimate of the population mean. The point estimate differs from the true value due to random variation in the data. 42. Confidence and prediction intervals: These are intervals that are constructed to cover some unknown quantity with a given probability. CI s are constructed to cover unknown constants (parameters) such as the population mean. PI s are constructed to cover observations that will be made in the future from some distribution. 43. Coverage probability: The actual probability that a CI or PI will cover the value that it is designed to cover. If all assumptions are met, this will be the coverage for which the interval was constructed (i.e. a 95% PI will actually cover 95% of the time). If all assumptions are not met, then the coverage can be lower or higher. 44. Width (of a CI or PI): The difference between the upper and lower bounds of the interval. We prefer CI s and PI s to be as short as possible, since this leads to a more precise statement. 5

6 45. One sample/two sample (hypothesis test): If two iid samples are observed from two possibly different populations, we are analyzing two-sample data. For this course the only analysis is a test of equality of the two population means. If one iid sample is observed from one population, we can test the population mean against a constant (usually zero). 46. Univariate/bivariate data: If one measurement is made per individual being studied, we are performing a univariate analysis. If two such measurements are made, we are performing a bivariate analysis. Be clear about the difference between bivariate data and two-sample data, they are very different things. 47. Scatterplot: A plot of bivariate data in which the (X i, Y i ) values are plotted as points in the plane. 48. Positive/negative trend (association): For bivariate data (X, Y ), if Y tends to increase when X increases (and hence X tends to increase when Y increases), then X and Y are positively associated. If Y tends to decrease when X increases (and hence X tends to decrease when Y increases), then X and Y are negatively associated. If neither relationship consistently holds, then X and Y have no association. 49. Correlation coefficient: A measure of the association between bivariate measurements X and Y. The correlation coefficient always falls between 1 and 1. Positive values indicate positive association, negative values indicate negative association, and values close to zero indicate no association. 50. Covariance: A measure of association between bivariate measurements. The scale of the covariance depends on the scale of the measurements, making it less useful for analysis. The correlation coefficient is a rescaled version of the covariance. 51. Fisher s (Z) transform: A transformation that stretches the correlation coefficient so that instead of falling between 1 and 1, it falls between and. Values close to zero are only slightly changed, but values close to ±1 are substantially changed. The specific form of the Fisher transform is f(r) = log((1 + r)/(1 r))/2, where log is the natural log. The Fisher transform produces a variable that has a normal distribution with mean f(ρ), where ρ is the population correlation coefficient, and the variance is 1/(n 3). It can be used to carry out hypothesis tests and calculate confidence intervals for correlation coefficients. 52. Conditional mean and variance: If (Y, X) are bivariate measurements, E(Y X = x) is a function of x whose value is the average of all Y values paired with X values equal to x. Similarly, var(y X = x) is the variance of all Y values paired with X values equal to x, and SD(Y X = x) is the standard deviation of all Y values paired with X values equal to x. 53. Heteroscedastic/homoscedastic: A heteroscedastic bivariate pair Y, X has the property that SD(Y X = x) varies with x. For a homoscedastic pair, SD(Y X = x) is constant as a function of x. 6

7 54. Simple linear regression: If we have bivariate data, assume E(Y X) = α + βx (i.e. is linear in X), and the data are homoscedastic, we have simple linear regression. 55. Errors: In a regression model, the observed response values differ from the expected response values by a random error term ɛ. The error term always has expected value Fitted values: In any regression model, once we have the parameter estimates, we can estimate the expected response value at each X i, denoted Ŷi, by plugging the parameter estimates into the mean function. For example, in simple linear regression, if we estimate ˆα and ˆβ, the fitted values are Ŷi = ˆα + ˆβX i. 57. Residuals: For any regression model, the residuals are the observed response values Y i minus the fitted response variables Ŷi: r i = Y i Ŷi. 58. Least squares: The process of estimating regression parameters by minimizing the sum of squares of the residuals. 59. Outlier: One of a small number of points that is dramatically different from the trend followed by the remaining points. Specifically, any observation i such that r i is greater than 2 or 2.5 times the IQR of all r i may be considered an outlier. It may be desirable to remove outliers during regression analysis, but they should still be considered as part of the overall analysis. 60. Diagnostic: Any method, especially a graphical method, that is designed to assess whether the assumptions of the linear model are approximately satisfied. The key diagnostics for simple linear regression are the scatterplot of residuals on fitted values (which should have no pattern), and the normal probability plot of the residuals (which should lie approximately on the 45 line). 61. Vector, matrix: A vector is a list of numbers, a matrix is a table of numbers. 62. Dimension (of a vector): The number of entries in a vector. 63. Linear combination: Starting with several vectors of the same dimension, if the vectors are scaled by (possibly different) constants and the resulting vectors are added, the final vector is a linear combination of the original vectors. 64. Dot product: Given two vectors of the same dimension, if corresponding elements are multiplied and the resulting products are summed, a single number results. This is the dot product (also called scalar product or inner product). 65. Perpendicular (orthogonal) vectors: Two vectors of the same dimension with zero dot product are perpendicular. 66. Linearly dependent: A set of vectors of the same dimension is linearly dependent if some linear combination (with at least one nonzero coefficient) of the vectors is zero. If a set of vectors is not linearly dependent, it is linearly independent. 7

8 67. Symmetric (matrix): A matrix A is symmetric if A ij = A ji for all indices i and j. 68. Square (matrix): A matrix is square if it has the same number of rows and columns. Otherwise it is rectangular. A rectangular matrix is tall and thin if it has more rows than columns. The design matrix in a regression problem is always tall and thin. 69. Matrix vector product: For an m n matrix A and a n-dimensional vector B, the matrix vector product AB is a m dimensional vector. One way to form AB is to take the dot product of each row of A with B, and place the results into a vector. A different, equivalent, way to form AB is to construct a linear combination of the columns of A using the elements of B as coefficients. 70. Nullspace (of a matrix): The nullspace of a matrix A is the set of all coefficient vectors B such that AB = 0. The vector B = 0 is always in the nullspace. For some matrices, other nonzero vectors may be in the nullspace as well. If 0 is the only vector in the nullspace, the matrix is nonsingular, otherwise it is singular. A matrix with more columns than rows is always singular. A matrix with equally many, or fewer columns than rows, may be singular or nonsingular. 71. Matrix matrix product: Two matrices A and B may be multiplied to form AB if the number of columns of A is equal to the number of rows of B. If A is m n and B is n r, AB is m r. The i, j element of AB is the dot product between row i of A and column j of B. The matrix matrix products X X and XX always exist. The former is called the column-wise inner product matrix while the latter is called the row-wise inner product matrix. 72. Identity matrix: The identity matrix I is a square m m matrix such that if A has m columns AI = A, and if A has m rows IA = A. 73. Matrix inverse: If A is a square matrix, the inverse of A is a matrix A 1 such that AA 1 = A 1 A = I, where I is the m m identity matrix. The inverse only exists if A is nonsingular. 74. Multiple regression: Data in which a single response measurement Y is paired with one or more predictor variables X j can be analyzed using multiple linear regression. The mean function is E(Y X) = α + β 1 X 1 + β 2 X β p X p, and the data should be homoscedastic. 75. Design matrix: All predictor variable values for all observations in a multiple regression problem can be stored in the design matrix. The first column contains all 1 s, and subsequent columns contain the predictor variable values. Each row contains the data for one observation, and each column contains the data for one predictor variable. 76. Proportion of explained variance (PVE): A number between zero and one, such that large values indicate that the predictor variables do a good job tracking the variation in the response values. The PVE is very interpretable, but has some technical drawbacks it always increases as new variables are added, and it is not easy to do any inference with the PVE. Larger PVE values indicate a better model. 8

9 77. F statistic: The F-statistic is MSR/MSE. Like the PVE, it is larger when the predictor variables do a good job tracking the variation in the response values. It ranges from 0 to and the which values are considered large depends on the degrees of freedom. It is less interpretable than the PVE, but is easy to use in hypothesis testing since tables of the F-distribution are easy to construct. Larger F values indicate a better model. 78. Akaike Information Criterion (AIC): A measure of fit for a regression model that explicitly accounts for the number of variables and the sizes of the residuals. The positive effects of small residuals can be offset by the negative effects of a complex model with many predictor variables. Smaller AIC values indicate a better model. 79. Main effects: For a predictor variable X j, the main effect is the term β j X j which appears in the regression function. 80. Interaction: In a multiple regression model, if the slope for one variable depends on the value of another variable, the two variables interact. The product term X j X k for two interacting variables can be included as a new variable in the regression model to account for this interaction. 81. Polynomial regression: If the relationship between Y and one of the predictor variables X j is not linear, polynomial terms X 2 j, X 3 j, etc. can be included as new variables in the regression model. 82. Full model: A multiple regression model in which main effects are included for every available predictor variable. 83. Forward/backward/all subsets selection: These are three ways to find the best model for a given dataset. The goal is to determine th population model, but like any inferential procedure the correct result will not always be obtained due to random variation in the data. Key scaling properties: 1. Measures of location: The mean and median scale and translate in the same way that the underlying data are scaled or translated. So if the data are translated by c, the mean and the median are translated by c. If the data are scaled by c, the mean and median are scaled by c. 2. Measures of scale: The variance, IQR, and standard deviation are invariant to translations. The IQR and standard deviation scale with the magnitude of the scale factor if the data are scaled by c, the IQR and standard deviation are scaled by c. The variance scales with the square of the scale factor if the data are scaled by c, the variance scales by c Measures of association: The correlation and covariance are invariant to translations in both the X and Y variables. If either the X or Y variable is scaled by c, the correlation is scaled by sgn(c) = c/ c, which is ±1. The covariance scales with the X values and 9

10 Y values separately, so if the X values are scaled by c and the Y values are scaled by d, the covariance is scaled by c d. 4. Slopes: For simple linear regression, if the Y values are scaled by c, ˆβ is scaled by c. If the X values are scaled by c, ˆβ is scaled by 1/c. Key sampling distributions: 1. Sample mean: E X = EX i var( X) = σ 2 /n SD( X) = σ/ n 2. Correlation coefficient ( denotes Fisher transformed value): 3. Simple linear regression slope Er ρ var(r ) 1/(n 3) SD(r ) 1/ n 3 E ˆβ = β var( ˆβ) = 4. Multiple regression slopes σ 2 (n 1)σ 2 X SD( ˆβ) = σ n 1σX E ˆβ j = β j var( ˆβ j ) = [σ 2 (X X) 1 ] jj Variance/Covariance Identities: var(x) = cov(x, X) var(x + Y ) = var(x) + var(y ) + 2cov(X, Y ) var(x Y ) = var(x) + var(y ) 2cov(X, Y ) If X and Y are independent: var(x + Y ) = var(x Y ) = var(x) + var(y ). cov(x, Y + Z) = cov(x, Y ) + cov(x, Z) 10

Linear Algebra V = T = ( 4 3 ).

Linear Algebra V = T = ( 4 3 ). Linear Algebra Vectors A column vector is a list of numbers stored vertically The dimension of a column vector is the number of values in the vector W is a -dimensional column vector and V is a 5-dimensional

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Review of basic probability and statistics

Review of basic probability and statistics Probability: basic definitions Review of basic probability and statistics A random variable is the outcome of a natural process that can not be predicted with certainty. Examples: the maximum temperature

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS THE ROYAL STATISTICAL SOCIETY 008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS The Society provides these solutions to assist candidates preparing for the examinations

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

MATH4427 Notebook 4 Fall Semester 2017/2018

MATH4427 Notebook 4 Fall Semester 2017/2018 MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

2008 Winton. Statistical Testing of RNGs

2008 Winton. Statistical Testing of RNGs 1 Statistical Testing of RNGs Criteria for Randomness For a sequence of numbers to be considered a sequence of randomly acquired numbers, it must have two basic statistical properties: Uniformly distributed

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Accelerated Advanced Algebra. Chapter 1 Patterns and Recursion Homework List and Objectives

Accelerated Advanced Algebra. Chapter 1 Patterns and Recursion Homework List and Objectives Chapter 1 Patterns and Recursion Use recursive formulas for generating arithmetic, geometric, and shifted geometric sequences and be able to identify each type from their equations and graphs Write and

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Statistical Inference

Statistical Inference Statistical Inference Bernhard Klingenberg Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Outline Estimation: Review of concepts

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5) 10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com 12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Y i = η + ɛ i, i = 1,...,n.

Y i = η + ɛ i, i = 1,...,n. Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Regression Analysis: Exploring relationships between variables. Stat 251

Regression Analysis: Exploring relationships between variables. Stat 251 Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Linear Regression for Air Pollution Data

Linear Regression for Air Pollution Data UNIVERSITY OF TEXAS AT SAN ANTONIO Linear Regression for Air Pollution Data Liang Jing April 2008 1 1 GOAL The increasing health problems caused by traffic-related air pollution have caught more and more

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information