Ismor Fischer, 1/11/

Similar documents
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

1 Inferential Methods for Correlation and Regression Analysis

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Regression, Inference, and Model Building

Stat 139 Homework 7 Solutions, Fall 2015

11 Correlation and Regression

This is an introductory course in Analysis of Variance and Design of Experiments.

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Simple Linear Regression

Final Examination Solutions 17/6/2010

Correlation Regression

Lecture 11 Simple Linear Regression

Polynomial Functions and Their Graphs

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Linear Regression Models

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

M1 for method for S xy. M1 for method for at least one of S xx or S yy. A1 for at least one of S xy, S xx, S yy correct. M1 for structure of r

Algebra of Least Squares

Stat 200 -Testing Summary Page 1

Efficient GMM LECTURE 12 GMM II

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Mathematical Notation Math Introduction to Applied Statistics

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Sample Size Determination (Two or More Samples)

Properties and Hypothesis Testing

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Linear Regression Demystified

6 Sample Size Calculations

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

ECON 3150/4150, Spring term Lecture 3

Chapter 6 Principles of Data Reduction

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Topic 9: Sampling Distributions of Estimators

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

University of California, Los Angeles Department of Statistics. Simple regression analysis

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Topic 9: Sampling Distributions of Estimators

10-701/ Machine Learning Mid-term Exam Solution

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Topic 9: Sampling Distributions of Estimators

(all terms are scalars).the minimization is clearer in sum notation:

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Least-Squares Regression

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

9. Simple linear regression G2.1) Show that the vector of residuals e = Y Ŷ has the covariance matrix (I X(X T X) 1 X T )σ 2.

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

a is some real number (called the coefficient) other

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Lesson 11: Simple Linear Regression

Random Variables, Sampling and Estimation

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

UCLA STAT 110B Applied Statistics for Engineering and the Sciences

University of California, Los Angeles Department of Statistics. Hypothesis testing

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

STA6938-Logistic Regression Model

V. Nollau Institute of Mathematical Stochastics, Technical University of Dresden, Germany

Read through these prior to coming to the test and follow them when you take your test.

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Section 14. Simple linear regression.

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Y i n. i=1. = 1 [number of successes] number of successes = n

A statistical method to determine sample size to estimate characteristic value of soil parameters

1 Review of Probability & Statistics

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Simple Linear Regression

Chapter 13, Part A Analysis of Variance and Experimental Design

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Statistics 20: Final Exam Solutions Summer Session 2007

Paired Data and Linear Correlation

Stat 319 Theory of Statistics (2) Exercises

Transcription:

Ismor Fischer, //04 7.4-7.4 Problems. I Problem 4.4/9, it was show that importat relatios exist betwee populatio meas, variaces, ad covariace. Specifically, we have the formulas that appear below left. I. (A) µ X+ = µ X+ µ I. (A) x+ y = x + y σ + = σ + σ + σ X X X s + = s + s + s x y x y xy II. (A) µ X = µ X µ II. (A) x y = x y σ = σ + σ σ X X X s = s + s s x y x y xy I this problem, we verify that these properties are also true for sample meas, variaces, ad covariace i two examples. For data values {x, x,, x } ad {y, y,, y }, recall that: x = y = xi s x = yi s y = ( x x) i ( yi y). Now suppose that each value x i from the first sample is paired with exactly oe correspodig value y i from the secod sample. That is, we have the set of ordered pairs of data {( x, y ), ( x, y ),, ( x, y )}, with sample covariace give by s xy = ( xi x)( yi y). Furthermore, we ca label the pairwise sum x + y as the dataset ( x+ y, x + y,, x + y), ad likewise for the pairwise differece x y. It ca be show (via basic algebra, or Appedix A), that for ay such dataset of ordered pairs, the formulas that appear above right hold. (Note that these formulas geeralize the properties foud i Problem.5/4.) For the followig ordered data pairs, verify that the formulas i I ad II hold. (I R, use mea, var, ad cov.) Also, sketch the scatterplot. x 0 6 8 y 3 3 5 9 Repeat for the followig dataset. Notice that the values of x i ad y i are the same as before, but the correspodece betwee them is differet! x 0 6 8 y 3 9 3 5

Ismor Fischer, //04 7.4-. Expiratio dates that establish the shelf lives of pharmaceutical products are determied from stability data i drug formulatio studies. I order to measure the rate of decompositio of a particular drug, it is stored uder various coditios of temperature, humidity, light itesity, etc., ad assayed for itact drug potecy at FDA-recommeded time itervals of every three moths durig the first year. I this example, the assay (mg) of a certai 500 mg tablet formulatio is determied at time X (moths) uder ambiet storage coditios. X 0 3 6 9 500 490 470 430 350 (a) Graph these data poits (x i, y i ) i a scatterplot, ad calculate the sample correlatio coefficiet r = s xy / s x s y. Classify the correlatio as positive or egative, ad as weak, moderate, or strog. (b) Determie the equatio of the least squares regressio lie for these data poits, ad iclude a 95% cofidece iterval for the slope β. (c) Sketch a graph of this lie o the same set of axes as part (a); also calculate ad plot the fitted respose values y ˆi ad the residuals e i = y i y ˆi o this graph. (d) Complete a ANOVA table for this liear regressio, icludig the F-ratio ad correspodig p-value. (e) Calculate the value of the coefficiet of determiatio r, usig the two followig equivalet ways (ad showig agreemet of your aswers), ad iterpret this quatity as a measure of fit of the regressio lie to the data, i a brief, clear explaatio. via squarig the correlatio coefficiet r = s xy s x s y foud i (a), via the ratio r = SS Regressio SS Total of sums of squares foud i (d). (f) Test the ull hypothesis of o liear associatio betwee X ad, either by usig your aswer i (a) o H 0 : ρ = 0, or equivaletly, by usig your aswers i (b) ad/or (d) o H 0 : β = 0. (g) Calculate a poit estimate of the mea potecy whe X = 6 moths. Judgig from the data, is this realistic? Determie a 95% cofidece iterval for this value. (h) The FDA recommeds that the expiratio date should be defied as that time whe a drug cotais 90% of the labeled potecy. Usig this defiitio, calculate the expiratio date for this tablet formulatio. Judgig from the data, is this realistic? (i) The residual plot of this model shows evidece of a oliear tred. (Check this!) I order to obtai a better regressio model, first apply the liear trasformatios X = X / 3 ad = 50, the try fittig a expoetial curve = α e β X. Use this model to determie the expiratio date. Judgig from the data, is this realistic?

Ismor Fischer, //04 7.4-3 (j) Redo this problem usig the followig R code: # See help(lm) or help(lsfit), ad help(plot.lm) for details. # Compute Correlatio Coefficiet ad Scatterplot X <- c(0, 3, 6, 9, ) <- c(500, 490, 470, 430, 350) cor(x, ) plot(x,, xlab = "X = Moths", ylab = " = Assay (mg)", pch=9) Aswer this. # Least Squares Fit, Regressio Lie Plot, ANOVA F-test reglie <- lm( ~ X) summary(reglie) ablie(reglie, col = "blue") # Exercise: Why does the p-value of 0.0049 appear twice? # Estimate Mea Potecy at 6 Moths ew <- data.frame(x = 6) predict(reglie, ew, iterval = "cofidece") # Residual Plot resids <- roud(resid(reglie), ) plot(reglie, which =, id. = 5, labels.id = resids, pch=9) # Log-Trasformed Liear Regressio Xtilde <- X / 3 tilde = 50 V <- log(tilde) plot(xtilde, V, xlab = "Xtilde", ylab = "l(tilde)", pch=9) reglie.trasf <- lm(v ~ Xtilde) summary(reglie.trasf) ablie(reglie.trasf, col = "red") # Plot Trasformed Model coeffs <- coefficiets(reglie.trasf) scale <- exp(coeffs[]) shape <- coeffs[] hat <- fuctio(x)(50 scale * exp(shape * X / 3)) plot(x,, xlab = "X = Moths", ylab = " = Assay (mg)", pch=9) curve(hat, col = "red", add = TRUE)

Ismor Fischer, //04 7.4-4 3. A Third Trasformatio. Suppose that two cotiuous variables X ad are egatively correlated via the oliear relatio =, for some parameters α ad β. This is α X + β algebraically equivalet to the relatio = αx + β, which ca the be solved via simple liear regressio. Use this reciprocal trasformatio o the data ad correspodig scatterplot below, to sketch a ew scatterplot, ad solve for sample-based estimates of the parameters α ad β. (Hit: Fidig the parameter values i this example should be straightforward, ad ot require ay least squares regressio formulas.) Express the origial respose i terms of X. X 0 3 4 5 X 0 3 4 5 60 30 0 5 0 / 4. For this problem, recall that i simple liear regressio, we have the followig defiitios: sxy b =, MS Err = SS Err s, r = SSReg SSErr SS = SS, SS Tot = ( ) s y, ad S xx = ( ) s x. x Tot r (a) Formally prove that the T-score = for testig the ull hypothesis H 0 : ρ = 0, r b β is equal to the T-score = MS Err Sxx for testig the ull hypothesis H 0 : β = 0. (b) Formally prove that, i simple liear regressio (where df Reg = ), the square of the T-score = b β MS Reg Sxx is equal to the F-ratio = MS Err MS Err for testig the ull hypothesis H 0 : β = 0. Tot

Ismor Fischer, //04 7.4-5 5. I a study of bige eatig disorders amog dieters, the average weights () of a group of overweight wome of similar ages ad lifestyles are measured at the ed of every two moths (X) over a eight moth period. The resultig data values, some accompayig summary statistics, ad the correspodig scatterplot, are show below. X 0 4 6 8 x = 4 s x = 0 00 90 0 80 0 y = 00 s y = 50 (a) Compute the sample covariace s xy betwee the variables X ad. (b) Compute the sample correlatio coefficiet r betwee the variables X ad. Use it to classify the liear correlatio as positive or egative, ad as strog, moderate, or weak. (c) Determie the equatio of the least squares regressio lie for these data. Sketch a graph of this lie o the scatterplot provided above. Please label clearly! (d) Also calculate the fitted respose values ŷ i, ad plot the residuals e i = y i ŷ i, o this same graph. Please label clearly! (e) Calculate the coefficiet of determiatio r, ad iterpret its value i the cotext of evaluatig the fit of this liear model to the sample data. Be as clear as possible. (f) Iterpretatio: Evaluate the overall adequacy of the liear model to these data, usig as much evidece as possible. I particular, refer to at least two formal liear regressio assumptios which may or may ot be satisfied here, ad why.

Ismor Fischer, //04 7.4-6 6. A pharmaceutical compay wishes to evaluate the results of a ew drug assay procedure, performed o = 5 drug samples of differet, but kow potecy X. I a perfect error-free assay, the two sets of values would be idetical, thus resultig i the ideal calibratio lie = X, i.e., = 0 + X. However, experimetal variability geerates the results show below, alog with some accompayig summary statistics: the sample meas, variaces, ad covariace, respectively. X (mg) (mg) 30 40 50 60 70 x = 50 s x = 50 3 39 53 65 7 y = 5 s y = 75 (a) Graph these data poits (x i, y i ) i a scatterplot. s xy = 60 (b) Compute the sample correlatio coefficiet r. Use it to determie whether or ot X ad are liearly correlated; if so, classify as positive or egative, ad as weak, moderate, or strog. (c) Determie the equatio of the least squares regressio lie for these data. Sketch a graph of this lie o the same set of axes as part (a). Also calculate ad plot the fitted respose values ŷ i ad the residuals e i = y i ŷ i, o this same graph. (d) Usig all of this iformatio, complete the followig ANOVA table for this simple liear regressio model. (Hits: SS Total ad df Total ca be obtaied from s y give above; SS Error = residual sum of squares, ad df Error =.) Show all work. Source df SS MS F-ratio p-value Regressio SS Total df Total Error Total (e) Costruct a 95% cofidece iterval for the slope β. (f) Use the p-value i (d) ad the 95% cofidece iterval i (e) to test whether the ull hypothesis H 0 : β = 0 ca be rejected i favor of the alterative H A : β 0, at the α =.05 sigificace level. Iterpret your aswer: What exactly has bee demostrated about ay associatio that might exist betwee X ad? Be precise. (g) Use the 95% cofidece iterval i (e) to test whether the ull hypothesis H 0 : β = ca be rejected i favor of the alterative H A : β, at the α =.05 sigificace level. Iterpret your aswer i cotext: What exactly has bee demostrated about the ew drug assay procedure? Be precise. 7. Refer to the posted Rcode folder for this problem. Please aswer all questios.