THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5

Similar documents
Financial Econometrics Prof. Massimo Guidolin

5.1 We will begin this section with the definition of a rational expression. We

Combining functions: algebraic methods

Numerical Differentiation

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

How to Find the Derivative of a Function: Calculus 1

A = h w (1) Error Analysis Physics 141

2.8 The Derivative as a Function

The derivative function

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

Polynomials 3: Powers of x 0 + h

MVT and Rolle s Theorem

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

REVIEW LAB ANSWER KEY

MAT 145. Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points

Material for Difference Quotient

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Continuity and Differentiability Worksheet

Ratio estimation using stratified ranked set sample

1watt=1W=1kg m 2 /s 3

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if

Handling Missing Data on Asymmetric Distribution

Cubic Functions: Local Analysis

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

lecture 26: Richardson extrapolation

Differential Calculus (The basics) Prepared by Mr. C. Hull

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

3.4 Worksheet: Proof of the Chain Rule NAME

Derivatives. By: OpenStaxCollege

Average Rate of Change

HOMEWORK HELP 2 FOR MATH 151

Basic Nonparametric Estimation Spring 2002

Math 1241 Calculus Test 1

Applied Linear Statistical Models. Simultaneous Inference Topics. Simultaneous Estimation of β 0 and β 1 Issues. Simultaneous Inference. Dr.

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

MATH 1020 Answer Key TEST 2 VERSION B Fall Printed Name: Section #: Instructor:

Differentiation in higher dimensions

2.11 That s So Derivative

ALGEBRA AND TRIGONOMETRY REVIEW by Dr TEBOU, FIU. A. Fundamental identities Throughout this section, a and b denotes arbitrary real numbers.

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line

Exam 1 Review Solutions

Higher Derivatives. Differentiable Functions

Solution for the Homework 4

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

Polynomial Interpolation

Quantum Mechanics Chapter 1.5: An illustration using measurements of particle spin.

Pre-Calculus Review Preemptive Strike

Math 312 Lecture Notes Modeling

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

Polynomial Interpolation

Regularized Regression

Time (hours) Morphine sulfate (mg)

Chapter 2 Describing Change: Rates

Exercises for numerical differentiation. Øyvind Ryan

The Priestley-Chao Estimator

Copyright c 2008 Kevin Long

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x)

New families of estimators and test statistics in log-linear models

The Verlet Algorithm for Molecular Dynamics Simulations

Lab 6 Derivatives and Mutant Bacteria

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households

Chapter 1 Functions and Graphs. Section 1.5 = = = 4. Check Point Exercises The slope of the line y = 3x+ 1 is 3.

2.3 Product and Quotient Rules

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

The Laws of Thermodynamics

Quaternion Dynamics, Part 1 Functions, Derivatives, and Integrals. Gary D. Simpson. rev 01 Aug 08, 2016.

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).

Taylor Series and the Mean Value Theorem of Derivatives

Printed Name: Section #: Instructor:

3. Using your answers to the two previous questions, evaluate the Mratio

3.1 Extreme Values of a Function

The structure of the atoms

Excerpt from "Calculus" 2013 AoPS Inc.

Exercises Copyright Houghton Mifflin Company. All rights reserved. EXERCISES {x 0 x < 6} 3. {x x 2} 2

Mathematics 105 Calculus I. Exam 1. February 13, Solution Guide

MTH 119 Pre Calculus I Essex County College Division of Mathematics Sample Review Questions 1 Created April 17, 2007

INTRODUCTION AND MATHEMATICAL CONCEPTS

Practice Problem Solutions: Exam 1

Solutions to the Multivariable Calculus and Linear Algebra problems on the Comprehensive Examination of January 31, 2014

1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible.

Printed Name: Section #: Instructor:

The Laplace equation, cylindrically or spherically symmetric case

232 Calculus and Structures

Printed Name: Section #: Instructor:

MATH 1020 TEST 2 VERSION A FALL 2014 ANSWER KEY. Printed Name: Section #: Instructor:

MA455 Manifolds Solutions 1 May 2008

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

MTH-112 Quiz 1 Name: # :

Calculus I Practice Exam 1A

INTRODUCTION AND MATHEMATICAL CONCEPTS

Modelling evolution in structured populations involving multiplayer interactions

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

Introduction to Derivatives

The total error in numerical differentiation

Transcription:

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION NEW MODULAR SCHEME introduced from te examinations in 009 MODULE 5 SOLUTIONS FOR SPECIMEN PAPER B THE QUESTIONS ARE CONTAINED IN A SEPARATE FILE Te time for te examination is 3 ours. Te paper contains eigt questions, of wic candidates are to attempt five. Eac question carries 0 marks. An indicative mark sceme is sown witin te questions, by giving an outline of te marks available for eac part-question. Te pass mark for te paper as a wole is 50%. Te solutions sould not be seen as "model answers". Rater, tey ave been written out in considerable detail and are intended as learning aids. For tis reason, tey do not carry mark scemes. Please note tat in many cases tere are valid alternative metods and tat, in cases were discussion is called for, tere may be oter valid points tat could be made. Wile every care as been taken wit te preparation of te questions and solutions, te Society will not be responsible for any errors or omissions. Te Society will not enter into any correspondence in respect of te questions or solutions. Note. In accordance wit te convention used in all te Society's examination papers, te notation log denotes logaritm to base e. Logaritms to any oter base are explicitly identified, e.g. log 0. RSS 008

Graduate Diploma Module 5, Specimen Paper B. Question (i) Te original variables, te answers to te questions, are likely to be igly correlated. Principal component analysis (PCA) gives linear combinations of te variables tat are uncorrelated. Te first PC accounts for te largest amount of variation in te data, te second for te next largest, and so on. If te questions form temselves into relatively distinct clusters ten PCs are useful to define subsets, and possibly to suggest ways of combining scores. PCs are only strictly valid for numeric data, but te data ere are nearer to being categorical at best ordinal. However, PCA is often used for data suc as tese. (ii) A cluster analysis could be useful, using correlations (or absolute values of tem); peraps indications of te grouping of questions would be given. (iii) PCA only works on complete records. If a respondent's answer to one question is missing, tat wole set of responses will be omitted. Because PCA is based on analysis of variability in data, missing values cannot easily be imputed. Te coice in tis case is between analysing a large number of responses on a small number of questions and a small number of responses on a large number of questions. Te strategy proposed seems sensible. (iv) Te first tree eigenvalues add to 5.4, i.e. 5.4/6 or 85.7% of te total variation, and sould be enoug. Te first PC (54% of total variation) is an overall score of concern about cost note tat te "direction" of questions, 3, 4 is opposite to tat of, 5, 6. Te second PC (3% of total variation) measures te tendency of respondents to answer all questions in te same way, i.e. wit similar scores. Te tird PC (9% of total variation and so relatively muc less important) is dominated by question 4, peraps contrasting its answers wit tose for question, peraps also taking question 5 into account. Te first two PCs terefore give most of te useful, easily understood, information. (v) Te two unsatisfactory features of te data are te large amount of missing information, leading to 9 of te 5 questions being discarded, and te suggestion from te second PC tat te respondents do not complete te form validly. Hence tese results are not reliable. A fres start is needed, wit reworded questions and boxes to tick as in a survey.

Graduate Diploma Module 5, Specimen Paper B. Question (i) Linear discriminant analysis can be used to produce a classification rule were te groups are known a priori, and data are described by several variables. Linear combinations of tese variables x i can sow up relations not obvious from separate, univariate, analyses. Classifications so found can be applied to te new sites. (ii) Multivariate Normal variance-covariance matrices are required to be equal for eac group (but locations will be different). Tis is not easy to ceck; altoug formal tests exist, tey are sensitive to non-normality. Also, relatively small sample sizes do not elp. Univariate Normality for eac measurement can be cecked in te usual ways (e.g. istograms, stem-and-leaf plots, Normal probability plots); univariate Normality is a necessary but not sufficient condition for multivariate Normality. (ii) Te variance-covariance matrices are apparently different, wit canges in sign as well as size of individual entries. Normality cannot be cecked on te information given. Te means of x and x 4 (and possibly x 5 ) appear different for te two groups. (iv) Metod After constructing and applying te discriminant function, 4/7 () and /5 () are found to ave been correctly classified. Tis is good, but is likely to be an overestimate of te future success rate (since te same data ave been used to construct te function and to "ceck" it). Cross-validation may be carried out by, for example, a jack-knife metod: calculate te function omitting one observation, and use te function to predict class membersip of tat item; repeat tis for eac item in turn and observe te number of correct predictions. [In a large data-set, te discriminant function would be calculated on some of te data and ten used to ceck te success rate of te remainder. Here we do not ave enoug data for tat.] Tis gave /7 () and 9/5 () correct. Metod Note tat x 4 was identified in (iii) as a useful variate. Tis metod correctly classifies /7 () and /5 (), and te numbers on cross-validation are te same. Tis seems te better metod. Wit tese sample sizes, using 5 variables (Metod ) may be over-fitting. Te univariate (as it as turned out) Metod () is more successful.

Graduate Diploma Module 5, Specimen Paper B. Question 3 If f (t) is te probability density function and F(t) te cumulative distribution function for te lifetime, ten te azard function (t) is defined by (t) = f (t)/( F(t)). Te azard function in tis case is t () = ()exp( t β x+ β x + β xx) 0 were 0 (t) is te baseline azard function. (i) Te log likeliood (log L) is not given for te null model, but te coefficient for x in model A (wic contains x only) is large relative to its standard error; furter, te azard ratio is ig. Tese suggest tat x is important. Te difference in logl between models B and C is small and certainly not statistically significant. Tus it seems likely tat tere is not an interaction effect (te interaction term is te only difference between te two models). Te difference in logl between models A and B is (7.58 75.640) = 8.44. Tis is significant as an observation from χ. So we conclude tat te best model is model B. (ii) Hazard ratio = exp(coefficient). For x in model B, a 95% confidence interval for te coefficient is given by.7 ± (.96 0.49), i.e. it is (0.33,.03). Te corresponding 95% confidence interval for te azard ratio is from exp(0.33) to exp(.03), i.e. from.393 to 7.484. (iii) Te fault reduces te expected lifetime. Given two items, bot wit te same level of te cemical (x ), te one wit te fault is about 3 [exp(.7)] times as likely to fail at any time. (iv) (.58.4) +.7 = 3.3 (.58.9) + 0 = 4.43 So te second is more likely to fail first. (v) If te proportional azards assumption is not valid ten te model is deficient. In particular te interpretation of te azard ratios is invalid, and te answers to parts (iii) and (iv) may be inaccurate. Te consequences depend to some extent on te type and seriousness of any departures from te assumptions.

Graduate Diploma Module 5, Specimen Paper B. Question 4 (i) Te survival time of an individual is censored wen te end-point of interest (deat in tis example) as not been observed, eiter because te trial is terminated before te end-point took place or because te individual as been lost to te trial for some reason (e.g. does not respond to follow-up). Te prase rigt-censoring refers to te censoring occurring after (i.e. to te rigt of, in natural time order) te last known survival time. (ii) Te Kaplan-Meier survival curve is constructed as follows. Te word "deat" is used in tis description generically; ere it does in fact refer to deat, but in oter examples it migt be ealing, general recovery, etc. We seek te estimated cumulative survival function St. ˆ( ) Te Kaplan-Meier metod requires te ordered deat times t (), t (),, t (r) to be considered. For j =,,, r, let n (j) be te number of individuals alive just before time t (j), and let d (j) be te number of deats at t (j). n( j) d( j) An estimate of te probability of survival from t (j) to t (j + ) is. n Tus (assuming independence) te probability of surviving troug all te intervals up to t (k + ) is estimated by () Sˆ t n d, k ( j) ( j) = j= n ( j) and tis is te Kaplan-Meier estimate. If te largest survival time [t (r) ] is censored, te metod above is used to give estimates up to and including te next largest, te value for wic is ten assumed to apply for all times onward. If te largest survival time is not censored, te estimate drops to zero at tat point. In te present example, t (r) is not censored. Te calculation is sown in detail on te next page. ( j) Solution continued on next page

Tere are 4 patients. Te calculation is sown in detail in te table below. Some of te detail migt be omitted in practice, and is often not sown in computer output. Rows for te censored observations ave been omitted, but care must be taken to ensure tat n (j) is always correct. Users of tis solution sould carefully verify te values in te table by reference to te data in te question. Time t (j) n (j) as defined in text above [i.e. number remaining just before time t (j) ] d (j) as defined in text above [i.e. number of events at time t (j) ] n d n ( j) ( j) ( j) Cumulative survival estimate St ˆ( ) at eac t (j) 6 3 4 9/3 0.86 8 9 7/9 0.739 7 5/7 0.65 0 0 9/0 0.5870 4 8 7/8 0.536 30 4 3/4 0.385 4 0 0 Solution continued on next page

Greenwood's formula for te standard error for te Kaplan-Meier estimate at monts follow-up (corresponding to te tird row in te calculation above) is SE = Sˆ ( ) 3 d ( ) n n d j= j j j 4 = 0.65 + + 3 9 9 7 7 5 j = 0.65( 0.00953+ 0.0069 + 0.007843) / = 0.65 0.0388 = 0.0993. (iii) Te interval is 0.65 ± (.96 0.0993) = 0.65 ± 0.946, i.e. (0.46, 0.85). (iv) From te grap, te median survival time is 30 monts. (v) A log rank test could be used for tis purpose.

Graduate Diploma Module 5, Specimen Paper B. Question 5 (a) Te infant mortality rate is number of deats between birt and one year (excluding fetal deats, stillbirts) total number of live birts in te same year. Te neonatal mortality rate is as above but only including deats up to 8 days. Te perinatal mortality rate is number of fetal deats and neonatal deats total number of live birts [sometimes divided by te total number of live birts and fetal deats, tere being no generally accepted convention for computing tis rate]. Te maternal mortality rate is number of deats from puerperal causes total number of live birts All tese rates are usually multiplied by 000. Solution continued on next page

(b) Age Males 00 90 Females 80 70 60 50 40 30 0 0 350 300 50 00 50 00 50 50 00 50 00 50 300 350 Totals (tousands) in ten-yearly age groups Te 0 frequencies are multiplied by 0; tose for 4 by 0/4; te coice of upper limit for te 85+ group is made so as not to distort te pyramid. [NOTE. Te accuracy of representation on te diagram is constrained by te limits of electronic reproduction.] (c) (i) Te sex-age-specific deat rate for a country is number for tat sex in age-range of interest during year average number of persons of tat sex and age living during te year 000 Tis is calculated separately for males and females, using suitable age ranges. "Average" (mid-year) is usually te mean of beginning and end figures for tat year. (ii) Te rates for U are generally iger tan for D up to 44, and ten become lower. Te rates for males are generally iger tan for females, in bot countries. Females tus ave longer expectation of life tan males, and inabitants of D ave longer expectation of life tan tose of U.

Graduate Diploma Module 5, Specimen Paper B. Question 6 (i) Tis is because a case-control study is retrospective, wereas relative risk is measured in a prospective study. A retrospective study takes affected persons and explores in detail te istory of events tat may ave led to te condition. For example, it may tereby enquire into weter most of tose affected by a colera epidemic ave consumed water from te same source. A retrospective study relies on aving fairly complete and reliable data on a variety of topics. A prospective study begins wit unaffected persons (e.g. tose witout lung cancer), notes various caracteristics (e.g. smoking abits, occupation, place of residence) and studies future development of te condition in relation to tose caracteristics. Tus it may enquire into weter te condition develops more frequently in some groups tan in oters. (ii) Odds P(event appens) =. P(event appens) Te odds ratio is te ratio of odds of disease in te exposed group of patients (i.e. ere te smokers) to tat in te unexposed group, i.e. ere odds ratio = P(disease, smoker) P(disease, smoker) P(disease, non-smoker) P(disease, non-smoker). (iii) Te combined data are as follows. Smokers Non-smokers Total Cases 89 394 483 Controls 3 434 447 Total 0 88 930 Using te relative frequencies from tis table, te odds ratio may be calculated as 89 434 3 394 = 7.54 wic is substantially greater tan and indicates greater prevalence of cancer among smokers. Solution continued on next page

(iv) Te Mantel-Haenszel metod is a simple way of adjusting for anoter factor, in tis case sex. [Note. Oter metods for doing tis are used in some computer programs.] Representing eac table by a c wit a + b + c + d = n, and keeping te b d two sexes separate as in te question, te Mantel-Haenszel estimate of te odds ratio is ad i i / ni bc / n i i i were i =, for males, females. Tis gives 58 7 3 63 + 580 350 6 45 7 49 + 580 350 4.537 = = 5.54 7.53. Tis is virtually te same as for te pooled data (7.54). Tis often turns out to appen wen bot subsets of te data are large and of te same order of size; also, in tis case, we migt suppose tat sex is not in fact an important factor. To obtain a 95% confidence interval for te odds ratio, we work via logaritms and first use te pooled data to obtain te standard error of te log odds ratio using te formula Var(log of odds ratio) = + + + a b c d (= 0.09300 ere) so tat te standard error is 0.09300 = 0.305. Te log of te Mantel-Haenszel estimate of te odds ratio is log(4.537/5.54) =.093, so te 95% confidence interval for te logaritm is given by.093 ± (.96 0.305) =.093± 0.5978, i.e. it is (.4(5),.67), and tus te interval for te odds ratio itself is (4.4, 3.69). (v) Te confidence interval does not contain.00, so we may reject te null ypotesis tat smoking status and occurrence of lung cancer are unrelated. Tere is definite evidence of an association. As mentioned above, te odds ratio strongly indicates greater prevalence of cancer among smokers tan among non-smokers. It does not appear tat sex is an important factor in tis.

Graduate Diploma Module 5, Specimen Paper B. Question 7 Part (i) (a) Cov ( R ˆ, x) E( Rx ˆ ) E( R ˆ ) E( x) E( y) E( R ˆ ) E( x) Y XE( R ˆ ) = = =. Tis gives ( ˆ ) ( ˆ Y Cov, ), i.e. ( ˆ ) ( R ˆ x ) Cov, E R = R x + E R R=. X X X (b) n f = N and R ˆ = y, so tat ˆRx = y or y Rx ˆ = 0. x Hence te estimator of Var ( R ˆ ) given in te question is {( y ) ( )} i y R xi x f. ˆ nx n { ( y ) ( )( ) ( ) } i y R yi y xi x R xi x f =. ˆ + ˆ nx n ( s ˆ Y RρsXsY R sx ) f = ˆ + ˆ nx in wic sy, s X are te estimated variances of Y and X, and ˆρ is te estimated correlation coefficient for X and Y. Part (ii) Te ratio metod works well wen Y is proportional to X, wit te relation passing troug te origin. It will not be better tan a simple random sample wen ρ is less tan 0 or wen te relation does not pass troug te origin (in wic case a regression estimator is required instead). See next page for solution to (iii)

Part (iii) (a) Te sugar content of an individual fruit sould be rougly proportional to its weigt, in fruit from te same source and batc. (b) Since we are not told N, te total number of oranges, a ratio estimator is used rater tan regression. Counting te wole batc would take a very long time for wat migt be a very small improvement in precision. x= 975, y = 0.9. X T = 80. ˆ y Σy R= = = 0.0565. Yˆ ˆ T = RXT = 46.045 (kg). x Σx We ave Var ( Yˆ ) Var ( ˆ T XT R) =. Also, on neglecting f wic will be very small (as n is only 0), we ave tat te value of te estimator of Var ( R ˆ ) is 0 9 ( y i Rxiyi + R xi ). ˆ ˆ x ( ) ( ( ) ) =. 68.69 0.0565 94.8 + 0.0565 39389 90 97.5 3.34687 = ( 68.69 49.476 + 37.33) = 35056.5 35056.5 wic on multiplying by (80) gives tat te value of te estimator of Var ( T ˆ ) is 5.564, i.e. te standard error is 5.056. (c) Te alf-widt of te interval, ts / n, is to be less tan. Tus s/ n < and 5 oranges will acieve tis approximately.

Graduate Diploma Module 5, Specimen Paper B. Question 8 [solution continues on next page] (i) As is clear from te data, te six strata split into tree wit fairly low density of caribou and tree wit muc iger density. Tere are also some variations in te values of s between te six strata. Stratified sampling ensures tat all tese six strata will be represented adequately, and tat an estimate of te total number of animals will ave a smaller standard deviation tan for simple random sampling. (ii) Te estimated total is Yˆ = Ny = N y (were tere are L strata), so st st = L ( ) ( ) ( ) ( ) Y ˆ = 400 4. + 40 5.6 + 00 67.6 + 40 79.0 st + ( 70 93.7) + ( 0 33.) = 697. Te estimated variance of Y ˆst is given by L N( N n) = 400 ( 400 98 ) +... = 84368.3, = s n so te estimated standard error is 97.9. 74.7 98 (iii) N is te true number in stratum. S is te true standard deviation in stratum. w is n /n, te proportion of te wole sample tat comes from stratum. V is te value specified for Var( Y ˆst ). (iv) Optimal allocation minimises te variance Var( Y ˆst ) (equivalently, Var( y st )) for fixed total sample size n. As well as allocating more sampling to strata wit larger population sizes, it allocates more to tose wit larger standard deviations, so te precision is comparable wit tose aving lower variability. In te present survey, tere are wide variations among te stratum sizes and standard deviations; proportional allocation wit te same sample size as optimal allocation is likely to lead to a considerably larger value of Var( Y ˆst ). (v) We use te formula quoted in part (iii) of te question, taking te estimates s from te preliminary aerial survey as toug tey were te true values S. Optimal allocation wit constant cost of sampling any unit as w i (= n i /n) given by NS i i wi =. We ave ΣN L S = 33903, using te preliminary survey values. NS =

Furter, we see tat ( NS) L N S = NS w, so tat = ( NS ) = w N S. Also we ave ΣN S = 478886 (tis appears in te denominator of te formula). Finally, we need V. Te criterion of d = 8000 wit (one-sided) tail probability 0.05 gives V = (8000/.96). ( 33903) n = = 8000 + 478886.96 77.804. So we take n = 78. Te allocation in eac stratum is ten given by n i NS i i = 78wi = 78 NS, wic gives n = 6.03, n = 5.9, n 3 =.39, n 4 =.54, n 5 = 5.08, n 6 = 4.66. However, te total size of stratum 3, N 3, is only 00; so we must take n 3 = 00. Te remaining 78 are ten allocated in te same ratios as before, by multiplying eac by 78/55.6. Tis gives n = 70.96, n = 6.05, n 4 = 4.34, n 5 = 58.43, n 6 = 8.. Finally, n = 7, n = 6, n 3 = 00, n 4 = 4, n 5 = 58, n 6 = 8. Wit tis allocation, te estimated variance of Y ˆst is given by L = s 74.7 N( N n) = 400 ( 400 7 ) +... = 8608.04 n 7 (note tere is a ZERO contribution to te sum from stratum 3, were we ave a 00% sample), so te estimated standard error is 433.9.