DESIGN-BASED RANDOM PERMUTATION MODELS WITH AUXILIARY INFORMATION. Wenjun Li. Division of Preventative and Behavioral Medicine

Size: px
Start display at page:

Download "DESIGN-BASED RANDOM PERMUTATION MODELS WITH AUXILIARY INFORMATION. Wenjun Li. Division of Preventative and Behavioral Medicine"

Transcription

1 DESG-BASED RADOM PERMUTATO MODELS WTH AUXLARY FORMATO Wenjun Li Division of Preventative and Behavioral Medicine Universit of Massachusetts Medical School Worcester MA 0655 Edward J. Stanek Department of Public Health Universit of Massachusetts Amherst MA 0003 Corresponding address: Wenjun Li Ph.D. Biostatistics Research Center Division of Preventive and Behavioral Medicine Universit of Massachusetts Medical School Shaw Building S Lake Avenue orth Worcester MA 0655 The content of this article is a part of the first author s dissertation conducted at the Department of Biostatistics and Epidemiolog Universit of Massachusetts Amherst Massachusetts. This research was partiall supported b a H grant (H-PHS-R0-HD36848). The authors wish to thank Drs. Julio Singer John Buonaccorsi and Carol Bigelow for their constructive comments. Corresponding author address: Wenjun.Li@UMASSMED.EDU LiW_05_m_v5.doc 9/9/005 i

2 ABSTRACT We develop design-based estimators of the population mean using auiliar information under simple random without replacement sampling b etending the random permutation model of Stanek Singer and Lencina (004) and Stanek and Singer (004). A ke step in the development is representing the population mean defined as a non-stochastic average of unit values as the sum of random variables constructed from a random permutation of the population. Using methods similar to those in model-based approaches the random variable representing the sum of non-sampled units is predicted to form a design-based estimator of the population mean. The random permutation model is etended b the joint permutation of the response and auiliar variables with known auiliar information incorporated through centering the auiliar variables on their respective means. The estimators are required to be linear functions of the sample unbiased and have minimum mean squared error. The estimators are identical to model-assisted and calibration estimators. However the results provide the building blocks for etending the design-based random permutation model theor to include covariates in more complicated sample designs and bridging the theoretical results to practical applications. The concordance of results with other less sstematic approaches is an added appeal of the methodolog. Ke words: auiliar variable design-based inference prediction finite sampling random permutation model simultaneous permutation Running title: Random Permutation models with auiliar information LiW_05_m_v5.doc 9/9/005 ii

3 . TRODUCTO Estimators can be made more precise b accounting for auiliar information such as gender age income and chronic disease-bearing histor that are partiall or completel known in a population. Methods of improving estimation with auiliar information have been discussed in a model-based approach (Bolfarine and Zacks 99; Valliant et al. 000) and using modelassisted or calibration methods (Cassel et al. 977; Cochran 977; Särndal and Wright 984; Deville and Särndal 99; Särndal et al. 99). n a model-based approach the result is the best linear unbiased predictor (BLUP) (Ghosh and Rao 994; Rao 997). The model-assisted approach combines ideas from model-based and design-based approaches resulting in generalized regression (GREG) estimators. The calibration approach produces a weighed estimator with weights that minimize the distance to benchmark weights while calibrated to known population quantities on some set of auiliar variables. The actual sample design plas no role in inference based on the model-based approach. Additional superpopulation model assumptions beond the sample design are required for development of GREG estimators. The calibration approach lacks an integrated theoretical framework. All of these approaches require additional assumptions or lack an integrated theor that builds simpl on the sample design. We develop a design-based estimator of the population mean that accounts for auiliar information. The development etends use of the random permutation model for simple random sampling (SRS) and two stage cluster sampling (Stanek Singer and Lencina (004) Stanek and Singer (004)) to account for auiliar variables. The method avoids the limitations of the superpopulation model approach but takes advantage of some of the optimization tools used in developing predictors with superpopulation model assumptions. Beginning with a vector of responses for each subject in a finite population a SRS of subjects is represented b a random permutation probabilit model that permutes the subject s response vectors. Auiliar LiW_05_m_v5.doc 9/9/005

4 information is incorporated through a simple linear transformation. The results provide a direct design-based method to account for auiliar information when estimating the population mean. This paper is organized as follows. We first present definitions and notation and introduce the random permutation model. We net present a simultaneous random permutation model in a simple setting with a response and one auiliar variable and use it to derive the best linear unbiased estimator (BLUE) of the population mean. Subsequentl we etend the results to scenarios with multiple auiliar variables. We conclude b illustrating the method with an eample and include results from a small scale simulation evaluating empirical estimators.. DEFTOS AD OTATO Let the population consist of subjects labeled s = and assume that the labels are non-informative serving onl to identif the subjects. Associated with subject s is a non-stochastic potentiall observable vector ( s s ) where s denotes the response of ( ks ) interest and = ( ) mean b the vector is a p vector of auiliar variables. We represent the population s ( µ µ ) where µ is associated with the response and µ = (( µ )) is a k p vector associated with the eplanator variables. The population covariance structure is defined b the ( p+ ) ( p+ ) matri σ σ Σ where Σ = σ Σ ( ) p ( ) k k σ = σ σ σ and Σ = ( σ ) is a p p matri. ndividual terms are defined in the usual manner such that µ = s µ k s= s= = σ = s µ ks ( ) s= σ = µ k ( ) ks and σ ( )( ) k = k s µ ks µ k s=. s= LiW_05_m_v5.doc 9/9/005

5 3. THE RADOM PERMUTATO MODEL We define a random permutation model as the set of all possible equall likel permutations of subjects in the population. We represent a random permutation b the sequence of random variables Y Y where the subscript refers to the position occupied in the random permutation. Following Stanek Singer and Lencina (004) we eplicitl define these random variables in terms of a set of indicator random variables U i = that is have a value of one if subject s is in position i in a permutation and zero otherwise. Using this notation the response for the subject in position i in a permutation is represented b the random variable Y i = Uiss ; using a vector notation Y i i s= = U where U = ( ) U U U i i i i ( s ) represents a vector of random variables that selects a subject in position i and = ( ) represents the population response vector. Defining ( ) U = U U U the random variables representing a permutation of response are given b Y = U. Similar vectors can be defined to represent a permutation of auiliar variables Xk = U k k =... p. This notation simplifies representation of the problem and development of the epected value and of the covariance structure of the vectors. We use properties of the indicator random variables to evaluate the epected value and covariance structure of the random variables. Taking epectation over all possible permutations ( ) EU is = for all i... ; s.... As a result = = EY ( ) E( ) = U = = µ for all i i EUU ( ) i =... where is an vector. n a similar manner is i s = ( ) when i i and s s EU ( is ) = when i= i and s s = and ( ) EUU = otherwise. Using is i s 0 LiW_05_m_v5.doc 9/9/005 3

6 these epressions we can show that ( ) = σ when i i. ( Y Y ) cov i i var Yi = σ for all i =... and that We combine these results to epress the epected value and the covariance structure of the random permutation using vector notation. As a result ( ) ( ) E Y = µ ; and Y = σ P var where P = b J is an identit matri and J =. Similar epressions ab a a cov YX k P k = σ appl to the auiliar variables. n addition ( ) and for k k ( ) = σ cov X X P k k k k. 4. SMULTAEOUS RADOM PERMUTATO MODEL We net consider the random variables representing the random permutation of response and auiliar variables simultaneousl. For simplicit we first assume that there is a single auiliar variable (i.e. p = ) corresponding to the subject s age (which we represent b s ). Age is assumed to be known for all subjects in the population. The simultaneous random permutation model is defined b concatenating the permutation vectors for response and the auiliar variables. The resulting model is an eample of a seemingl unrelated regression model (Zellner 963) with both regressions corresponding to simple mean models such that Y 0 µ = + E X 0 µ Z= Gµ + E () ( ) where Z= Y X = G = ( ) µ µ µ and where denotes the Kronecker product (Grabill 976). n this model E ( Z) = Gµ and var ( ) = Z Σ P. The LiW_05_m_v5.doc 9/9/005 4

7 mean age µ is known since age is assumed to be known for all subjects. This implies that the random variables arising from permuting the auiliar variable are constrained b X i i = µ =. We eliminate µ from model () b subtracting 0 µ from both sides (which is equivalent to multipling each term in the model b 0 R = ). Representing the 0 P elements of X b X = X µ the transformed model is given b i i Y = µ + E X 0 () where Z = RZ and G = RG. Since Z = G µ + E P is idempotent ( ) = E µ Z G cov( ) Model () is equivalent to model () with the auiliar variate centered at zero. Z = Σ P. 5. SAMPLG AD PARTTOG THE RADOM VARABLES We represent random variables for a SRS of size n b the first n elements of the random permutation vectors. The sampled ( Z ) and remaining ( ) ( ) Z portions of Z can be n 0 n ( n) obtained b pre-multiplication b Z K = i.e. KZ = ( ( n) n n). Since K is 0 Z non-stochastic it follows that ( Z) ( n 0 n) E ( Z ) = ( µ n 0 n ) E = µ and Z V V cov = where Z V V V = Σ P V = Σ P n ( n) V = V = Σ J n ( n) where ( ) = n n n n reflects SRS sampling can be represented as J. Consequentl the partitioned model that LiW_05_m_v5.doc 9/9/005 5

8 Z G = µ + E (3) Z G where ( ) G = 0 and G = ( 0 ). n n n n 6. PARAMETER OF TEREST We assume that the parameter of interest is the population mean of the response variate which is defined as a linear function of the permuted random variables namel i µ n = c Y i i + c = i = n+ Y where c = or equivalentl as = CZ = C Z + C Z (4) µ C ( 0 ) = ( ) where = c C c 0 and C = c( 0 ). After sampling onl n n n n C Z will be unknown; thus estimating µ is equivalent to finding a predictor of C Z = c Y. i = n + i We develop the best linear unbiased predictor (BLUP) of C Z and refer to the estimator of µ as the BLUE of µ. 7. BLUE OF A LEAR FUCTO We require the predictor of C Z to be a linear function of the sample wz n n = wy i i + wx i i to be unbiased i.e. ( E wz ) ( ) = E C Z and have minimum i= i= mean squared error (MSE). The estimator of µ is given b n n n ( C w ) Z i= i ( i= i i i= i i ) P= + = c Y + w Y + w X. (5) The unbiased constraint implies that wg C G = 0. The variance of P is ( ) var P = wvw wv C + C V C. With such a setup the prediction theorem of Roall LiW_05_m_v5.doc 9/9/005 6

9 (976) can be applied. The minimum variance unbiased estimator is obtained b minimizing the function ( ) { } Φ w = wv w w V C + wg C G (6) λ where λ is a Lagrangian multiplier resulting in After simplification (7) reduces to where f and ( ) ( ) ˆ w = V V + G G V G G G V V C. (7) f wˆ = n (8) n β = n β = σ σ. Consequentl the estimator and its variance are n ˆ β P = Y + Y X ( µ ) i i= i= n+ f β = fy + f Y X f ( ) ( µ ) ( X ) = Y β µ (9) ( ˆ σ var P) = ( ρ )( f ) (0) n where ρ = σ σσ is the correlation coefficient of Y on X and Y and X are the sample mean of the response variable and the auiliar variable respectivel. This epression is similar to the epressions commonl seen in linear regression estimators but includes a finite population correction factor. The epressions in equation (9) have interesting interpretations. The first epression is divided into the sum of random variables in the sample and a predictor of the random variables in the remainder. Since the weighted total of the sample and remainder random variables equals LiW_05_m_v5.doc 9/9/005 7

10 the population mean it is clear that for a realized sample we simpl substitute the values observed for the sample random variables in the estimator. Rather than predicting the value of each remaining random variable b the sample mean Y we include an adjustment that accounts for the auiliar variable. The adjustment shrinks the observed difference between the auiliar sample mean and the population auiliar mean. For eample if the value observed for the auiliar sample mean X eceeds the population mean µ and the response and auiliar random variables are positivel correlated we ma anticipate that the sample mean Y will also eceed the population mean µ. The predictor of the remaining random variables compensates for the over estimation of µ b the sample random variables. The correction is proportional to the auiliar difference X µ and weighted b the regression coefficient β inflated b one over the finite population correction factor f. The second epression in equation (9) is ver similar to the BLUP of a realized cluster mean based on a mied model fit to a two stage cluster sample. The similarit provides an intuition connecting BLUP in the two stage sampling contet to adjustments for auiliar variables in SRS. n the contet of two stage cluster sampling with equal size clusters the BLUP of a realized cluster mean under a two stage random permutation model is given b ˆ T = fy + f Y Y Y i i i f i ( β ) ( ) where f = m/ M is the common sampling fraction of units in clusters Y is the sample mean of i the realized cluster Y is the average of these means over sampled clusters and β = mσ ( f ) σ e + ( f ) σ e is a function of the variance between clusters σ and the average variance within clusters σ e (see Table Stanek and Singer (004 p5)). The predictor is LiW_05_m_v5.doc 9/9/005 8

11 the sum of two terms. The first term substitutes the values observed for the sample units for the realized cluster in the predictor. The second term predicts the contribution of the remaining units in the realized cluster b adjusting the realized cluster sample mean. The adjustment is proportional to the difference between the realized cluster sample mean and the overall mean. The similarit of the BLUP of a realized cluster mean in two stage cluster sampling to the estimator of the population mean with auiliar information in SRS is self-evident. The difference between the sample mean for the auiliar variable and the population mean in SRS is replaced b the difference between the realized cluster sample mean and the overall mean in two-stage cluster sampling. The regression coefficient relating the response to the auiliar variable in SRS is replaced b a coefficient that is a function of the within and between cluster variance components. 8. EXTESO TO CASES WTH MULTPLE COVARATES Etension of the above results to scenarios with multiple auiliar variables is straightforward. Suppose p auiliar variables are available for use in estimation. The matri of random variables representing a joint permutation of a response variable and the p covariates is given b U( p ) = ( Y X X p). A simultaneous random permutation model can be defined similar to () with E ( ) = ( p ) and = ( p+ ) Z = vec Y X X G. t follows that and that cov Z = Σ P. When auiliar means µ k k = p are known a Z Gµ ( ) 0 similar centering transformation can be applied to Z such that Z = Z. A 0 p P reparameterized model incorporating known auiliar information has a form identical to Model ( ) () but where G = 0. With these differences a linear unbiased minimum variance p LiW_05_m_v5.doc 9/9/005 9

12 estimator can be derived following the same steps as in Section 7 (Appendi ). The unique solution for the vector of coefficients w is f wˆ = n n β β Σ σ k = p. Accordingl the estimator and its where = X X = ( β β βp) variance are ˆ β P = fy + f Y f ( ) ( X µ ) ( ) = Y β X µ () and ( ˆ σ var P) = ( ρ X )( f ) () n where Y is the sample mean of the response variable and X is the vector of sample means of the p auiliar variable and ρ = σ Σ σ σ is the squared multiple correlation coefficient X X X X of Y on X coefficient. otice that the onl difference between () and (0) is the multiple correlation ρ X instead of ρ. This epression is similar to the epressions commonl seen in multiple linear regression models (Grabill 976) but includes a finite population correction factor. Results () and () are also the same as difference estimators with optimal coefficients (Montanari 987) and the GREG estimator (Särndal et al. 99). 9. EMPRCAL ESTMATES The estimators in equation (9) and () are functions of the regression coefficients which in turn are functions of variance components in the population. n practice although Σ X ma be known σ X will need to be estimated. Särndal et al. (99 p9) recommends LiW_05_m_v5.doc 9/9/005 0

13 estimating both terms. When p= the estimate of β is given b b ˆ ˆ = σ / σ where ˆ = n n ( Y Y )( X X ) and ˆ ( i σ n i n = σ = i i = Xi X) uses the known population variance of X to estimate β b b ˆ = σ / σ.. An alternative estimator We conducted a small scale simulation stud to compare different empirical estimators in the contet of a simple problem where there is interest in estimating the smoking rate µ = π in a population of size based on a SRS with both smoking status and gender recorded on sample subjects. We assume that the response variable is an indicator of smoking status ( = s if subject s is a smoker and zero otherwise) and the auiliar variable is an indicator of male gender ( = if subject s is male and zero otherwise). We also assume that the proportion of s males in the population µ = π is known. B including the gender status of the sample subjects theoreticall we can reduce the variance of the estimate of smoking prevalence in the population over the simple sample proportion with the percent reduction given b 00( ρ ). n practice since the regression parameter is not known some of the reduction in the MSE is lost due to uncertaint in the regression coefficient. The simulation stud generated a series of hpothetical populations of sizes and 600; each with an auiliar variable representing 50% men ( π = 0.5 ). n all populations the overall prevalence of smokers is assumed to be 40% ( π = 0.4 ). The prevalence of smoking in men ranged from 40% to 68% translating into a relative risk of smoking for men ranging from to We evaluated the estimators b comparing the average MSE over 5000 independent simple random samples. To compare the MSE of the estimators we epressed the MSE of estimators that account for gender as a percentage of the MSE of the estimator corresponding to the simple sample smoking prevalence. Three estimators are compared differing b whether b or b β is used for the regression coefficient. LiW_05_m_v5.doc 9/9/005

14 The simulation results are illustrated in Figures and and summarize the reduction in MSE that ma result b accounting for auiliar information. Figure presents results when =00 and n=5. Results were ver similar to those in Figure with larger and larger size samples. Figure presents results when =50 and n=5. n both figures notice that a reduction in the MSE will almost alwas occur when the regression coefficient is known after accounting for auiliar information. n the simulation the MSE of the estimator based on b was slightl lower but basicall equivalent to the estimator based on b in all settings. Using the empirical estimators a reduction in the MSE will not occur unless there is a sufficientl large difference (measured here b the relative risk of smoking for males) in response b the auiliar variable. From Figure a reduction in the MSE will occur when RR>.75 while from Figure a reduction will occur when RR> EXAMPLE As a simple eample suppose there is interest in estimating the smoking rate µ = π in a population of size based on a SRS with both smoking status and gender recorded on sample subjects. We assume that the proportion of males in the population µ = π is known and represent the sample estimate of the proportion smoking as Y = ˆ π the proportion male in the sample as X = ˆ π and the proportion of male smokers in the sample as ˆ π. With this ˆ ˆ ˆ σ π π notation ( ) ˆ ˆ ˆ σ = π π σ = π( π) and = ( ) ˆ σ = ˆ π ˆ ππˆ. Using these estimators we estimate β b ˆ π b = ˆ π ˆ ππˆ ( ˆ π ) or LiW_05_m_v5.doc 9/9/005

15 b ˆ π = π ˆ ππˆ ( π ) and estimate ρ b ˆ ρ = ˆ π ˆ ππˆ ( ) ˆ ( ˆ ) π π π π. Substituting these epressions into the estimator in (9) using b results in { ( ) ( π ) } Pˆ ˆ ˆ ˆ = n n b π + π π ˆ π ˆ ˆ ππ = ˆ π π π π( π) ( ˆ ) ˆ σ with variance estimated b var ˆ ( Pˆ ) ( ˆ ρ )( f ) =. n As a particular application suppose that = 00 and n = 0 so that f = 0.0 and assume that half the population is male i.e. π = 0.5. Among the sample suppose further that thirt percent smoke ( ˆ π = 0.3 ) sit percent are male ( ˆ π = 0.6 ) and that twent-five percent of the sample subjects are male smokers ( ˆ π = 0.5). Using this information b ( ) ( ) = = ( 0.3) ( ) ( ) ˆ ρ = = and the proportion of smokers in the population is estimated b P ˆ = { ( ) ( ) ( ) } = + 00 = + 00 = 0.7 { 6 4 8( 0.) } { 6.} Thus rather than using the sample estimate of the smoking prevalence (i.e. ˆ π = 0.3 ) the estimate of smoking prevalence accounting for gender is given b. ˆ 0.7 P =. To form this estimate the predicted number of smokers among the non-sampled subjects is reduced from 4 LiW_05_m_v5.doc 9/9/005 3

16 to. to account for the larger percentage of male subjects in the sample (relative to the percentage of males in the population). The same estimate can be obtained b directl appling the regression results ˆ = ˆ π ( ˆ π π ) or ( ) P b P ˆ = = 0.7 but this epression does not reveal the essential prediction of non-sampled random variables underling the method development. Using b to estimate β P ˆ = 0.7 a result that corresponds to the simple weighted average prevalence of smoking using population gender weights. These estimators are identical to post-stratified estimators of prevalence rate as discussed in calibration (Deville and Särndal 99) and model-assisted approaches (Särndal et al. 99). The estimated standard error is given b ( ˆ ) var ˆ P = The standard error is ˆ ρ = 0.95 of the estimated standard error based solel on the sample (given b ( ˆ π ) = ( f ) var ˆ ( ˆ ) ˆ π π n ).. DSCUSSO The surve sampling literature has struggled to reconcile design-based and model-based theories of estimation/prediction. Model based methods recentl popularized b Valiant Dorfman and Roall (000) have a theoretical structure based on the prediction-based methods developed b Roall (973976). This structure is important since it allows methods to be etended relativel easil to different applications with increasing compleit. The limitation of the theor is that it does not account for the sample design. A similar theor has not previousl emerged using design-based methods. nstead a miture of approaches such as GREG or calibration approaches (Särndal et al. 99) have been developed. These approaches combine model-based and design-based ideas or begin with ad-hoc functional forms of estimators and optimize them in special settings. These approaches have been successful in addressing man practical problems in a design-based LiW_05_m_v5.doc 9/9/005 4

17 framework. However the have not led to an approach that provides the consistent conceptual and theoretical base or that can be readil etended to applications with increasing compleit. We have illustrated how the design-based random permutation model theor can be etended to include auiliar variables in a straight forward manner. These results etend the scope of the random permutation model theor to a broader class of problems. Previous developments of the theor have identified subtleties in interpreting random effects in simple random sampling (Stanek Singer and Lencina (004)) and developed predictors of realized random effects in balanced two stage sampling problems with response error (Stanek and Singer 004). Current research is etending these results to clustered population settings where clusters are of different size and there is unequal probabilit sampling and to settings where there is missing data. n each case the same basic approach is considered with estimators (or predictors) developed based on a clear optimization theor. The present results illustrate how the theor can be used to account for covariates. The fact that the results coincide with those developed b GREG or calibration approaches strengthens the appeal of the random permutation model approach as a design-based competitor to the model-based superpopulation theor. Still much more work is needed to etend the methods. Etensions are being developed to more comple settings including two stage designs with cluster and unit covariates longitudinal studies and settings where randomization of units to treatments. We consider the basic results developed here to provide a foundation for additional work in these directions. Reference Bolfarine H. and S. Zacks (99). Prediction Theor for Finite Populations. ew York Springer- Verlag. Brewer K. R. W. (995). Combining design-based and model-based inference (Chapter 30). Business Surve Methods. B. B. Co DA; Chinnappa B; Christianson A; Colledge MJ; Kott PS. ew York John Wile & Sons nc.: Brewer K. R. W. (999). "Cosmetic calibration with unequal probabilit sampling." Surve Methodolog 5: 05-. Brewer K. R. W. (999). "Design-based or prediction-based inference? Stratified random vs stratified balanced sampling." nternational Statistical Review 67(): LiW_05_m_v5.doc 9/9/005 5

18 Brewer K. R. W. M. Hanif et al. (988). "How earl Can Model-based Prediction and Designbased Estimation Be Reconciled?" Journal of the American Statistical Association 83: 8-3. Cassel C. M. C.-E. Särndal et al. (977). Foundations of nference in Surve Sampling. ew York Y Wile. Cassel C. M. C. E. Särndal et al. (976). "Some Results on Generalized Difference Estimation and Generalized Regression Estimation for Finite Populations." Biometrika 63: Cochran W. G. (977). Sampling Techniques. ew York John Wile and Sons. Deville J. C. and C. E. Särndal (99). "Calibration Estimators in Surve Sampling." Journal of the American Statistical Association 87: Ghosh M. and J.. K. Rao (994). "Small area estimation: an appraisal." Statistical Science 9(): Grabill F. A. (976). Theor and application of the linear model. Belmont CA Wadsworth Publishing Compan nc. Horvitz D. G. and D. J. Thompson (95). "A generalization of sampling without replacement from a finite universe." Journal of the American Statistical Association 47(60): Montanari G. E. (987). "Post-sampling Efficient QR-prediction in Large-sample Surves." nternational Statistical Review 55: 9-0. Rao C. R. (967). Least square theor using an estimated dispersion matri and its applications to measurement of signals. Proceedings of teh 5th Berkele Smposium on Mathematical Statistics and Probabilit Berkele CA. Rao J.. K. (997). "Developments in sample surve theor: an appraisal." Canadian Journal of Statistics 5(): -. Rao J.. K. (999). "Some Current Trends in Sample Surve Theor and Methods." Sankha: The ndian Journal of Statistics: Series B 6(): -57. Roall R.M. (973). The prediction approach to finite population sampling theor: application to the hospital discharge surve. Rockville Md. DHEW o. (HSM) 73-39:-3. Roall R. M. (976). "The linear least-squares prediction approach to two-stage sampling." Journal of the American Statistical Association 7: Särndal C. E. B. Swensson et al. (99). Model assisted surve sampling. ew York Springer- Verlag. Särndal C. E. and R. L. Wright (984). "Cosmetic Form of Estimators in Surve Sampling." Scandinavian Journal of Statistics : Stanek E. J. J. d. M. Singer et al. (004). "A unified approach to estimation and prediction under simple random sampling." Journal of Statistical Planning and nference : Stanek E. J. and J. M. Singer (004). "Predicting Random Effects from Finite population clustered samples with response error." Journal of the American Statistical Association 99(468): Valliant R. A. H. Dorfman et al. (000). Finite Population Sampling and nference a Prediction Approach. ew York John Wile & Sons. Wu C. B. and R. R. Sitter (00). "A model-calibration approach to using complete auiliar information from surve data." Journal of the American Statistical Association 96(453): Zellner A. (963). "Estimators of Seemingl Unrelated Regression Equations: Some Eact Finite Sample Results." Journal of the American Statistical Association 58: LiW_05_m_v5.doc 9/9/005 6

19 Figure. Percent Change in MSE of Smoking Prevalence Estimates Accounting for Gender b Relative Risk of Smoking (=00 n=5) 0 Relative Risk of Smoking for Males % Change in MSE relative to Sample Prevalence Estimate all Var Comp.: b Estimate XY Cov onl:b Assume all Var Known:Beta LiW_05_m_v5.doc 9/9/005 7

20 Figure. Percent Change in MSE of Smoking Prevalence Estimates Accounting for Gender b Relative Risk of Smoking (=50 n=5) Relative Risk of Smoking for Males % Change in MSE relative to Sample Prevalence Estimate all Var Comp.: b Estimate XY Cov onl:b Assume all Var Known:Beta LiW_05_m_v5.doc 9/9/005 8

21 Appendi Proof of result () and () When there are p auiliar variables a simultaneous random permutation model can be defined similar to () with Z = vec ( Y X Xp ) and = ( p+ ) G. When µ X is known let X = P X where X = ( X X Xp ). The transformation can be represented as Z = E = 0 RZ where R = and = 0 p P ( p+) ( ) G 0 and RG G 0 where = ( p) RE. The reparameterized model that incorporates the constraints thus has a form identical to Model (). To represent the SRS sampling let ( ( )) ( ( n) n ) portioned simultaneous permutation model takes the same form of (3) p+ n 0 n n K = and thus the p+ 0 n Z G µ = + E Z G where ( ) G = 0 n np and ( ( ) ). G = 0 n n p A linear function of the random variables can be defined similar to (4) with C= ( c 0 p ) ( ) ( C = c 0 np and C = c 0 ( n) p ). The estimator of C Z can then be defined as a linear function of the sample ( p) is a ( p ) w w w wz where = ( ) w = w w w + n vector of coefficients. With these a linear unbiased minimum variance estimation can be derived b minimizing the function ( ) { } Φ w = wv w w V C + wg C G. λ LiW_05_m_v5.doc 9/9/005 9

22 Specificall differentiating the above function with respect to w and λ and setting the derivatives to zeros results in the following estimating equations V G wˆ V C = ˆ λ G 0 G C where ( G = ( n 0 np) and G = n 0 ( n) p). The unique solution is ( ) ( ) ˆ w = V V + G G V G G G V V C. (A.) Since V = Σ P n V = Σ J n ( n) Pn n= n and n n n n G V G u Σ u with n P n n n = we have the following useful identities ( ) = ( ) u = ( 0 p ) and V V = p+ J n ( n). Subsequentl (A.) can be simplified as n follows wˆ = J ( ) + ( u Σ u ) ( ) ( n ) n n Σ uu P C n (( ) ) c = u n + uσ u Σ u n. n n p+ n n n n Since Σ ( σ σ XΣX σx ) σ σ X ( ΣX σxσ σ X ) ( σ ) ( σ ) σ σ X = = σx ΣX ΣXσ X σxσxσx ΣX σx σx ( ) ( ) ( u Σ u Σ u σ σxσxσx) ( σ σ XΣXσX) ( σ ) = = = Σ σ σ Σ σ ΣXσX β X X X X X where β= Σ σ X X. Therefore we have nc wˆ = u n + n n n. β LiW_05_m_v5.doc 9/9/005 0

23 Consequentl Pˆ = C + w Z ( ) = cy + c Y β X µ n = fy + f Y f ( ) n ( ) β ( X µ ) ( ) = Y β X µ and var ( Pˆ ) ( ) ( = C + w V C + w f = ( ρx ) n σ ) where ρ = σ Σ σ σ is the multiple correlation coefficient of Y on X. X X X X 0 LiW_05_m_v5.doc 9/9/005

Random permutation models with auxiliary variables. Design-based random permutation models with auxiliary information. Wenjun Li

Random permutation models with auxiliary variables. Design-based random permutation models with auxiliary information. Wenjun Li Running heads: Random permutation models with auxiliar variables Design-based random permutation models with auxiliar information Wenjun Li Division of Preventive and Behavioral Medicine Universit of Massachusetts

More information

Division of Preventative and Behavioral Medicine. University of Massachusetts Medical School, Worcester, MA 01655

Division of Preventative and Behavioral Medicine. University of Massachusetts Medical School, Worcester, MA 01655 USE OF AUXLARY FORMATO A DESG-BASED RADOM PERMUTATO MODEL Wenjun Li Division of Preventative Behavioral Medicine Universit of Massachusetts Medical School, Worcester, MA 0655 Edward J. Stanek Department

More information

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek

Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/21/03) Ed Stanek Comments on Design-Based Prediction Using Auxilliary Information under Random Permutation Models (by Wenjun Li (5/2/03) Ed Stanek Here are comments on the Draft Manuscript. They are all suggestions that

More information

Correlation. Bret Hanlon and Bret Larget. Department of Statistics University of Wisconsin Madison. December 6, Correlation 1 / 25

Correlation. Bret Hanlon and Bret Larget. Department of Statistics University of Wisconsin Madison. December 6, Correlation 1 / 25 Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, 2 Correlation / 25 The Big Picture We have just completed a length series of lectures on ANOVA

More information

Domain estimation under design-based models

Domain estimation under design-based models Domain estimation under design-based models Viviana B. Lencina Departamento de Investigación, FM Universidad Nacional de Tucumán, Argentina Julio M. Singer and Heleno Bolfarine Departamento de Estatística,

More information

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Copright, 8, RE Kass, EN Brown, and U Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Chapter 6 Random Vectors and Multivariate Distributions 6 Random Vectors In Section?? we etended

More information

Estimators in simple random sampling: Searls approach

Estimators in simple random sampling: Searls approach Songklanakarin J Sci Technol 35 (6 749-760 Nov - Dec 013 http://wwwsjstpsuacth Original Article Estimators in simple random sampling: Searls approach Jirawan Jitthavech* and Vichit Lorchirachoonkul School

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators

Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators. Calibration Estimators Implications of Ignoring the Uncertainty in Control Totals for Generalized Regression Estimators Jill A. Dever, RTI Richard Valliant, JPSM & ISR is a trade name of Research Triangle Institute. www.rti.org

More information

Estimation of Finite Population Variance Under Systematic Sampling Using Auxiliary Information

Estimation of Finite Population Variance Under Systematic Sampling Using Auxiliary Information Statistics Applications {ISS 5-7395 (online)} Volume 6 os 08 (ew Series) pp 365-373 Estimation of Finite Population Variance Under Sstematic Samplin Usin Auiliar Information Uzma Yasmeen Muhammad oor-ul-amin

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Computation of Csiszár s Mutual Information of Order α

Computation of Csiszár s Mutual Information of Order α Computation of Csiszár s Mutual Information of Order Damianos Karakos, Sanjeev Khudanpur and Care E. Priebe Department of Electrical and Computer Engineering and Center for Language and Speech Processing

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson Department of Statistics Stockholm University Introduction

More information

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total

More information

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total. Abstract

NONLINEAR CALIBRATION. 1 Introduction. 2 Calibrated estimator of total.   Abstract NONLINEAR CALIBRATION 1 Alesandras Pliusas 1 Statistics Lithuania, Institute of Mathematics and Informatics, Lithuania e-mail: Pliusas@tl.mii.lt Abstract The definition of a calibrated estimator of the

More information

CONTINUOUS SPATIAL DATA ANALYSIS

CONTINUOUS SPATIAL DATA ANALYSIS CONTINUOUS SPATIAL DATA ANALSIS 1. Overview of Spatial Stochastic Processes The ke difference between continuous spatial data and point patterns is that there is now assumed to be a meaningful value, s

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson 1 Introduction When planning the sampling strategy (i.e.

More information

6. Vector Random Variables

6. Vector Random Variables 6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following

More information

Statistics in Medicine. Prediction with measurement errors: do we really understand the BLUP?

Statistics in Medicine. Prediction with measurement errors: do we really understand the BLUP? Prediction with measurement errors: do we really understand the BLUP? Journal: Manuscript ID: SIM-0-00 Wiley - Manuscript type: Paper Date Submitted by the Author: 0-Apr-00 Complete List of Authors: Singer,

More information

A GENERAL FAMILY OF DUAL TO RATIO-CUM- PRODUCT ESTIMATOR IN SAMPLE SURVEYS

A GENERAL FAMILY OF DUAL TO RATIO-CUM- PRODUCT ESTIMATOR IN SAMPLE SURVEYS STATISTIS I TASITIO-new series December 0 587 STATISTIS I TASITIO-new series December 0 Vol. o. 3 pp. 587 594 A GEEAL FAMILY OF DUAL TO ATIO-UM- ODUT ESTIMATO I SAMLE SUVEYS ajesh Singh Mukesh Kumar ankaj

More information

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Statistica Sinica 8(1998), 1165-1173 A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR Phillip S. Kott National Agricultural Statistics Service Abstract:

More information

Comparison of Estimators in Case of Low Correlation in Adaptive Cluster Sampling. Muhammad Shahzad Chaudhry 1 and Muhammad Hanif 2

Comparison of Estimators in Case of Low Correlation in Adaptive Cluster Sampling. Muhammad Shahzad Chaudhry 1 and Muhammad Hanif 2 ISSN 684-8403 Journal of Statistics Volume 3, 06. pp. 4-57 Comparison of Estimators in Case of Lo Correlation in Muhammad Shahad Chaudhr and Muhammad Hanif Abstract In this paper, to Regression-Cum-Eponential

More information

Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom

Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Covariance and Correlation Class 7, 18.05 Jerem Orloff and Jonathan Bloom 1. Understand the meaning of covariance and correlation. 2. Be able to compute the covariance and correlation

More information

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Statistica Sinica 22 (2012), 777-794 doi:http://dx.doi.org/10.5705/ss.2010.238 BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING Desislava Nedyalova and Yves Tillé University of

More information

Dependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression

Dependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression Dependence and scatter-plots MVE-495: Lecture 4 Correlation and Regression It is common for two or more quantitative variables to be measured on the same individuals. Then it is useful to consider what

More information

Biostatistics in Research Practice - Regression I

Biostatistics in Research Practice - Regression I Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.

More information

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Keywords: Survey sampling, finite populations, simple random sampling, systematic

More information

Regression-Cum-Exponential Ratio Type Estimators for the Population Mean

Regression-Cum-Exponential Ratio Type Estimators for the Population Mean Middle-East Journal of Scientific Research 19 (1: 1716-171, 014 ISSN 1990-933 IDOSI Publications, 014 DOI: 10.589/idosi.mejsr.014.19.1.635 Regression-um-Eponential Ratio Tpe Estimats f the Population Mean

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

LINEAR REGRESSION ANALYSIS

LINEAR REGRESSION ANALYSIS LINEAR REGRESSION ANALYSIS MODULE V Lecture - 2 Correcting Model Inadequacies Through Transformation and Weighting Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technolog Kanpur

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

8.1 Exponents and Roots

8.1 Exponents and Roots Section 8. Eponents and Roots 75 8. Eponents and Roots Before defining the net famil of functions, the eponential functions, we will need to discuss eponent notation in detail. As we shall see, eponents

More information

Square Estimation by Matrix Inverse Lemma 1

Square Estimation by Matrix Inverse Lemma 1 Proof of Two Conclusions Associated Linear Minimum Mean Square Estimation b Matri Inverse Lemma 1 Jianping Zheng State e Lab of IS, Xidian Universit, Xi an, 7171, P. R. China jpzheng@idian.edu.cn Ma 6,

More information

UNCORRECTED SAMPLE PAGES. 3Quadratics. Chapter 3. Objectives

UNCORRECTED SAMPLE PAGES. 3Quadratics. Chapter 3. Objectives Chapter 3 3Quadratics Objectives To recognise and sketch the graphs of quadratic polnomials. To find the ke features of the graph of a quadratic polnomial: ais intercepts, turning point and ais of smmetr.

More information

Estimation of Population Mean on Recent Occasion under Non-Response in h-occasion Successive Sampling

Estimation of Population Mean on Recent Occasion under Non-Response in h-occasion Successive Sampling Journal of Modern Applied Statistical Metods Volume 5 Issue Article --06 Estimation of Population Mean on Recent Occasion under Non-Response in -Occasion Successive Sampling Anup Kumar Sarma Indian Scool

More information

Superpopulations and Superpopulation Models. Ed Stanek

Superpopulations and Superpopulation Models. Ed Stanek Superpopulations and Superpopulation Models Ed Stanek Contents Overview Background and History Generalizing from Populations: The Superpopulation Superpopulations: a Framework for Comparing Statistics

More information

From the help desk: It s all about the sampling

From the help desk: It s all about the sampling The Stata Journal (2002) 2, Number 2, pp. 90 20 From the help desk: It s all about the sampling Allen McDowell Stata Corporation amcdowell@stata.com Jeff Pitblado Stata Corporation jsp@stata.com Abstract.

More information

Demonstrate solution methods for systems of linear equations. Show that a system of equations can be represented in matrix-vector form.

Demonstrate solution methods for systems of linear equations. Show that a system of equations can be represented in matrix-vector form. Chapter Linear lgebra Objective Demonstrate solution methods for sstems of linear equations. Show that a sstem of equations can be represented in matri-vector form. 4 Flowrates in kmol/hr Figure.: Two

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

10.2 The Unit Circle: Cosine and Sine

10.2 The Unit Circle: Cosine and Sine 0. The Unit Circle: Cosine and Sine 77 0. The Unit Circle: Cosine and Sine In Section 0.., we introduced circular motion and derived a formula which describes the linear velocit of an object moving on

More information

Mathematical Statistics. Gregg Waterman Oregon Institute of Technology

Mathematical Statistics. Gregg Waterman Oregon Institute of Technology Mathematical Statistics Gregg Waterman Oregon Institute of Technolog c Gregg Waterman This work is licensed under the Creative Commons Attribution. International license. The essence of the license is

More information

A new approach to weighting and inference in sample surveys

A new approach to weighting and inference in sample surveys Biometria (2008), 95, 3,pp. 539 553 C 2008 Biometria Trust Printed in Great Britain doi: 10.1093/biomet/asn028 A new approach to weighting and inference in sample surves BY JEAN-FRANÇOIS BEAUMONT Statistics

More information

Strain Transformation and Rosette Gage Theory

Strain Transformation and Rosette Gage Theory Strain Transformation and Rosette Gage Theor It is often desired to measure the full state of strain on the surface of a part, that is to measure not onl the two etensional strains, and, but also the shear

More information

EDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1 UNIT 2 - NQF LEVEL 4 OUTCOME 4 - STATISTICS AND PROBABILITY TUTORIAL 3 LINEAR REGRESSION

EDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1 UNIT 2 - NQF LEVEL 4 OUTCOME 4 - STATISTICS AND PROBABILITY TUTORIAL 3 LINEAR REGRESSION EDEXCEL AALYTICAL METHODS FOR EGIEERS H1 UIT - QF LEVEL 4 OUTCOME 4 - STATISTICS AD PROBABILITY TUTORIAL 3 LIEAR REGRESSIO Tabular and graphical form: data collection methods; histograms; bar charts; line

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D. Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and

More information

Section 1.5 Formal definitions of limits

Section 1.5 Formal definitions of limits Section.5 Formal definitions of limits (3/908) Overview: The definitions of the various tpes of limits in previous sections involve phrases such as arbitraril close, sufficientl close, arbitraril large,

More information

a 2 x y 1 x 1 y SOL AII.1a

a 2 x y 1 x 1 y SOL AII.1a SOL AII.a The student, given rational, radical, or polnomial epressions, will a) add, subtract, multipl, divide, and simplif rational algebraic epressions; Hints and Notes Rules for fractions: ) Alwas

More information

Characterization of the Skew-Normal Distribution Via Order Statistics and Record Values

Characterization of the Skew-Normal Distribution Via Order Statistics and Record Values International Journal of Statistics and Probabilit; Vol. 4, No. 1; 2015 ISSN 1927-7032 E-ISSN 1927-7040 Published b Canadian Center of Science and Education Characterization of the Sew-Normal Distribution

More information

2: Distributions of Several Variables, Error Propagation

2: Distributions of Several Variables, Error Propagation : Distributions of Several Variables, Error Propagation Distribution of several variables. variables The joint probabilit distribution function of two variables and can be genericall written f(, with the

More information

Let X denote a random variable, and z = h(x) a function of x. Consider the

Let X denote a random variable, and z = h(x) a function of x. Consider the EE385 Class Notes 11/13/01 John Stensb Chapter 5 Moments and Conditional Statistics Let denote a random variable, and z = h(x) a function of x. Consider the transformation Z = h(). We saw that we could

More information

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analsis of Longitudinal Data Patrick J. Heagert PhD Department of Biostatistics Universit of Washington 1 Auckland 2008 Session Three Outline Role of correlation Impact proper standard errors Used to weight

More information

A comparison of estimation accuracy by the use of KF, EKF & UKF filters

A comparison of estimation accuracy by the use of KF, EKF & UKF filters Computational Methods and Eperimental Measurements XIII 779 A comparison of estimation accurac b the use of KF EKF & UKF filters S. Konatowski & A. T. Pieniężn Department of Electronics Militar Universit

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

P 0 (x 0, y 0 ), P 1 (x 1, y 1 ), P 2 (x 2, y 2 ), P 3 (x 3, y 3 ) The goal is to determine a third degree polynomial of the form,

P 0 (x 0, y 0 ), P 1 (x 1, y 1 ), P 2 (x 2, y 2 ), P 3 (x 3, y 3 ) The goal is to determine a third degree polynomial of the form, Bezier Curves While working for the Renault automobile compan in France, an engineer b the name of P. Bezier developed a sstem for designing car bodies based partl on some fairl straightforward mathematics.

More information

Sample Problems For Grade 9 Mathematics. Grade. 1. If x 3

Sample Problems For Grade 9 Mathematics. Grade. 1. If x 3 Sample roblems For 9 Mathematics DIRECTIONS: This section provides sample mathematics problems for the 9 test forms. These problems are based on material included in the New York Cit curriculum for 8.

More information

An Information Theory For Preferences

An Information Theory For Preferences An Information Theor For Preferences Ali E. Abbas Department of Management Science and Engineering, Stanford Universit, Stanford, Ca, 94305 Abstract. Recent literature in the last Maimum Entrop workshop

More information

Review of Probability

Review of Probability Review of robabilit robabilit Theor: Man techniques in speech processing require the manipulation of probabilities and statistics. The two principal application areas we will encounter are: Statistical

More information

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan

More information

Parameterized Joint Densities with Gaussian and Gaussian Mixture Marginals

Parameterized Joint Densities with Gaussian and Gaussian Mixture Marginals Parameterized Joint Densities with Gaussian and Gaussian Miture Marginals Feli Sawo, Dietrich Brunn, and Uwe D. Hanebeck Intelligent Sensor-Actuator-Sstems Laborator Institute of Computer Science and Engineering

More information

nm nm

nm nm The Quantum Mechanical Model of the Atom You have seen how Bohr s model of the atom eplains the emission spectrum of hdrogen. The emission spectra of other atoms, however, posed a problem. A mercur atom,

More information

F. Jay Breidt Colorado State University

F. Jay Breidt Colorado State University Model-assisted survey regression estimation with the lasso 1 F. Jay Breidt Colorado State University Opening Workshop on Computational Methods in Social Sciences SAMSI August 2013 This research was supported

More information

Inference about the Slope and Intercept

Inference about the Slope and Intercept Inference about the Slope and Intercept Recall, we have established that the least square estimates and 0 are linear combinations of the Y i s. Further, we have showed that the are unbiased and have the

More information

Admissible Estimation of a Finite Population Total under PPS Sampling

Admissible Estimation of a Finite Population Total under PPS Sampling Research Journal of Mathematical and Statistical Sciences E-ISSN 2320-6047 Admissible Estimation of a Finite Population Total under PPS Sampling Abstract P.A. Patel 1* and Shradha Bhatt 2 1 Department

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

LESSON #1 - BASIC ALGEBRAIC PROPERTIES COMMON CORE ALGEBRA II

LESSON #1 - BASIC ALGEBRAIC PROPERTIES COMMON CORE ALGEBRA II 1 LESSON #1 - BASIC ALGEBRAIC PROPERTIES COMMON CORE ALGEBRA II Mathematics has developed a language all to itself in order to clarif concepts and remove ambiguit from the analsis of problems. To achieve

More information

Uncertainty and Parameter Space Analysis in Visualization -

Uncertainty and Parameter Space Analysis in Visualization - Uncertaint and Parameter Space Analsis in Visualiation - Session 4: Structural Uncertaint Analing the effect of uncertaint on the appearance of structures in scalar fields Rüdiger Westermann and Tobias

More information

COMPRESSIVE (CS) [1] is an emerging framework,

COMPRESSIVE (CS) [1] is an emerging framework, 1 An Arithmetic Coding Scheme for Blocked-based Compressive Sensing of Images Min Gao arxiv:1604.06983v1 [cs.it] Apr 2016 Abstract Differential pulse-code modulation (DPCM) is recentl coupled with uniform

More information

Scatter Plot Quadrants. Setting. Data pairs of two attributes X & Y, measured at N sampling units:

Scatter Plot Quadrants. Setting. Data pairs of two attributes X & Y, measured at N sampling units: Geog 20C: Phaedon C Kriakidis Setting Data pairs of two attributes X & Y, measured at sampling units: ṇ and ṇ there are pairs of attribute values {( n, n ),,,} Scatter plot: graph of - versus -values in

More information

Unit 12 Study Notes 1 Systems of Equations

Unit 12 Study Notes 1 Systems of Equations You should learn to: Unit Stud Notes Sstems of Equations. Solve sstems of equations b substitution.. Solve sstems of equations b graphing (calculator). 3. Solve sstems of equations b elimination. 4. Solve

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

A Review on the Theoretical and Empirical Efficiency Comparisons of Some Ratio and Product Type Mean Estimators in Two Phase Sampling Scheme

A Review on the Theoretical and Empirical Efficiency Comparisons of Some Ratio and Product Type Mean Estimators in Two Phase Sampling Scheme American Journal of Mathematics and Statistics 06, 6(): 8-5 DOI: 0.59/j.ajms.06060.0 A Review on the Theoretical and Empirical Efficienc omparisons of Some Ratio and Product Tpe Mean Estimators in Two

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

An elementary model for the validation of flamelet approximations in non-premixed turbulent combustion

An elementary model for the validation of flamelet approximations in non-premixed turbulent combustion Combust. Theor Modelling 4 () 89. Printed in the UK PII: S364-783()975- An elementar model for the validation of flamelet approimations in non-premied turbulent combustion A Bourliou and A J Majda Département

More information

Structural Equation Modeling Lab 5 In Class Modification Indices Example

Structural Equation Modeling Lab 5 In Class Modification Indices Example Structural Equation Modeling Lab 5 In Class Modification Indices Example. Model specifications sntax TI Modification Indices DA NI=0 NO=0 MA=CM RA FI='E:\Teaching\SEM S09\Lab 5\jsp6.psf' SE 7 6 5 / MO

More information

A GENERAL FAMILY OF ESTIMATORS FOR ESTIMATING POPULATION MEAN USING KNOWN VALUE OF SOME POPULATION PARAMETER(S)

A GENERAL FAMILY OF ESTIMATORS FOR ESTIMATING POPULATION MEAN USING KNOWN VALUE OF SOME POPULATION PARAMETER(S) A GENERAL FAMILY OF ESTIMATORS FOR ESTIMATING POPULATION MEAN USING KNOWN VALUE OF SOME POPULATION PARAMETER(S) Dr. M. Khoshnevisan GBS, Giffith Universit, Australia-(m.khoshnevisan@griffith.edu.au) Dr.

More information

Exercise solutions: concepts from chapter 5

Exercise solutions: concepts from chapter 5 1) Stud the oöids depicted in Figure 1a and 1b. a) Assume that the thin sections of Figure 1 lie in a principal plane of the deformation. Measure and record the lengths and orientations of the principal

More information

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image. INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging

More information

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys The Canadian Journal of Statistics Vol.??, No.?,????, Pages???-??? La revue canadienne de statistique Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys Zhiqiang TAN 1 and Changbao

More information

3.1-Quadratic Functions & Inequalities

3.1-Quadratic Functions & Inequalities 3.1-Quadratic Functions & Inequalities Quadratic Functions: Quadratic functions are polnomial functions of the form also be written in the form f ( ) a( h) k. f ( ) a b c. A quadratic function ma Verte

More information

The first change comes in how we associate operators with classical observables. In one dimension, we had. p p ˆ

The first change comes in how we associate operators with classical observables. In one dimension, we had. p p ˆ VI. Angular momentum Up to this point, we have been dealing primaril with one dimensional sstems. In practice, of course, most of the sstems we deal with live in three dimensions and 1D quantum mechanics

More information

Lévy stable distribution and [0,2] power law dependence of. acoustic absorption on frequency

Lévy stable distribution and [0,2] power law dependence of. acoustic absorption on frequency Lév stable distribution and [,] power law dependence of acoustic absorption on frequenc W. Chen Institute of Applied Phsics and Computational Mathematics, P.O. Box 89, Division Box 6, Beijing 88, China

More information

Supplement-Sample Integration for Prediction of Remainder for Enhanced GREG

Supplement-Sample Integration for Prediction of Remainder for Enhanced GREG Supplement-Sample Integration for Prediction of Remainder for Enhanced GREG Abstract Avinash C. Singh Division of Survey and Data Sciences American Institutes for Research, Rockville, MD 20852 asingh@air.org

More information

Estimating Realized Random Effects in Mixed Models

Estimating Realized Random Effects in Mixed Models Etimating Realized Random Effect in Mixed Model (Can parameter for realized random effect be etimated in mixed model?) Edward J. Stanek III Dept of Biotatitic and Epidemiology, UMASS, Amhert, MA USA Julio

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

Perturbation Theory for Variational Inference

Perturbation Theory for Variational Inference Perturbation heor for Variational Inference Manfred Opper U Berlin Marco Fraccaro echnical Universit of Denmark Ulrich Paquet Apple Ale Susemihl U Berlin Ole Winther echnical Universit of Denmark Abstract

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

An example of Lagrangian for a non-holonomic system

An example of Lagrangian for a non-holonomic system Uniersit of North Georgia Nighthaks Open Institutional Repositor Facult Publications Department of Mathematics 9-9-05 An eample of Lagrangian for a non-holonomic sstem Piotr W. Hebda Uniersit of North

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

Songklanakarin Journal of Science and Technology SJST R3 LAWSON

Songklanakarin Journal of Science and Technology SJST R3 LAWSON Songklanakarin Journal of Science and Technology SJST-0-00.R LAWSON Ratio Estimators of Population Means Using Quartile Function of Auxiliary Variable in Double Sampling Journal: Songklanakarin Journal

More information

Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems

Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems Deep Nonlinear Non-Gaussian Filtering for Dnamical Sstems Arash Mehrjou Department of Empirical Inference Ma Planck Institute for Intelligent Sstems arash.mehrjou@tuebingen.mpg.de Bernhard Schölkopf Department

More information

Class 11 Maths Chapter 15. Statistics

Class 11 Maths Chapter 15. Statistics 1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class

More information

MI-2 Prob. Set #4 DUE: Tuesday, Feb 15, 2011 Spring 11. 4) Write a matrix equation and give result:

MI-2 Prob. Set #4 DUE: Tuesday, Feb 15, 2011 Spring 11. 4) Write a matrix equation and give result: MI- Prob. Set #4 DUE: Tuesda, Feb 1, 011 Spring 11 1) Dr. Condie has sport coats, pairs of slacks, 4 shirts and 6 ties that he can choose to wear on an da. a) How man combinations of these four pieces

More information

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Simple design-efficient calibration estimators for rejective and high-entropy sampling Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling

More information

A Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples

A Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples A Model-Over-Design Integration for Estimation from Purposive Supplements to Probability Samples Avinash C. Singh, NORC at the University of Chicago, Chicago, IL 60603 singh-avi@norc.org Abstract For purposive

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

Linear Equation Theory - 2

Linear Equation Theory - 2 Algebra Module A46 Linear Equation Theor - Copright This publication The Northern Alberta Institute of Technolog 00. All Rights Reserved. LAST REVISED June., 009 Linear Equation Theor - Statement of Prerequisite

More information

A GENERAL FAMILY OF ESTIMATORS FOR ESTIMATING POPULATION MEAN USING KNOWN VALUE OF SOME POPULATION PARAMETER(S)

A GENERAL FAMILY OF ESTIMATORS FOR ESTIMATING POPULATION MEAN USING KNOWN VALUE OF SOME POPULATION PARAMETER(S) A GENERAL FAMILY OF ESTIMATORS FOR ESTIMATING POPULATION MEAN USING KNOWN VALUE OF SOME POPULATION PARAMETER(S) Dr. M. Khoshnevisan, GBS, Griffith Universit, Australia (m.khoshnevisan@griffith.edu.au)

More information

Chapter 3. Expectations

Chapter 3. Expectations pectations - Chapter. pectations.. Introduction In this chapter we define five mathematical epectations the mean variance standard deviation covariance and correlation coefficient. We appl these general

More information