Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals
|
|
- Roy Stafford
- 6 years ago
- Views:
Transcription
1 Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Michael Sherman Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843, USA sherman@stat.tamu.edu Arnab Maity Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, North Carolina 27695, U.S.A. amaity@ncsu.edu Suojin Wang Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843, USA sjwang@stat.tamu.edu Abstract In sample surveys and many other areas of application, the ratio of variables is often of great importance. This often occurs when one variable is available at the population level while another variable of interest is available for sample data only. In this case, using the sample ratio, we can often gather valuable information on the variable of interest for the unsampled observations. In many other studies, the ratio itself is of interest, for example when estimating proportions from a random number of observations. In this note we compare three confidence intervals for the population ratio: A large sample interval, a log based version of the large sample interval, and Fieller s interval. This is done through data analysis and through a small simulation experiment. The Fieller method has often been proposed as a superior interval for small sample sizes. We show through a data example and simulation experiments that Fieller s method often gives nonsensical and uninformative intervals when the observations are noisy relative to the mean of the data. The large sample interval does not similarly suffer and thus can be a more reliable method for small and large samples. Some key words: Fieller s interval, ratio estimation, variance estimation, sample surveys, small sample inference.
2 1 Introduction In sample surveys and many other areas of application, the ratio of variables is often of great importance. This often occurs when one variable is available at the population level while another variable of interest is available for sample data only. In this case, if the ratio of the two variables is estimable we gather valuable information on the variable of interest for the unsampled observations. In other studies, the ratio itself is of interest. One such example is the estimation of proportions where the number of observations is random, as is often the case in cluster sampling, for example. Other examples include applications to cost effectiveness in economics (Jiang, Wu, and Williams, 2000), studying the ratio of regression coefficients (Hirschberg and Lye, 2007) and comparing health outcomes across spatial domains (Beyene and Moineddin, 2005). For these reasons estimation of the true ratio between variables is of great interest. 2 Methodological Development There are two basic frameworks in which we draw inferences: the infinite population and the finite population settings. For the former, we consider our observations as coming from a bivariate distribution function, F (x, y), with correlation coefficient ρ. In the latter, our data observations come from a finite population (X, Y ), X = (x 1,..., x N ), Y = (y 1,..., y N ) of size N. In the former case the population ratio is defined to be R = E(Y )/E(X) = µ y /µ x, where the expectations, µ y and µ x, are the population means of the variables. In the latter case the population ratio is defined to be R = N i=1 y i / N i=1 x i = Ȳ / X. Our results are not specific to the finite or infinite population case, and in either case we denote a random sample by the sample elements chosen as S = (i 1,..., i n ). Then the sample data is (x, y), where x = x(s), y = y(s). For either the infinite population or finite population setting, we consider the sample ratio to be the ratio of sample means: r = i S y i/ i S x i = ȳ/ x where x, ȳ are the sample means of the observations x = x(s) and y = y(s). 1
3 We desire inferences on the population ratio, R, based on the sample ratio r. Define the sample standard deviations of x = x(s) and y = y(s) as s x and s y. Further, define c x x = n 1 s 2 x/ x 2, cȳȳ = n 1 s 2 y/ȳ 2 and cȳ x = n 1 s xy /( xȳ). Large sample approximations, e.g., Cochran (1977) or Lohr (2009), show that σ C 2 = i S (y i rx i ) 2 /{n(n 1) x 2 } = r 2 [cȳȳ + c x x 2cȳ x ] is a consistent variance estimator of r so that a large sample confidence interval is given by I C = (r ± t α/2,n 1 r cȳȳ + c x x 2cȳ x ), where t α,n 1 denotes the (1 α)100% quantile of a t n 1 distribution. Note that we have assumed that in the finite population situation that the sampling fraction n/n is small enough so that we can consider the finite population correction, fpc = 1 n/n, to be equal to one. We assume this throughout the present study. If this is not the case then all variance estimates can be adjusted accordingly. As an alternative to the previous large sample interval, one can use the log ratio as a pivotal quantity, construct a confidence interval around the estimated log ratio and exponentiate both end points to obtain an interval for the actual ratio. Specifically, using a Taylor s expansion we derive var{log(r)} = var{log(ȳ) log( x)} var{log(ȳ ) log( X) + (ȳ Ȳ )/Ȳ ( x X)/ X} = var(ȳ Ȳ )/Ȳ 2 + var( x X)/ X 2 2cov{(ȳ Ȳ ), ( x X)}/(Ȳ X). Hence we have the variance estimator var{log(r)} cȳȳ + c x x 2cȳ x. Thus the log ratio based interval is I LR = (r exp{ t α/2,n 1 cȳȳ + c x x 2cȳ x }, r exp{t α/2,n 1 cȳȳ + c x x 2cȳ x }). One motivation for this interval is that positive variables are often skewed to the right and the distribution of the sample ratio, r, is also often skewed to the right. The interval I C is by definition symmetric around r while the interval I LR allows for asymmetric behavior. 2
4 For small samples, another better known suggested improvement on I C is the so called Fieller interval, suggested initially by Fieller (1932). This interval has often been used in applications, for example in Heitjan (2000), Hirschberg and Lye (2007), Jiang, Wu, and Williams (2000), and Beyene and Moineddin (2005). To explore the behavior of this interval, it is natural to assume that the framework for sampling is simple random sampling from an infinite population. This is mainly for simplicity and clarity in our analytical discussions. This Fieller approach, however, is often used in the finite population setting where data are typically not assumed to come from a specific distribution. The Fieller interval assumes that the joint distribution of the infinite population, F (x, y), is bivariate normal. In this case, the pivotal statistic T = n 1/2 (ȳ R x)/(s 2 y 2Rs xy + R 2 s 2 x) 1/2 has an exact t-distribution. This is easily seen by defining the variables z i = y i Rx i, i S. Then the mean of the z variables is 0, and T = n 1/2 ( z)/s z has a t-distribution with n 1 degrees of freedom, where s z is the sample standard deviation of z i. A confidence interval for the ratio is given by the set of all R such that t α/2,n 1 < T < t 1 α/2,n 1. (1) Using this we can form the equation: t 2 α/2,n 1 > T 2. This expression is easily written as a quadratic inequality in R: ar 2 + br + c < 0 (2) with a = 1 t 2 α/2,n 1 c x x, b = 2r(1 t 2 α/2,n 1 c xȳ), c = r 2 (1 t 2 α/2,n 1 c ȳȳ). Suppose there are two real valued solutions, d 1 d 2, to the equation in (2) obtained by changing the inequality into an equality. If the coefficient a is such that a > 0 then the solution to the inequality (1) for R is (d 1, d 2 ). On the other hand, if a < 0, the solution in R is (, d 1 ) (d 2, ). Using the quadratic formula, and solving for R we find the two roots, d 1 and d 2 as functions of our sample data. This gives the endpoints of the Fieller interval. The endpoints of the 3
5 interval are given by I F = r [ (1 t 2 α/2,n 1 c xȳ) ± t α/2,n 1 {(c x x + cȳȳ 2cȳ x ) t 2 α/2,n 1 (c x xcȳȳ c 2 ȳ x)} 1/2 ]. 1 t 2 α/2,n 1 c x x In a small sample (n = 8) of observations, Efron and Tibshirani (1993) seek to estimate the population ratio in a study on bioequivalence. The observations are roughly compatible with normality and the Fieller interval is called exact and the gold standard in this case. This seems appropriate and several nonparametric intervals based on resampling are shown to be close to the Fieller interval with the better intervals closer to the Fieller interval. In the following section we show that qualitatively different behavior of the Fieller interval, I F, can occur. 3 Data Example In contrast to the Efron and Tibshirani example from Section 2, we show there can be quite different behavior of the Fieller interval. Consider the n = 8 observations from Lehtonen and Pahkinen (2004), p.103 of two variables (ue91, hou85), the number of unemployed people and the number of households in several provinces of central Finland. The data are given in Table 1. The goal is to estimate the true ratio of the ue91 to the hou85 variables. Although the sampling fraction is n/n = 8/32 in this case for simplicity we assume in our illustration that fpc = 1. Using the formula for the Fieller interval we find the two solutions to the quadratic equation to be and The implied interval seems to be reasonably precise. Note, however, that the sample ratio, r = , and that this value is not included in the interval. Closer inspection shows that the right endpoint from the formula is given by and the left endpoint from the formula is given by This is a curious situation. If we seek to find the interval from (1), we find that the interval is given by (, ) (0.1510, ), the complement of the interval (0.1379, ). This suggests that we cannot simply switch the endpoints to obtain an appropriate interval. Further, we see that the correct Fieller interval 4
6 ue91 hou Table 1: A simple random sample without replacement of size n = 8 from the Province 91 population data as presented in Lehtonen and Pahkinen (2004), p.103. Presented are the two study variables: the number of unemployed persons (ue91) and the number of households (hou85). gives a very uninformative and completely nonsensical interval (which is actually the union of two intervals). Inspection of the formula shows that we have completely nonsensical intervals when either (c x x + cȳȳ 2cȳ x ) t 2 α/2,n 1 (c x xcȳȳ c 2 ȳ x) < 0 and/or 1 t 2 α/2,n 1 c x x < 0. The second situation occurs when the sample variance of the x s is large relative to the square of the sample mean. Note, also, that this is the case where the usual confidence interval for µ x contains zero. In this data set we find that c x x = 0.37 and this in turn leads to a value of 1 t 2 α/2,n 1 c x x = Note that in stark contrast to the strange behavior of the Fieller interval, the usual large sample interval is given by I C = (0.1478, ) and the log ratio interval is given by I LR = (0.1483, ). Both of these intervals give informative inferences on the population ratio. 5
7 4 Simulation We now investigate how the three intervals, Fieller, log ratio and large sample perform in a simulation experiment. We are particularly interested in the small sample behavior of Fieller s interval, where it has been assumed to be particularly suited. The model is: X F X, Y = h(x)β + ɛ, where ɛ F ɛ. We consider two choices for h( ), namely, h(x) = x and h(x) = x 2. We set F X and F ɛ to be one of the three distributions (1) Normal(2,1), (2) Gamma(shape=3) and (3) Gamma(shape=1). Also we set β = 0, 0.5 and 1. For each of these cases, we consider sample sizes n = 5, n = 10 and n = 30. We calculate all three intervals and evaluate the coverage and mean interval length based on 10,000 data sets for each setting. To give some idea of what might be expected, we give the details of one particular simulation experiment. In this case, we set n = 5, and set F X and F ɛ to be the Normal(2,1) distribution with β = 0, so that in this case the t-distribution for T is exact. In one simulation we find that the right endpoint is less than the left in 921 cases. Further in 17 cases we find that (c x x + cȳȳ 2cȳ x ) t 2 α/2,n 1 (c x xcȳȳ c 2 ȳ x) < 0 leading to a negative square root. In these cases the interval is undefined. Using the interval (d 1, d 2 ) in all cases we find the coverage to be.857. However, including the 921 intervals in which this interval does not include the true ratio but (, d 1 ) (d 2, ) does contain R we find a coverage of.949 which is within simulation error of the nominal.950 (as it must be as the t-distribution is exact). It is not clear what to do in the 17 cases with a negative square root. In these cases, the left hand side of (2) is either > 0 or < 0 for all real values of R, which means that either there is no solution for R in (2) or the solution for R is the whole real line. By convention, we say there is no solution and in this case the interval fails to capture the true ratio. However, the practical impact of this situation is small in this case. The difficulty is in a significant minority the correct intervals necessary to get the nominal coverage are the absurd disjoint 6
8 intervals of the form (, d 1 ) (d 2, ). These intervals are nonsensical and would likely be interpreted inadvertently to actually be (d 1, d 2 ) which is not the correct interval. Note, in particular, that in the data example in Section 3 the interval obtained by switching the endpoints is not comparable to either the large sample interval or the log interval. A natural question is how likely it is to obtain the nonsensical interval. Recall that this occurs when the coefficient a = 1 t 2 α/2,n c x x is negative. In the case of bivariate normality where both X and Y follow Normal(2,1) distributions, we can explicitly give the probability of a < 0. Note that P [a < 0] = P [t 2 α/2,n 1c x x > 1] = P [n x 2 /s 2 x t 2 α/2,n 1]. Further, n x 2 /σx 2 has a noncentral χ 2 distribution with 1 degree of freedom and noncentrality parameter nµ 2 x/σx. 2 Now, (n 1)s 2 x/σx 2 has a (central) χ 2 distribution with n 1 degrees of freedom. Thus, n x 2 /s 2 x has a noncentral F -distribution with numerator degrees of freedom 1, denominator degrees of freedom n 1, and noncentrality parameter nµ 2 x/σx. 2 To evaluate this probability in our simulation experiment, with n = 5, µ x = 2 and σx 2 = 1. Thus we have nµ 2 x/σx 2 = 20 and we find for α = 0.05 P [F (1, 4, 20) < 7.71] = , where F (a, b, c) denotes a random variable with the F -distribution with a and b numerator and denominator degrees of freedom and noncentrality parameter c. This value is close to our empirical result in the simulation experiment, where we observed 921/10000 =.0921 negative denominators in the Fieller interval. For other sample sizes, Figure 1 displays the chance of a negative denominator in the Fieller interval when sampling from a bivariate normal population where both X and Y follow Normal(2,1) distributions. We see from the left plot in Figure 1 that once the sample size is larger than n = 10 the chance is quite small. The sample size needed to be safe from the sign switch is in fact dependent on the inverse of the coefficient of variation (µ x /s x ), as nµ 2 x/σx 2 is the noncentrality of the F distribution. In the left plot in Figure 1 cv = 0.5 while in the right plot both X and Y follow Normal(1,1) 7
9 % of times a < % of times a < Sample size (n) Sample size (n) Figure 1: Results from simulation study. Displayed are the chance of a negative denominator in the Fieller interval when sampling from a bivariate normal population with coefficient of variation 0.5 (left) and 1.0 (right), respectively. distributions so cv = 1.0. The results are much more severe in the right plot. For larger values of cv the Fieller interval performs badly even for larger sample sizes. We note that a potential problem with bigger cv is that the log ratio method can also become non-applicable for large cv s as x itself may become negative. Table 2 summarizes our simulation results. In all cases when two distinct roots d 1 and d 2 were found in the construction of the Fieller interval the interval was taken to be (d 1, d 2 ). We see that the Fieller interval, often motivated for good small sample behavior, behaves very poorly for n = 5 and moderately badly for n = 10. The reason for the undercoverage of Fieller s interval under bivariate normality is as discussed above. Fieller intervals tend to be the widest but in many cases still do not have coverage closest to nominal of the three intervals. For the larger sample size of n = 30 all three intervals perform well across all situations considered. We see that under gamma distributions the comparisons of the three intervals are qualitatively similar. 8
10 5 Conclusion We have studied the Fieller interval for a population ratio pointing out that care must be used when using the Fieller interval. We can obtain nonsensical answers from the formula. We see that this is not a rare occurrence. For bivariate normal observations Fieller s formula gives nonsensical results in approximately 10 percent of data sets when n = 5 and cv = 0.5. This sample size is small, but such sample sizes are common in biological applications. The larger the coefficient of variation the larger the proportion of nonsensical and uninformative intervals we obtain using Fieller s interval. The large sample approximations leading to the Cochran and log ratio intervals perform more stably in general for small samples. Although the log ratio method requires positive means to be usable, it appears to perform generally better when applicable than Cochran s large sample method, especially when the sample size is small. The bootstrap is a common alternative to large sample methods for small to moderate sample sizes. The analysis of a ratio, however, is a particularly difficult problem for the bootstrap and great care is necessary to draw reasonable inferences. See, e.g., Chapter 25 of Efron and Tibshirani (1993) where bias correction and calibration are necessary to make the intervals perform adequately. References Beyene, J. and Moineddin, R. (2005), Methods for Confidence Interval Estimation of a Ratio Parameter with Application to Location Quotients, BMC Medical Research Methodology. 5, 32. Cochran, W. G. (1977), Sampling Techniques, New York: Wiley. Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, New York: Chapman Hall. Fieller, E.C. (1932), The Distribution of the Index in a Bivariate Normal Distribution, 9
11 Biometrika, 24, Heitjan, D.F. (2000), Fieller s Method and Net Health Benefits, Health Economics, 9, Hirschberg, J.G. and Lye, J.N. (2007), Providing Intuition to the Fieller Method with Two Geometric Representations Using STATA and EVIEWS, preprint. Jiang, G., Wu, J., and Williams, G.R. (2000), Fieller s Interval and the Bootstrap-Fieller Interval for the Incremental Cost-Effectiveness Ratio, Health Services and Outcomes Research Methodology, 1, Lehtonen, R. and Pahkinen, E. (2004), Practical Methods for Design and Analysis of Complex Surveys, 2nd Edition, New York: Wiley. Lohr, S.L. (2009), Sampling: Design and Analysis, 2nd Edition, Pacific Grove: Brooks/Cole. 10
12 n = 5 n = 10 n = 30 β = 0 β = 0.5 β = 1 β = 0 β = 0.5 β = 1 β = 0 β = 0.5 β = 1 case: X Normal(2, 1), Y = Xβ + Normal(2, 1) Cochran Fieller log ratio case: X Normal(2, 1), Y = X 2 β + Normal(2, 1) Cochran Fieller log ratio case: X Normal(2, 1), Y = Xβ + Gamma(shape = 3) Cochran Fieller log ratio case: X Normal(2, 1), Y = X 2 β + Gamma(shape = 3) Cochran Fieller log ratio case: X Normal(2, 1), Y = Xβ + Gamma(shape = 1) Cochran Fieller log ratio case: X Normal(2, 1), Y = X 2 β + Gamma(shape = 1) Cochran Fieller log ratio Table 2: Results from the simulation study. For each setting (β = 0, 0.5 and 1) the coverage probability (first column) and interval length (second column) is reported. The nominal coverage is 95%. 11
13 n = 5 n = 10 n = 30 β = 0 β = 0.5 β = 1 β = 0 β = 0.5 β = 1 β = 0 β = 0.5 β = 1 case: X Gamma(shape = 3), Y = Xβ + Normal(2, 1) Cochran Fieller log ratio case: X Gamma(shape = 3), Y = X 2 β + Normal(2, 1) Cochran Fieller log ratio case: X Gamma(shape = 3), Y = Xβ + Gamma(shape = 3) Cochran Fieller log ratio case: X Gamma(shape = 3), Y = X 2 β + Gamma(shape = 3) Cochran Fieller log ratio case: X Gamma(shape = 3), Y = Xβ + Gamma(shape = 1) Cochran Fieller log ratio case: X Gamma(shape = 3), Y = X 2 β + Gamma(shape = 1) Cochran Fieller log ratio Table 2: continued. 12
14 n = 5 n = 10 n = 30 β = 0 β = 0.5 β = 1 β = 0 β = 0.5 β = 1 β = 0 β = 0.5 β = 1 case: X Gamma(shape = 1), Y = Xβ + Normal(2, 1) Cochran Fieller log ratio case: X Gamma(shape = 1), Y = X 2 β + Normal(2, 1) Cochran Fieller log ratio case: X Gamma(shape = 1), Y = Xβ + Gamma(shape = 3) Cochran Fieller log ratio case: X Gamma(shape = 1), Y = X 2 β + Gamma(shape = 3) Cochran Fieller log ratio case: X Gamma(shape = 1), Y = Xβ + Gamma(shape = 1) Cochran Fieller log ratio case: X Gamma(shape = 1), Y = X 2 β + Gamma(shape = 1) Cochran Fieller log ratio Table 2: continued. 13
Linear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More informationFinite Population Correction Methods
Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1
More informationConfidence Intervals for the Process Capability Index C p Based on Confidence Intervals for Variance under Non-Normality
Malaysian Journal of Mathematical Sciences 101): 101 115 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Confidence Intervals for the Process Capability
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More informationEstimation and sample size calculations for correlated binary error rates of biometric identification devices
Estimation and sample size calculations for correlated binary error rates of biometric identification devices Michael E. Schuckers,11 Valentine Hall, Department of Mathematics Saint Lawrence University,
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationBetter Bootstrap Confidence Intervals
by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationA Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators
Statistics Preprints Statistics -00 A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Jianying Zuo Iowa State University, jiyizu@iastate.edu William Q. Meeker
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationStatistical Hypothesis Testing
Statistical Hypothesis Testing Dr. Phillip YAM 2012/2013 Spring Semester Reference: Chapter 7 of Tests of Statistical Hypotheses by Hogg and Tanis. Section 7.1 Tests about Proportions A statistical hypothesis
More informationIEOR E4703: Monte-Carlo Simulation
IEOR E4703: Monte-Carlo Simulation Output Analysis for Monte-Carlo Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Output Analysis
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More informationReliable Inference in Conditions of Extreme Events. Adriana Cornea
Reliable Inference in Conditions of Extreme Events by Adriana Cornea University of Exeter Business School Department of Economics ExISta Early Career Event October 17, 2012 Outline of the talk Extreme
More informationEcon 371 Problem Set #1 Answer Sheet
Econ 371 Problem Set #1 Answer Sheet 2.1 In this question, you are asked to consider the random variable Y, which denotes the number of heads that occur when two coins are tossed. a. The first part of
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationSupporting Information for Estimating restricted mean. treatment effects with stacked survival models
Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationProbability Theory and Statistics. Peter Jochumzen
Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................
More informationBlock Bootstrap Prediction Intervals for Vector Autoregression
Department of Economics Working Paper Block Bootstrap Prediction Intervals for Vector Autoregression Jing Li Miami University 2013 Working Paper # - 2013-04 Block Bootstrap Prediction Intervals for Vector
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationConfidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection
Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui
More informationHeteroskedasticity-Robust Inference in Finite Samples
Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard
More informationCOMPOSITE RELIABILITY MODELS FOR SYSTEMS WITH TWO DISTINCT KINDS OF STOCHASTIC DEPENDENCES BETWEEN THEIR COMPONENTS LIFE TIMES
COMPOSITE RELIABILITY MODELS FOR SYSTEMS WITH TWO DISTINCT KINDS OF STOCHASTIC DEPENDENCES BETWEEN THEIR COMPONENTS LIFE TIMES Jerzy Filus Department of Mathematics and Computer Science, Oakton Community
More informationEstimation of AUC from 0 to Infinity in Serial Sacrifice Designs
Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,
More informationEmpirical Likelihood Inference for Two-Sample Problems
Empirical Likelihood Inference for Two-Sample Problems by Ying Yan A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Statistics
More informationMFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators
MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,
More informationKevin Ewans Shell International Exploration and Production
Uncertainties In Extreme Wave Height Estimates For Hurricane Dominated Regions Philip Jonathan Shell Research Limited Kevin Ewans Shell International Exploration and Production Overview Background Motivating
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationEstimation of uncertainties using the Guide to the expression of uncertainty (GUM)
Estimation of uncertainties using the Guide to the expression of uncertainty (GUM) Alexandr Malusek Division of Radiological Sciences Department of Medical and Health Sciences Linköping University 2014-04-15
More informationAlgebra II Unit Breakdown (Curriculum Map Outline)
QUARTER 1 Unit 1 (Arithmetic and Geometric Sequences) A. Sequences as Functions 1. Identify finite and infinite sequences 2. Identify an arithmetic sequence and its parts 3. Identify a geometric sequence
More informationSummary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club
Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report
More informationSUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota
Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In
More informationA Note on Coverage Probability of Confidence Interval for the Difference between Two Normal Variances
pplied Mathematical Sciences, Vol 6, 01, no 67, 3313-330 Note on Coverage Probability of Confidence Interval for the Difference between Two Normal Variances Sa-aat Niwitpong Department of pplied Statistics,
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationConservative variance estimation for sampling designs with zero pairwise inclusion probabilities
Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationChapter 8 - Statistical intervals for a single sample
Chapter 8 - Statistical intervals for a single sample 8-1 Introduction In statistics, no quantity estimated from data is known for certain. All estimated quantities have probability distributions of their
More informationDo not copy, post, or distribute
14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible
More informationUQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables
UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1
More informationBootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205
Bootstrap (Part 3) Christof Seiler Stanford University, Spring 2016, Stats 205 Overview So far we used three different bootstraps: Nonparametric bootstrap on the rows (e.g. regression, PCA with random
More informationA GEOMETRIC APPROACH TO CONFIDENCE SETS FOR RATIOS: FIELLER S THEOREM, GENERALIZATIONS AND BOOTSTRAP
Statistica Sinica 19 (2009), 1095-1117 A GEOMETRIC APPROACH TO CONFIDENCE SETS FOR RATIOS: FIELLER S THEOREM, GENERALIZATIONS AND BOOTSTRAP Ulrike von Luxburg and Volker H. Franz Max Planck Institute for
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationAsymptotic distribution of the sample average value-at-risk
Asymptotic distribution of the sample average value-at-risk Stoyan V. Stoyanov Svetlozar T. Rachev September 3, 7 Abstract In this paper, we prove a result for the asymptotic distribution of the sample
More information1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments
/4/008 Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University C. A Sample of Data C. An Econometric Model C.3 Estimating the Mean of a Population C.4 Estimating the Population
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical
More informationTopic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.
Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick
More information2008 Winton. Statistical Testing of RNGs
1 Statistical Testing of RNGs Criteria for Randomness For a sequence of numbers to be considered a sequence of randomly acquired numbers, it must have two basic statistical properties: Uniformly distributed
More informationWeek 2: Review of probability and statistics
Week 2: Review of probability and statistics Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED
More informationSimple Linear Regression (Part 3)
Chapter 1 Simple Linear Regression (Part 3) 1 Write an Estimated model Statisticians/Econometricians usually write an estimated model together with some inference statistics, the following are some formats
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationConfidence Intervals for the Ratio of Two Exponential Means with Applications to Quality Control
Western Kentucky University TopSCHOLAR Student Research Conference Select Presentations Student Research Conference 6-009 Confidence Intervals for the Ratio of Two Exponential Means with Applications to
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationA Bootstrap Test for Conditional Symmetry
ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School
More informationn =10,220 observations. Smaller samples analyzed here to illustrate sample size effect.
Chapter 7 Parametric Likelihood Fitting Concepts: Chapter 7 Parametric Likelihood Fitting Concepts: Objectives Show how to compute a likelihood for a parametric model using discrete data. Show how to compute
More informationLecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf
Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationA noninformative Bayesian approach to domain estimation
A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationApplied Microeconometrics (L5): Panel Data-Basics
Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics
More informationSimulating Uniform- and Triangular- Based Double Power Method Distributions
Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions
More informationCourse information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.
Course information: Instructor: Tim Hanson, Leconte 219C, phone 777-3859. Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Text: Applied Linear Statistical Models (5th Edition),
More informationMonte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics
Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,
More informationHypothesis Testing hypothesis testing approach
Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we
More informationLecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2
Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y
More informationMotivation for multiple regression
Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope
More informationAn Introduction to Parameter Estimation
Introduction Introduction to Econometrics An Introduction to Parameter Estimation This document combines several important econometric foundations and corresponds to other documents such as the Introduction
More informationOn robust and efficient estimation of the center of. Symmetry.
On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationChapter 3. Comparing two populations
Chapter 3. Comparing two populations Contents Hypothesis for the difference between two population means: matched pairs Hypothesis for the difference between two population means: independent samples Two
More informationThe Distribution of F
The Distribution of F It can be shown that F = SS Treat/(t 1) SS E /(N t) F t 1,N t,λ a noncentral F-distribution with t 1 and N t degrees of freedom and noncentrality parameter λ = t i=1 n i(µ i µ) 2
More informationThe Delta Method and Applications
Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order
More informationMeasuring the fit of the model - SSR
Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do
More informationSimple linear regression
Simple linear regression Prof. Giuseppe Verlato Unit of Epidemiology & Medical Statistics, Dept. of Diagnostics & Public Health, University of Verona Statistics with two variables two nominal variables:
More informationSummary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)
Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ
More informationThe Prediction of Monthly Inflation Rate in Romania 1
Economic Insights Trends and Challenges Vol.III (LXVI) No. 2/2014 75-84 The Prediction of Monthly Inflation Rate in Romania 1 Mihaela Simionescu Institute for Economic Forecasting of the Romanian Academy,
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationMathematical statistics
November 15 th, 2018 Lecture 21: The two-sample t-test Overview Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationOverall Plan of Simulation and Modeling I. Chapters
Overall Plan of Simulation and Modeling I Chapters Introduction to Simulation Discrete Simulation Analytical Modeling Modeling Paradigms Input Modeling Random Number Generation Output Analysis Continuous
More informationSTT 843 Key to Homework 1 Spring 2018
STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ
More informationON THE NUMBER OF BOOTSTRAP REPETITIONS FOR BC a CONFIDENCE INTERVALS. DONALD W. K. ANDREWS and MOSHE BUCHINSKY COWLES FOUNDATION PAPER NO.
ON THE NUMBER OF BOOTSTRAP REPETITIONS FOR BC a CONFIDENCE INTERVALS BY DONALD W. K. ANDREWS and MOSHE BUCHINSKY COWLES FOUNDATION PAPER NO. 1069 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationConfidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean
Confidence Intervals Confidence interval for sample mean The CLT tells us: as the sample size n increases, the sample mean is approximately Normal with mean and standard deviation Thus, we have a standard
More informationDouble Bootstrap Confidence Interval Estimates with Censored and Truncated Data
Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 22 11-2014 Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data Jayanthi Arasan University Putra Malaysia,
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More information