A Semi-Parametric Approach to Account for Complex. Designs in Multiple Imputation

Size: px
Start display at page:

Download "A Semi-Parametric Approach to Account for Complex. Designs in Multiple Imputation"

Transcription

1 A Sei-Paraetric Approach to Account for Coplex Designs in ultiple Iputation Hanzhi Zhou, Trivellore E. Raghunathan and ichael R. Elliott Progra in Survey ethodology, University of ichigan Departent of iostatistics, University of ichigan ultiple iputation (I) has becoe one of leading approaches in dealing with issing data in survey research. However, existing software packages and procedures typically do not incorporate coplex saple design features in the iputation process. Researcher has deonstrated that ipleentation of I based on siple rando sapling (SRS) assuption can cause severe bias in estiation and hence invalid inferences, especially when the design features are highly related to survey variables of interest (Reiter et al. 2006). Recent work to accoodate coplex saple designs in iputation has focused on odel-based ethods which directly odel the coplex design features in the forulation of the iputation odel. In this paper, we propose a sei-paraetric procedure as an alternative approach to incorporate coplex saple designs in I. Specifically, we divide the iputation process into two stages: the coplex feature of the survey design (including weights and clusters) is fully accounted for at the first stage, which is accoplished by applying a nonparaetric ethod to generate a series of synthetic datasets; we then perfor conventional paraetric I for issing data at the second stage using readily available iputation software designed for an SRS saple. A new cobining rule for the point and variance estiates is derived to ake valid inferences based on the two-stage procedure. Using health survey data fro the ehavior Risk Factor Surveillance Syste, we evaluate the proposed ethod with a siulation study and copare it with the odel-based ethod with respect to coplete data analysis. Results show that the proposed ethod yields saller bias and is ore efficient than the odel-based ethod. Keywords: data issing data, coplex saple design, ultiple iputation, ayesian ootstrap, synthetic 1. Introduction and Research Question ultiple iputation (I) is a principled ethod in dealing with issing data in survey research and has been adopted by federal statistical agencies in recent years. A very iportant point underlying the I theory is that the ethod was designed for coplex saple surveys hence requires the iputation to be ade conditional on saple designs. The purpose is to ake the issing data echanis ignorable: since design features are usually related to survey variables of interest in real survey data, severe bias on the estiates can be avoided when they are properly accounted for in the process (Reiter et al. 2006). However, survey practitioners usually assue siple rando sapling (SRS) when they re perforing I, largely due to the inadequacy of standard software packages in handling coplex saple designs. A typical exaple is the Sequential Regression ultivariate 1 / 18

2 Iputation procedure (SRI) using IVEware (Raghunathan et al.2001), which has been gaining increasing popularity in handling ultivariate issing data in large scale surveys. Applications such as I for issing incoe data in NHIS (Schenker et al. 2006) focused on the strategies of odeling coplicated data structure and failed to recognize the iportance of incorporating coplex saple design features, reflecting an inconsistency between theory and practice. The question is then: how do we fully incorporate the saple designs in I to achieve valid statistical inferences? Reiter et al. (2006) proposed a fixed effect odeling ethod in addressing the proble where they included design variables as predictors in the iputation odel and it outperfors SRS scenario in ters of correcting the bias. Their conclusion thus supports the general advice of including coplex saple designs in I procedure. However, they did not look at survey weight as another iportant design variable. Treating weight as scalar suary of the design inforation in the iputation odel ay not work well for inference beyond eans or totals, since interactions between the probabilities of selection and the population paraeters of interest will not be accounted for (Elliott 2007). esides, their results for real data application did not show significant gains of incorporating designs over ignoring designs as in their siulation study. Little following work has been done to replicate their results with real survey datasets or to investigate other potential ethods in this regard, say, is there a way to incorporate design inforation within the I fraework other than including the as covariates in the iputation odel? The goal of this paper is to propose a sei-paraetric two-step ethod as an alternative to the existing fully odel-based ethods. Specifically, we divide the iputation process into two steps: the coplex feature of the survey design (including weights and clusters) is fully accounted for at the first step, which is accoplished by applying a nonparaetric ethod to generate a series of synthetic datasets; we then perfor conventional paraetric I for issing data at the second step using readily available iputation software designed for an SRS saple. A new cobining rule for the point and variance estiates is derived to ake valid inferences based on the two-step procedure. The rest of this paper is structured as follows: Section 2 describes the proposed ethod in detail. We first lay out the conceptual idea and then deonstrate the theoretical results. Section 3 provides the results fro a siulation study in a PPS sapling design setting to epirically evaluate the new ethod. Section 4 concludes with discussion and directions for future research. 2. Two-step I Procedure In this section, we propose a two-step procedure to perfor ultiple iputation with which accounting for coplex designs and ultiply iputing issing data are divided into two separate steps. The basic idea is to uncoplex the coplex designs before ipleenting standard I procedure. y uncoplex, we ean a statistical procedure which akes the design features irrelevant at the analysis stage, i.e. turning a dataset into one that is self-weighting, so that we can treat the uncoplexed populations as if they were siple rando saples fro a superpopulation or the true population. Thus, siple estiation forulae for self-weighting saples directly apply. Specifically in our case, this is achieved by generating synthetic populations through a nonparaetric resapling procedure adapted fro ayesian ootstrap (Rubin 1981) i.e. the Finite Population ayesian ootstrap (FP) which will be introduced in section 2.2, such that coplication of odeling those designs in the iputation can be avoided. The standard practice of I assuing SRS can then apply directly to 2 / 18

3 the generated populations which are free of sapling designs. Two ajor assuptions are ade for ipleenting the proposed ethod: 1) The ethod is proposed to treat ite nonresponse proble under I fraework, rather than unit nonresponse. Therefore all input weights are assued to be final weights after unit nonresponse adjustent and calibration. 2) issing at rando (AR) as a ore practical issing data echanis is assued for all types of analysis under study. We do not consider CAR (issing Copletely At Rando) which is too ideal for real world surveys, neither do we consider NAR (Not issing At Rando) which is another scope of research topic Conceptual Idea Figure 1 and Figure 2 together illustrate the conceptual idea of the procedure. Figure 1 shows the proposed procedure to account for coplex designs in I. Denote Q as the population paraeter we are interested in. Denote the actual saple survey data as D ( Y, Y ), is obs where Y is represents the portion of ite issing data and Y obs represents the portion of observed data. The first step of the synthetic data generation approach creates FP synthetic populations D D D D (1) (2) ( ) {,,..., }, where D ( Y, Y ), b 1,2,...,. is obs The second step of ultiple iputation creates iputed datasets for each of the FP synthetic population generated fro the first step, D { D, D,..., D }, for b 1,2,...,. 1 2 Thus we end up with D { D, D,..., D, D, D,..., D,..., D, D,..., D } which contains all the (1) (1) (1) (2) (2) (2) ( ) ( ) ( ) iputed synthetic population datasets generated by the two-step procedure. Figure 2 (a)-(c) shows the evolution of data structure in the proposed procedure. Suppose we have three survey variables of interest, Y1, Y2 and Y3, and a survey saple of size n was drawn fro of a target population of size N through soe type of coplex sapling design. The character in pink square denotes the issing part for each survey variable in both saple data and synthetic population data. At the first step, we can think of the unobserved eleents of the population as issing by design and we treat issing values as a separate category for each variable. y applying the adapted FP ethod, a synthetic population is created, ideally a plausible reflection of the target population. Note that in this process, the issing data are also brought up to the population level. At the second step, conventional SRI assuing SRS is applied to fill in those ite issing data and we end up with a coplete dataset which we call iputed synthetic population. The whole process will be replicated for by ties. 3 / 18

4 Figure1. Proposed Procedure to Account for Coplex Designs in I Original Saple with issing Data Nonparaetric Approach for Uncoplexing the Design: Use FP to Generate Synthetic Populations D (1) D (2) D () Paraetric Approach for Iputing the issing Data: Standard ultiple Iputation by SRI D (1) 1,.. D (2) 1,.. D () 1,.. D (1) D (2) D () Cobining Rule for Valid Inference: f(q ) Figure2. Data Structure Evolution (a) (b) (c) Y1 Y2 Y3 Y1 Y2 Y3 Y1 Y2 Y n N N Actual Saple Dataset Synthetic Dataset Iputed Synthetic Dataset 4 / 18

5 2.2. ethods Now we deonstrate how the adapted-fp ethod for unequal probability sapling (Cohen 1997) can be applied to uncoplex weight as one design feature and how this ethod can be adapted to uncoplex cluster as another, hence ore coplicated designs such as stratified ultistage cluster sapling can be handled. SRI is illustrated as one option to perfor the conventional I once the coplex designs have been appropriately accounted for in the first place First step---nonparaetric approach to Uncoplex Designs Per the liitations of fully odel-based ethods stated in section1, we propose using a nonparaetric ethod, i.e. the adapted-fp by Cohen (1997) to account for coplex saple designs in I. Specifically, the adapted-fp serves as a procedure to restore the existing coplex survey saple back to soe SRS-type/self-weighting data structure. This will be realized by generating populations fro the coplex saple repeatedly in a spirit siilar to synthetic population generation ethod in the context of I for disclosure risk liitation (Raghunathan et al. 2003). Resapling technique is used in order to fully capture the uncertainty in the original saple. The nonparaetric approach has iniu assuption of the distributional for of rando effects and is robust to odel isspecification that usually poses probles to the odel-based ethods. Additional to that, FP is a ethod developed fro the conventional bootstrap whose ayesian nature akes it fit to the I fraework well Finite population ayesian ootstrap: Pólya s Urn Schee: Denote an urn containing finite nuber of balls as { }. A ball is randoly drawn fro the urn and a sae ball fro outside of the urn is added back to the urn along with the originally picked one. Repeat such selection process until balls have been selected as a saple, call this saple Pólya saple of size. Figure 3 is a flow chart of how a Pólya Saple of size =N-n is drawn. Figure3. A Flow Chart of Drawing a Pólya Saple of size =N-n. n 1 st draw n+1 2 nd draw n+2 (N-n)th draw n+(n-n) Original Urn Final Urn Adapted-FP ethod: ased on the Pólya s urn schee described above, the adapted-fp populations can be generated following the three-step procedure: Step1: Take a Pólya saple of size N-n, denoted by * * * y1, y2,..., y( N n) fro the urn { y1, y2,..., y n }. In this process, each y i in the urn is selected with probability: N n wi 1 li,k 1 n, (1) N n N n k 1 n 5 / 18

6 where w is the case weight for the ith unit and lik, 1 i is the nuber of bootstrap selections of unit i up to (k-1) th selection, setting li,0 0. Step2: For the FP population * * * y1 y2 yn y1 y2 y,,,,,,, N n so that the FP population has exact size N. Step3: Repeat the previous steps a large nuber of ties, say ties, to obtain FP populations Relating FP to ultiple Iputation for Ite issing Data: If we think of the sapling process and responding process as one cobined process, in other words, treating the responding process as another level of sapling of responding units given the original sapled units, then it s easy to see the connection between FP and standard I. FP tries to bootstrap the saple to the entire population while I tries to bootstrap the responding pool to the coplete saple. A typical exaple is the approxiate ayesian bootstrap (A) suggested by Rubin & Schenker (1986) as a way of generating I when the original saple can be regarded as IID and the response echanis is ignorable. Now that Cohen s FP extends to unequal probability selection, we ay well think of the unsapled part of population as issing and ultiply ipute this part using the adapted-fp procedure described above. Hence the essential of our approach---to carry out two levels of ultiple iputation: 1) I for unit nonresponse by Pólyaing up a coplex saple onto a population where issing values are treated as a separate category for each variable with issing data, and 2) I for ite nonresponse by SRI for the entire population Uncoplex Weights: The adapted-fp can be applied directly to weights in a sapling design such as probability proportional to size (PPS) sapling. Note that in practice, the input weight in Forula (1) should be the final weight or poststratified weight after all types of adjustent including unit nonresponse adjustent and calibration for undercoverage, etc. The reason is obvious: if we use the base/design weight instead in applying the adapted FP for generating synthetic populations, the potential probles of undercoverage or unit nonresponse existing in the original coplex saple would be brought up to the population level without being adjusted at all. Since ultiate analyses are based on such iputed synthetic populations, a direct consequence would be biased inference even if the procedure by itself is efficient. Intuitive interpretation by assuing an SRS saple design: Forula (1) for adjusting bootstrap selection probability based on case weight is interpreted as follows: Let k 1,2,..., N n 1, i 1,2,..., n, before aking any bootstrap selection of y, y,, y unobserved units in the population fro the observed original coplex saple 1 2 n, i.e. when k 1 and l, 1 l,0 0 i k i, the probability of selecting unit i with sapling weight i w is ( w 1) / ( N n). To ake it sipler to understand, suppose we have a siple rando sapling of n i 6 / 18

7 units in the first place, then w / i N n for all sapled units, each representing N / n units in the population, then the probability of that any one unit fro the SRS saple is selected before any N bootstrap selection is ( 1) / ( N n), which is the selection probability of any units aong all the n rest N n units in the population, and this exactly equals 1/n. As we proceed with the bootstrap selection, we adjust this selection probability according to the nuber of ties each unit aong y, y,, y was selected during the FP procedure, each unit now represents ( N n) / n 1 2 n aong the N n units to be selected during one bootstrap whenever it is selected once. After each selection, the denoinator of the prior probability function needs to be inflated to reflect the total units being represented during all the bootstrap selections so far, while the nuerator also needs to be inflated to reflect the total units represented by unit i in the process. Therefore we obtain the probability as in forula (1) Uncoplex Sapling Error Codes (i.e. stratu and clusters): Now we will show how the adapted FP also works when clusters are involved in the saple design. Suppose we have a stratified two-stage clustering design, with the probability of selection for each priary sapling unit (PSU)/cluster being proportional to its population size (PPS) within strata. As usual we treat each stratu independently and apply the ethod separately within each. Now we have two layers of bootstrap selection---one at the cluster level and the other at the eleent level. Suppose there are C h clusters in th h stratu in the population, aong which c h were sapled, denote the as z hj where j 1,2,..., ch. Treating each cluster as the sapling unit, we can apply the sae procedure as in previous section where only eleents are involved. That is, we want to bootstrap selecting Ch ch clusters out of 1 2 z, z,..., z to for a population of clusters: h h hc h z, z,..., z, z, z,..., z. Accordingly, we need to change the corresponding ters in * * * h1 h2 hch h1 h2 h( Ch ch ) forula (1) to ake it a cluster-level selection probability. Let w, j 1,2,..., c hj be the sapling h weight for th j cluster, forula (1) thus can be adapted as forulae (2) and (2)', corresponding to the cluster-level selection and eleent-level selection, respectively. Ch c h Nch n ch whj 1 li,k 1 wci 1 li,k 1 ch nch, (2) and, (2)' Ch c h Nch n ch Ch ch k 1 Nh c nch k 1 ch nch Once the clusters have been selected by adapted FP procedure using forula (2), we can further 7 / 18

8 select eleents using the sae ethod within each selected cluster using forula (2)'. Notice that at the eleent level of selection, each selected cluster should now be treated as a population therefore the sapling weight for eleents in the original saple cannot be used anyore, instead we need to derive a new set of weights for eleents conditional on cluster being selected. First, we need to obtain the conditional probability of selection for eleent given cluster (forula (3) ), then inverse it to get the corresponding weight w ci in forula (2)'. Pr( ith eleent selected jth cluster selected) Pr( ith eleent selected & jth cluster selected) = Pr( jth cluster selected) Pr( ith eleent selected in the original saple) phji Pr( jth cluster selected) p, (3) hj Where is the selection probability of cluster j aong all clusters in stratu h, is the selection probability of unit i aong all units in stratu h Second Step---I for issing Data Using SRI: Now that we have uncoplexed the sapling designs, we are in a good position to proceed with perforing conventional ultiple iputation. SRI as a popular technique for coplex survey data structure is one option. Without the need to include design variables in the iputation odel due to a self-weighting FP population generated fro previous step, our task should now be concentrated on correctly odeling the covariate variables as well as interactions aong the whenever necessary Theoretical Results Rubin (1987) s standard I rule for cobining point and variance estiation does not fit the two-step I procedure. A new cobining rule is developed accordingly, which accoodates two sources of variability due to synthesizing populations by a nonparaetric ethod at the first step and ultiply iputing issing data by SRI at the second step. The validity of the new cobining rule is to be justified both fro a ayesian perspective and fro a repeated sapling perspective. Cobining rule for the point estiate: q q q b1 1 b1 1, (4) 8 / 18

9 Cobining rule for the variance: T (1 ) ( q q ) U 1 2 b1 b (1 ) ( ) (1 ) ( ) q q q q b1 1 b1 1 b (1 ) ( ) (1 ) ( q q q b1 1 b1 1 b (1 ) V (1 ) V b1 1 T T, (5) q ) 2 When n, and are large, the inference can be approxiated by noral distribution, thus the 95% confidence interval can be coputed as [ q z0.975 T, q z0.975 T ]. (See Appendix A for detailed derivation and notation) ayesian Proof of the New Cobining Rules for Inference: With iputed FP synthetic datasets generated fro the proposed two-step procedure, we need to find a way to cobine inferences fro both steps. Now we show how the cobining rules for noral approxiation is derived, assuing large saples. Let the coplete data be D ( D, D ), where D ( X, Y, R, I), X is covariate obs is obs obs inc atrix, Y obs is the observed part of survey variable with issing data, R inc is the response indicator for all sapled units, and I is the sapling indicator. Let the FP synthetic population be D ( D, D ), b=1,2,...,. obs is Let the iputed synthetic population be D ( D, D ), =1,2,..., and b=1,2,...,. ( ) obs is( ) The posterior ean and variance of Q are iediate using the rules for finding unconditional oents fro conditional oents (according to Result 3.2 of Rubin 1987). In our case, we shall condition on two layers of observed data due to the two-step I procedure: Posterior ean: E( Q D ) E{ E[ E( Q D ) D ] D } E{ Q D } Q b obs obs ( ) obs obs, (10) Where Qˆ Qb E Q D note that ˆ ˆ li ( obs ), 1 Q is the coplete data statistic for the 9 / 18

10 synthetic population and Qˆ { Qˆ,..., Qˆ } is repeated values of the posterior distribution 1 of Q ˆ. Q ˆ b Q li li ( ˆ E Q Dobs), note that ˆQ is the coplete data statistic b 1 1 for the original population and Qˆ { Qˆ,..., Qˆ,..., Qˆ,..., Qˆ }. b Where Posterior Variance: V ( Q D ) obs V{ E[ E( Q D ) D ] D } E{ V[ E( Q D ) D ] D } ( ) obs obs ( ) obs obs V{ Q D } E{ T D } b obs obs 1 T T, (11) T Q Q T Qˆ Q 2 2 li(1 ) ( b ), li (1 ) ( b ). 1 b Siulation Study A siulation study was designed to investigate the properties of inference based on the proposed ethod. In particular, we are interested to see how the two-step I procedure perfors in coparison with the existing alternative ethods including: (1) coplete case analysis, (2) ignore designs in the iputation odel, and (3) include designs as fixed effect in the iputation odel Data: Data fro real surveys were anipulated to serve as our population. Specifically, RFSS 2009 in the state of ichigan used a disproportionate stratified sapling design (DSS) with no PSU (cluster level) involved therefore is suitable for our purpose of looking at weight as a single design variable in ultiple iputation. There are four strata in ichigan and for siplicity we only chose one stratu as the basis of our siulation study. Eight categorical variables were selected which we thought would be potentially correlated with survey weight. Table 1 shows the recoded variables we ve chosen for analysis. After soe data cleaning, we ended up with a coplete dataset of N=1323 cases which will be serving as our population. 10 / 18

11 Table1. Survey Variable under Analysis Survey Variable Coding Race 1: Whites 2: Non-Whites Whether or Not Have Health Plan 1: Yes 2: No Incoe Level 1: Low 2: ediu 3: High Eployent Status 1: Eployed 2: Uneployed 3: Other arital Status 1: arried 2: Unarried Education 1: Lower Than High School 2: High School and Higher Whether or Not Have Diabete 1: Yes 2: No Whether or Not Have Astha 1: Yes 2: No 3.2. Design: Our strategies of evaluating the perforance of proposed ethod in coparison with alternative ethods consist of the following steps: Step1: ake design variable related to survey variables: Since we want to exaine the new ethod assuing the designs are relevant to issingness on survey variables, we achieved the assuption by regressing weights as the dependent variable on all other survey variables as predictors and obtained the predicted values of weight to be used for siulation. The purpose is to ake sure that weight as a design variable is at least oderately related to survey variables. Step2: Draw 100 saples/replicates with probability proportionate to the inverse of predicted weights, each of size 200 (we will call it before-deletion saples): In this way, we obtained the probability proportional to size (PPS) sapling weights ( ) which are directly related to the predicted weights ( ) fro the previous step, since we were iplicitly treating the predicted weights as kind of a easure of size. Now becoes our target design variable to be exained with the new ethod. Step3: Ipose issingness on the Coplete Data under AR echanis we used a deletion function taking the for,where is a binary indicator for issingness, represents Race, represents unit s values of other survey variables (except arital status and education ) of which we purposefully delete values. Thus we obtained 100 after-deletion saples. Table 2 shows the fractions of issing by race for each survey variable of interest. The fractions of issingness by race are in a range of 15%~40%, where the issingness for incoe was ade to be generally higher than all other variables. Table2. Fractions of issingness on Survey Variables of Interest by Race Race Eployent Health Status Plan Diabete Astha Incoe Whites 15% 15% 15% 15% 20% Non-Whites 25% 25% 25% 25% 40% 11 / 18

12 Step4: Generate =100 FP synthetic populations for each replicate saple, with the PPS sapling weights as input weights, using Cohen (1997) s adapted FP ethod. In the process, we ade the population size five ultiples of the original PPS saple size thus each FP population is of size Step5: Create =5 ultiply iputed datasets by SRI procedure for each FP population with each replicate saple, thus I obtained 5*100*100=50000 iputed datasets each of size Step6: Obtain ultiply iputed synthetic population estiates for the ean of each survey variable of interest. For each replicate saple, we use forulae (4) and (5) as our new cobining rules to cobine the estiated eans and variances fro 500 iputed synthetic datasets Results: Three critical statistics were exained for coparison across the four ethods. They are absolute relative bias, root ean square error, and epirical noinal 95% confidence interval coverage rate. All are calculated based on 100 replicate saples. Also note that except for the new ethod all estiates under the other three ethods as well as the actual saples before deletion are design-based. Figure4 displays the Q-Q plot atrix of 100 pairs of estiated proportions fro the actual saples before deletion versus that fro the corresponding iputed synthetic populations under proposed ethod, for each survey variable by level. The plots deonstrate a nearly perfect 45-degree straight line for all the variable levels. This indicates that the distributions of the iputed synthetic populations practically atch the actual saples before deletion. Table3 gives the detailed results fro the siulation study. For siplicity, we only display the results for three variables, eployent status and health plan, which has higher correlations with the design variable weight, and incoe, whose fraction of issingness is highest aong all. According to table3, the new ethod has uch saller bias than all its copetitors. In ters of RSE, although the gain is not as substantial as that in the case of absolute relative bias, the new ethod perfors generally the best. Noinal 95% CI coverage of the point estiate under the proposed ethod is in a range of 86%-96%, considering categorical nature of all the survey variables, this is a reasonable result. We can see that except for coplete case analysis which has lowest CI coverage in general, there sees no uch difference between the new ethod and the other two odel-based I ethods. 12 / 18

13 Figure4. Q-Q Plot atrix for Estiated Proportions: Actual Saple efore Deletion versus FP+SRI 13 / 18

14 Table3. Absolute Relative ias, RSE and 95% CI Coverage Rates Copared across Four ethods Variables FP+SRI Include Weights in the Iputation odel Do Not Include Weights in the Iputation odel Coplete Case Analysis EPLOY_ Relbias RSE 95% CI cov. Relbias RSE 95% CI cov % 3.99E-02 96% 1.03% 3.59E-02 95% 2.10% 3.93E-02 96% 3.23% 7.34E-02 95% % 5.41E-02 87% 19.15% 5.76E-02 86% 8.73% 5.67E-02 86% 5.98% 5.97E-02 79% % 7.69E-02 90% 3.66% 6.80E-02 90% 3.58% 7.69E-02 90% 1.02% 8.43E-02 92% INCOE_ % 5.54E-02 88% 1.38% 5.27E-02 91% 1.22% 5.63E-02 89% 6.77% 8.94E-02 88% % 6.16E-02 88% 0.25% 5.99E-02 85% 0.63% 6.20E-02 87% 3.17% 8.39E-02 93% % 3.02E-02 96% 5.93% 3.33E-02 95% 6.68% 3.35E-02 94% 13.53% 5.48E-02 90% HLTHPLAN_ % 6.68E-02 86% 4.59% 7.88E-02 87% 1.51% 6.96E-02 85% 0.80% 7.11E-02 80% % 6.68E-02 86% 32.70% 7.88E-02 87% 10.80% 6.96E-02 85% 2.40% 1.12E-01 80% Relbias RSE 95% CI cov. Relbias RSE 95% CI cov. 14 / 18

15 4. Discussions Our priary goal was to propose a new ethod to account for coplex saple design features in ultiple iputation for ite issing data and to evaluate the perforance of the new ethod in a PPS saple setting. There are several advantages of the proposed two-step I procedure: first, it relaxes the usually strong distributional assuptions of rando effects in paraetric odels; second, it potentially protects against odel isspecification, for exaple, wrongly inclusion or exclusion of interactions between design variables and other covariates in the iputation odel. eanwhile, it retains the nice features of SRI in handling coplex data structure and various types of issing variable. Another advantage is that both steps of the procedure are of a ayesian flavor and iplies a proper iputation ethod. This fits into the standard I paradig which requires ayesian derived iputation ethod to attain randoization validity. A further advantage lies in that unlike the fully odel-based ethods which include designs in the iputation odel and still require coplex survey packages to analyze the iputed datasets, the new ethod fully accounts for the designs by uncoplexing the and restoring a population in a separate step, therefore only siple, unweighted coplete-data analysis techniques are needed for inferences with the newly developed cobining rules. This reduces a lot burden on data users. Our findings in the siulation study suggest that the new ethod can bring about significant gains in bias relative to the existing odel-based ethods without losing any efficiency. Therefore, even for categorical variables, the noinal 95% confidence interval coverage rates under the new ethod are quite reasonable. A direct application of the proposed approach is in the context of I for disclosure risk liitation context. The approach of using I fraework to generate synthetic populations has becoe popular in protecting the confidentiality of respondents in the survey world, yet ost are odel-based or based on nonparaetric ethod alone (i.e. ayesian ootstrap not accounting for designs), the sei-paraetric approach as a cobining use of the two serves as a good alternative, especially for coplex survey designs. Our current work focuses on ore coprehensive siulation studies to assess the general perforance of the proposed ethod, including its robustness to different degrees of odel isspecification. We also ai to extend the application of our ethod in ore coplex saple design settings where both unequal probabilities of selection and clustering are involved. 15 / 18

16 Appendix A. Direct Derivation for the Two-step Cobining Rules This is achieved by constructing an approxiate posterior distribution of Q given D in analogy with the standard theory of ultiple iputation for issing data. The conceptual fraework of the proposed procedure suggests that we have the following decoposition: f ( Q D ) f ( Q D, D, V, V ) f ( D, V D, V ) f ( V D ) dd dv dv f ( Q D, V ) f ( D, V D, V ) f ( V D ) dd dv dv Where V is variance of the posterior ean q for each D obtained when, let V be the variance of the q obtained when. Then V is the average of the ( b ) V obtained when. Step1: Cobining rule fro Synthesizing data by adapted FP: f ( Q D, V ) Let Q f ( Y) be a scalar population quantity we are interested in. For exaple, it can be the population ean of survey variable Y, or it can also be a regression coefficient. Let q and u be the point and variance estiates for Q based on the actual saple data, (1) (2) ( ) q { q, q,..., q } denotes the point estiates fro all FP populations. Adapted fro the cobining rules developed by Raghunathan et al.(2003) for fully synthetic data, the posterior ean and variance can be estiated as: q 1 q b 1 (6) T etween synthetic variance + Within synthetic variance (1 ) V U (1 ) ( q q ) U (7) b1 b1 b1 Where U T which is the between iputation variance for FP population b, and will be defined later in step 2. Step2: Cobining rules fro ultiply iputing issing data: f ( D, V D, V) f ( V D ) or Let f Q D ( ) q { q, q,..., q } denote the point estiates fro all ultiple iputed datasets for 1 2 FP b. Adapted fro Rubin (1987) s conventional cobining rules for ultiple iputation, the posterior ean and variance estiates can be written as: 16 / 18

17 q 1 q 1 (8) T U etween iputation variance Within iputation variance (1 ) V U (1 ) V (1 ) ( q q ) (9) The within iputation variance disappears because at this step we are actually ultiply iputing issing data for a population (FP is treated as a population) therefore no sapling variance is involved here, i.e. U 0. Two-step cobining rules for inference: f ( Q D ) When we were deriving f ( Q D ) at the first step, q was viewed as the sufficient suaries of FP b, after constructing the unbiased estiator q for each ( b ) q fro generating ultiply iputed datasets nested within each FP population, we need to approxiate f ( Q D ) with f ( Q D ) by substituting q with their estiators q to obtain the ( b ) posterior ean and variance for the two-step procedure as in forulae (4) and (5). Where V and V represents the between synthetic population variance and between ultiple iputation variance, respectively. Notice that the within variance coponent at the first step is actually represented by the between variance fro the second step. References: 1. Cohen, ichael P. (1997). The ayesian bootstrap and ultiple iputation for unequal probability saple designs. ASA Proceedings of the Section on Survey Research ethods, Dong, Qi. (2011). Cobining Inforation fro ultiple Coplex Surveys. University of ichigan. Unpublished Dissertation. 3. Efron,. (1979). ootstrap ethods: another look at the jackknife. Annals of Statistics, 7, Elliott,.R. (2007). ayesian Weight Triing for Generalized Linear Regression odels. Survey ethodology 33(1): Gross, S. (1980). edian estiation in saple surveys, presented at the 1980 Joint Statistical eetings. 6. Ki, J.K. (2004). Finite Saple Properties of ultiple Iputation Estiator. Ann. Statist. 32(2): Ki, Jae Kwang, ichael rick, J., Fuller, Wayne A. and Kalton, Graha (2006). On the bias of the ultiple-iputation variance estiator in survey sapling. Journal of the Royal Statistical Society, Series : Statistical ethodology, 68, Little, R.A. and Rubin, D.. (2002). Statistical Analysis with issing Data (Second Edition), New 17 / 18

18 York: J Wiley & Sons, New York. 9. Lo, Albert Y. (1988). A ayesian ootstrap for a Finite Population. The Annals of Statistics 16(4): eng, Xiao-Li. (1994). ultipl-iputation Inferences with Uncongenial Sources of Iput. Statistical Science 9(4): Raghunathan, T.E. Lepkowski, J.. Van Hoewyk, J. Solenberger, P. (2001). A ultivariate technique for ultiply iputing issing values using a sequence of regression odels. Survey ethodology 27(1): Raghunathan, T.E. Reiter, J.P. and Rubin, D.. (2003). ultiple Iputation for Statistical Disclosure Liitation. Journal of Official Statistics 19(1): Rao, J.N.K. and Wu, C.F.J. (1988) Resapling Inference with Coplex Survey Data. Journal of the Aerican Statistical Association 83: Reiter, J.P. Raghunathan, T.E. and Kinney, Satkartar K. (2006). The Iportance of odeling the Sapling Design in ultiple Iputation for issing Data. Survey ethodology 32(2): Reiter, J.P. (2004). Siultaneous use of ultiple iputation for issing data and disclosure liitation. Survey ethodology 30(2): Rubin, D. (1981). The ayesian bootstrap. Annals of Statistics, 9, Rubin, D.. and Schenker, N. (1986). ultiple Iputation for Interval Estiation fro Siple Rando Saples with Ignorable Nonresponse. Journal of the Aerican Statistical Association, 81(394): Rubin, D. (1987). ultiple Iputation for Nonresponse in Surveys. New York: Wiley. 19. Rubin, D. (1996). ultiple Iputation After 18+ Years. Journal of the Aerican Statistical Association, 91(434): Schafer, J.L. (1999). ultiple Iputation: A Prier. Statistical ethods in edical Research 8: Schafer, J.L. Ezzati-Rice, T.. Johnson, W. Khare,. Little, R.J.A. and Rubin, D.. (1997). The Nhanes III ultiple Iputation Project. 22. Schenker, Nathaniel, Raghunathan, T.E. Chiu, Pei-Lu, akuc, D.. Zhang, Guangyu and Cohen, A.J. (2006). ultiple Iputation of issing Incoe Data in the National Health Interview Survey. Journal of the Aerican Statistical Association 101(475): Yu, andi (2008). Disclosure Risk Assessents and Control. Dissertation. The University of ichigan. 18 / 18

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Meta-Analytic Interval Estimation for Bivariate Correlations

Meta-Analytic Interval Estimation for Bivariate Correlations Psychological Methods 2008, Vol. 13, No. 3, 173 181 Copyright 2008 by the Aerican Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0012868 Meta-Analytic Interval Estiation for Bivariate Correlations

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction TWO-STGE SMPLE DESIGN WITH SMLL CLUSTERS Robert G. Clark and David G. Steel School of Matheatics and pplied Statistics, University of Wollongong, NSW 5 ustralia. (robert.clark@abs.gov.au) Key Words: saple

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Bayesian Approach for Fatigue Life Prediction from Field Inspection

Bayesian Approach for Fatigue Life Prediction from Field Inspection Bayesian Approach for Fatigue Life Prediction fro Field Inspection Dawn An and Jooho Choi School of Aerospace & Mechanical Engineering, Korea Aerospace University, Goyang, Seoul, Korea Srira Pattabhiraan

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

Correcting a Significance Test for Clustering in Designs With Two Levels of Nesting

Correcting a Significance Test for Clustering in Designs With Two Levels of Nesting Institute for Policy Research Northwestern University Working Paper Series WP-07-4 orrecting a Significance est for lustering in Designs With wo Levels of Nesting Larry V. Hedges Faculty Fellow, Institute

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Data-Driven Imaging in Anisotropic Media

Data-Driven Imaging in Anisotropic Media 18 th World Conference on Non destructive Testing, 16- April 1, Durban, South Africa Data-Driven Iaging in Anisotropic Media Arno VOLKER 1 and Alan HUNTER 1 TNO Stieltjesweg 1, 6 AD, Delft, The Netherlands

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Estimation of the Population Mean Based on Extremes Ranked Set Sampling

Estimation of the Population Mean Based on Extremes Ranked Set Sampling Aerican Journal of Matheatics Statistics 05, 5(: 3-3 DOI: 0.593/j.ajs.05050.05 Estiation of the Population Mean Based on Extrees Ranked Set Sapling B. S. Biradar,*, Santosha C. D. Departent of Studies

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Examining the Effects of Site Selection Criteria for Evaluating the Effectiveness of Traffic Safety Countermeasures

Examining the Effects of Site Selection Criteria for Evaluating the Effectiveness of Traffic Safety Countermeasures Exaining the Effects of Site Selection Criteria for Evaluating the Effectiveness of Traffic Safety Countereasures Doinique Lord* Associate Professor and Zachry Developent Professor I Zachry Departent of

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

arxiv: v1 [stat.ot] 7 Jul 2010

arxiv: v1 [stat.ot] 7 Jul 2010 Hotelling s test for highly correlated data P. Bubeliny e-ail: bubeliny@karlin.ff.cuni.cz Charles University, Faculty of Matheatics and Physics, KPMS, Sokolovska 83, Prague, Czech Republic, 8675. arxiv:007.094v

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Comparing Probabilistic Forecasting Systems with the Brier Score

Comparing Probabilistic Forecasting Systems with the Brier Score 1076 W E A T H E R A N D F O R E C A S T I N G VOLUME 22 Coparing Probabilistic Forecasting Systes with the Brier Score CHRISTOPHER A. T. FERRO School of Engineering, Coputing and Matheatics, University

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

Example A1: Preparation of a Calibration Standard

Example A1: Preparation of a Calibration Standard Suary Goal A calibration standard is prepared fro a high purity etal (cadiu) with a concentration of ca.1000 g l -1. Measureent procedure The surface of the high purity etal is cleaned to reove any etal-oxide

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

A Comparative Study of Parametric and Nonparametric Regressions

A Comparative Study of Parametric and Nonparametric Regressions Iranian Econoic Review, Vol.16, No.30, Fall 011 A Coparative Study of Paraetric and Nonparaetric Regressions Shahra Fattahi Received: 010/08/8 Accepted: 011/04/4 Abstract his paper evaluates inflation

More information

An Introduction to Meta-Analysis

An Introduction to Meta-Analysis An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl

More information

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal

More information

Nonlinear Log-Periodogram Regression for Perturbed Fractional Processes

Nonlinear Log-Periodogram Regression for Perturbed Fractional Processes Nonlinear Log-Periodogra Regression for Perturbed Fractional Processes Yixiao Sun Departent of Econoics Yale University Peter C. B. Phillips Cowles Foundation for Research in Econoics Yale University First

More information

IN modern society that various systems have become more

IN modern society that various systems have become more Developent of Reliability Function in -Coponent Standby Redundant Syste with Priority Based on Maxiu Entropy Principle Ryosuke Hirata, Ikuo Arizono, Ryosuke Toohiro, Satoshi Oigawa, and Yasuhiko Takeoto

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Handwriting Detection Model Based on Four-Dimensional Vector Space Model

Handwriting Detection Model Based on Four-Dimensional Vector Space Model Journal of Matheatics Research; Vol. 10, No. 4; August 2018 ISSN 1916-9795 E-ISSN 1916-9809 Published by Canadian Center of Science and Education Handwriting Detection Model Based on Four-Diensional Vector

More information

Best Procedures For Sample-Free Item Analysis

Best Procedures For Sample-Free Item Analysis Best Procedures For Saple-Free Ite Analysis Benjain D. Wright University of Chicago Graha A. Douglas University of Western Australia Wright s (1969) widely used "unconditional" procedure for Rasch saple-free

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME

AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME J. Japan Statist. Soc. Vol. 35 No. 005 73 86 AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME H. S. Jhajj*, M. K. Shara* and Lovleen Kuar Grover** For estiating the

More information

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS. Introduction When it coes to applying econoetric odels to analyze georeferenced data, researchers are well

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

Using a De-Convolution Window for Operating Modal Analysis

Using a De-Convolution Window for Operating Modal Analysis Using a De-Convolution Window for Operating Modal Analysis Brian Schwarz Vibrant Technology, Inc. Scotts Valley, CA Mark Richardson Vibrant Technology, Inc. Scotts Valley, CA Abstract Operating Modal Analysis

More information

Bootstrapping clustered data

Bootstrapping clustered data J. R. Statist. Soc. B (2007) 69, Part 3, pp. 369 390 Bootstrapping clustered data C. A. Field Dalhousie University, Halifax, Canada A. H. Welsh Australian National University, Canberra, Australia [Received

More information

Physics 139B Solutions to Homework Set 3 Fall 2009

Physics 139B Solutions to Homework Set 3 Fall 2009 Physics 139B Solutions to Hoework Set 3 Fall 009 1. Consider a particle of ass attached to a rigid assless rod of fixed length R whose other end is fixed at the origin. The rod is free to rotate about

More information

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China 6th International Conference on Machinery, Materials, Environent, Biotechnology and Coputer (MMEBC 06) Solving Multi-Sensor Multi-Target Assignent Proble Based on Copositive Cobat Efficiency and QPSO Algorith

More information

a a a a a a a m a b a b

a a a a a a a m a b a b Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression Advances in Pure Matheatics, 206, 6, 33-34 Published Online April 206 in SciRes. http://www.scirp.org/journal/ap http://dx.doi.org/0.4236/ap.206.65024 Inference in the Presence of Likelihood Monotonicity

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

A proposal for a First-Citation-Speed-Index Link Peer-reviewed author version

A proposal for a First-Citation-Speed-Index Link Peer-reviewed author version A proposal for a First-Citation-Speed-Index Link Peer-reviewed author version Made available by Hasselt University Library in Docuent Server@UHasselt Reference (Published version): EGGHE, Leo; Bornann,

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Modeling the Structural Shifts in Real Exchange Rate with Cubic Spline Regression (CSR). Turkey

Modeling the Structural Shifts in Real Exchange Rate with Cubic Spline Regression (CSR). Turkey International Journal of Business and Social Science Vol. 2 No. 17 www.ijbssnet.co Modeling the Structural Shifts in Real Exchange Rate with Cubic Spline Regression (CSR). Turkey 1987-2008 Dr. Bahar BERBEROĞLU

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network 565 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 59, 07 Guest Editors: Zhuo Yang, Junie Ba, Jing Pan Copyright 07, AIDIC Servizi S.r.l. ISBN 978-88-95608-49-5; ISSN 83-96 The Italian Association

More information

CHAPTER 19: Single-Loop IMC Control

CHAPTER 19: Single-Loop IMC Control When I coplete this chapter, I want to be able to do the following. Recognize that other feedback algoriths are possible Understand the IMC structure and how it provides the essential control features

More information

INTELLECTUAL DATA ANALYSIS IN AIRCRAFT DESIGN

INTELLECTUAL DATA ANALYSIS IN AIRCRAFT DESIGN INTELLECTUAL DATA ANALYSIS IN AIRCRAFT DESIGN V.A. Koarov 1, S.A. Piyavskiy 2 1 Saara National Research University, Saara, Russia 2 Saara State Architectural University, Saara, Russia Abstract. This article

More information

Generalized Queries on Probabilistic Context-Free Grammars

Generalized Queries on Probabilistic Context-Free Grammars IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 1, JANUARY 1998 1 Generalized Queries on Probabilistic Context-Free Graars David V. Pynadath and Michael P. Wellan Abstract

More information

Statistical Logic Cell Delay Analysis Using a Current-based Model

Statistical Logic Cell Delay Analysis Using a Current-based Model Statistical Logic Cell Delay Analysis Using a Current-based Model Hanif Fatei Shahin Nazarian Massoud Pedra Dept. of EE-Systes, University of Southern California, Los Angeles, CA 90089 {fatei, shahin,

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5 Course STA0: Statistics and Probability Lecture No to 5 Multiple Choice Questions:. Statistics deals with: a) Observations b) Aggregates of facts*** c) Individuals d) Isolated ites. A nuber of students

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Modified Systematic Sampling in the Presence of Linear Trend

Modified Systematic Sampling in the Presence of Linear Trend Modified Systeatic Sapling in the Presence of Linear Trend Zaheen Khan, and Javid Shabbir Keywords: Abstract A new systeatic sapling design called Modified Systeatic Sapling (MSS, proposed by ] is ore

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information