emerge Network: CERC Survey Survey Sampling Data Preparation Overview The entire patient population does not use inpatient and outpatient clinic services at the same rate, nor are racial and ethnic subpopulations represented proportionately. Using the clinic population as a pool of potential survey participants, random sampling will likely predispose the survey results to primarily reflect the opinions of an older, sicker, and whiter portion of the populace. We will use demographic and geocode-based data to develop a stratified sampling scheme to enable greater representation from demographic groups less likely to come to clinic and who are less often studied. The Sampling Strategy Workgroup will centrally develop a sampling plan for use by the entire Network based on patient-specific demographics data combined with data from the 2012 American Community Survey (ACS) that will be sent by the sites to the CC. Population definitions Adult populations All patients who have had at least one inpatient or outpatient clinic visit at one of the participating sites, between October 1, 2012 and October 1, 2014, who were at least 18 years old at the time of the last visit, who have a valid home address in the United States of America, and who are not known to be deceased. Pediatric populations Parents of all patients who have had at least one inpatient or outpatient clinic visit at one of the participating sites, between October 1, 2012 and October 1, 2014, who were less than 18 years old at the time of the last visit, who have a valid home address in the United States of America, and who are not known to be deceased. Data transfers There will be four population data transfers relevant to the sampling strategy: two transfers from the site to the CC and two transfers from the CC to the site. The first data transfer corresponds to the pilot study and will be used, in part, to identify all challenges. The second will be on a relatively short timeline and will be used to identify participants for the survey itself. Data transfer #1 (May 1, 2014 June 30, 2014) Site à CC:
Sites will send the CC data on their entire population of patients who are qualified to participate in this study, but will limit the inclusion date of the most recent clinic visit to October 1, 2012 May 1, 2014. Data transfer #2 (September 1, 2014) CC à Site: The CC will return the population data to the sites but with five added columns: 1) MetInclusion (1=met phase 1 and phase 2 checks*), 2) SurveyCode (A, B, C, or NA), 3) Passcode (a unique 6 character code use to access website; values are lowercase and do not include o or l ), 4) InstitutionCode (a 2 digit code defined in the Scantron document) and 5) CustomerContactID (a concatenation of the institution code and ID). * Phase 1 identified subjects that had a missing ID, a duplicate ID, missing census information, missing address/household ID, a common householdid (20+), a date issue, or an address that was not geocodable (if available). Phase 2 then identified those subjects with a missing age or gender or that had an age conflict (e.g., age>=18, but seen at a pediatric center). Data transfer #3 (November 1, 2014 November 15, 2014) Site à CC: Sites will update the population datasets using the entire date range October 1, 2012 October 1, 2014. It will contain updated information (including updated age, geocode and other data) for those who were included in data transfer #1 as well new patients observed between May 1, 2014 and October 1, 2014. Data transfer #4 (January 15, 2015) CC à Site: The CC will return the population data to the sites along with the five columns outlined in Data transfer #2. Data transfer protocol Data may be transferred from each institution to the Coordinating Center using an institutions own secure data transfer program/service. If one is not available, Data Hippo (https://data.vanderbilt.edu/data-hippo/) may be used. All data returned from the Coordinating Center will use Vanderbilt s subscription to Accellion. Questions regarding the transfer of data, should be directed to Nate Mercaldo (nate.mercaldo@vanderbilt.edu) and Jonathan Schildcrout (jonathan.schildcrout@vanderbilt.edu). Process description 1. Send Data to CC Send the CC a csv file that contains the demographic variables and the census-derived variables at the block group level (see Data to be sent to the CC below). The table will have N rows where N is the number of unique patients. Please use the variable names
(case-sensitive) provided below, but please DO NOT include the GEOID variable in the dataset. See DemographicsExample.csv for an example file that should be sent to the CC. 2. CC will create sampling strata The CC will cross-tabulate the demographic data at each site. The cross-tabulation will involve the following stratification variables: age, sex, race, ethnicity, urban/rural (census block group), median income (census block group), and educational attainment for adults at least 25 years (census block group). For patients with missing race or ethnicity, we will define strata using the race and ethnicity information at the block group level. 3. Sampling Strategy Workgroup Develops Sampling Plan The CC will work with the sampling strategy working group to develop a sampling plan. The sampling scheme will consider the availability of demographic subpopulations that are of interest across the network when developing the sampling strategy. The goal of the sampling strategy is to develop the simplest approach (i.e., fewest sampling strata) that will permit adequate representation of subpopulations of interest. This can only be detailed after demographic data are observed. 4. Data Returned to Site After the sampling plan has been identified, the CC will return the population data to the sites as described above in Data transfers. Data to be sent to the CC The following is a data dictionary for the demographics dataset that is to be joined with both the block group based census data (including urban/rural) and the ACS 2008-2012 five-year summary file estimates. The CC created a national shapefile and state-specific shapefiles that include all relevant census and ACS data. Join the following demographics dataset with the shapefile. The datasets are to be joined by the block group variable, GEOID. Important note: Do not include GEOID in the file that is sent to the CC. Variable format Definition Name id character(10) Patient identifier: The first two characters correspond to the emerge site id. age numeric Patient age on the date of the most recent clinic visit rounded to the nearest year (this is a whole number) or if that cannot be calculated please indicate how it was calculated. sex character(1) M =male, F =female, O =other, U =unknown race character(2) Patient race. Include race categories that are at least as finely detailed as those listed in NIH Planned Enrollment Report.docx (in the dropbox folder). Attach a detailed and
comprehensive data dictionary. ethnicity character(1) Hispanic or latino: Y =yes; N =no; U =unknown biobank numeric Is the patient a participant in the biobank?: 1 = yes; 2 = no (declined participation); 3=unknown monthyear character(7) Month and year of the most recent inpatient or outpatient contact with the EMR. Format: mm/yyyy AddressID character(7) This variable should be coded in such a way that each unique street address can be identified. Patients who live at the same street address should receive the same AddressID. Patients who live in different apartments at the same street address should receive the same AddressID. HouseholdID character(7) This variable should be coded in such a way that each unique household can be identified based on the street address and possibly the apartment number. Patients who live at the same street address but in different apartments should receive different values for HouseholdID. GEOID numeric Census block group: a 12 digit concatenation of state (2 digits), county (3 digits), census tract (6 digits), block group (1 digit). Not to be confused with GEOID_1 and GEOID_TIGR. The following is a data dictionary of the census and ACS shapefiles that are to be joined with the demographics dataset. If you would like a data dictionary for the ACS five-year summary tables, see ACS2012_5-year_TableShells.xls in the dropbox folder. Additionally, see the document emergegeocodedata.docx in the dropbox folder if you want to understand how the census and ACS shapefile was developed. Finally, see shapefile_download_instructions.docx for instructions on how to download state-specific and country shapefiles. Variable format Definition Name GEOID numeric Census block group: a 12 digit concatenation of state (2), county (3), census tract (6), block group (1). Not to be confused with GEOID_1 and GEOID_TIGR. LSAD10 numeric 75=urbanized area (50000 or more), 76=urban cluster (2500 to 50000), missing=rural. From the 2010 Census urban and rural classification and urban area criteria. B19013_001, B19013A_00, Through B19013I_00 numeric Median household income overall and by race / ethnicity (White alone, Black or AA alone, Am Indian / Alaska Native alone, Asian alone, Native Hawaiian / Other Pacific Islander, Some other race alone, Two or more races, White alone not hispanic or latino, hispanic or latino) for the census block group. Table B19013 from the 5-year estimate summary file (ACS 2008-2012)
B03002_001 through B03002_021 B15002_001 through B15002035 numeric numeric Number overall and of each race (White alone, Black or AA alone, Am Indian / Alaska Native alone, Asian alone, Native Hawaiian / Other Pacific Islander, Some other race alone, Two or more races, White alone not hispanic or latino, hispanic or latino Two races including some other race, two races excluding some other race / three or more races) by ethnicity (Hispanic, not hispanic) combination within the census block group. Table B3002002 from the 5-year estimate summary file (ACS 2008-2012). Number of each educational attainment group (no schooling, nursery to fourth grade, 5 th and 6 th, 7 th -8 th, 9 th, 10 th, 11 th, 12 th with no diploma, HS grad/ged/alternative, some college less than 1 year, some college one or more years and no degree, associates degree, bachelor s degree, masters degree, professional school degree, doctorate) by gender combination within the census block group for those who are 25 or older. Table B15002 from the 5-year estimate summary file (ACS 2008-2012).