TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were

Size: px
Start display at page:

Download "TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were"

Transcription

1 Goetzel RZ, Pei X, Tabrizi MJ, Henke RM, Kowlessar N, Nelson CF, Metz RD. Ten modifiable health risk factors are linked to more than one-fifth of employer-employee health care spending. Health Aff (Millwood). 2012;31(11). TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS Additional Information on Methods Health Risks Definition Ten health risks in this and the previous study were dichotomized as high versus lower risk using standard definitions documented in the original HERO analysis. Employees classified at high risk for physical inactivity exercised fewer than three days in a typical week. High risk for alcohol consumption meant, for men, consuming three or more drinks per day or 15 or more per week, and, for women, 2 or more drinks per day or 8 or more drinks per week. High-risk workers with for poor nutrition/eating habits consumed fewer than 5 fruits and vegetables a day. High-risk tobacco users were individuals who smoked cigarettes or used other tobacco products (i.e., pipe, cigar, snuff, or smokeless tobacco). Those classified as high risk for stress-related problems reported that their life was very stressful and that they were not effective in coping with that stress. Being Respondents were termed at high risk for depression if they felt unhappy or 1

2 hopeless and unwilling to ask for help, or if they reported being depressed almost always. Employees at high risk for biometric measures had BMI values of 30 and above; total cholesterol levels greater than 239 mg/dl; systolic blood pressure greater than 159 mg Hg, or diastolic blood pressure greater than 99 mg Hg; and blood glucose levels greater than 115 mg/liter. Minimum and maximum values for biometric measures were set by the researchers based on advice from clinicians on what constitute reasonable ranges for the values studied. Values outside those ranges were excluded from the analysis and set to missing. More Details about the Regression Model All the regression equations used a generalized linear model (GLM) to estimate risk-expenditure relationships. The outcome for all models was total medical expenditures post-health risk assessment. The predictor variables included indicators for each health risk and all confounding variables described above. A log dependent variable was used to accommodate the skewed nature of medical expenditure data. Differences between employees at high and lower risk levels were estimated by comparing adjusted costs from the model with and without having the risk, while holding all other variables constant at their 2

3 average value for the at-risk population. The incremental cost associated with being at high risk was calculated as a percent difference in costs compared to the reference category (lower risk). The software package employed in these studies was STATA. Confounders in the Regression Analyses The regression models in this study controlled for confounding variables that might also influence costs. Confounders included employee age; gender; type of health plan in which the employee was enrolled (fee for service [COMP], point of service [POS], health maintenance organization [HMO], preferred provider organization [PPO], consumer driven health plan [CDHP]); location (Northeast, North Central, South or West region of the country); industry type of employer (manufacturing durable goods, manufacturing non-durable goods, service, oil and gas); employment category (salaried or hourly); number of months followed in the database; and the specific employer contributing data (1-7). The employer indicator adjusted for employer-specific differences in benefit plan design, culture, or other factors not accounted for by the other variables in the model. Binary indicators were used to denote when information on health risks was missing, and these were controlled for in the analysis. Missing data 3

4 occurred when respondents failed to answer one or more of the HRA questions or when biometric screening data were not collected by the employer. Including the missing data indicators allowed the statistical tests to be completed without excluding any observations. Additional Information on Sensitivity Analyses We conducted several sensitivity analyses to control for having an illness at baseline, and for having varying degrees of disease comorbidities. Baseline disease severity was measured during the index year in which the health risk assessment was completed and reported using the Charlson Comorbidity Index (CCI) and Psychiatric Diagnosis Groups (PDG) values, respectively. These indices were entered into the regression models as part of the sensitivity analysis that controlled for baseline physical or mental illness an enhancement from the original HERO study methods. In the first sensitivity analysis, baseline CCI and PDG scores were added to the regression models, and in the second, outlier cases were excluded. Estimating Expenditures for Individuals with Multiple Risks In addition to calculating the incremental cost of each risk factor, estimates were developed for combinations of risk factors representing employees having multiple risks 4

5 for heart disease, stroke, and psychosocial problems. This was accomplished by comparing predicted mean expenditures for employees with specific risk factors for a given health outcome to those without these risk factors. The multiple risk results were not simply the addition of costs for individual risks. Rather, high risk for multiple risk clusters was newly defined and the results produced showed the effect of having multiple risks simultaneously. The seven risk factors (risk cluster) used to predict the occurrence of heart disease were: 1) tobacco use, 2) high blood pressure, 3) high total cholesterol, 4) physical inactivity, 5) high blood glucose, 6) obesity, and 7) high stress. The cluster indicating being at high risk for a stroke meant the employee had the following risks 1) tobacco use, 2) high blood pressure, 3) high total cholesterol, and 4) high stress. Finally, having a psychosocial risk cluster was defined as having 1) high stress and 2) depression. These multiple risk analyses replicated the approach used in the original HERO study. 5

6 APPENDIX EXHIBIT 1 Description of the Study Sample Study Sample (n=92,486) Measure Value Percent Count Age , , , ,766 Gender Female ,746 Male ,740 Health plan COMP (fee for service) 8.3 7,656 HMO (health maintenance organization) ,273 POS (point of service) 3.8 3,470 PPO (preferred provider organization) ,302 CDHP (consumer driven health plan) 1.4 1,263 Unknown Employment Salary ,492 Hourly ,439 Other and Unknown Industry Oil & Gas Extraction 6.7 6,163 Manufacturing, Durable Goods ,400 Manufacturing, Nondurable Goods ,883 Services ,040 Region of Northeast 8.9 8,240 Residence North Central ,679 South ,569 West 7.5 6,893 Unknown/missing

7 CURRENT Percent (N=92,486) HERO Percent (N=46,026) Risk Category* Obesity High Total Cholesterol High Blood Glucose High Blood Pressure Poor Nutrition/Eating Habits Physical Inactivity Tobacco Use High Alcohol Consumption High Stress Depression Age, Severity Indicators, and Costs Mean Standard Error Age Total eligible years since HRA was taken Baseline Charlson Comorbidity Index (CCI) Baseline Number of Psychiatric Diagnosis Groups (PDGs) Total Medical and Drug Expenditures (2009 $) $ SOURCE Authors' analysis of data from 92,486 individuals from seven different organizations in the MarketScan Commercial Claims and Encounter Database. NOTES * The percentages in the Risk Category compare the values from the current study to the values from the 1998 HERO study (2). The Charlson Comorbidity Index predicts the one-year mortality for a patient who may have a range of co-morbid conditions. The Number of Psychiatric Diagnostic Groups measures the severity of mental illness using a group of identified mental disease diagnoses. Total Medical and Drug Expenditure, including employer and employee portions of medical and drug payments, was standardized to 2009 dollars, adjusted for inflation using the Medical Care Services Consumer Price Index (CPI) for medical care costs and Medical Care Commodities CPI for pharmaceuticals. 7

8 APPENDIX EXHIBIT 2 Adjusted Medical Expenditures (in 2009 Dollars) for High-Risk vs. Lower Risk Employees, Controlling for Charlson Comorbidity Index and the Number of Psychiatric Diagnostic Groups, and Excluding Outliers, and Differences between Each Risk Level with 95% Intervals Risk Measures Risk Level Controlling for CCI and NPDG 8 Excluding Outliers Depression High $12, $6, %Difference Controlling for CCI and NPDG %Difference Excluding Outliers Lower $9, $4, % 42.60% (24.9, Interval (23.7, 43.2) 36.0) Blood glucose High $12, $6, Lower $12, $4, % 34.09% Interval (-6.8,7.1) (24.2, 34.6) Blood pressure High $8, $5, Lower $6, $4, % 28.35% (19.5, Interval (17.6, 36.0) 30.0) Obesity High $7, $4, Lower $6, $3, % 27.18% (21.5, Interval (18.7, 27.4) 26.5) High $7, $3, Tobacco Use Lower $7, $3, % 16.13% Interval (1.1, 10.4) (4.3, 9.7) Physical inactivity High $7, $4, Lower $6, $3, % 13.05% Interval (3.2, 11.4) (2.6, 7.2) Stress High $8, $4, Lower $8, $4, % 7.54% Interval (-3.7, 8.9) (0.4, 7.8) Total Cholesterol High $7, $4, Lower $7, $4, % -1.08% Interval (-8.7, 6.0) (-5.9, 2.6) Nutrition/Eat ing Habits High $4, $3, Lower $4, $3, % -4.55% Interval (-5.5, 3.2) (-4.5, 0.7) Alcohol Consumption High $7, $3,706.36

9 Lower $7, $4, % -8.95% Interval (-9.6, 8.4) (-9.2, 0.8) SOURCE Results for % difference controlling for Charlson Comorbidity Index and the Number of Psychiatric Diagnostic Groups are from the authors' analysis of data from 92,486 individuals from seven different organizations in the MarketScan Commercial Claims and Encounter Database. Results for % difference excluding outliers are from the authors' analysis of data from 87,862 individuals from seven different organizations in the MarketScan Commercial Claims and Encounter Database. NOTES Medical expenditures include employer and employee portions of medical payments. Outlier costs are defined as greater than or equal to 95 percentile of the total medical cost. C.I. stands for confidence intervals. 9

10 APPENDIX EXHIBIT 3 Estimated Annual Medical Expenditures (in 2009 $) for Employees With and Without Selected Multiple Risk Factors Coexisting Multiple Risk Factors Leading to: With multiple risk factors Without any of the risk factors %Difference High risk for heart disease $10,134 $3, % High risk for stroke $6,137 $3, % High risk for psychosocial problems $6,165 $3, % SOURCE Authors' analysis of data from 92,486 individuals from seven different organizations in the MarketScan Commercial Claims and Encounter Database. NOTES Medical expenditures include employer and employee portions of medical payments. Risk-free individual is estimated to have medical expenditures of $3,207. The estimated annual medical expenditures were estimated using regression models assuming average values for other risk categories and covariates. High risk for heart disease is defined as tobacco use, or high blood pressure, or high blood glucose, or high total cholesterol, or physical inactivity, or obesity, or high stress. High risk for stroke is defined as tobacco use, or high blood pressure, or high total cholesterol, or high stress. High risk for psychosocial problems is defined as high stress or depression. 10

11 APPENDIX EXHIBIT 4 Regression Model Estimates Corresponding to Exhibit 1 in the Manuscript Dependent Variable: Total Medical Expenditure Independent Variable Coefficient Relative Risk Std. Err. z P>z [95% Conf. Interval] Obesity Total cholesterol Blood pressure Blood glucose Tobacco use Alcohol consumption Nutrition/Eating habits Physical inactivity Stress Depression Age (35-44) Age (45-54) Age (55-64) Female Months Salary Hourly Northeast North central South West Capitated plan Employer Employer Employer Employer Employer Employer Missing bprisk Missing chrisk Missing bgrisk Missing alrisk Missing tbrisk

12 Missing exrisk Missing nurisk Missing strisk Missing derisk SOURCE Authors' analysis of data from 92,486 individuals from seven companies in the MarketScan Commercial Claims and Encounter Database. Nine missing indicators were added in the model to indicate whether each health risk was missing. 12

13 APPENDIX EXHIBIT 5 Regression Model Estimates After Controlling for Charlson Comorbidity Index (CCI) and Number of Psychiatric Diagnosis Groups (PDG) Dependent Variable: Total Medical Expenditure Independent Coefficient Relative Risk Variables Std. Err. z P>z [95% Conf. Interval] Obesity High total cholesterol High blood pressure High blood glucose Tobacco use High alcohol consumption Poor nutrition/eating habits Physical inactivity High stress Depression Age (35-44) Age (45-54) Age (55-64) Female Months Salary Hourly Northeast North central South West Capitated Plan CCI NPDG employer employer employer employer employer employer Missing bprisk Missing chrisk

14 Missing bgrisk Missing alrisk Missing tbrisk Missing exrisk Missing nurisk Missing strisk Missing derisk SOURCE Authors' analysis of data from 92,486 individuals from seven companies in the MarketScan Commercial Claims and Encounter Database. Nine missing indicators were added to the model to indicate whether each health risk was missing. 14

15 APPENDIX EXHIBIT 6 Regression Model Results After Excluding Outliers Dependent Variable: Total Medical Expenditure Independent Variables Coefficients Relative Risk Std. Err. z P>z [95% Conf. Interval] Obesity High total cholesterol High blood pressure High blood glucose Tobacco use High alcohol consumption Poor nutrition/eating habits Physical inactivity High stress Depression Age (35-44) Age (45-54) Age (55-64) Female Months Salary Hourly Northeast North central South West Capitated Plan Employer Employer Employer Employer Employer Employer Missing bprisk Missing chrisk Missing bgrisk Missing alrisk Missing tbrisk Missing exrisk Missing nurisk

16 Missing strisk Missing derisk SOURCE Authors' analysis of data from 92,486 individuals from seven companies in the MarketScan Commercial Claims and Encounter Database. Nine missing indicators were added in the model to indicate whether each health risk was missing. 16

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population Online supplement Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in Breathlessness in the General Population Table S1. Comparison between patients who were excluded or included

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Inference for Distributions Inference for the Mean of a Population

Inference for Distributions Inference for the Mean of a Population Inference for Distributions Inference for the Mean of a Population PBS Chapter 7.1 009 W.H Freeman and Company Objectives (PBS Chapter 7.1) Inference for the mean of a population The t distributions The

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Lecture 12: Interactions and Splines

Lecture 12: Interactions and Splines Lecture 12: Interactions and Splines Sandy Eckel seckel@jhsph.edu 12 May 2007 1 Definition Effect Modification The phenomenon in which the relationship between the primary predictor and outcome varies

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326 page 35 8 Statistics are around us both seen and in ways that affect our lives without us knowing it. We have seen data organized into charts in magazines, books and newspapers. That s descriptive statistics!

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Inferences Based on Two Samples

Inferences Based on Two Samples Chapter 6 Inferences Based on Two Samples Frequently we want to use statistical techniques to compare two populations. For example, one might wish to compare the proportions of families with incomes below

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Data Analysis 1 LINEAR REGRESSION. Chapter 03 Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative

More information

Lecture 1 Introduction to Multi-level Models

Lecture 1 Introduction to Multi-level Models Lecture 1 Introduction to Multi-level Models Course Website: http://www.biostat.jhsph.edu/~ejohnson/multilevel.htm All lecture materials extracted and further developed from the Multilevel Model course

More information

Published by the Stationery Office, Dublin, Ireland.

Published by the Stationery Office, Dublin, Ireland. An Phríomh-Oifig Staidrimh Central Statistics Office Published by the Stationery Office, Dublin, Ireland. Available from the: Central Statistics Office, Information Section, Skehard Road, Cork October

More information

Low-Income African American Women's Perceptions of Primary Care Physician Weight Loss Counseling: A Positive Deviance Study

Low-Income African American Women's Perceptions of Primary Care Physician Weight Loss Counseling: A Positive Deviance Study Thomas Jefferson University Jefferson Digital Commons Master of Public Health Thesis and Capstone Presentations Jefferson College of Population Health 6-25-2015 Low-Income African American Women's Perceptions

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Exploring, summarizing and presenting data. Berghold, IMI, MUG

Exploring, summarizing and presenting data. Berghold, IMI, MUG Exploring, summarizing and presenting data Example Patient Nr Gender Age Weight Height PAVK-Grade W alking Distance Physical Functioning Scale Total Cholesterol Triglycerides 01 m 65 90 185 II b 200 70

More information

MINISTÉRIO DAS FINANÇAS DIRECÇÃO GERAL DE ESTATÍSTICA DIRECÇÃO NACIONAL DE ESTATÍSTICA ECONOMICAS E SOCIAIS

MINISTÉRIO DAS FINANÇAS DIRECÇÃO GERAL DE ESTATÍSTICA DIRECÇÃO NACIONAL DE ESTATÍSTICA ECONOMICAS E SOCIAIS MINISTÉRIO DAS FINANÇAS DIRECÇÃO GERAL DE ESTATÍSTICA DIRECÇÃO NACIONAL DE ESTATÍSTICA ECONOMICAS E SOCIAIS CPI SERI 2 EDITION ONE ISSUE www.dne.mof.gov.tl MINISTÉRIO DAS FINANÇAS DIRECÇÃO GERAL DE ESTATÍSTICA

More information

Distribution-free ROC Analysis Using Binary Regression Techniques

Distribution-free ROC Analysis Using Binary Regression Techniques Distribution-free Analysis Using Binary Techniques Todd A. Alonzo and Margaret S. Pepe As interpreted by: Andrew J. Spieker University of Washington Dept. of Biostatistics Introductory Talk No, not that!

More information

Quantitative Bivariate Data

Quantitative Bivariate Data Statistics 211 (L02) - Linear Regression Quantitative Bivariate Data Consider two quantitative variables, defined in the following way: X i - the observed value of Variable X from subject i, i = 1, 2,,

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Practice problems from chapters 2 and 3

Practice problems from chapters 2 and 3 Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,

More information

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) 3. Descriptive Statistics Describing data with tables and graphs (quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) Bivariate descriptions

More information

Sociology 593 Exam 2 March 28, 2002

Sociology 593 Exam 2 March 28, 2002 Sociology 59 Exam March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably means that

More information

MINISTÉRIO DAS FINANÇAS DIRECÇÃO GERAL DE ESTATÍSTICA DIRECÇÃO NACIONAL DE ESTATÍSTICA ECONOMICAS E SOCIAIS

MINISTÉRIO DAS FINANÇAS DIRECÇÃO GERAL DE ESTATÍSTICA DIRECÇÃO NACIONAL DE ESTATÍSTICA ECONOMICAS E SOCIAIS MINISTÉRIO DAS FINANÇAS DIRECÇÃO GERAL DE ESTATÍSTICA DIRECÇÃO NACIONAL DE ESTATÍSTICA ECONOMICAS E SOCIAIS CPI SERI 2 EDITION THIRD ISSUE www.dne.mof.gov.tl MINISTÉRIO DAS FINANÇAS DIRECÇÃO GERAL DE ESTATÍSTICA

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

[ z = 1.48 ; accept H 0 ]

[ z = 1.48 ; accept H 0 ] CH 13 TESTING OF HYPOTHESIS EXAMPLES Example 13.1 Indicate the type of errors committed in the following cases: (i) H 0 : µ = 500; H 1 : µ 500. H 0 is rejected while H 0 is true (ii) H 0 : µ = 500; H 1

More information

6.3 Use Normal Distributions. Page 399 What is a normal distribution? What is standard normal distribution? What does the z-score represent?

6.3 Use Normal Distributions. Page 399 What is a normal distribution? What is standard normal distribution? What does the z-score represent? 6.3 Use Normal Distributions Page 399 What is a normal distribution? What is standard normal distribution? What does the z-score represent? Normal Distribution and Normal Curve Normal distribution is one

More information

Jun Tu. Department of Geography and Anthropology Kennesaw State University

Jun Tu. Department of Geography and Anthropology Kennesaw State University Examining Spatially Varying Relationships between Preterm Births and Ambient Air Pollution in Georgia using Geographically Weighted Logistic Regression Jun Tu Department of Geography and Anthropology Kennesaw

More information

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback University of South Carolina Scholar Commons Theses and Dissertations 2017 Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback Yanan Zhang University of South Carolina Follow

More information

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points EEP 118 / IAS 118 Elisabeth Sadoulet and Kelly Jones University of California at Berkeley Fall 2008 Introductory Applied Econometrics Final examination Scores add up to 125 points Your name: SID: 1 1.

More information

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX The following document is the online appendix for the paper, Growing Apart: The Changing Firm-Size Wage

More information

Press Release Consumer Price Index October 2017

Press Release Consumer Price Index October 2017 Consumer Price Index, base period December 2006 October 2017 The Central Bureau of Statistics presents the most important findings for the Consumer Price Index (CPI) for the month of October 2017. The

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Measuring community health outcomes: New approaches for public health services research

Measuring community health outcomes: New approaches for public health services research Research Brief March 2015 Measuring community health outcomes: New approaches for public health services research P ublic Health agencies are increasingly asked to do more with less. Tough economic times

More information

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1 MAT 2379, Introduction to Biostatistics Sample Calculator Problems for the Final Exam Note: The exam will also contain some problems

More information

Chapter 2. Mean and Standard Deviation

Chapter 2. Mean and Standard Deviation Chapter 2. Mean and Standard Deviation The median is known as a measure of location; that is, it tells us where the data are. As stated in, we do not need to know all the exact values to calculate the

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease

More information

Job Training Partnership Act (JTPA)

Job Training Partnership Act (JTPA) Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans Example of a randomized experiment: Job Training

More information

Chapter 9. Correlation and Regression

Chapter 9. Correlation and Regression Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

More information

Classification & Regression. Multicollinearity Intro to Nominal Data

Classification & Regression. Multicollinearity Intro to Nominal Data Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,

More information

JOINT STRATEGIC NEEDS ASSESSMENT (JSNA) Key findings from the Leicestershire JSNA and Charnwood summary

JOINT STRATEGIC NEEDS ASSESSMENT (JSNA) Key findings from the Leicestershire JSNA and Charnwood summary JOINT STRATEGIC NEEDS ASSESSMENT (JSNA) Key findings from the Leicestershire JSNA and Charnwood summary 1 What is a JSNA? Joint Strategic Needs Assessment (JSNA) identifies the big picture in terms of

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

Press Release Consumer Price Index December 2014

Press Release Consumer Price Index December 2014 Consumer Price Index, base period December 2006 December 2014 The Central Bureau of Statistics presents the most important findings for the Consumer Price Index (CPI) for the month of December 2014. The

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

Chapitre 3. 5: Several Useful Discrete Distributions

Chapitre 3. 5: Several Useful Discrete Distributions Chapitre 3 5: Several Useful Discrete Distributions 5.3 The random variable x is not a binomial random variable since the balls are selected without replacement. For this reason, the probability p of choosing

More information

M e d i c a l P s y c h o l o g y U n i t, D e p a r t m e nt of C l i n i c a l N e u r o s c i e n c e s a n d M e n t a l H e a l t h Fa c u l t y

M e d i c a l P s y c h o l o g y U n i t, D e p a r t m e nt of C l i n i c a l N e u r o s c i e n c e s a n d M e n t a l H e a l t h Fa c u l t y R. Fonseca; M. Figueiredo-Braga M e d i c a l P s y c h o l o g y U n i t, D e p a r t m e nt of C l i n i c a l N e u r o s c i e n c e s a n d M e n t a l H e a l t h Fa c u l t y of M e d i c i n e,

More information

ANOVA - analysis of variance - used to compare the means of several populations.

ANOVA - analysis of variance - used to compare the means of several populations. 12.1 One-Way Analysis of Variance ANOVA - analysis of variance - used to compare the means of several populations. Assumptions for One-Way ANOVA: 1. Independent samples are taken using a randomized design.

More information

The Empirical Rule, z-scores, and the Rare Event Approach

The Empirical Rule, z-scores, and the Rare Event Approach Overview The Empirical Rule, z-scores, and the Rare Event Approach Look at Chebyshev s Rule and the Empirical Rule Explore some applications of the Empirical Rule How to calculate and use z-scores Introducing

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

CHAPTER 5 LINEAR REGRESSION AND CORRELATION CHAPTER 5 LINEAR REGRESSION AND CORRELATION Expected Outcomes Able to use simple and multiple linear regression analysis, and correlation. Able to conduct hypothesis testing for simple and multiple linear

More information

6. 5x Division Property. CHAPTER 2 Linear Models, Equations, and Inequalities. Toolbox Exercises. 1. 3x = 6 Division Property

6. 5x Division Property. CHAPTER 2 Linear Models, Equations, and Inequalities. Toolbox Exercises. 1. 3x = 6 Division Property CHAPTER Linear Models, Equations, and Inequalities CHAPTER Linear Models, Equations, and Inequalities Toolbox Exercises. x = 6 Division Property x 6 = x =. x 7= Addition Property x 7= x 7+ 7= + 7 x = 8.

More information

BINF 702 SPRING Chapter 8 Hypothesis Testing: Two-Sample Inference. BINF702 SPRING 2014 Chapter 8 Hypothesis Testing: Two- Sample Inference 1

BINF 702 SPRING Chapter 8 Hypothesis Testing: Two-Sample Inference. BINF702 SPRING 2014 Chapter 8 Hypothesis Testing: Two- Sample Inference 1 BINF 702 SPRING 2014 Chapter 8 Hypothesis Testing: Two-Sample Inference Two- Sample Inference 1 A Poster Child for two-sample hypothesis testing Ex 8.1 Obstetrics In the birthweight data in Example 7.2,

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington

More information

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries

More information

Part 7: Glossary Overview

Part 7: Glossary Overview Part 7: Glossary Overview In this Part This Part covers the following topic Topic See Page 7-1-1 Introduction This section provides an alphabetical list of all the terms used in a STEPS surveillance with

More information

Transformations between the Assessment of Quality of Life AQoL Instruments and Test-Retest Reliability

Transformations between the Assessment of Quality of Life AQoL Instruments and Test-Retest Reliability Research Paper 2011 (66) Transformations between the Assessment of Quality of Life AQoL Instruments and Test-Retest Reliability Jeff Richardson Professor and Foundation Director, Centre for Health Economics

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

Distributed analysis in multi-center studies

Distributed analysis in multi-center studies Distributed analysis in multi-center studies Sharing of individual-level data across health plans or healthcare delivery systems continues to be challenging due to concerns about loss of patient privacy,

More information

ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7

ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7 ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7 Matt Tudball University of Toronto St. George October 6, 2017 Matt Tudball (University of Toronto) ECO375H1 October 6, 2017 1 / 36 ECO375 Tutorial 4 Welcome

More information

Dynamic Disease Screening

Dynamic Disease Screening Dynamic Disease Screening Peihua Qiu pqiu@ufl.edu Department of Biostatistics University of Florida December 10, 2014, NCTS Lecture, Taiwan p.1/25 Motivating Example SHARe Framingham Heart Study of NHLBI.

More information

Press Release Consumer Price Index March 2018

Press Release Consumer Price Index March 2018 Consumer Price Index, base period December 2006 March 2018 The Central Bureau of Statistics presents the most important findings for the Consumer Price Index (CPI) for the month of March 2018. The CPI

More information

Health insurance data in France : from statistics to policy?

Health insurance data in France : from statistics to policy? Health insurance data in France : from statistics to policy? CIRAO, dec 16 Pierre-Yves Geoffard Data extraction: Marjorie Mazars et Sébastien Rivière (CNAMTS) 1 Plan de la présentation The French health

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

IT 403 Practice Problems (2-2) Answers

IT 403 Practice Problems (2-2) Answers IT 403 Practice Problems (2-2) Answers #1. Which of the following is correct with respect to the correlation coefficient (r) and the slope of the leastsquares regression line (Choose one)? a. They will

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

Ch 13 & 14 - Regression Analysis

Ch 13 & 14 - Regression Analysis Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more

More information

4 Data collection tables, worksheets, and checklists

4 Data collection tables, worksheets, and checklists 4 Data collection tables, worksheets, and checklists 1 2 Table 4.1 Contact and data collection schematic for ADAPT (ADAPTDC.Tab) 1 Type of visit/contact Eligibility Enrollment Cognitive assessment Followup

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Didacticiel - Études de cas. In this tutorial, we show how to use TANAGRA ( and higher) for measuring the association between ordinal variables.

Didacticiel - Études de cas. In this tutorial, we show how to use TANAGRA ( and higher) for measuring the association between ordinal variables. Subject Association measures for ordinal variables. In this tutorial, we show how to use TANAGRA (1.4.19 and higher) for measuring the association between ordinal variables. All the measures that we present

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Author's response to reviews

Author's response to reviews Author's response to reviews Title: Diverse risks of incident cardiovascular disease and all-cause mortality in men and women with low cash margins living alone: cohort data from 60-year-olds Authors:

More information

15: Regression. Introduction

15: Regression. Introduction 15: Regression Introduction Regression Model Inference About the Slope Introduction As with correlation, regression is used to analyze the relation between two continuous (scale) variables. However, regression

More information

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) In 2007, the number of wins had a mean of 81.79 with a standard

More information

BIOS 2041: Introduction to Statistical Methods

BIOS 2041: Introduction to Statistical Methods BIOS 2041: Introduction to Statistical Methods Abdus S Wahed* *Some of the materials in this chapter has been adapted from Dr. John Wilson s lecture notes for the same course. Chapter 0 2 Chapter 1 Introduction

More information

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows: INTRODUCTORY ECONOMETRICS Dpt of Econometrics & Statistics (EA3) University of the Basque Country UPV/EHU OCW Self Evaluation answers Time: 21/2 hours SURNAME: NAME: ID#: Specific competences to be evaluated

More information

Special Release (2006=100) (2006=100)

Special Release (2006=100) (2006=100) Special Release PHILIPPINE STATISTICS AUTHORITY PROVINCE OF AKLAN Volume IV Number 12 January 2016 INQUIRIES: For more information write or call: Philippine Statistics Authority N. Roldan St., Poblacion,

More information

Complex sample design effects and inference for mental health survey data

Complex sample design effects and inference for mental health survey data International Journal of Methods in Psychiatric Research, Volume 7, Number 1 Complex sample design effects and inference for mental health survey data STEVEN G. HEERINGA, Division of Surveys and Technologies,

More information

Lessons learned from a decade of battling chronic disease on the front lines of local companies.

Lessons learned from a decade of battling chronic disease on the front lines of local companies. Lessons learned from a decade of battling chronic disease on the front lines of local companies. Effecting Positive Change in Employee Health AGENDA Our Experience and Methods Data on employee health What

More information

Chapter 9 - Correlation and Regression

Chapter 9 - Correlation and Regression Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 9: Logistic regression (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 28 Regression methods for binary outcomes 2 / 28 Binary outcomes For the duration of this lecture suppose

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.

More information

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management Lecture Notes for BUSINESS STATISTICS - BMGT 571 Chapters 1 through 6 Professor Ahmadi, Ph.D. Department of Management Revised May 005 Glossary of Terms: Statistics Chapter 1 Data Data Set Elements Variable

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

SMAM 314 Exam 42 Name

SMAM 314 Exam 42 Name SMAM 314 Exam 42 Name Mark the following statements True (T) or False (F) (10 points) 1. F A. The line that best fits points whose X and Y values are negatively correlated should have a positive slope.

More information

Chapter 11. Correlation and Regression

Chapter 11. Correlation and Regression Chapter 11. Correlation and Regression The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Summarising numerical data

Summarising numerical data 2 Core: Data analysis Chapter 2 Summarising numerical data 42 Core Chapter 2 Summarising numerical data 2A Dot plots and stem plots Even when we have constructed a frequency table, or a histogram to display

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Problem Set 5 ANSWERS

Problem Set 5 ANSWERS Economics 20 Problem Set 5 ANSWERS Prof. Patricia M. Anderson 1, 2 and 3 Suppose that Vermont has passed a law requiring employers to provide 6 months of paid maternity leave. You are concerned that women

More information

Statistical Thinking in Biomedical Research Session #3 Statistical Modeling

Statistical Thinking in Biomedical Research Session #3 Statistical Modeling Statistical Thinking in Biomedical Research Session #3 Statistical Modeling Lily Wang, PhD Department of Biostatistics (modified from notes by J.Patrie, R.Abbott, U of Virginia and WD Dupont, Vanderbilt

More information

Statistics for IT Managers

Statistics for IT Managers Statistics for IT Managers 95-796, Fall 2012 Module 2: Hypothesis Testing and Statistical Inference (5 lectures) Reading: Statistics for Business and Economics, Ch. 5-7 Confidence intervals Given the sample

More information