Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Similar documents
Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

General Regression Model

Introduction to Statistical Analysis

BIOS 312: Precision of Statistical Inference

Stat 5101 Lecture Notes

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Review of Statistics 101

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018

STAT 4385 Topic 01: Introduction & Review

Subject CS1 Actuarial Statistics 1 Core Principles

Statistical Questions: Classification Modeling Complex Dose-Response

Statistics in medicine

Experimental Design and Data Analysis for Biologists

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Machine Learning Linear Classification. Prof. Matteo Matteucci

Lecture 10: Introduction to Logistic Regression

Parameter Estimation and Fitting to Data

Lecture Outline Biost 518 / Biost 515 Applied Biostatistics II / Biostatistics II. Linear Predictors Modeling Complex Dose-Response

Ex: Cubic Relationship. Transformations of Predictors. Ex: Threshold Effect of Dose? Ex: U-shaped Trend?

APPENDIX B Sample-Size Calculation Methods: Classical Design

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Categorical and Zero Inflated Growth Models

Statistical Modelling in Stata 5: Linear Models

New Developments in Econometrics Lecture 16: Quantile Estimation

Semiparametric Generalized Linear Models

Generalized Linear Models for Non-Normal Data

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

What s New in Econometrics? Lecture 14 Quantile Methods

Bios 6649: Clinical Trials - Statistical Design and Monitoring

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

L6: Regression II. JJ Chen. July 2, 2015

Dose-response modeling with bivariate binary data under model uncertainty

Institute of Actuaries of India

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Lecture 2 Machine Learning Review

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Correlation and Simple Linear Regression

Correlation and Regression Theory 1) Multivariate Statistics

Exam details. Final Review Session. Things to Review

Chapter 1 Statistical Inference

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

ECON3150/4150 Spring 2015

Psychology 282 Lecture #4 Outline Inferences in SLR

Foundations of Probability and Statistics

Semiparametric Regression

AP Statistics Cumulative AP Exam Study Guide

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Contents. Part I: Fundamentals of Bayesian Inference 1

Correlation and Regression Bangkok, 14-18, Sept. 2015

y response variable x 1, x 2,, x k -- a set of explanatory variables

Introduction to mtm: An R Package for Marginalized Transition Models

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

ECE521 week 3: 23/26 January 2017

Classification & Regression. Multicollinearity Intro to Nominal Data

Practical Biostatistics

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Stat 5102 Final Exam May 14, 2015

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Lecture 12: Effect modification, and confounding in logistic regression

36-463/663: Multilevel & Hierarchical Models

POL 681 Lecture Notes: Statistical Interactions

General Linear Model (Chapter 4)

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Part III Measures of Classification Accuracy for the Prediction of Survival Times

STATISTICS-STAT (STAT)

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Part 6: Multivariate Normal and Linear Models

Econometrics. 4) Statistical inference

TMA 4275 Lifetime Analysis June 2004 Solution

Stat 587: Key points and formulae Week 15

A Discussion of the Bayesian Approach

Prentice Hall Mathematics, Geometry 2009 Correlated to: Connecticut Mathematics Curriculum Framework Companion, 2005 (Grades 9-12 Core and Extended)

Linear Regression Models P8111

Glossary for the Triola Statistics Series

3 Joint Distributions 71

Ph.D. course: Regression models

Multivariate Survival Analysis

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

BIOSTATISTICAL METHODS

Part 8: GLMs and Hierarchical LMs and GLMs

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

Review of Statistics

Lecture 12: Interactions and Splines

Final Exam. Name: Solution:

A general mixed model approach for spatio-temporal regression data

Plausible Values for Latent Variables Using Mplus

Transcription:

Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest sense of the word) Science is about proving things to people (The validity of any proof rests solely on the willingness of the audience to believe it) March 9, 007 00, 003, 005 Scott S. Emerson, M.D., Ph.D. First Stage of Scientific Investigation Further Stages of Scientific Investigation Hypothesis generation Observation Measurement of existing populations Disadvantages: Confounding Limited ability to establish cause and effect 3 Refinement and confirmation of hypotheses Experiment Intervention Elements of experiment Overall goal Specific aims (hypotheses) Materials and methods Collection of data Analysis Interpretation; Refinement of hypotheses 4

Statistical Questions Statistical Tasks Clustering of observations Clustering of variables Quantification of distributions Comparing distributions Prediction of individual observations 5 Understand overall goal Refine specific aims (stat hypotheses) Materials and methods: Study design Collection of data: Advise on QC Analysis Describe sample (materials and methods) Analyses to address specific aims Interpretation 6 Statistical Analysis Purpose of Descriptive Statistics Descriptive statistics (Sampling plan) Materials and methods Address scientific question Inferential statistics Point estimates Interval estimates (quantify precision) Decision analysis (hypothesis tests) 7 Identify errors in measurement, data collection Characterize materials and methods Assess validity of assumptions needed for analysis Straightforward estimates to address scientific question Hypothesis generation 8

Descriptive Methods Type of Measurement Binary, nominal, ordered, continuous, censored Type of description Univariate Location, spread, skewness, kurtosis Bivariate, trivariate Methods Numerical, graphical Statistical Inference Use the sample to make inference about the entire population Inferential estimates Quantify the uncertainty in the estimates computed from the sample To what extent does the random variation inherent in sampling affect our ability to draw conclusions? 9 0 Statistical Role Population Parameters Experimental results are subject to variability Statistics provides Framework in which to describe general trends Estimates of treatment effect Framework in which to describe our level of confidence in the conclusions drawn from the experiment Measures of the precision of our estimates Estimates of the generalizability of the results Scientific questions are typically answered by making inference about some population parameter, e.g. Mean Geometric mean Median Proportion above threshold Odds above threshold Hazard

Measures of Association Most often: difference or ratio of univariate parameters Difference (or ratio) of means Ratio of geometric mean Ratio (or difference) of medians Difference (or ratio) of proportions Odds ratios Hazard ratio (or difference) Criteria for Summary Measure In order of importance Scientifically (clinically) relevant Also reflects current state of knowledge Is likely to vary across levels of the factor of interest Ability to detect variety of changes Statistical precision Only relevant if all other things are equal 3 4 Inference Generalizations from sample to population Estimation Point estimates Interval estimates Decision analysis (testing) Quantifying strength of evidence Frequentist or Bayesian methods exist Approximate Sampling Distn Most often we choose estimators that are asymptotically normally distributed ˆ V For large n : θ ~ & N mean θ, var n V is related to average "statistical information" from each observation Often : V depends on the value of θ 5 6

Typical Method for 00(-α)% CI Computing P values using Z When estimate is approximately normal 00( α)% confidenceintervalis θ = ˆ θ z L θ = ˆ θ + z U α / α / seˆ seˆ ( ˆ θ ) ( ˆ θ ) ( estimate) ± ( crit val) ( std error) ( θ, θ ) L U 7 Standardized statistic Stata commands Lower one -sided P value Upper one -sided P value Two -sided P value Z est hyp = = std err ˆ θ θ ~ N seˆ 0 ( ˆ θ ) ( 0,) ˆ θ θ0 norm ( ) seˆ ˆ θ ˆ θ θ0 - norm ( ) seˆ ˆ θ ˆ ( ) θ θ0 norm abs ˆ seˆ θ 8 Aside: Comparing Estimates Aside: Correlated Estimates Comparisons across strata or studies This is easy, if estimates are independent and approximately normally distributed For independent ˆ θ + ˆ θ ~ & N ˆ θ ˆ θ ~ & N ˆ ˆ θ / θ ~ & N ˆ θ ~ & N ( θ, se ); ˆ θ ~ & N ( θ, se ) ( θ + θ, se + se ) ( θ θ, se + se ) θ, θ se θ θ + se θ 9 If estimates are correlated and approximately normally distributed For correlated ˆ θ ~ & N θ ˆ, se ; θ ~ & N θ, se ω = corr ˆ θ, ˆ θ ˆ θ + ˆ θ ~ & N ˆ θ ˆ θ ~ & N ( ) ( ) ( ) ( θ + θ, se + se + ω se se ) ( θ θ, se + se ω se se ) 0

Two Variable Setting Uses of Regression Many statistical problems consider the association between two variables Response variable (outcome, dependent variable) Grouping variable (predictor, independent variable) Two major uses of regression Borrow information to address sparse data in some groups E.g., 68 and 70 year olds provide information about 69 year olds Question: How far away do you want to go? Provide a statistical contrast to compare distribution of response across groups Think of a slope as an average comparison of summary measures per unit difference in the grouping variable Regression Inference Regression Models Estimates Slope: (average) contrasts across groups Fitted values: estimated summary measure in a group Standard errors Confidence intervals P values testing for Intercept of zero (who cares?) Slope of zero (test for linear trend in summary measures) 3 According to the parameter compared across groups Means Linear regression Geom Means Linear regression on logs Odds Logistic regression Rates Poisson regression Hazards Proportional Hazards regr Quantiles Parametric survival regr 4

Regression vs Two Samples Maximal Assumptions When used with a binary grouping variable common regression models reduce to the corresponding two variable methods Linear regression with a binary predictor Classical: t test with equal variance Robust SE: t test with unequal variance (approx) Logistic regression with a binary predictor Score test: Chi squared test for association Cox regression with a binary predictor Score test: Logrank test 5 Independence with robust SE: between identified clusters Sufficient sample sizes for asymptotic distributions to be a good approximation Variance appropriate to the model with robust SE: relaxed for associations Regression model accurately describes summary measures across groups Necessary for prediction of summary measures Shape of distribution same in each group Necessary for prediction (forecast) intervals 6 Transformations of Predictors Flexible Modeling of Predictors We transform predictors to provide more flexible description of complex associations between the response and some scientific measure Threshold effects Exponentially increasing effects U-shaped functions S-shaped functions etc. 7 We do have methods that can fit a wide variety of curve shapes Dummy variables A step function with tiny steps Polynomials If high degree: allows many patterns of curvature Splines Piecewise linear or piecewise polynomial Fractional polynomial 8

Choice of Transformation Scientific Questions Based on the following criteria Scientific issues Scientific question to be addresses Role of predictors in the scientific question Ease of interpretation Statistical issues Accuracy of model Precision of model Parsimony Most times: Comparing distribution of response across groups defined by predictor of interest Very often, other variables also need to be considered because Comparison is different in strata Groups being compared differ in other ways Less variability of response if we control for other variables Overfitting of data (inflate type I error) 9 30 Statistical Role Covariates other than the POI are included in the model as Effect modifiers Confounders Precision variables Model Building Criteria for inclusion of covariates Scientific question Statistical precision But sometimes doing data exploration 3 3

Where do YOU go from here? What a long, strange trip it s been Truckin, Grateful Dead 33 34 Possible Biostat Courses Well [I ve] worn out [my] welcome with random precision Shine on You Crazy Diamond, Pink Floyd Biost 54: Design of Medical Studies The scientific (and some statistical) issues involved in designing clinical trials Biost 57-58 satisfies prerequisites Taught by Tom Fleming in Spring 07 35 36

Possible Biostat Courses Biost 536: Analysis of Categorical Data More experience dealing with categorical response Possible Biostat Courses Biost 537: Analysis of Survival Data More experience dealing with censored time to event data Biost 57-58 satisfies prerequisites Offered in Autumn Biost 57-58 satisfies prerequisites Offered in Winter 37 38 Possible Biostat Courses Biost 540: Analysis of Correlated Response Data Dealing with longitudinal data analysis and other clustered observations Biost 57-58 is adequate preparation If you are planning to take Biost 536, 537, and 54, it might be best to take Biost 540 next year Taught by Patrick Heagerty in Spr 07 39