Elaboration d un score au départ d une base de données. Ch. Mélot, MD, PhD, MSciBiostat Service des Soins Intensifs Hôpital Universitaire Erasme

Size: px
Start display at page:

Download "Elaboration d un score au départ d une base de données. Ch. Mélot, MD, PhD, MSciBiostat Service des Soins Intensifs Hôpital Universitaire Erasme"

Transcription

1 Elaboration d un score au départ d une base de données Ch. Mélot, MD, PhD, MSciBiostat Service des Soins Intensifs Hôpital Universitaire Erasme ESP, le 20 février 2006

2 METHODOLOGY

3 Scientific experiment Biological Hypothesis Clinical trial or Experiment Collection of data Statistical analysis Proof of the hypothesis Data fishing,, data dredging,, data snooping Collection of data Search for statistical significance Building of hypotheses

4 Data mining,, data dredging Data mining, also known as knowledge-discovery in databases (KDD) has been defined as The nontrivial extraction of implicit, previously unknown, and potentially useful information from data and The science of extracting useful information from large data sets or databases. Used in the technical context of data warehousing and analysis, the term data mining is neutral. Data dredging (data fishing, data snooping) is the term used to refer to the inappropriate (sometimes deliberately so) search for statistically significant relationships in large quantities of data. A key point is that one should not formulate a hypothesis as a result of seeing the data, at least not, if the data are then used as proof of the hypothesis.

5 Data mining,, data dredging If you want to work from data to hypotheses while avoiding the problems of data dredging, you need to collect a data set, then partition it into two subsets, A and B, with data items randomly placed in the two subsets. One subset - say, subset B - is examined for interesting hypotheses. Once a hypothesis has been formulated by examining subset B, the hypothesis can be tested on subset A, since subset A was not used to construct the hypothesis. Only where such a hypothesis is also supported by subset A is it reasonable to believe that the hypothesis might be valid.

6 MODELS

7 MULTIVARIABLE REGRESSION

8 MULTIVARIABLE REGRESSION If y = continuous variable: multiple regression y = = o + 1 x x x 3 If y = dichotomus variable: multivariable logistic régression y = e o + 1 x x x e o + 1 x x x 3

9 MULTIVARIABLE REGRESSION If y = count of events during a given period of time : multivariable Poisson regression y = e o + 1 x x x 3 If y = time to event: multivariable Cox regression y = h 0 (t) e o + 1 x x x 3

10 Expression of the results If y = continuous variable: multiple regression y = = o + 1 x x x 3 1 = «slope» for the risk factor x 1

11 Expression of the results If y = dichotomus variable: multivariable logistic regression Logit(y) = o + 1 x x x 3 1 e = odds ratio for the risk factor x 1

12 Expression of the results If y = count of events during a given period of time (t( i ) : multivariable Poisson s regression Ln(y/t i ) = o + 1 x x x 3 e 1 = relative risk of the occurrence of the event during the period of time

13 Expression of the results If y = time to event: multivariable Cox s regression Ln(y/h 0 (t)) = = o + 1 x x x 3 1 e = hazard ratio for the risk factor x 1

14 MULTIVARIATE REGRESSION

15 MULTIVARIATE REGRESSION y 1 0j 1j 2j 3j 1 y 2 y 3 = 0j 1j 2j 0j 1j 2j 3j 3j x x 1 x 2 x 3

16 2 1.5 MULTIVARIATE ANALYSIS Belgium-Luxembourg PROPOFOL Italy Germany Austria SUFENTANIL Finland Denmark Holland Portugal Sweden UK Ireland Switzerland Spain MORPHINE Norway -1 France FENTANYL MIDAZOLAM Soliman H.M., Mélot C., et al. Br. J. Anaesth. 2001;87:

17 THE MULTICOLLINEARITY PROBLEM Presence of multicollinearity is suggested when: the parameter estimates and associated t-t and p-values are highly unstable when variables are added in the model (lack of precision of the parameter estimates) selectivity bias is present (i.e., parameter estimate for a variable will be different depending on the order of entry in the model) examination of the correlation matrix reveals a high degree of correlation between some variables and a number of significant correlation coefficients (r > 0.8, i.e. r² > 0.64)

18 THE MULTICOLLINEARITY PROBLEM Determinant of the correlation matrix: lies in the interval [0, 1] equals zero if perfect multicollinearity is present, and unity if there is no multicollinearity. Bartlett s transformation converts this determinant into a c² statistics which tests: Ho: Det = 1 2 T with 1 2K 5 ln Det 6 K(K-1)/2 degrees of freedom T = number of observations K = number of variables

19 THE MULTICOLLINEARITY PROBLEM What can be done in the presence of multicollinearity? Supplementation of the original data with more information (additional observations) Combination of variables in a single variable Reduction of the dimensionality of the data: deletion from the equation of variables whose parameter estimates are affected by multicollinearity (loose of relevant predictors -> dangerous procedure) ad hoc statistical procedure: principal components analysis

20 AN EXAMPLE OF CORRELATION MATRIX r AGE SEX Height Weight BMP FEV1 RV FRC TLC AGE 1 SEX Height Weight BMP FEV RV FRC TLC r > 0.8 r² > 0.64

21 HOW TO BUILD AND VALIDATE A CLINICAL SCORE?

22 AN EXAMPLE QUESTION: How can we build an infection score usable in the intensive care setting? How can we validate the new score?

23 BUILDING STRATEGY 1st STEP: Criteria to define infection Garner JS, Jarvis WR, Emori TG, Horan TC, Hughes JM CDC definitions for nosocomial infections, 1988 Am J Infect Control 1988;16:

24 BUILDING STRATEGY 2nd STEP: Putative predictors easily collected in the intensive care setting CRP, WBC, RR, HR, TEMP SOFA, APACHE II, DayMV, Day HD

25 BUILDING STRATEGY 3rd STEP: Collecting the data and creating the database (353 patients) Continuous variables: AGE, CRP, WBC, RR, HR, TEMP, DayMV, DayHD, SOFA, APACHE II,... Discrete variables: GENDER, INFECTION,...

26 BUILDING STRATEGY 4th STEP: Searching for predictors of infection (1 = yes, 0 = no) Simple logistic regression

27 SIMPLE LOGISTIC REGRESSION Infect 1 e e CRP CRP Predictor Constant Variable p CRP, mg/100 ml WBC, count/mm³ RR, cycle/min HR, beat/min TEMP, C SOFA, points APACHE II, points AGE,yrs DayMV, days DayHD, days

28 LOGISTIC REGRESSION PROPORTION OF INFECTION = (x) e 1 + e CRP CRP CRP, mg/100 ml

29 LOGISTIC TRANSFORMATION (x) = e 0 x 1 + e 0 x (x) Logit [ (x) ] = ln [ ] 1 - (x)

30 LOGISTIC REGRESSION LOGIT Prop. of INFECTION Logit [ (x)] = CRP CRP, mg/100ml

31 BUILDING STRATEGY 5th STEP: Defining cut-off points for the continuous variables LOWESS Smoothing Cleveland WS Robust Locally Weighted Regression and Smoothing Scatterplots J Am Stat Assoc 1979;74:

32 LOWESS Smoothing A locally weighted regression method for smoothing bivariate scattergrams. Its tension parameter indicates what percentage of the dataset s values should be included in each window for the smoothing: A higher number produces a tighter smooth (with less response to local variances) A lower number produces a looser smooth (that is more strongly influenced by local variances) Cleveland WS. Robust Locally Weighted Regression and Smoothing Scatterplots. J Am Stat Assoc 1979;74:

33 LOWESS SMOOTHING with SCATTERPLOT Tension = 66 Temperature > Infection TEMPERATURE, C

34 LOWESS SMOOTHING with SCATTERPLOT Tension = 66 WBC < 5, ,000-12,000 2 > 12,000 3 Infection ,000 12, WBC, count/mm³

35 6th STEP: BUILDING STRATEGY Creating iso-weighted dummy variables for each cut-offs values for each variable

36 CREATING DUMMY VARIABLES WBC WBC_1 WBC_2 WBC < 5, WBC 5,000-12, WBC > 12, Nomber of levels - 1 = Nomber of dummy variables

37 BUILDING STRATEGY 7th STEP: Multiple logistic regression with all variables set to 1 or 0 Each coefficient given by the logistic regression is a measure of the relative weight of the level of the variable while controlling for all other variables retained in the model

38 MULTIPLE LOGISTIC REGRESSION Infect = e TEMP_ CRP_ WBC_ WBC_ HR_ HR_ RR_ SOFA_1 1 + e TEMP_ CRP_ WBC_ WBC_ HR_ HR_ RR_ SOFA_1

39 MULTIPLE LOGISTIC REGRESSION Dummy variables Cut-offs Coefficient SE p CRP_1 > 6 mg/100ml WBC_1 < 5,000/mm³ WBC_2 > 12,000/mm³ RR_1 > 25 c/min HR_ b/min HR_2 > 140 b/min TEMP_1 > 37.5 C SOFA_1 > 5 points

40 BUILDING STRATEGY 8th STEP: Creating the INFECTION PROBABILITY SCORE Each coefficient given by the logistic regression is transformed in a natural number.

41 CREATING THE SCORE Logistic LEVEL OF CERTAINTY OF INFECTION regression coefficients TEMP CRP WBC HR RR SOFA Rounded coefficients POINTS x x /

42 INFECTION PROBABILITY SCORE (IPS) POINTS TEMP C 37.5 > 37.5 CRP mg/100ml 6 > 6 WBC cells/mm³ 5,000-12,000 > 12,000 < 5,000 HR beats/min > 140 RR cycles/min 25 > 25 SOFA points 5 > 5 IPS varies from 0 to 26

43 BUILDING STRATEGY 9th STEP: Performance of the Score: Discrimination: correct prediction by the score Calibration/reliability

44 DISCRIMINATION: ROC CURVE ON THE BUILDING DATA SET (353 patients) 100 Sensitivity (IPS > 14) SENSITIVITY = 73.6 % SPECIFICITY = 77.9 % 20 AREA = % CI: Specificity PREVALENCE OF INFECTION: 91 / 353 = 25.8 % POSITIVE PREDICTIVE VALUE (IPS > 14) = 53.6 % NEGATIVE PREDICTIVE VALUE (IPS 14) = 89.5 %

45 CALIBRATION Infected patients n Range Prob Mean Prob Expected number Observed number Observed event rate Expected event rate C stat Hosmer-Lemeshow test C= 9.4 df 8, p 0.308

46 CALIBRATION OBserved event rate Hosmer Lemeshow test (C = 9.4, p = 0.308) Expected event rate

47 PREDICTIVE VALUES OF IPS PREDICTIVE VALUE (%) PREVALENCE (%) Positive PV Negative PV

48 VALIDATION SET 140 patients

49 ROC CURVE ON THE VALIDATION DATA SET (140 patients) 100 Sensitivity (IPS > 13) AREA = % CI: SENSITIVITY = 90.2 % SPECIFICITY = 76.8 % Specificity PREVALENCE OF INFECTION: 41 / 140 = 29.3 % POSITIVE PREDICTIVE VALUE (IPS > 15) = 61.7 % NEGATIVE PREDICTIVE VALUE (IPS 15) = 95.0 %

50 Relationship between discrimination and calibration.

51 Diamond GA, J Clin Epidemiol 1992;45:85-89

52 Diamond GA, J Clin Epidemiol 1992;45:85-89

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

Incorporating published univariable associations in diagnostic and prognostic modeling

Incorporating published univariable associations in diagnostic and prognostic modeling Incorporating published univariable associations in diagnostic and prognostic modeling Thomas Debray Julius Center for Health Sciences and Primary Care University Medical Center Utrecht The Netherlands

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

STATISTICS Relationships between variables: Correlation

STATISTICS Relationships between variables: Correlation STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.

More information

Trends in Human Development Index of European Union

Trends in Human Development Index of European Union Trends in Human Development Index of European Union Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey spxl@hacettepe.edu.tr, deryacal@hacettepe.edu.tr Abstract: The Human Development

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Introduction. 19 April 2012 Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable

More information

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

A framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis

A framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis A framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis Thomas Debray Moons KGM, Ahmed I, Koffijberg H, Riley RD Supported by

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Variance estimation on SILC based indicators

Variance estimation on SILC based indicators Variance estimation on SILC based indicators Emilio Di Meglio Eurostat emilio.di-meglio@ec.europa.eu Guillaume Osier STATEC guillaume.osier@statec.etat.lu 3rd EU-LFS/EU-SILC European User Conference 1

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the

More information

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure). 1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

More information

F M U Total. Total registrants at 31/12/2014. Profession AS 2, ,574 BS 15,044 7, ,498 CH 9,471 3, ,932

F M U Total. Total registrants at 31/12/2014. Profession AS 2, ,574 BS 15,044 7, ,498 CH 9,471 3, ,932 Profession AS 2,949 578 47 3,574 BS 15,044 7,437 17 22,498 CH 9,471 3,445 16 12,932 Total registrants at 31/12/2014 CS 2,944 2,290 0 5,234 DT 8,048 413 15 8,476 HAD 881 1,226 0 2,107 ODP 4,219 1,921 5,958

More information

The Flight of the Space Shuttle Challenger

The Flight of the Space Shuttle Challenger The Flight of the Space Shuttle Challenger On January 28, 1986, the space shuttle Challenger took off on the 25 th flight in NASA s space shuttle program. Less than 2 minutes into the flight, the spacecraft

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

Correspondence analysis and related methods

Correspondence analysis and related methods Michael Greenacre Universitat Pompeu Fabra www.globalsong.net www.youtube.com/statisticalsongs../carmenetwork../arcticfrontiers Correspondence analysis and related methods Middle East Technical University

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

Classification & Regression. Multicollinearity Intro to Nominal Data

Classification & Regression. Multicollinearity Intro to Nominal Data Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Testing and Model Selection

Testing and Model Selection Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses

More information

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment;

More information

1 ONE SAMPLE TEST FOR MEDIAN: THE SIGN TEST

1 ONE SAMPLE TEST FOR MEDIAN: THE SIGN TEST NON-PARAMETRIC STATISTICS ONE AND TWO SAMPLE TESTS Non-parametric tests are normally based on ranks of the data samples, and test hypotheses relating to quantiles of the probability distribution representing

More information

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56 References Aljandali, A. (2014). Exchange rate forecasting: Regional applications to ASEAN, CACM, MERCOSUR and SADC countries. Unpublished PhD thesis, London Metropolitan University, London. Aljandali,

More information

Course Econometrics I

Course Econometrics I Course Econometrics I 3. Multiple Regression Analysis: Binary Variables Martin Halla Johannes Kepler University of Linz Department of Economics Last update: April 29, 2014 Martin Halla CS Econometrics

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Chapter 6 Scatterplots, Association and Correlation

Chapter 6 Scatterplots, Association and Correlation Chapter 6 Scatterplots, Association and Correlation Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours 12 10 5 3 15 16 8 Grade 70

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Turning a research question into a statistical question.

Turning a research question into a statistical question. Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE

More information

RISK ASSESSMENT METHODOLOGIES FOR LANDSLIDES

RISK ASSESSMENT METHODOLOGIES FOR LANDSLIDES RISK ASSESSMENT METHODOLOGIES FOR LANDSLIDES Jean-Philippe MALET Olivier MAQUAIRE CNRS & CERG. Welcome to Paris! 1 Landslide RAMs Landslide RAM A method based on the use of available information to estimate

More information

Scatterplots and Correlation

Scatterplots and Correlation Chapter 4 Scatterplots and Correlation 2/15/2019 Chapter 4 1 Explanatory Variable and Response Variable Correlation describes linear relationships between quantitative variables X is the quantitative explanatory

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

Part A: Salmonella prevalence estimates. (Question N EFSA-Q ) Adopted by The Task Force on 28 March 2007

Part A: Salmonella prevalence estimates. (Question N EFSA-Q ) Adopted by The Task Force on 28 March 2007 The EFSA Journal (2007) 98, 1-85 Report of the Task Force on Zoonoses Data Collection on the Analysis of the baseline survey on the prevalence of Salmonella in broiler flocks of Gallus gallus, in the EU,

More information

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt Giovanni Nattino The Ohio Colleges of Medicine Government Resource Center The Ohio State University Stata Conference -

More information

Multivariate Analysis

Multivariate Analysis Prof. Dr. J. Franke All of Statistics 3.1 Multivariate Analysis High dimensional data X 1,..., X N, i.i.d. random vectors in R p. As a data matrix X: objects values of p features 1 X 11 X 12... X 1p 2.

More information

22s:152 Applied Linear Regression. Chapter 2: Regression Analysis. a class of statistical methods for

22s:152 Applied Linear Regression. Chapter 2: Regression Analysis. a class of statistical methods for 22s:152 Applied Linear Regression Chapter 2: Regression Analysis Regression analysis a class of statistical methods for studying relationships between variables that can be measured e.g. predicting blood

More information

Corporate Governance, and the Returns on Investment

Corporate Governance, and the Returns on Investment Corporate Governance, and the Returns on Investment Klaus Gugler, Dennis C. Mueller and B. Burcin Yurtoglu University of Vienna, Department of Economics BWZ, Bruennerstr. 72, A-1210, Vienna 1 Considerable

More information

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Multiple OLS Regression

Multiple OLS Regression Multiple OLS Regression Ronet Bachman, Ph.D. Presented by Justice Research and Statistics Association 12/8/2016 Justice Research and Statistics Association 720 7 th Street, NW, Third Floor Washington,

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics). Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

A tool to demystify regression modelling behaviour

A tool to demystify regression modelling behaviour A tool to demystify regression modelling behaviour Thomas Alexander Gerds 1 / 38 Appetizer Every child knows how regression analysis works. The essentials of regression modelling strategy, such as which

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

Online Appendix for Cultural Biases in Economic Exchange? Luigi Guiso Paola Sapienza Luigi Zingales

Online Appendix for Cultural Biases in Economic Exchange? Luigi Guiso Paola Sapienza Luigi Zingales Online Appendix for Cultural Biases in Economic Exchange? Luigi Guiso Paola Sapienza Luigi Zingales 1 Table A.1 The Eurobarometer Surveys The Eurobarometer surveys are the products of a unique program

More information

Ch. 3 Review - LSRL AP Stats

Ch. 3 Review - LSRL AP Stats Ch. 3 Review - LSRL AP Stats Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber

More information

Regression of Inflation on Percent M3 Change

Regression of Inflation on Percent M3 Change ECON 497 Final Exam Page of ECON 497: Economic Research and Forecasting Name: Spring 2006 Bellas Final Exam Return this exam to me by midnight on Thursday, April 27. It may be e-mailed to me. It may be

More information

Modelling structural change using broken sticks

Modelling structural change using broken sticks Modelling structural change using broken sticks Paul White, Don J. Webber and Angela Helvin Department of Mathematics and Statistics, University of the West of England, Bristol, UK Department of Economics,

More information

STAT 704 Sections IRLS and Bootstrap

STAT 704 Sections IRLS and Bootstrap STAT 704 Sections 11.4-11.5. IRLS and John Grego Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 14 LOWESS IRLS LOWESS LOWESS (LOcally WEighted Scatterplot Smoothing)

More information

Vocabulary: Data About Us

Vocabulary: Data About Us Vocabulary: Data About Us Two Types of Data Concept Numerical data: is data about some attribute that must be organized by numerical order to show how the data varies. For example: Number of pets Measure

More information

The Changing Nature of Gender Selection into Employment: Europe over the Great Recession

The Changing Nature of Gender Selection into Employment: Europe over the Great Recession The Changing Nature of Gender Selection into Employment: Europe over the Great Recession Juan J. Dolado 1 Cecilia Garcia-Peñalosa 2 Linas Tarasonis 2 1 European University Institute 2 Aix-Marseille School

More information

Meta-analysis. 21 May Per Kragh Andersen, Biostatistics, Dept. Public Health

Meta-analysis. 21 May Per Kragh Andersen, Biostatistics, Dept. Public Health Meta-analysis 21 May 2014 www.biostat.ku.dk/~pka Per Kragh Andersen, Biostatistics, Dept. Public Health pka@biostat.ku.dk 1 Meta-analysis Background: each single study cannot stand alone. This leads to

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Annotated Exam of Statistics 6C - Prof. M. Romanazzi

Annotated Exam of Statistics 6C - Prof. M. Romanazzi 1 Università di Venezia - Corso di Laurea Economics & Management Annotated Exam of Statistics 6C - Prof. M. Romanazzi March 17th, 2015 Full Name Matricola Total (nominal) score: 30/30 (2/30 for each question).

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

UNIVERSITY OF OTAGO EXAMINATIONS 2008

UNIVERSITY OF OTAGO EXAMINATIONS 2008 UNIVERSITY OF OTAGO EXAMINATIONS 2008 STATISTICS Paper STAT 242/342 Multivariate Methods (TIME ALLOWED: THREE HOURS) This examination paper comprises 25 pages. Candidates should answer questions as follows:

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

MATH ASSIGNMENT 2: SOLUTIONS

MATH ASSIGNMENT 2: SOLUTIONS MATH 204 - ASSIGNMENT 2: SOLUTIONS (a) Fitting the simple linear regression model to each of the variables in turn yields the following results: we look at t-tests for the individual coefficients, and

More information

Weighted Voting Games

Weighted Voting Games Weighted Voting Games Gregor Schwarz Computational Social Choice Seminar WS 2015/2016 Technische Universität München 01.12.2015 Agenda 1 Motivation 2 Basic Definitions 3 Solution Concepts Core Shapley

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Psych 230. Psychological Measurement and Statistics

Psych 230. Psychological Measurement and Statistics Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State

More information

Sample Size/Power Calculation by Software/Online Calculators

Sample Size/Power Calculation by Software/Online Calculators Sample Size/Power Calculation by Software/Online Calculators May 24, 2018 Li Zhang, Ph.D. li.zhang@ucsf.edu Associate Professor Department of Epidemiology and Biostatistics Division of Hematology and Oncology

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Extensions of Cox Model for Non-Proportional Hazards Purpose

Extensions of Cox Model for Non-Proportional Hazards Purpose PhUSE Annual Conference 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Author: Jadwiga Borucka PAREXEL, Warsaw, Poland Brussels 13 th - 16 th October 2013 Presentation Plan

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Shortfalls of Panel Unit Root Testing. Jack Strauss Saint Louis University. And. Taner Yigit Bilkent University. Abstract

Shortfalls of Panel Unit Root Testing. Jack Strauss Saint Louis University. And. Taner Yigit Bilkent University. Abstract Shortfalls of Panel Unit Root Testing Jack Strauss Saint Louis University And Taner Yigit Bilkent University Abstract This paper shows that (i) magnitude and variation of contemporaneous correlation are

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Interval-Based Composite Indicators

Interval-Based Composite Indicators University of Rome Niccolo Cusano Conference of European Statistics Stakeholders 22 November 2014 1 Building Composite Indicators 2 (ICI) 3 Constructing ICI 4 Application on real data Composite Indicators

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Chapter 10 Logistic Regression

Chapter 10 Logistic Regression Chapter 10 Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Logistic Regression Extends idea of linear regression to situation where outcome

More information

Logistic Regression in R. by Kerry Machemer 12/04/2015

Logistic Regression in R. by Kerry Machemer 12/04/2015 Logistic Regression in R by Kerry Machemer 12/04/2015 Linear Regression {y i, x i1,, x ip } Linear Regression y i = dependent variable & x i = independent variable(s) y i = α + β 1 x i1 + + β p x ip +

More information

The Information Content of Capacity Utilisation Rates for Output Gap Estimates

The Information Content of Capacity Utilisation Rates for Output Gap Estimates The Information Content of Capacity Utilisation Rates for Output Gap Estimates Michael Graff and Jan-Egbert Sturm 15 November 2010 Overview Introduction and motivation Data Output gap data: OECD Economic

More information

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between 7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation

More information

10/27/2015. Content. Well-homogenized national datasets. Difference (national global) BEST (1800) Difference BEST (1911) Difference GHCN & GISS (1911)

10/27/2015. Content. Well-homogenized national datasets. Difference (national global) BEST (1800) Difference BEST (1911) Difference GHCN & GISS (1911) Content Is the global mean temperature trend too low? Victor Venema, Phil Jones, Ralf Lindau, Tim Osborn and numerous collaborators @VariabilityBlog variable-variability.blogspot.com 1. Comparison trend

More information

ISO INTERNATIONAL STANDARD. Thermal bridges in building construction Linear thermal transmittance Simplified methods and default values

ISO INTERNATIONAL STANDARD. Thermal bridges in building construction Linear thermal transmittance Simplified methods and default values INTERNATIONAL STANDARD ISO 14683 First edition 1999-06-15 Thermal bridges in building construction Linear thermal transmittance Simplified methods and default values Points thermiques dans les bâtiments

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information