Elaboration d un score au départ d une base de données. Ch. Mélot, MD, PhD, MSciBiostat Service des Soins Intensifs Hôpital Universitaire Erasme
|
|
- Janel Foster
- 5 years ago
- Views:
Transcription
1 Elaboration d un score au départ d une base de données Ch. Mélot, MD, PhD, MSciBiostat Service des Soins Intensifs Hôpital Universitaire Erasme ESP, le 20 février 2006
2 METHODOLOGY
3 Scientific experiment Biological Hypothesis Clinical trial or Experiment Collection of data Statistical analysis Proof of the hypothesis Data fishing,, data dredging,, data snooping Collection of data Search for statistical significance Building of hypotheses
4 Data mining,, data dredging Data mining, also known as knowledge-discovery in databases (KDD) has been defined as The nontrivial extraction of implicit, previously unknown, and potentially useful information from data and The science of extracting useful information from large data sets or databases. Used in the technical context of data warehousing and analysis, the term data mining is neutral. Data dredging (data fishing, data snooping) is the term used to refer to the inappropriate (sometimes deliberately so) search for statistically significant relationships in large quantities of data. A key point is that one should not formulate a hypothesis as a result of seeing the data, at least not, if the data are then used as proof of the hypothesis.
5 Data mining,, data dredging If you want to work from data to hypotheses while avoiding the problems of data dredging, you need to collect a data set, then partition it into two subsets, A and B, with data items randomly placed in the two subsets. One subset - say, subset B - is examined for interesting hypotheses. Once a hypothesis has been formulated by examining subset B, the hypothesis can be tested on subset A, since subset A was not used to construct the hypothesis. Only where such a hypothesis is also supported by subset A is it reasonable to believe that the hypothesis might be valid.
6 MODELS
7 MULTIVARIABLE REGRESSION
8 MULTIVARIABLE REGRESSION If y = continuous variable: multiple regression y = = o + 1 x x x 3 If y = dichotomus variable: multivariable logistic régression y = e o + 1 x x x e o + 1 x x x 3
9 MULTIVARIABLE REGRESSION If y = count of events during a given period of time : multivariable Poisson regression y = e o + 1 x x x 3 If y = time to event: multivariable Cox regression y = h 0 (t) e o + 1 x x x 3
10 Expression of the results If y = continuous variable: multiple regression y = = o + 1 x x x 3 1 = «slope» for the risk factor x 1
11 Expression of the results If y = dichotomus variable: multivariable logistic regression Logit(y) = o + 1 x x x 3 1 e = odds ratio for the risk factor x 1
12 Expression of the results If y = count of events during a given period of time (t( i ) : multivariable Poisson s regression Ln(y/t i ) = o + 1 x x x 3 e 1 = relative risk of the occurrence of the event during the period of time
13 Expression of the results If y = time to event: multivariable Cox s regression Ln(y/h 0 (t)) = = o + 1 x x x 3 1 e = hazard ratio for the risk factor x 1
14 MULTIVARIATE REGRESSION
15 MULTIVARIATE REGRESSION y 1 0j 1j 2j 3j 1 y 2 y 3 = 0j 1j 2j 0j 1j 2j 3j 3j x x 1 x 2 x 3
16 2 1.5 MULTIVARIATE ANALYSIS Belgium-Luxembourg PROPOFOL Italy Germany Austria SUFENTANIL Finland Denmark Holland Portugal Sweden UK Ireland Switzerland Spain MORPHINE Norway -1 France FENTANYL MIDAZOLAM Soliman H.M., Mélot C., et al. Br. J. Anaesth. 2001;87:
17 THE MULTICOLLINEARITY PROBLEM Presence of multicollinearity is suggested when: the parameter estimates and associated t-t and p-values are highly unstable when variables are added in the model (lack of precision of the parameter estimates) selectivity bias is present (i.e., parameter estimate for a variable will be different depending on the order of entry in the model) examination of the correlation matrix reveals a high degree of correlation between some variables and a number of significant correlation coefficients (r > 0.8, i.e. r² > 0.64)
18 THE MULTICOLLINEARITY PROBLEM Determinant of the correlation matrix: lies in the interval [0, 1] equals zero if perfect multicollinearity is present, and unity if there is no multicollinearity. Bartlett s transformation converts this determinant into a c² statistics which tests: Ho: Det = 1 2 T with 1 2K 5 ln Det 6 K(K-1)/2 degrees of freedom T = number of observations K = number of variables
19 THE MULTICOLLINEARITY PROBLEM What can be done in the presence of multicollinearity? Supplementation of the original data with more information (additional observations) Combination of variables in a single variable Reduction of the dimensionality of the data: deletion from the equation of variables whose parameter estimates are affected by multicollinearity (loose of relevant predictors -> dangerous procedure) ad hoc statistical procedure: principal components analysis
20 AN EXAMPLE OF CORRELATION MATRIX r AGE SEX Height Weight BMP FEV1 RV FRC TLC AGE 1 SEX Height Weight BMP FEV RV FRC TLC r > 0.8 r² > 0.64
21 HOW TO BUILD AND VALIDATE A CLINICAL SCORE?
22 AN EXAMPLE QUESTION: How can we build an infection score usable in the intensive care setting? How can we validate the new score?
23 BUILDING STRATEGY 1st STEP: Criteria to define infection Garner JS, Jarvis WR, Emori TG, Horan TC, Hughes JM CDC definitions for nosocomial infections, 1988 Am J Infect Control 1988;16:
24 BUILDING STRATEGY 2nd STEP: Putative predictors easily collected in the intensive care setting CRP, WBC, RR, HR, TEMP SOFA, APACHE II, DayMV, Day HD
25 BUILDING STRATEGY 3rd STEP: Collecting the data and creating the database (353 patients) Continuous variables: AGE, CRP, WBC, RR, HR, TEMP, DayMV, DayHD, SOFA, APACHE II,... Discrete variables: GENDER, INFECTION,...
26 BUILDING STRATEGY 4th STEP: Searching for predictors of infection (1 = yes, 0 = no) Simple logistic regression
27 SIMPLE LOGISTIC REGRESSION Infect 1 e e CRP CRP Predictor Constant Variable p CRP, mg/100 ml WBC, count/mm³ RR, cycle/min HR, beat/min TEMP, C SOFA, points APACHE II, points AGE,yrs DayMV, days DayHD, days
28 LOGISTIC REGRESSION PROPORTION OF INFECTION = (x) e 1 + e CRP CRP CRP, mg/100 ml
29 LOGISTIC TRANSFORMATION (x) = e 0 x 1 + e 0 x (x) Logit [ (x) ] = ln [ ] 1 - (x)
30 LOGISTIC REGRESSION LOGIT Prop. of INFECTION Logit [ (x)] = CRP CRP, mg/100ml
31 BUILDING STRATEGY 5th STEP: Defining cut-off points for the continuous variables LOWESS Smoothing Cleveland WS Robust Locally Weighted Regression and Smoothing Scatterplots J Am Stat Assoc 1979;74:
32 LOWESS Smoothing A locally weighted regression method for smoothing bivariate scattergrams. Its tension parameter indicates what percentage of the dataset s values should be included in each window for the smoothing: A higher number produces a tighter smooth (with less response to local variances) A lower number produces a looser smooth (that is more strongly influenced by local variances) Cleveland WS. Robust Locally Weighted Regression and Smoothing Scatterplots. J Am Stat Assoc 1979;74:
33 LOWESS SMOOTHING with SCATTERPLOT Tension = 66 Temperature > Infection TEMPERATURE, C
34 LOWESS SMOOTHING with SCATTERPLOT Tension = 66 WBC < 5, ,000-12,000 2 > 12,000 3 Infection ,000 12, WBC, count/mm³
35 6th STEP: BUILDING STRATEGY Creating iso-weighted dummy variables for each cut-offs values for each variable
36 CREATING DUMMY VARIABLES WBC WBC_1 WBC_2 WBC < 5, WBC 5,000-12, WBC > 12, Nomber of levels - 1 = Nomber of dummy variables
37 BUILDING STRATEGY 7th STEP: Multiple logistic regression with all variables set to 1 or 0 Each coefficient given by the logistic regression is a measure of the relative weight of the level of the variable while controlling for all other variables retained in the model
38 MULTIPLE LOGISTIC REGRESSION Infect = e TEMP_ CRP_ WBC_ WBC_ HR_ HR_ RR_ SOFA_1 1 + e TEMP_ CRP_ WBC_ WBC_ HR_ HR_ RR_ SOFA_1
39 MULTIPLE LOGISTIC REGRESSION Dummy variables Cut-offs Coefficient SE p CRP_1 > 6 mg/100ml WBC_1 < 5,000/mm³ WBC_2 > 12,000/mm³ RR_1 > 25 c/min HR_ b/min HR_2 > 140 b/min TEMP_1 > 37.5 C SOFA_1 > 5 points
40 BUILDING STRATEGY 8th STEP: Creating the INFECTION PROBABILITY SCORE Each coefficient given by the logistic regression is transformed in a natural number.
41 CREATING THE SCORE Logistic LEVEL OF CERTAINTY OF INFECTION regression coefficients TEMP CRP WBC HR RR SOFA Rounded coefficients POINTS x x /
42 INFECTION PROBABILITY SCORE (IPS) POINTS TEMP C 37.5 > 37.5 CRP mg/100ml 6 > 6 WBC cells/mm³ 5,000-12,000 > 12,000 < 5,000 HR beats/min > 140 RR cycles/min 25 > 25 SOFA points 5 > 5 IPS varies from 0 to 26
43 BUILDING STRATEGY 9th STEP: Performance of the Score: Discrimination: correct prediction by the score Calibration/reliability
44 DISCRIMINATION: ROC CURVE ON THE BUILDING DATA SET (353 patients) 100 Sensitivity (IPS > 14) SENSITIVITY = 73.6 % SPECIFICITY = 77.9 % 20 AREA = % CI: Specificity PREVALENCE OF INFECTION: 91 / 353 = 25.8 % POSITIVE PREDICTIVE VALUE (IPS > 14) = 53.6 % NEGATIVE PREDICTIVE VALUE (IPS 14) = 89.5 %
45 CALIBRATION Infected patients n Range Prob Mean Prob Expected number Observed number Observed event rate Expected event rate C stat Hosmer-Lemeshow test C= 9.4 df 8, p 0.308
46 CALIBRATION OBserved event rate Hosmer Lemeshow test (C = 9.4, p = 0.308) Expected event rate
47 PREDICTIVE VALUES OF IPS PREDICTIVE VALUE (%) PREVALENCE (%) Positive PV Negative PV
48 VALIDATION SET 140 patients
49 ROC CURVE ON THE VALIDATION DATA SET (140 patients) 100 Sensitivity (IPS > 13) AREA = % CI: SENSITIVITY = 90.2 % SPECIFICITY = 76.8 % Specificity PREVALENCE OF INFECTION: 41 / 140 = 29.3 % POSITIVE PREDICTIVE VALUE (IPS > 15) = 61.7 % NEGATIVE PREDICTIVE VALUE (IPS 15) = 95.0 %
50 Relationship between discrimination and calibration.
51 Diamond GA, J Clin Epidemiol 1992;45:85-89
52 Diamond GA, J Clin Epidemiol 1992;45:85-89
Statistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationUnit 11: Multiple Linear Regression
Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationIncorporating published univariable associations in diagnostic and prognostic modeling
Incorporating published univariable associations in diagnostic and prognostic modeling Thomas Debray Julius Center for Health Sciences and Primary Care University Medical Center Utrecht The Netherlands
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationSTATISTICS Relationships between variables: Correlation
STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.
More informationTrends in Human Development Index of European Union
Trends in Human Development Index of European Union Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey spxl@hacettepe.edu.tr, deryacal@hacettepe.edu.tr Abstract: The Human Development
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationPh.D. course: Regression models. Introduction. 19 April 2012
Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable
More informationPh.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status
Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationA framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis
A framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis Thomas Debray Moons KGM, Ahmed I, Koffijberg H, Riley RD Supported by
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationVariance estimation on SILC based indicators
Variance estimation on SILC based indicators Emilio Di Meglio Eurostat emilio.di-meglio@ec.europa.eu Guillaume Osier STATEC guillaume.osier@statec.etat.lu 3rd EU-LFS/EU-SILC European User Conference 1
More informationChapter 19: Logistic regression
Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog
More informationAnalysis of Time-to-Event Data: Chapter 6 - Regression diagnostics
Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the
More information7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).
1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that
More informationF M U Total. Total registrants at 31/12/2014. Profession AS 2, ,574 BS 15,044 7, ,498 CH 9,471 3, ,932
Profession AS 2,949 578 47 3,574 BS 15,044 7,437 17 22,498 CH 9,471 3,445 16 12,932 Total registrants at 31/12/2014 CS 2,944 2,290 0 5,234 DT 8,048 413 15 8,476 HAD 881 1,226 0 2,107 ODP 4,219 1,921 5,958
More informationThe Flight of the Space Shuttle Challenger
The Flight of the Space Shuttle Challenger On January 28, 1986, the space shuttle Challenger took off on the 25 th flight in NASA s space shuttle program. Less than 2 minutes into the flight, the spacecraft
More informationLogistic Regression. Continued Psy 524 Ainsworth
Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression
More informationCorrespondence analysis and related methods
Michael Greenacre Universitat Pompeu Fabra www.globalsong.net www.youtube.com/statisticalsongs../carmenetwork../arcticfrontiers Correspondence analysis and related methods Middle East Technical University
More informationChecking model assumptions with regression diagnostics
@graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor
More informationClassification & Regression. Multicollinearity Intro to Nominal Data
Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationTesting and Model Selection
Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses
More informationPredict y from (possibly) many predictors x. Model Criticism Study the importance of columns
Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment;
More information1 ONE SAMPLE TEST FOR MEDIAN: THE SIGN TEST
NON-PARAMETRIC STATISTICS ONE AND TWO SAMPLE TESTS Non-parametric tests are normally based on ranks of the data samples, and test hypotheses relating to quantiles of the probability distribution representing
More information176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56
References Aljandali, A. (2014). Exchange rate forecasting: Regional applications to ASEAN, CACM, MERCOSUR and SADC countries. Unpublished PhD thesis, London Metropolitan University, London. Aljandali,
More informationCourse Econometrics I
Course Econometrics I 3. Multiple Regression Analysis: Binary Variables Martin Halla Johannes Kepler University of Linz Department of Economics Last update: April 29, 2014 Martin Halla CS Econometrics
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationChapter 6 Scatterplots, Association and Correlation
Chapter 6 Scatterplots, Association and Correlation Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours 12 10 5 3 15 16 8 Grade 70
More informationMore Statistics tutorial at Logistic Regression and the new:
Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual
More informationMultiple linear regression S6
Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple
More informationTurning a research question into a statistical question.
Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE
More informationRISK ASSESSMENT METHODOLOGIES FOR LANDSLIDES
RISK ASSESSMENT METHODOLOGIES FOR LANDSLIDES Jean-Philippe MALET Olivier MAQUAIRE CNRS & CERG. Welcome to Paris! 1 Landslide RAMs Landslide RAM A method based on the use of available information to estimate
More informationScatterplots and Correlation
Chapter 4 Scatterplots and Correlation 2/15/2019 Chapter 4 1 Explanatory Variable and Response Variable Correlation describes linear relationships between quantitative variables X is the quantitative explanatory
More informationwith the usual assumptions about the error term. The two values of X 1 X 2 0 1
Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The
More informationPart A: Salmonella prevalence estimates. (Question N EFSA-Q ) Adopted by The Task Force on 28 March 2007
The EFSA Journal (2007) 98, 1-85 Report of the Task Force on Zoonoses Data Collection on the Analysis of the baseline survey on the prevalence of Salmonella in broiler flocks of Gallus gallus, in the EU,
More informationAssessing the Calibration of Dichotomous Outcome Models with the Calibration Belt
Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt Giovanni Nattino The Ohio Colleges of Medicine Government Resource Center The Ohio State University Stata Conference -
More informationMultivariate Analysis
Prof. Dr. J. Franke All of Statistics 3.1 Multivariate Analysis High dimensional data X 1,..., X N, i.i.d. random vectors in R p. As a data matrix X: objects values of p features 1 X 11 X 12... X 1p 2.
More information22s:152 Applied Linear Regression. Chapter 2: Regression Analysis. a class of statistical methods for
22s:152 Applied Linear Regression Chapter 2: Regression Analysis Regression analysis a class of statistical methods for studying relationships between variables that can be measured e.g. predicting blood
More informationCorporate Governance, and the Returns on Investment
Corporate Governance, and the Returns on Investment Klaus Gugler, Dennis C. Mueller and B. Burcin Yurtoglu University of Vienna, Department of Economics BWZ, Bruennerstr. 72, A-1210, Vienna 1 Considerable
More informationLogistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ
Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent
More informationˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.
Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationMultiple OLS Regression
Multiple OLS Regression Ronet Bachman, Ph.D. Presented by Justice Research and Statistics Association 12/8/2016 Justice Research and Statistics Association 720 7 th Street, NW, Third Floor Washington,
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More information-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).
Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More informationA tool to demystify regression modelling behaviour
A tool to demystify regression modelling behaviour Thomas Alexander Gerds 1 / 38 Appetizer Every child knows how regression analysis works. The essentials of regression modelling strategy, such as which
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More information26:010:557 / 26:620:557 Social Science Research Methods
26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview
More informationOnline Appendix for Cultural Biases in Economic Exchange? Luigi Guiso Paola Sapienza Luigi Zingales
Online Appendix for Cultural Biases in Economic Exchange? Luigi Guiso Paola Sapienza Luigi Zingales 1 Table A.1 The Eurobarometer Surveys The Eurobarometer surveys are the products of a unique program
More informationCh. 3 Review - LSRL AP Stats
Ch. 3 Review - LSRL AP Stats Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber
More informationRegression of Inflation on Percent M3 Change
ECON 497 Final Exam Page of ECON 497: Economic Research and Forecasting Name: Spring 2006 Bellas Final Exam Return this exam to me by midnight on Thursday, April 27. It may be e-mailed to me. It may be
More informationModelling structural change using broken sticks
Modelling structural change using broken sticks Paul White, Don J. Webber and Angela Helvin Department of Mathematics and Statistics, University of the West of England, Bristol, UK Department of Economics,
More informationSTAT 704 Sections IRLS and Bootstrap
STAT 704 Sections 11.4-11.5. IRLS and John Grego Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 14 LOWESS IRLS LOWESS LOWESS (LOcally WEighted Scatterplot Smoothing)
More informationVocabulary: Data About Us
Vocabulary: Data About Us Two Types of Data Concept Numerical data: is data about some attribute that must be organized by numerical order to show how the data varies. For example: Number of pets Measure
More informationThe Changing Nature of Gender Selection into Employment: Europe over the Great Recession
The Changing Nature of Gender Selection into Employment: Europe over the Great Recession Juan J. Dolado 1 Cecilia Garcia-Peñalosa 2 Linas Tarasonis 2 1 European University Institute 2 Aix-Marseille School
More informationMeta-analysis. 21 May Per Kragh Andersen, Biostatistics, Dept. Public Health
Meta-analysis 21 May 2014 www.biostat.ku.dk/~pka Per Kragh Andersen, Biostatistics, Dept. Public Health pka@biostat.ku.dk 1 Meta-analysis Background: each single study cannot stand alone. This leads to
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationThis gives us an upper and lower bound that capture our population mean.
Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when
More informationAnnotated Exam of Statistics 6C - Prof. M. Romanazzi
1 Università di Venezia - Corso di Laurea Economics & Management Annotated Exam of Statistics 6C - Prof. M. Romanazzi March 17th, 2015 Full Name Matricola Total (nominal) score: 30/30 (2/30 for each question).
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationUNIVERSITY OF OTAGO EXAMINATIONS 2008
UNIVERSITY OF OTAGO EXAMINATIONS 2008 STATISTICS Paper STAT 242/342 Multivariate Methods (TIME ALLOWED: THREE HOURS) This examination paper comprises 25 pages. Candidates should answer questions as follows:
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationMATH ASSIGNMENT 2: SOLUTIONS
MATH 204 - ASSIGNMENT 2: SOLUTIONS (a) Fitting the simple linear regression model to each of the variables in turn yields the following results: we look at t-tests for the individual coefficients, and
More informationWeighted Voting Games
Weighted Voting Games Gregor Schwarz Computational Social Choice Seminar WS 2015/2016 Technische Universität München 01.12.2015 Agenda 1 Motivation 2 Basic Definitions 3 Solution Concepts Core Shapley
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationPsych 230. Psychological Measurement and Statistics
Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State
More informationSample Size/Power Calculation by Software/Online Calculators
Sample Size/Power Calculation by Software/Online Calculators May 24, 2018 Li Zhang, Ph.D. li.zhang@ucsf.edu Associate Professor Department of Epidemiology and Biostatistics Division of Hematology and Oncology
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationExtensions of Cox Model for Non-Proportional Hazards Purpose
PhUSE Annual Conference 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Author: Jadwiga Borucka PAREXEL, Warsaw, Poland Brussels 13 th - 16 th October 2013 Presentation Plan
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationShortfalls of Panel Unit Root Testing. Jack Strauss Saint Louis University. And. Taner Yigit Bilkent University. Abstract
Shortfalls of Panel Unit Root Testing Jack Strauss Saint Louis University And Taner Yigit Bilkent University Abstract This paper shows that (i) magnitude and variation of contemporaneous correlation are
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationTrendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues
Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +
More informationInterval-Based Composite Indicators
University of Rome Niccolo Cusano Conference of European Statistics Stakeholders 22 November 2014 1 Building Composite Indicators 2 (ICI) 3 Constructing ICI 4 Application on real data Composite Indicators
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationChapter 10 Logistic Regression
Chapter 10 Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Logistic Regression Extends idea of linear regression to situation where outcome
More informationLogistic Regression in R. by Kerry Machemer 12/04/2015
Logistic Regression in R by Kerry Machemer 12/04/2015 Linear Regression {y i, x i1,, x ip } Linear Regression y i = dependent variable & x i = independent variable(s) y i = α + β 1 x i1 + + β p x ip +
More informationThe Information Content of Capacity Utilisation Rates for Output Gap Estimates
The Information Content of Capacity Utilisation Rates for Output Gap Estimates Michael Graff and Jan-Egbert Sturm 15 November 2010 Overview Introduction and motivation Data Output gap data: OECD Economic
More information7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between
7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation
More information10/27/2015. Content. Well-homogenized national datasets. Difference (national global) BEST (1800) Difference BEST (1911) Difference GHCN & GISS (1911)
Content Is the global mean temperature trend too low? Victor Venema, Phil Jones, Ralf Lindau, Tim Osborn and numerous collaborators @VariabilityBlog variable-variability.blogspot.com 1. Comparison trend
More informationISO INTERNATIONAL STANDARD. Thermal bridges in building construction Linear thermal transmittance Simplified methods and default values
INTERNATIONAL STANDARD ISO 14683 First edition 1999-06-15 Thermal bridges in building construction Linear thermal transmittance Simplified methods and default values Points thermiques dans les bâtiments
More informationBIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY
BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1
More information