Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

Size: px
Start display at page:

Download "Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System"

Transcription

1 PREP Course #13: Introduction to Exploratory Data Analysis and Data Transformations (Part 2) Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

2 CME Disclosure Statement The North Shore LIJ Health System adheres to the ACCME s new Standards for Commercial Support. Any individuals in a position to control the content of a CME activity, including faculty, planners, and managers, are required to disclose all financial relationships with commercial interests. All identified potential conflicts of interest are thoroughly vetted by the North Shore-LIJ for fair balance and scientific objectivity and to ensure appropriateness of patient care recommendations. Course Director and Course Planners, Kevin Tracey, MD, Cynthia Hahn, Emmelyn Kim, MPH, Tina Chuck, MPH have nothing to disclose. Martin L Lesser, PhD, EMT-CC have nothing to disclose

3 Why Transform* Data? 1. Classical Inference a. To achieve homoscedasticity (ANOVA, t-test do not work with unequal variances) b. To achieve normality c. To straighten out plots d. To conform to known physical laws 2. Exploratory Data Analysis (EDA) a. To symmetrize/normalize b. To explore data c. To compare distributions d. To linearize plots e. To create confusion (??) * EDAers use the work re-express 3

4 Variance Stabilization Initial Problem: To compare locations of several distributions However, this can be problematic when the spreads of the distributions are different. We would like spread to be (fairly) independent of location. We can use transformations to create distributions where location and spread are unrelated. 4

5 Variance as a Function of the Mean Suppose Var X = f (μ) and let Y dt f(t) Then Var X Var Y f (μ) = 1 * Bartlett, 1947 Snedecor and Cochran, p.325 5

6 Well Known Examples 1. Var X = cμ (e.g. Poisson) Y = x 2. Var X = cμ 2 (or, sd(x) = kμ) Y = log X 3. Var X = cμ (A μ) (e.g. binomial) Y = sin -1 x A * Such transformations render the variance independent of μ. 6

7 Investigating Heteroscedasticity Pictures are most informative (and easy to produce)! For Instance: Plot: s vs. Mean s 2 vs. Mean Median vs. IQR Example 5: Four studies of VPC frequency: (VPC rate is given as VPC Frequency / 100,000 Beats) VPC Rates By Study Study A B C D MEAN VAR SD

8 VPC Rates among 4 Studies Study A B C D MEAN VAR SD S 2 vs. X VAR S vs. X SD 8

9 LOG (VPC+.002) among 4 Studies Study A B C D MEAN VAR SD S 2 vs. X VAR S vs. X SD 9

10 Straightening X-Y Plots Regression and Correlation Transforming X and/or Y to yield a straight line relationship makes analysis simpler. 1. Interpolation is simpler 2. Interpretation is (usually) simpler 3. Departures from fit are more clearly detected Shapes of curves of form y = x P p > 1 p < 1 10

11 Which Transformation Straightens the Plot? (How to choose p) Recall Ladder of Powers: y = x P p =, -3, -2, -1, -1/2, (0), 1, 2, 3, The Bulging Rule y y up y up x down x up x down x up y down y down 11

12 Investigating the Bulge Some Plots show a bulge clearly: Y Y up X down e.g. Y = X 1/2 might work X 12

13 Investigating the Bulge (cont d) Some Plots don t show a bulge so clearly: Y X How do we find the bulge? use half-slopes 13

14 Half-Slopes 1. Divide X-values into (approx.) thirds. 2. For each third, compute median (X) and median (Y). X L X M X R (L=Left M=Middle R= Right) Y L Y M Y R 3. Compute Half-Slopes b L Y X M M - - Y X L L b R Y X R R - - Y X M M 4. Look at R b b L R If R 1 Then XY relationship is straight. If R 1 Then a transformation may help. 14

15 Example of Half-Slopes y L y M y R x L x M x R L M R 15

16 Interpretation and Reporting Although a transformation may help to analyze the data, it may be worth the difficulty to explain or understand. Some transformations are easier to understand than others (e.g. logx, 10 x, x 2, x,1/x ) Suppose we choose 4 x, how do we explain this? 1. We could invert the transform. e.g. f 1 f ( quantile (x) ) is ok. But f -1 f (mean (x)) mean f -1 f (x) e.g. x 1 x 2 n... x n x1... n x n Note: e l o gx n i (x, 1 x 2,...,x n 1 n ) geometricmean 16

17 Inversion: An Example Peak common Bile Duct Pressure Recally -10 peak seem edfairly norm al Usual 95% CI for median (y) is Median 1.96 Note: This assumes that MedianMean and S H - SPR EAD H - SPR EAD isestimateof σ CI = (1.96) (1.48) = (-10.08, ) Inverting: 100 peak 2 y CI for median (peak) = (.98, 5.46) 17

18 What about using nonparametric procedures and forgetting about transformations? 1. EDA is not as concerned with inference as it is with description 2. For inference, non-parametrics may be fine, but description is still needed 3. Efficiency questions 18

19 19

20 Scales in Music Transformations in Everyday Experience by David C Hoaglin Chance Vol.1, No. 4, 1988, Springer-Verlag, New York Octave: If note with frequency f2 is and octave above note f1, then f2 = 2*f1. Thus, in base 2 units: logf2 logf1 = log 2 2 = 1 octave. There are 12 notes in an octave, all equispaced, resulting in 12 intervals of size Richter Scale for Earthquakes Richter scale: Strength of an earthquake is expressed in log base 10 units. The seismic energy E (in ergs) released by an earthquake of magnitude M can be estimated as log E = M. Thus, an earthquake of magnitude 7 releases about 3 times as much energy of one of magnitude 5 (10 1.5*(7-5) = 1,000).

21 Decibels Sound is actually measured by pressure (dynes/square cm). An increase in 20 db corresponds to a tenfold increase in sound pressure. L p = 20 log 10 (p/0.0002) db, where is the internationally accepted minimum audible sound pressure at 1,000 Hz (i.e., dyne/cm 2, rms). Average Speed in Auto Races Measure speed in mph. Race officials, however, measure elapsed time. Thus, average speed is reciprocal of elapsed time * distance around track * number of laps completed.

22 Gasoline Consumption Usually expressed as miles per gallon. Sometimes (e.g., in consumer testing) the number of gallons per trip (i.e., the reciprocal) is given. ph ph = -log 10 [H + ] Lenses and Cameras f-stops on aperture setting: f= ratio of focal length of lens (L) to the diameter of the aperture (d). Suppose a camera has f-stops of f= 1.4, 2.0, 2.8, 4.0, 5.6, 8.0, 11.0, This is a geometric progression, where each term is sqrt(2) times the preceding one. So 1 f-stop change from 5.6 to 8.0 halves the area of the aperture. (Because A = π (d/2) 2 = π (L/(2f)) 2 ). The f-stop involves several transformations.

23 Field Position in Football The 100-yard football field is normally described in terms of how far the line of scrimmage is from the 50-yard line, rather than how far, in absolute yards, the line is from a fixed goal line. We don t say that the Giants are 80 yards from the goal line ; we say the Giants are on the Redskins 20 yard line. Time of Goal in Hockey The time clock in hockey starts at 20 minutes and counts down to 0 minutes so that one always knows how many minutes are left in the period. However, when a goal is scored (or a penalty is assessed), the official time is announced as how many minutes into the period the goal was scored. Example: Time of the goal is 5 minutes and 30 seconds into the second period. One would have to convert this time into 14 minutes and 30 seconds left in the second period.

24 Copper Wire Gauge The larger the gauge, the smaller the wire s diameter. Gauges run from 0000, 000,00, 0, 1, 2,, 56. By definition, d gauge = d 0 r gauge and d 0 can be shown to equal mils. Shotgun Size Shotgun size is usually decribed by its gauge, 10-gauge, 12-gauge, 20-gauge, for example. Gauge (which represents the bore diameter in millimeters) was originally derived as the number of lead balls with that same diameter that together weighed one pound. Example: a bore-sized lead ball for 12-gauge shotgun weighed 1/12 pound. Thus, a close approximation for gauge is that it is proportional to the reciprocal cube of the bore diameter. Gauge Bore Diameter

Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System PREP Course #10: Introduction to Exploratory Data Analysis and Data Transformations (Part 1) Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

More information

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight AP Statistics Chapter 9 Re-Expressing data: Get it Straight Objectives: Re-expression of data Ladder of powers Straight to the Point We cannot use a linear model unless the relationship between the two

More information

PREP Course 13: Radiation Safety for Laboratory Research. William Robeson Radiology Service Line Physicist

PREP Course 13: Radiation Safety for Laboratory Research. William Robeson Radiology Service Line Physicist PREP Course 13: Radiation Safety for Laboratory Research William Robeson Radiology Service Line Physicist CME Disclosure Statement The North Shore LIJ Health System adheres to the ACCME s new Standards

More information

7.2 Trapezoidal Approximation

7.2 Trapezoidal Approximation 7. Trapezoidal Approximation NOTES Write your questions here! Riemann Sum f(x) = x + 1 [1,3] f(x) = x + 1 The Definite Integral b f(x)dx a Trapezoidal Approximation for interval [1,3] with n subintervals

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

More information

MATH 2200 PROBABILITY AND STATISTICS M2200FL081.1

MATH 2200 PROBABILITY AND STATISTICS M2200FL081.1 MATH 2200 PROBABILITY AND STATISTICS M2200FL081.1 In almost all problems, I have given the answers to four significant digits. If your answer is slightly different from one of mine, consider that to be

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Order of Operations Expression Variable Coefficient

More information

Predicted Y Scores. The symbol stands for a predicted Y score

Predicted Y Scores. The symbol stands for a predicted Y score REGRESSION 1 Linear Regression Linear regression is a statistical procedure that uses relationships to predict unknown Y scores based on the X scores from a correlated variable. 2 Predicted Y Scores Y

More information

16.1 Properties of Logarithms

16.1 Properties of Logarithms Name Class Date 16.1 Properties of Logarithms Essential Question: What are the properties of logarithms? A2.5.C Rewrite exponential equations as their corresponding logarithmic equations and logarithmic

More information

SOLUTIONS FOR PROBLEMS 1-30

SOLUTIONS FOR PROBLEMS 1-30 . Answer: 5 Evaluate x x + 9 for x SOLUTIONS FOR PROBLEMS - 0 When substituting x in x be sure to do the exponent before the multiplication by to get (). + 9 5 + When multiplying ( ) so that ( 7) ( ).

More information

Correlation & Linear Regression. Slides adopted fromthe Internet

Correlation & Linear Regression. Slides adopted fromthe Internet Correlation & Linear Regression Slides adopted fromthe Internet Roadmap Linear Correlation Spearman s rho correlation Kendall s tau correlation Linear regression Linear correlation Recall: Covariance n

More information

Algebra II Vocabulary Cards

Algebra II Vocabulary Cards Algebra II Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Complex Numbers Complex Number (examples)

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Algebra II Vocabulary Cards

Algebra II Vocabulary Cards Algebra II Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Complex Numbers Complex Number (examples)

More information

Algebra, Functions, and Data Analysis Vocabulary Cards

Algebra, Functions, and Data Analysis Vocabulary Cards Algebra, Functions, and Data Analysis Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Complex Numbers

More information

Why Data Transformation? Data Transformation. Homoscedasticity and Normality. Homoscedasticity and Normality

Why Data Transformation? Data Transformation. Homoscedasticity and Normality. Homoscedasticity and Normality Objectives: Data Transformation Understand why we often need to transform our data The three commonly used data transformation techniques Additive effects and multiplicative effects Application of data

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Chapter 2: Rocket Launch

Chapter 2: Rocket Launch Chapter 2: Rocket Launch Lesson 2.1.1. 2-1. Domain:!" # x # " Range: 2! y! " y-intercept! y = 2 no x-intercepts 2-2. a. Time Hours sitting Amount Earned 8PM 1 $4 9PM 2 $4*2hrs = $8 10PM 3 $4*3hrs = $12

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

A Discussion of the Bayesian Approach

A Discussion of the Bayesian Approach A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248) AIM HIGH SCHOOL Curriculum Map 2923 W. 12 Mile Road Farmington Hills, MI 48334 (248) 702-6922 www.aimhighschool.com COURSE TITLE: Statistics DESCRIPTION OF COURSE: PREREQUISITES: Algebra 2 Students will

More information

TRANSFORMATIONS TO OBTAIN EQUAL VARIANCE. Apply to ANOVA situation with unequal variances: are a function of group means µ i, say

TRANSFORMATIONS TO OBTAIN EQUAL VARIANCE. Apply to ANOVA situation with unequal variances: are a function of group means µ i, say 1 TRANSFORMATIONS TO OBTAIN EQUAL VARIANCE General idea for finding variance-stabilizing transformations: Response Y µ = E(Y)! = Var(Y) U = f(y) First order Taylor approximation for f around µ: U " f(µ)

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

If you need more room, use the backs of the pages and indicate that you have done so.

If you need more room, use the backs of the pages and indicate that you have done so. Math 125 Final Exam Winter 2018 Your Name Your Signature Student ID # Quiz Section Professor s Name TA s Name Turn off and stow away all cell phones, watches, pagers, music players, and other similar devices.

More information

Intro to Stats Lecture 11

Intro to Stats Lecture 11 Outliers and influential points Intro to Stats Lecture 11 Collect data this week! Midterm is coming! Terms X outliers: observations outlying the overall pattern of the X- variable Y outliers: observations

More information

6.7 Variation and Problem Solving. OBJECTIVES 1 Solve Problems Involving Direct Variation. 2 Solve Problems Involving Inverse Variation.

6.7 Variation and Problem Solving. OBJECTIVES 1 Solve Problems Involving Direct Variation. 2 Solve Problems Involving Inverse Variation. 390 CHAPTER 6 Rational Epressions 66. A doctor recorded a body-mass inde of 7 on a patient s chart. Later, a nurse notices that the doctor recorded the patient s weight as 0 pounds but neglected to record

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Central Limit Theorem Confidence Intervals Worked example #6. July 24, 2017

Central Limit Theorem Confidence Intervals Worked example #6. July 24, 2017 Central Limit Theorem Confidence Intervals Worked example #6 July 24, 2017 10 8 Raw scores 6 4 Mean=71.4% 2 0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90+ Scaling is to add 3.6% to bring mean

More information

COMPARING GROUPS PART 1CONTINUOUS DATA

COMPARING GROUPS PART 1CONTINUOUS DATA COMPARING GROUPS PART 1CONTINUOUS DATA Min Chen, Ph.D. Assistant Professor Quantitative Biomedical Research Center Department of Clinical Sciences Bioinformatics Shared Resource Simmons Comprehensive Cancer

More information

MODULE 7 UNIVARIATE EDA - QUANTITATIVE

MODULE 7 UNIVARIATE EDA - QUANTITATIVE MODULE 7 UNIVARIATE EDA - QUANTITATIVE Contents 7.1 Interpreting Shape........................................ 46 7.2 Interpreting Outliers....................................... 47 7.3 Comparing the Median

More information

Chapter 2 Linear Motion

Chapter 2 Linear Motion Chapter 2 Linear Motion Conceptual Questions 2.1 An object will slow down when its acceleration vector points in the opposite direction to its velocity vector. Recall that acceleration is the change in

More information

Chapter 10 Re-expressing Data: Get It Straight!

Chapter 10 Re-expressing Data: Get It Straight! Chapter 0 Re-expressing Data: Get It Straight! 23 Chapter 0 Re-expressing Data: Get It Straight!. s. a) The residuals plot shows no pattern. No re-expression is needed. b) The residuals plot shows a curved

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,

More information

Assessing Model Adequacy

Assessing Model Adequacy Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for inferences. In cases where some assumptions are violated, there are

More information

Sem. 1 Review Ch. 1-3

Sem. 1 Review Ch. 1-3 AP Stats Sem. 1 Review Ch. 1-3 Name 1. You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of variables you have measured is a. 1463; all quantitative. b.

More information

MATHEMATICS METHODS. Calculator-assumed. Sample WACE Examination Marking Key

MATHEMATICS METHODS. Calculator-assumed. Sample WACE Examination Marking Key MATHEMATICS METHODS Calculator-assumed Sample WACE Examination 016 Marking Key Marking keys are an explicit statement about what the examiner expects of candidates when they respond to a question. They

More information

3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

More information

Biostatistics 4: Trends and Differences

Biostatistics 4: Trends and Differences Biostatistics 4: Trends and Differences Dr. Jessica Ketchum, PhD. email: McKinneyJL@vcu.edu Objectives 1) Know how to see the strength, direction, and linearity of relationships in a scatter plot 2) Interpret

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

CSCI2244-Randomness and Computation First Exam with Solutions

CSCI2244-Randomness and Computation First Exam with Solutions CSCI2244-Randomness and Computation First Exam with Solutions March 1, 2018 Each part of each problem is worth 5 points. There are actually two parts to Problem 2, since you are asked to compute two probabilities.

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov MSc / PhD Course Advanced Biostatistics dr. P. Nazarov petr.nazarov@crp-sante.lu 2-12-2012 1. Descriptive Statistics edu.sablab.net/abs2013 1 Outline Lecture 0. Introduction to R - continuation Data import

More information

The following formulas related to this topic are provided on the formula sheet:

The following formulas related to this topic are provided on the formula sheet: Student Notes Prep Session Topic: Exploring Content The AP Statistics topic outline contains a long list of items in the category titled Exploring Data. Section D topics will be reviewed in this session.

More information

COMPOSITE AND INVERSE FUNCTIONS & PIECEWISE FUNCTIONS

COMPOSITE AND INVERSE FUNCTIONS & PIECEWISE FUNCTIONS Functions Modeling Change: A Preparation or Calculus, 4th Edition, 2011, Connally 2.4 COMPOSITE AND INVERSE FUNCTIONS & PIECEWISE FUNCTIONS Functions Modeling Change: A Preparation or Calculus, 4th Edition,

More information

Probability Distribution

Probability Distribution Economic Risk and Decision Analysis for Oil and Gas Industry CE81.98 School of Engineering and Technology Asian Institute of Technology January Semester Presented by Dr. Thitisak Boonpramote Department

More information

Everything is not normal

Everything is not normal Everything is not normal According to the dictionary, one thing is considered normal when it s in its natural state or conforms to standards set in advance. And this is its normal meaning. But, like many

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Derivatives of Trig and Inverse Trig Functions

Derivatives of Trig and Inverse Trig Functions Derivatives of Trig and Inverse Trig Functions Math 102 Section 102 Mingfeng Qiu Nov. 28, 2018 Office hours I m planning to have additional office hours next week. Next Monday (Dec 3), which time works

More information

DISTANCE, RATE, AND TIME 7.1.1

DISTANCE, RATE, AND TIME 7.1.1 DISTANCE, RATE, AND TIME 7.1.1 Distance (d) equals the product of the rate of speed (r) and the time (t). This relationship is shown below in three forms: d = r!t!!!!!!!!!r = d t!!!!!!!!!t = d r It is

More information

Math 4 Review for Quarter 1 Cumulative Test

Math 4 Review for Quarter 1 Cumulative Test Math 4 Review for Quarter 1 Cumulative Test Name: I. Unit Conversion Units are important in describing the world around us To convert between units: o Method 1: Multiplication/Division Converting to a

More information

Inference for the mean of a population. Testing hypotheses about a single mean (the one sample t-test). The sign test for matched pairs

Inference for the mean of a population. Testing hypotheses about a single mean (the one sample t-test). The sign test for matched pairs Stat 528 (Autumn 2008) Inference for the mean of a population (One sample t procedures) Reading: Section 7.1. Inference for the mean of a population. The t distribution for a normal population. Small sample

More information

Math 147 Lecture Notes: Lecture 12

Math 147 Lecture Notes: Lecture 12 Math 147 Lecture Notes: Lecture 12 Walter Carlip February, 2018 All generalizations are false, including this one.. Samuel Clemens (aka Mark Twain) (1835-1910) Figures don t lie, but liars do figure. Samuel

More information

Continuous Random Variables

Continuous Random Variables MATH 38 Continuous Random Variables Dr. Neal, WKU Throughout, let Ω be a sample space with a defined probability measure P. Definition. A continuous random variable is a real-valued function X defined

More information

Volume vs. Diameter. Teacher Lab Discussion. Overview. Picture, Data Table, and Graph

Volume vs. Diameter. Teacher Lab Discussion. Overview. Picture, Data Table, and Graph 5 6 7 Middle olume Length/olume vs. Diameter, Investigation page 1 of olume vs. Diameter Teacher Lab Discussion Overview Figure 1 In this experiment we investigate the relationship between the diameter

More information

You may use your calculator and a single page of notes. The room is crowded. Please be careful to look only at your own exam.

You may use your calculator and a single page of notes. The room is crowded. Please be careful to look only at your own exam. LAST NAME (Please Print): KEY FIRST NAME (Please Print): HONOR PLEDGE (Please Sign): Statistics 111 Midterm 1 This is a closed book exam. You may use your calculator and a single page of notes. The room

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

Week In Review #8 Covers sections: 5.1, 5.2, 5.3 and 5.4. Things you must know

Week In Review #8 Covers sections: 5.1, 5.2, 5.3 and 5.4. Things you must know Week In Review #8 Covers sections: 5.1, 5.2, 5.3 and 5. Things you must know Know how to get an accumulated change by finding an upper or a lower estimate value Know how to approximate a definite integral

More information

Intuitive Biostatistics: Choosing a statistical test

Intuitive Biostatistics: Choosing a statistical test pagina 1 van 5 < BACK Intuitive Biostatistics: Choosing a statistical This is chapter 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc.

More information

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y. Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a

More information

Quiz For use after Section 4.2

Quiz For use after Section 4.2 Name Date Quiz For use after Section.2 Write the word sentence as an inequality. 1. A number b subtracted from 9.8 is greater than. 2. The quotient of a number y and 3.6 is less than 6.5. Tell whether

More information

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Final Exam Review - Math 2412 Fall 2013 Name SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Find the term indicated in the expansion. 1) (x + 3y)11;

More information

AMS 5 NUMERICAL DESCRIPTIVE METHODS

AMS 5 NUMERICAL DESCRIPTIVE METHODS AMS 5 NUMERICAL DESCRIPTIVE METHODS Introduction A histogram provides a graphical description of the distribution of a sample of data. If we want to summarize the properties of such a distribution we can

More information

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

Solutions to Intermediate and College Algebra by Rhodes

Solutions to Intermediate and College Algebra by Rhodes Solutions to Intermediate and College Algebra by Rhodes Section 1.1 1. 20 2. -21 3. 105 4. -5 5. 18 6. -3 7. 65/2 = 32.5 8. -36 9. 539 208 2.591 10. 13/3 11. 81 12. 60 = 2 15 7.746 13. -2 14. -1/3 15.

More information

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions ACMS 20340 Statistics for Life Sciences Chapter 13: Sampling Distributions Sampling We use information from a sample to infer something about a population. When using random samples and randomized experiments,

More information

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,

More information

Important: This test consists of 16 multiple choice problems, each worth 6.25 points.

Important: This test consists of 16 multiple choice problems, each worth 6.25 points. Physics 214 Exam 1 Spring 2005 Fill in on the OPSCAN sheet: 1) Name 2) Student identification number 3) Exam number as 01 4) Sign the OPSCAN sheet Important: This test consists of 16 multiple choice problems,

More information

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus

More information

Four Types of Motion We ll Study

Four Types of Motion We ll Study Four Types of Motion We ll Study The branch of mechanics that studies the motion of a body without caring about what caused the motion. Kinematics definitions Kinematics branch of physics; study of motion

More information

Conversion Factors COMMONLY USED CONVERSION FACTORS. Multiply By To Obtain

Conversion Factors COMMONLY USED CONVERSION FACTORS. Multiply By To Obtain Conversion Factors COMMONLY USED CONVERSION FACTORS Multiply By To Obtain Acres................. 43,560 Square feet Acres................. 1.56 x 10-3 Square miles Acre-Feet............. 43,560 Cubic feet

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

Normalizing the I Control Chart

Normalizing the I Control Chart Percent of Count Trade Deficit Normalizing the I Control Chart Dr. Wayne Taylor 80 Chart of Count 30 70 60 50 40 18 30 T E 20 10 0 D A C B E Defect Type Percent within all data. Version: September 30,

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Chapter 6: The Definite Integral

Chapter 6: The Definite Integral Name: Date: Period: AP Calc AB Mr. Mellina Chapter 6: The Definite Integral v v Sections: v 6.1 Estimating with Finite Sums v 6.5 Trapezoidal Rule v 6.2 Definite Integrals 6.3 Definite Integrals and Antiderivatives

More information

Turning a research question into a statistical question.

Turning a research question into a statistical question. Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE

More information

2. A stock dropped 36% in value. By what percentage must the stock increase to regain its value? 2 is four times as far from 3. 7 d.

2. A stock dropped 36% in value. By what percentage must the stock increase to regain its value? 2 is four times as far from 3. 7 d. March, 010 1. The tallest man-made structure in the world is the Burj Khalifa in Dubai, UAE. Its elevator is very fast taking less than a minute to reach the observation deck on the 1 th floor which is

More information

Correlation and Regression Bangkok, 14-18, Sept. 2015

Correlation and Regression Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Correlation The strength

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

2.4 Slope and Rate of Change

2.4 Slope and Rate of Change 2.4 Slope and Rate of Change Learning Objectives Find positive and negative slopes. Recognize and find slopes for horizontal and vertical lines. Understand rates of change. Interpret graphs and compare

More information

ISQS 5349 Final Exam, Spring 2017.

ISQS 5349 Final Exam, Spring 2017. ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC

More information

Statistics Handbook. All statistical tables were computed by the author.

Statistics Handbook. All statistical tables were computed by the author. Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm

More information