Statistics Chapter 4

Similar documents
Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Economics & Business

Economics 130. Lecture 4 Simple Linear Regression Continued

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

U-Pb Geochronology Practical: Background

Chapter 11: Simple Linear Regression and Correlation

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

x = , so that calculated

Chapter 13: Multiple Regression

Statistics II Final Exam 26/6/18

Comparison of Regression Lines

/ n ) are compared. The logic is: if the two

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Chemometrics. Unit 2: Regression Analysis

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Chapter 14 Simple Linear Regression

Lecture 4 Hypothesis Testing

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

28. SIMPLE LINEAR REGRESSION III

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Statistics MINITAB - Lab 2

Lecture 6: Introduction to Linear Regression

18. SIMPLE LINEAR REGRESSION III

Basic Business Statistics, 10/e

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

First Year Examination Department of Statistics, University of Florida

Learning Objectives for Chapter 11

STAT 511 FINAL EXAM NAME Spring 2001

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

STATISTICS QUESTIONS. Step by Step Solutions.

Statistics for Business and Economics

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

STAT 3008 Applied Regression Analysis

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Linear Regression Analysis: Terminology and Notation

Lecture 15 Statistical Analysis in Biomaterials Research

Introduction to Regression

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

The SAS program I used to obtain the analyses for my answers is given below.

Statistical Evaluation of WATFLOOD

NUMERICAL DIFFERENTIATION

Sampling Theory MODULE V LECTURE - 17 RATIO AND PRODUCT METHODS OF ESTIMATION

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Analytical Chemistry Calibration Curve Handout

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Negative Binomial Regression

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

ANOVA. The Observations y ij

Topic- 11 The Analysis of Variance

x i1 =1 for all i (the constant ).

Basic Statistical Analysis and Yield Calculations

Laboratory 3: Method of Least Squares

Econometrics of Panel Data

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

January Examinations 2015

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Laboratory 1c: Method of Least Squares

Measurement Uncertainties Reference

Statistics Spring MIT Department of Nuclear Engineering

Chapter 9: Statistical Inference and the Relationship between Two Variables

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Cathy Walker March 5, 2010

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Methods of Detecting Outliers in A Regression Analysis Model.

Polynomial Regression Models

SIMPLE LINEAR REGRESSION

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

A Robust Method for Calculating the Correlation Coefficient

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

e i is a random error

Uncertainty in measurements of power and energy on power networks

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Lecture 3 Stat102, Spring 2007

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Hydrological statistics. Hydrological statistics and extremes

Properties of Least Squares

Some basic statistics and curve fitting techniques

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

a. (All your answers should be in the letter!

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

CHAPTER 8. Exercise Solutions

Transcription:

Statstcs Chapter 4 "There are three knds of les: les, damned les, and statstcs." Benjamn Dsrael, 1895 (Brtsh statesman) Gaussan Dstrbuton, 4-1 If a measurement s repeated many tmes a statstcal treatment of the data can provde an ndcaton of the relablty of the results. If the errors assocated wth the measurement are completely random then the Central Lmt Theorem assures that the data wll follow a mathematcal form called a Gaussan dstrbuton (bell-shaped curve). In the lmt of an nfnte number of measurements the hstogram below becomes the populaton dstrbuton denoted by the sold lne. The hstogram on the rght was drawn to have the same mean, standard devaton, and area as the smooth curve. Ths s not true for a fnte number of measurements. Only n the nfnte lmt wll they bethe same. The populaton dstrbuton (nfnte n) scharacterzed by a populaton mean (average - center of the symmetrc dstrbuton), µ, and a populaton standard devaton (measure of the wdth of the dstrbuton), σ. Another useful measure s the square of the standard devaton, σ 2,known as the varance. µ = lm 1 n n n Σ x =1 σ 2 = lm 1 n n n Σ(x µ) 2 =1 AGaussan curve, y Gaussan can be expressed n terms of these varables. y Gaussan = (x µ) 1 2 σ 2π e 2σ 2

Obtanng the Probablty -2- As all measurements contan expermental error no result s completely certan. However, by a judcous use of statstcs one can assocate a probablty wth the result. To convert the Gaussan dstrbuton nto a probablty densty a new varable z s ntroduced defned as z = (x µ) /σ. Then the Gaussan curve becomes the normal Gaussan error curve or the z dstrbuton y = f (z) = f (0) = f (1) = 1 2π e z2 /2 1 2π 1 2π e = 0. 3989 = 0. 2419 (x = µ ± σ are nflecton ponts) Integraton of the populaton dstrbuton between two lmts as from x = µ zσ to x = µ + zσ gves the probablty of obtanng a value of x between these lmts. The ntegral lmts that gve 95% of the populaton dstrbuton area (a probablty of 0.95) are 95% confdence lmts. The symmetrc Gaussan dstrbuton requres that these lmts be equdstant from µ. The ntegral of agaussan between fnte lmts s not analytc and the area under the curve sgven ntables. (0z0 below s really the absolute value of z, z ) The sample data s not nfnte! How can one characterze ts relablty? Frst one needs to obtan the mean and standard devaton for fnte data. The fnte dstrbuton s characterzed by a sample mean (sample average), < x >, and a sample standard devaton, s. Agan, another useful measure s the square of the standard devaton s 2 known as the sample varance. The quantty n 1below scalled the degrees of freedom. < x > = Σ x n Σ(x < x >) 2 s = n 1

-3- EX 1. For the data on the bulbs gven nthe hstogram where < x > = 45. 2 hr and s = 94. 2 hr a) What fracton of bulbs s expected to have a lfetme greater than 1005.3 hr? b) What fracton of bulbs s expected to have a lfetme between 798.1 and 901.7 hr? Comparson of Standard Devatons wth F Test, 4-2 To examne whether two standard devatons are statstcally dfferent determne F calculated = s 2 1/s 2 2 where F 1. If F calculated > F table the dfference s sgnfcant and the two measurements are statstcally dfferent. Hypothess testng based upon assumng that the null hypothess s true at a certan level of probablty, generally chosen to be 5%. The hypothess s accepted f the probablty for t beng true s > 5% and rejected f the probablty for t beng true s < 5%. Null hypothess for the F test: two sets of measurements taken from populatons wth the same populaton standard devaton; all dfferences arse from only random varatons n measurement accept: F calculated < F table => standard devatons are not statstcally dfferent reject: F calculated > F table => standard devatons are statstcally dfferent

-4- Confdence Intervals, 4-3 (nferences based on small samples) From a lmted number of measurements one wants an estmate of the uncertanty of the measurement. One can use a confdence nterval confdence nterval = < x > ± ts n whch mples that the true populaton mean wll be found wthn a range of st/ n of the sample mean wth a confdence level (level of certanty) specfed by the partcular t chosen f one were to repeat the n measurements many tmes. Then a 95% confdence nterval would nclude the true populaton mean n 95% of these sets of n measurements. Note that ths mples that the uncertanty n the sample mean s reduced by more measurements by a factor of 1/ n. Null hypothess for the t test: two sets of measurements taken from populatons wth the same mean; all dfferences arse from only random varatons n measurement accept: t calculated < t table => means are not statstcally dfferent reject: t calculated > t table => means are statstcally dfferent EX 2. The percentage of an addtve n gasolne was measured sx tmes wth the followng results: 0.13, 0.12, 0.16, 0.17, 0.20 and 0.11%. Fnd the 90% and 99% confdence ntervals for the percentage of the addtve.

Comparson of Means wth Student s t, 4-4 (always use 95% confdence ntervals) -5- Case 1: Comparson to a Known or Standard Value EX 3. AStandard Reference Materal s certfed to contan 94.6 ppm of an organc contamnant n sol. You analyze the reference compound fve tmes obtanng < x > = 97. 00, s = 1. 66. Do your results dffer from the expected result at the 95% confdence level? Case 2: Comparson of Replcate Measurements (apply F test frst) For ths comparson one frst obtans a pooled standard devaton then uses t to calculate a value of t whch s compared wth t n Student s t. Ifthe calculated t s greater than the 95% confdence level value of t the two replcates are consdered dfferent. For the two sets of data wth n 1 and n 2 measurements wth means < x 1 >and < x 2 >and standard devatons s 1 and s 2 F calculated < F table F calculated > F table t calculated = <x 1 > < x 2 > s pooled n 1n 2 s s pooled = 2 1 (n 1 1) + s 2 2 (n 2 1) n 1 + n 2 2 <x 1 > < x 2 > n 1 + n 2 s 2 1 /n 1 + s 2 2 /n 2 degrees of freedom = (s 2 1/n 1 + s 2 2/n 2 ) 2 (s 2 1 /n 1) 2 n 1 1 + (s2 2 /n 2) 2 n 2 1 EX 4. Atranee n a medcal lab wll be released to work on her own when her results agree wth those of an experenced worker at the 95% confdence level. Consderng the results for blood urea ntrogen analyss gven below, should the tranee be released to work alone? tranee < x > = 14. 5 7 mg/dl s = 0. 5 3 mg/dl n = 6samples experenced worker < x > = 13. 9 5 mg/dl s = 0. 4 2 mg/dl n = 5samples

Case 3: Comparson of Indvdual Dfferences wth Pared t Test -6- Each sample s measured once by each method and the dfferences d,average value of the dfferences < d >, and standard devaton of the dfferences s d determned n order to calculate t. s d = Σ(d < d >) 2 n 1 t calculated = <d > s d n EX 5. The T content (wt%) of fve dfferent ore samples (each wth dfferent T content) was measured by each of two methods. Do the two technques gve results that are sgnfcantly dfferent at the 95% confdence level? Sample Method 1 Method 2 d d < d > (d < d >) 2 A 0.0134 0.0135-0.0001 +0.0006 3. 6 10 7 B 0.0144 0.0156-0.0012-0.0005 2. 5 10 7 C 0.0126 0.0137-0.0011-0.0004 1. 6 10 7 D 0.0125 0.0137-0.0012-0.0005 2. 5 10 7 E 0.0137 0.0136 +0.0001-0.0008 6. 4 10 7 Grubbs Test for Outlers, 4-6 (use 95% confdence) To determne whether a partcular data pont can be excluded based upon ts questonable veracty, form the Grubbs statstc, G G calculated = x questonable < x > s If G calculated > G table then the pont can be excluded wth the chosen confdence level (here 95%). The mean and standard devaton wll need to be recalculated. Hnt: generally do not exclude a data pont unless you are certan that an error occurred n ts measurement. Never exclude more than one pont. Always use a value of G of at least a 95% confdence level. NOTE: For the F, t, and G statstcs f the calculated value s less than the table value the null hypothess s true, you do nothng!

-7- Method of Lnear Least Squares, 4-7 For a set of n data ponts (x, y ) one wants to fnd the "best" straght lne through the data: y y=mx+b Each y devates from the lne d = y y = y (mx + b) where y s the value when x = x.tomnmze the devatons from lnearty rrespectve of ther sgn one consders the square of the devatons x d 2 = (y mx b) 2 = y 2 2mx y + m 2 x 2 + 2mbx 2by + b 2 In the method of least squares one mnmzes the sum of the squares of all the devatons: SSE = Σ d 2 = Σ y 2 2m Σ x y + m 2 Σ x 2 + 2mb Σ x 2b Σ y + The values of m and b are found whch mnmze SSE m SSE = b Σ b 2 b SSE = m

-8- Lnear Regresson Equatons equaton n Harrs varant (n sample spreadsheet) slope, m (4-16) n Σ x y Σ x Σ y n Σ x 2 (Σ x ) 2 Σ(x < x >)(y < y >) Σ(x < x >) 2 y-ntercept, b (4-17) Σ x 2 Σ y Σ x Σ x y n Σ x 2 (Σ x ) 2 Σ y m Σ x n varance of the regresson, s 2 y (standard error) (4-20) Σ(y mx b) 2 n 2 varance of the slope, s 2 m (standard error) (4-21) n n 2 Σ(y mx b) 2 n Σ x 2 (Σ x ) 2 Σ(y mx b) 2 (n 2)Σ(x < x >) 2 varance of the ntercept, s 2 b (standard error) (4-22) Σ(y mx b) 2 Σ x 2 (n 2)[n Σ x 2 (Σ x ) 2 ] Σ x 2 Σ(y mx b) 2 n(n 2)Σ(x < x >) 2 correlaton coeffcent, R (5-2) Σ(x < x >)(y < y >) Σ x y Σ x Σ y /n Σ(x < x >) 2 Σ(y < y >) 2 Σ(x < x >) 2 Σ(y < y >) 2 Abbrevatons used n spreadsheet n addton to SSE (smlar expressons n y) Sx = Σ x SSx = Σ x 2 SSDx = Σ(x < x >) 2 Sxy = Σ x y SDxSDy = Σ(x < x >)(y < y >)

-9- Calbraton Curves, 4-8 standard solutons blank solutons questonable: (0.392) NOTE: Propagaton of uncertanty for a calbraton curve follows Eq. 4-27