t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Similar documents
Topic 9: Sampling Distributions of Estimators

Stat 200 -Testing Summary Page 1

Properties and Hypothesis Testing

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

This is an introductory course in Analysis of Variance and Design of Experiments.

Statistics 511 Additional Materials

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

1 Inferential Methods for Correlation and Regression Analysis

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Chapter 6 Sampling Distributions

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Common Large/Small Sample Tests 1/55

Power and Type II Error

Final Examination Solutions 17/6/2010

NCSS Statistical Software. Tolerance Intervals

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Regression, Inference, and Model Building

Sample Size Determination (Two or More Samples)

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

The standard deviation of the mean

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

11 Correlation and Regression

Chapter 13, Part A Analysis of Variance and Experimental Design

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Stat 319 Theory of Statistics (2) Exercises

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Lecture 2: Monte Carlo Simulation

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

MATH/STAT 352: Lecture 15

Expectation and Variance of a random variable

Stat 421-SP2012 Interval Estimation Section

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Topic 10: Introduction to Estimation

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Data Analysis and Statistical Methods Statistics 651

Module 1 Fundamentals in statistics

Linear Regression Models

(7 One- and Two-Sample Estimation Problem )

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

A statistical method to determine sample size to estimate characteristic value of soil parameters

Random Variables, Sampling and Estimation

Efficient GMM LECTURE 12 GMM II

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

STAT431 Review. X = n. n )

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Statistical inference: example 1. Inferential Statistics

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

An Introduction to Randomized Algorithms

Sampling Distributions, Z-Tests, Power

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Computing Confidence Intervals for Sample Data

Lecture 7: Properties of Random Samples

Describing the Relation between Two Variables

Chapter 8: Estimating with Confidence

Section 14. Simple linear regression.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Comparing your lab results with the others by one-way ANOVA


Read through these prior to coming to the test and follow them when you take your test.

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Mathematical Notation Math Introduction to Applied Statistics

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Simple Linear Regression

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Confidence Intervals for the Population Proportion p

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Confidence Intervals

Transcription:

EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The variace is a measure of the dispersio, or spread, of the members of a populatio. The populatio variace is deoted σ ad its square root (s ) is called the stadard deviatio. ( i ) i= Stadard deviatio [0] for a sample is calculated as s =, the square root of the variace. ( ) Sample meas have there ow variace ad stadard deviatio [33], which depeds o the sample size. The stadard deviatio of the meas is called the stadard error ad it is calculated as the sample stadard deviatio divided by the square root of. s = s. Parameter (populatio summary umber, deoted by Greek letters) versus Statistic (sample summary umber). All hypotheses are stated i terms of populatio parameters. Idea behid a statistical test All statistical tests deped o some statistic followig a kow statistical distributio. The more commo distributios used i statistical test are the Z, t, Chi square (χ²) ad F distributio. Z distributio [34] : used to test a mea agaist a hypothesized value (H 0 : µ = µ 0 ) or the differece betwee two meas agaist a hypothesized value (H 0 : µ µ = δ), where δ is ofte 0. Use of the Z distributio is appropriate whe the variace (σ ) is kow or the sample size is very large. The Z distributio is called the stadard ormal distributio. Chi square distributio [559] : used to test a variace (σ ) agaist a hypothesized value (H 0 : σ = σ 0 ). The variace (σ ) is estimated from a sample (S ). t distributio [34] : used to test a mea agaist a hypothesized value (H 0 : µ = µ 0 ) or the differece betwee two meas agaist a hypothesized value (H 0 : µ µ = δ), where δ is ofte 0. Use of the t distributio is appropriate for ay sample size ad whe the variace (σ ) is ukow ad estimated from the sample (S ). The t-distributio is the ratio of a ormal distributio ( i the umerator) ad a chi square distributio ( S i the deomiator). F distributio [5] : used to test two variace estimates (S ad S estimatig σ ad σ ) for equality (H 0 : σ = σ ).

EXST30 Backgroud material Page For a test to be valid the distributio must hold uder the ull hypothesis. The distributio will ot be the same if the ull hypothesis is ot true, but this is ot importat for the test of hypothesis. µ ( 0 ) δ For the t distributio [44] the test statistic is either t = or t =. If the ull hypothesis S S is true, the the t test statistic should be oe of the ormal bell shaped curves of the t distributio (there is oe for each possible degrees of freedom). The t-test statistic is calculated ad values that would be uusual uder the ull hypothesis would cause rejectio of the ull hypothesis. Commoly, t values that would occur oly 5% of the time uder the ull hypothesis are cosidered uusual. There are some assumptios that also must hold for this to be true. The assumptios are () the variable of iterest ( i ) is ormally distributed ad () that the error term is idepedet. ( ) δ For the secod case that tests betwee two meas, t =, if we calculate a combied variace S from the two variaces (oe for each mea) the there is a additioal assumptio, (3) that the variaces are the same. s s Also for the secod case the variace [39] is S = +. However, the degrees of freedom for this variace are difficult to kow uless the variaces are equal ad ca be pooled. So, we first test the variaces for equality usig the F test of equal variaces. If they are equal we pool the ( ) s + ( ) s variaces, weightig by the degrees of freedom. The calculatio is S p =. ( ) + ( ) Usig the pooled variace the calculatio for the stadard error is = +. S s I this process we have made several assumptios. First, we have assumed that the data are ormally distributed. This assumptio is eeded to use the t-test, which is based o a ormal distributio. The secod assumptio is that the variaces are homogeeous. If the two variaces are ot the same the we should ot pool them ito a sigle estimate. Cetral limit theorem [33, 60]: states that if the summed variables have a fiite variace the they will be approximately ormally distributed. May real processes do i fact have distributios with fiite variace which results i the commo use of the ormal distributio. P values [3, 46] are simply the probability of observig a give value, or larger value, of a statistic. For example, if we calculate a t value ad it is.7, what is the probability that we get a value of.7 or larger for a t test? For a two-tailed t test, what is the probability that we get a absolute value of.7 or larger for a t test? To obtai this probability refer to tables of the t distributio.

EXST30 Backgroud material Page 3 Cofidece itervals [45] are itervals that will iclude the true populatio parameter 00(-α)% of the time. For ormally distributed variables like meas we use the t distributio to costruct cofidece itervals. Variaces follow a Chi square distributio, so cofidece itervals for variaces are calculated differetly. For a mea, or a mea differece, the cofidece iterval is calculated as the parameter estimate ± t α/ * the stadard error of the estimate. For a mea the parameter estimate is ad the stadard error is s. The cofidece iterval is give by t s P t s µ + t s = α. ± ad is usually expressed as ( α α ) α / / / For the differece betwee two meas the parameter estimate is ad the stadard error is s = s p +, where s p is the pooled variace estimate. The cofidece iterval is give by ( ) tα /s P ( ) tα/s µ ( ) + t /s α = α. ± ad is expressed as ( ) Similar calculatios are used to calculate cofidece itervals for treatmet meas [45] i Aalysis of Variace, differece betwee treatmet meas [45] i ANOVA ad for cofidece itervals o regressio coefficiets (itercepts ad slopes) for liear regressio [86]. All of these are usually ormally distributed ad the cofidece iterval calculated as parameter estimate ± t α/ * stadard error of the estimate.

A Itroductio to SAS Programmig SAS programs cosists of two major type of steps The DATA step used to create or modify a SAS dataset [Cotets > SAS Products > Base SAS > SAS Laguage cocepts > Data Step Cocepts] SAS dataset a file cotaiig a collectio of similar iformatio Ca be visualized as a two-dimesioal array (table) that looks like spreadsheet (e.g. like EXCEL) Observatio each row represets the iformatio for a sigle item Variable each colum cotais oe type of item I additio to data values, variable ames ad types, legths, labels ad formats are stored i this file Source of data for the data step Covert a raw data file to a SAS data set. This ca be either stored i a separate file or icluded i the SAS program itself The SAS datasets are iitially created from some source of data. The SAS dataset, oce created, ca be stored as a permaet file or recreated each time SAS is ru o that dataset. Whe the SAS dataset is created we ca assig formats ad labels Whe the SAS dataset is created we ca perform modificatios to the data, trasformatios ad calculatios The PROC step (PROC is from PROCedure) used to process SAS data sets Allows File maipulatio - SORTS Report preparatio PRINT ad tabulatio procedures Aalysis MEANS, FREQ, various statistical aalyses Graphics CHART, PLOT, TIMEPLOT Utilities CONTENTS, DATASETS, FORMAT, file import ad compare utilities SAS Display Maager (i SAS help see Usig SAS Software i our Operatig Eviromet > Usig SAS i Widows > Ruig programs i the SAS Widowig eviromet ). Here there are descriptios of the SAS iterface, icludig graphics of SAS widows ad meu optios. Program editor widow (= Editor) - type, edit, save ad submit SAS programs Log widow (= Log) Displays the SAS log (otes ad messages produced whe programs ru) Output widow (= Output) Displays output from SAS program rus

Geeral rules about SAS programs SAS statemets ca begi o ay lie. SAS statemets ca be cotiued o aother lie as log as o word is split. More tha oe statemet ca be writte o a sigle lie. At least oe blak must separate each word or item i a SAS statemet, except for mathematical operators (e.g. +, -, *, /, =). Each SAS statemet must ed with a semicolo, ;. I PC SAS it is a good practice to ed each DATA step or PROC step with a RUN; statemet. Statemets surrouded by a pair of /* to start ad */ to ed ca be icluded aywhere i a SAS statemet that a blak would appear. The eclosed sectio is a commet. This ca also be used to tur ay segmet of a SAS program ito a iactive commet sectio. The order of may statemets is ot importat. This is true i both DATA steps ad PROC steps. There are some logical exceptios. For example, you caot process data util after a INPUT statemet. Itroductio to the SAS DATA step creatig a SAS data set from raw data. The DATA step starts with the word DATA followed by a data-set-ame. There are some limits o what ca be used as the data-set-ame. The SAS help (9..3) states that A SAS ame ca be up to eight characters log. The first character must be a letter (A,B,C,...,Z) or uderscore (_). Subsequet characters ca be letters, umbers (0 to 9), or uderscores. Note that o blaks are allowed. Two ames (_N_ ad _ERROR_) are reserved by the SAS System. Names loger tha eight characters are acceptable i SAS 9..3, up to 3 characters.

Legths of variables ad ames i SAS Maximum Legth of SAS Names SAS Applicatio Max Legth Arrays 3 CALL routies 6 Catalog etries 3 DATA step statemet labels 3 DATA step variable labels 56 DATA step variables 3 DATA step widows 3 Egies 8 Filerefs 8 Formats, character 3 Formats, umeric 3 Fuctios 6 Geeratio data sets 8 Iformats, character 30 Iformats, umeric 3 Librefs 8 Macro variables 3 Macro widows 3 Macros 3 Members of SAS data libraries (SAS data sets, views, 3 catalogs, idexes) except for geeratio data sets Passwords 8 Procedure ames (first 8 characters must be uique, ad may 6 ot begi with "SAS") SCL variables 3