Statistics Independent (X) you can choose and manipulate. Usually on x-axis

Similar documents
Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter 2 Descriptive Statistics

Random Variables, Sampling and Estimation

Describing the Relation between Two Variables

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Chapter 1 (Definitions)

1 Inferential Methods for Correlation and Regression Analysis

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Formulas and Tables for Gerstman

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Properties and Hypothesis Testing

Computing Confidence Intervals for Sample Data

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Final Examination Solutions 17/6/2010

Chapter 6 Sampling Distributions

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Regression, Inference, and Model Building

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Stat 139 Homework 7 Solutions, Fall 2015

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Module 1 Fundamentals in statistics

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Expectation and Variance of a random variable

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Formulas FROM LECTURE 01 TO 22 W X. d n. fx f. Arslan Latif (mt ) & Mohsin Ali (mc ) Mean: Weighted Mean: Mean Deviation: Ungroup Data

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Read through these prior to coming to the test and follow them when you take your test.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

Tables and Formulas for Sullivan, Fundamentals of Statistics, 2e Pearson Education, Inc.

Sampling Distributions, Z-Tests, Power

MA238 Assignment 4 Solutions (part a)

Topic 10: Introduction to Estimation

Probability and statistics: basic terms

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Sampling, Sampling Distribution and Normality

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Sample Size Determination (Two or More Samples)

Chapter 23: Inferences About Means

DAWSON COLLEGE DEPARTMENT OF MATHEMATICS 201-BZS-05 PROBABILITY AND STATISTICS FALL 2015 FINAL EXAM

Rule of probability. Let A and B be two events (sets of elementary events). 11. If P (AB) = P (A)P (B), then A and B are independent.

Topic 9: Sampling Distributions of Estimators

Estimation for Complete Data

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Introduction to Probability and Statistics Twelfth Edition

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Math 140 Introductory Statistics

Common Large/Small Sample Tests 1/55

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

MEASURES OF DISPERSION (VARIABILITY)

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Statistics 511 Additional Materials

Stat 200 -Testing Summary Page 1

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Confounding: two variables are confounded when the effects of an RV cannot be distinguished. When describing data: describe center, spread, and shape.

1036: Probability & Statistics

Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS

Statisticians use the word population to refer the total number of (potential) observations under consideration

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

MATH/STAT 352: Lecture 15

GG313 GEOLOGICAL DATA ANALYSIS

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Exam 2 Instructions not multiple versions

IE 230 Probability & Statistics in Engineering I. Closed book and notes. No calculators. 120 minutes.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Mathematical Notation Math Introduction to Applied Statistics

Lecture 24 Floods and flood frequency

Biostatistics for Med Students. Lecture 2

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

11 Correlation and Regression

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

(7 One- and Two-Sample Estimation Problem )

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Transcription:

Statistics-6000 Variable: are characteristic that ca take o differet values with respect to persos, time, ad place ad types of variables are as follow: Idepedet (X) you ca choose ad maipulate. Usually o x-axis Depedet (Y) is what you measure i the experimet ad what is affected durig the experimet. Usually o y-axis Itermediate is a variable i a causal pathway that causes variatio i the depedet variable ad is itself caused to vary by the idepedet variable Cofouder is a extraeous variable i a statistical model that correlates (positively or egatively) with both the depedet variable ad the idepedet variable. The methodologies of scietific studies therefore eed to accout for these variables - either through true experimetal desigs, i which case, oe achieves cotrol, or through statistical meas. (Iteral Validity) Discrete Variable: This is a whole umber ad coutable variable. Ordial, Rakig Type or Nomial Classificatory Categorical Type. (qualitative variable) Cotiuous or measurable variable: variables have o gaps betwee them. Have decimal poits ad uits. (Quatitative variable) Why statistic i geeral: collectio of data, summarizatio ad aalyzig of data set, evaluatio, coduct a research ad fially makig coclusio (Testig hypothesis) Specific goal of statistic: defie a ormal rage (μ ad σ), correlatio study (relatioship), regressio study (predictio), associatio (Qui-Square Test) & agreemet testig (Croblach Alpha & Kappa Cohe Correlatio), testig hypothesis (z,t,f) ad quality cotrol (L G Chart) Sample (): small radom group of idividuals or observatios that is chose for study from populatio. Sample is a part of populatio. Radom sample: is the selectio of the sample such that every member from the populatio has a equal chace of beig icluded i the sample

Samplig uit: A part from populatio, a idividual, household, school, sectio, village Samplig frame: a complete list of samplig uits i the populatio Why we eed sample study: o Less time o Less persoel o Less resources o Less moey o For i-depth study Sample size: the umber of idividuals or observatios uder study. ( 30) Samplig methods: o Simple Radom Samplig: Each uit i this method has a equal probability of beig icluded i the sample. (Lottery sample) by usig tables of radom umbers. Is used whe there is homogeeity i the study elemets of the populatio. (N) is small o Stratified Samplig: The study elemets of populatio are heterogeeous. (N) is lager. (Stratum). Precisio (1/SE) of the estimate will be high (SE will be less) o Systematic Samplig (coveiece): (N) is very large. (K)=N/; is samplig iterval. Oe umber (X) is chose radomly from (1 to K). X+0K, X+1K, X+2K X+3K, X+ (-1) K are icluded i the sample. Precisio of the estimate will be less. o Cluster Samplig: (N) is large ad it s ot possible to get complete listig of the populatio uit. Precisio of the estimate will be less. o Multi Stage Samplig: (N) is very large. Samplig is doe i stages. Precisio of the estimate will be less. o Quota Samplig: (Samplig of Coveiece). () Is fixed ad ot probability samplig method. Not radomly selected. Results caot be geeralized but applicable to that area oly. Not good samplig method. Populatio (N): Aggregate of subjects uder cosideratio. Whole group is represetative Parameters (μ ad σ) Statistics ( ad SD or s) Statistical methods: descriptive method ad iferece method

Descriptive method: frequecy tables, diagrams, graphs (bar chart, pie chart, pictogram, histogram, frequecy polygo ad curves-lierity), arithmetic or geometric or weighted mea, media, mode, rage, quartile deviatio(iqr), mea deviatio, stadard deviatio(sd), coefficiet of variatio (CV%), correlatio coefficiet (r)-pearso Product Momet Correlatio, ad regressio aalysis used for predicatio. Iferece aalysis: used to geeralize the results, obtaied from the radom sample, for the populatio from which the represetative sample was selected. Two mai compoets of iferece method are: Estimatio of Parameters (populatio values) Testig the Statistical Sigificace of the Hypothesis Measure of locatio: mea, mode, ad media. They are oe sigle value to represet the distributio. Whe these values describe a populatio they called parameters. If the describe a sample the referred as statistic(s). Mea ( or µ) = x or x N مجموع القیم على العدد Media: is the middle most value of the arrage data set (cotiuous distributio). The value of it is ot affected by the extreme values ad therefore media is preferred to mea whe there are extreme values. Whe sample ot ormally distributed Mode: the most frequet observatio of data/distributio. Distributio may have more tha 1 mode. There are 2 types of data? Group data ad U-group data (very rich) Why we group the data? Groupig the actual data collected will lose erichmet of the data set from its actual values but some time we eed to hide the actual data from the public ad other competitors or for simplificatio of data we hadig large data set. f = or N ; total umber of frequecy = umber of observatios (sample size) Number of classes or groups eeded to make histogram: 2 k or N Class Iterval Size = MaximumMiimum ; this is icremet value that would be added k For group data arithmetic mea; = mf f, where (m = mid-value of class iterval)

Mid-value = (Lower limit:l1 + Upper limit:l2) 2; these L = real limits oly (x- ) = Zero, always Variace for a group data; (SD 2 or σ 2 ) = fm2 f 2 While computig arithmetic mea for a give grouped frequecy distributio, it is assumed that all values fallig i a particular group or class are located at the midpoit of the group. For group media= L 1 + L2L1 x N C, f = media frequecy, C=cumulative fre. f 2 Law of ext If the give class limits are score limits the covert them to real limits Last group of cumulative frequecy = N or or f For group mode = L 1 + L2L1 x (f f1) ; class with maximum frequecy 2ff1f2 Quartiles ad Percetiles: are the values i the cotiuous distributio showig the proportio/percetage of lyig below (or up to) the give value Q i = L 1 + L2L1 i x N x C; i = 1,2,3 (looks very likely to media formula) f 4 Iterquartile rage (IQR): reflects the variability amog the middle 50% of the observatio of the data. Better tha rage ( uses extreme values oly) Q 1 (25%) ad Q 2 (50%) ad Q 3 (75%) IQR = Q 3 Q 1 ; better tha rage = 75%-25%=50% P 50 = Q 2 = Media; of cotiuous data distributio Real times limits used for group data for: media, mode, quartiles, ad percetiles

P i = L 1 + L2L1 f x i x N 100 C; i = 1,2,3,.,99 (looks very likely to media formula) Rule of ext to locate the class iterval from cumulative frequecy distributio Measure of Variability = Rage, IQR, Variace, SD, ad Coefficiet of Variatio Measure of Variability = Scatter or dispersio of data aroud the mea Rage = Largest observatio Smallest observatio σ 2 = (Xμ)2 N or SD 2 = (x 1 )2 ; variace of ugroup data Group data σ 2 or SD 2 = x2 ( x)2 ; o eed for 1 (1) = fm f σ or SD = + σ 2 or SD 2 ; uit of SD is similar to observatio value CV = SD x 100 ; o uit its uitless quatity CV% is used to compare variatio betwee same sample variables or differet A evet = outcome Probability of (A) = is the proportio of times the outcomes would occur i a very log series of repetitios. (all evets are equally likely) P(A) = m (0 m ); whe () is exhaustive, mutually exclusive Equally likely trials of (m) is possible

Idepedet evets: two evets are said to be idepedet if the presece or absece of oe does ot alter the chaces of the other beig preset, or of the occurrece of oe does ot alter the chace of occurrece of the other. (meas that they ca occur together) Mutually exclusive evets: if they caot both occur together or be preset at the same time. No overlappig betwee the outcomes. Cois flippig head or tail Additive rule: mutually exclusive evets the probability of occurrece of 2 or more mutually exclusive evets is the sum of their probabilities of each outcome P (A or B) = P (A) + P (B) e.g. throwig die for odd umbers- mutually exclusive ev. Multiplicative rule: Idepedet evets probability of simultaeous occurrece of evets A ad B i a series of idepedet trails (i.e. chace of oe outcome occurrig is ot affected by kowledge of whether or ot the other occurred) is the product of their probabilities. P (A ad B) = P(A) x P(B) Idepedet evets Geeral additive rule: if the 2 evets are ot mutually exclusive, the the probability that either evet A or B occurs is: P(A or B or both) = P(A) + P(B) P(A & B) Discrete Probability Distributio (DPD): sum of p(x)s = 1, probability of each outcome is betwee 0-1, outcomes are mutually exclusive. μ= (x i p(x i )) ad σ 2 = ((x i μ) 2. p(x i )) ; for discrete probability distributio Coditioal probability: Joit probability: P(A B)= P(A) x P(B) = multiplicative rule Biomial Distributio: have two outcomes oly oe or zero. Its discrete distributio p(x) = C p q ; C is called biomial coefficiet. (0 x ) C =1 ad C = 1 ad 0! = 1 ad (p+q) = 1; p is the parameters ad is the degree of biomial distributio ad ad p is fixed, trails idepedet, 2 outcomes possible

Its applicatio whe populatio is dichotomized or divided ito 2 classes oly (p) is the probability of success ad (q) is the probability of failure. (p+q)=1 The mea of the biomial distributio (expected value) = p(x) = mea = p The variace of biomial distributio V(x) or σ 2 = p q; if.p.q 10 we ca use ormal distributio to approximate biomial At least to 10 = P(10 x ) = i the questios At most to 10 = P(0 x 10) = i the questios At least oe will retur: 1-p(x=0) i the biomial distributio = i the questios The Poisso distributio: discrete distributio, trails are idepedet, p is very small, is very large, evets are very rare. P(x) = x P(x) = eλ λ x x! ; x=0, 1, 2,.. λ (Aver.)=.p; is parameters (Mea = Variace) e=2.7183 Normal distributio: for cotiuous distributio, large umber of observatios, curve is bell-shaped, symmetrical about the mea, mea=mode=media, total area uder the curve = 1sqr uit ad it approximate the histogram (frequecy polygo). The mea of all possible sample mea is equal to the populatio mea, therefore sample mea is called ubiased estimatio of populatio.

Z (λ) µ±1sd = 0.6826 µ±2sd = 0.9544 Empirical rule=bell Curved-shaped µ±3sd = 0.9973 The degree of flatess or peakess of the curve is determied by the value of σ or SD Stadard Normal Distributio(Z): μ=0, σ 2 =1; σ = 1, Z or Z(λ)= Xμ λ = area uder the curve after trasformatio process. Z(λ) is poit o horizotal lie Estimatio of discrete sample size = = Z2 p q, Z = 1.96 (95% CI) or 2.58 (99% CI) or L2 3.29 (99.9%CI) L: is the permissible error o either side of the estimate (2L is the width of the iterval) If the permissible error o either side of the estimate is give i % L is calculate as ( # 100 x p); do pilot study to estimate p) The populatio proportio of the characteristic is expected to lie i the iterval (p 1 -L, p 2 +L) σ

Estimatio of cotiuous sample size = = Z2 SD 2 (99%CI) or 3.29 (99.9%CI), Z = 1.96 (95% CI) or 2.58 d2 If the permissible error o either side of the estimate is give i % d is calculate as ( # 100 x ) Whe 95% of cofidece iterval: ±1.96 (SE( )) = SD Whe 95% of cofidece iterval: p±1.96 (SE(p)) = p.q SD 2 = p q, Prevalece rate mea old ad ew cases together (Prevalece rate) V(p) = p.q SE ( )= SD the it follows that SE(p) = p.q for prevalece rate of the populatio SD: average amout of deviatio of differet sample values from the mea value SE: average amout of deviatio of differet meas (of differet samples) from the populatio mea Average Mea Deviatio = x Positive skew of the curve : mea > media ad the right side skewed (positive) Geometric mea = product of all % values or = value at ed value at begig 1 Weighted mea = (1x 1)(2 x 2) 12 A experiemet: the observatio of some activity or the act of takig some measuremet. (havig 3 childre) by 3 pregacies A outcome: particular result of a experimet. All the (BBB, BBG ) = 8 outcomes A evet: is the collectio (subset) of oe or more outcomes. E.g. Boy-Girl-Boy A, B, C if we wat 2 joits Combiatios (C r )=! - this is used i biomial probability: AB, BC, AC =3 (r)! r!

Permutatios (P r ) =! ; AB, AC, BA, BC, CA, CB = 6 (r)! Simple Radom Sample: each uit or item has a equal chace of beig selected Samplig error = a sample statistic populatio parameter We reject the ull hypothesis, P<0.05 for testig of sigificace t-distributio We accept the ull hypothesis, P>0.05 for testig of sigificace t-distributio P-value = α (5% or 1% or 0.1%) = rejectio area= tailed area V (X i ) = N N1 x σ2 = SE( ) Cetral Limit Theory: the mea of all possible samples mea is equal to the populatio mea. Therefore; sample mea is called ubiased estimatio of populatio mea. V(X) = N N1 σ2 if the populatio is fiite V(X) = σ2 if the populatio is ifiite (ulimited) = (SE)2 Chi-Square Test: x 2 = (OE)2 E ; (No of colum-1) (No of raw-1) =df If calculated value is greater tha tabular value the there is associatio

Oe-tailed t-test; H 0 =0 ad H 1 > 0 or H 1 < 0

P-value: Presumig H 0 is true, the likelihood of chace variatio yieldig a t-statistic more extreme tha -2.01 o either side of 0 (sice H 1 directio is both high ad low) is.11. Coclusio: Sice P-value >.05, we do ot reject H 0. Two-tailed t-test; H 0 =0 ad H 1 0

Oe sample test: Compariso of sample mea with populatio mea. Degree of freedom = -1 for t-test which is distributio of differeces If the calculated value of t > table value we reject the ull hypothesis, H 0 : μ = μ 0 = # (o differece or they are same ad equal)-type I error H 1 0 or H 1 > 0 or H 1 < 0 Z = μ0 ; here <30 where assumptio of SD = σ SE( ) t= μ0 ; here <30 where SD σ, eve (N) is ormally distributed SE( ) Upaired two sample test: Compariso of two idepedet sample meas. H 0 :μ 1 = μ 2 = (μ 1 μ 2 = Zero) they come from same populatio, samples are take from the populatio z = 1 2 SE ( 1 2) ; 30 SE( 1 2)= SD12 1 + SD22 2 ; 30

t = 1 2 SE ( 1 2) <30 ; studet t-distributio SE (μ1 μ2) = s 1 1 + 1 2 ; <30 S = (11)SD12 (21)SD2 2 122 ; <30 Degree of freedom = (1-1) + (2-1) = 1+2-2 Paired sample test: Compariso of meas of two correlated samples. Same subject i both groups. Mea differece for the values is Zero H 0 : µ d = 0 (the mea of the differece i the populatio is zero D= di ad SD d = (did)2 1 Degree of freedom = -1 t= D SE(SDd) SE(SDd) = SDd

If (P-value) is low or equal the Null (H 0 ) must GO (Rejected) Iferece of proportios: H 0 : P = P 0 Z = pp0 SE(p) ad SE (p) = P0 x Q0 ad p= m m is prevalece Where Q 0 = 1-P 0 (remember this is populatio proportio) (p) is calculate from () Two sample t-test is as follow: H 0 : P 1 = P 2 (P 1 - P 2 = Zero) z = p A p B, for 2 sample test of proportio for ay () sample # SE (p A p B ) p = r 1r 2 1 2 ; weighted average for 2 sample test of proportio for ay () sample SE (p A p B ) = pq 1 1 + 1 2 ; for 2 sample test of proportio for ay () sample # Correlatio of (X,Y): DF= -2 t= r 2 1r 2 Calculated t-value is greater tha table t-value the X ad Y sigificatly related to each other

Regressio: a=is the y-itercept ad b=slope Y= a + bx Percetage of total variatio i Y explaied by X = 100 (r) 2 t= r 2 1r 2 if t(calculated) > t(table) the variables (X,Y) related to each other