Statistics-6000 Variable: are characteristic that ca take o differet values with respect to persos, time, ad place ad types of variables are as follow: Idepedet (X) you ca choose ad maipulate. Usually o x-axis Depedet (Y) is what you measure i the experimet ad what is affected durig the experimet. Usually o y-axis Itermediate is a variable i a causal pathway that causes variatio i the depedet variable ad is itself caused to vary by the idepedet variable Cofouder is a extraeous variable i a statistical model that correlates (positively or egatively) with both the depedet variable ad the idepedet variable. The methodologies of scietific studies therefore eed to accout for these variables - either through true experimetal desigs, i which case, oe achieves cotrol, or through statistical meas. (Iteral Validity) Discrete Variable: This is a whole umber ad coutable variable. Ordial, Rakig Type or Nomial Classificatory Categorical Type. (qualitative variable) Cotiuous or measurable variable: variables have o gaps betwee them. Have decimal poits ad uits. (Quatitative variable) Why statistic i geeral: collectio of data, summarizatio ad aalyzig of data set, evaluatio, coduct a research ad fially makig coclusio (Testig hypothesis) Specific goal of statistic: defie a ormal rage (μ ad σ), correlatio study (relatioship), regressio study (predictio), associatio (Qui-Square Test) & agreemet testig (Croblach Alpha & Kappa Cohe Correlatio), testig hypothesis (z,t,f) ad quality cotrol (L G Chart) Sample (): small radom group of idividuals or observatios that is chose for study from populatio. Sample is a part of populatio. Radom sample: is the selectio of the sample such that every member from the populatio has a equal chace of beig icluded i the sample
Samplig uit: A part from populatio, a idividual, household, school, sectio, village Samplig frame: a complete list of samplig uits i the populatio Why we eed sample study: o Less time o Less persoel o Less resources o Less moey o For i-depth study Sample size: the umber of idividuals or observatios uder study. ( 30) Samplig methods: o Simple Radom Samplig: Each uit i this method has a equal probability of beig icluded i the sample. (Lottery sample) by usig tables of radom umbers. Is used whe there is homogeeity i the study elemets of the populatio. (N) is small o Stratified Samplig: The study elemets of populatio are heterogeeous. (N) is lager. (Stratum). Precisio (1/SE) of the estimate will be high (SE will be less) o Systematic Samplig (coveiece): (N) is very large. (K)=N/; is samplig iterval. Oe umber (X) is chose radomly from (1 to K). X+0K, X+1K, X+2K X+3K, X+ (-1) K are icluded i the sample. Precisio of the estimate will be less. o Cluster Samplig: (N) is large ad it s ot possible to get complete listig of the populatio uit. Precisio of the estimate will be less. o Multi Stage Samplig: (N) is very large. Samplig is doe i stages. Precisio of the estimate will be less. o Quota Samplig: (Samplig of Coveiece). () Is fixed ad ot probability samplig method. Not radomly selected. Results caot be geeralized but applicable to that area oly. Not good samplig method. Populatio (N): Aggregate of subjects uder cosideratio. Whole group is represetative Parameters (μ ad σ) Statistics ( ad SD or s) Statistical methods: descriptive method ad iferece method
Descriptive method: frequecy tables, diagrams, graphs (bar chart, pie chart, pictogram, histogram, frequecy polygo ad curves-lierity), arithmetic or geometric or weighted mea, media, mode, rage, quartile deviatio(iqr), mea deviatio, stadard deviatio(sd), coefficiet of variatio (CV%), correlatio coefficiet (r)-pearso Product Momet Correlatio, ad regressio aalysis used for predicatio. Iferece aalysis: used to geeralize the results, obtaied from the radom sample, for the populatio from which the represetative sample was selected. Two mai compoets of iferece method are: Estimatio of Parameters (populatio values) Testig the Statistical Sigificace of the Hypothesis Measure of locatio: mea, mode, ad media. They are oe sigle value to represet the distributio. Whe these values describe a populatio they called parameters. If the describe a sample the referred as statistic(s). Mea ( or µ) = x or x N مجموع القیم على العدد Media: is the middle most value of the arrage data set (cotiuous distributio). The value of it is ot affected by the extreme values ad therefore media is preferred to mea whe there are extreme values. Whe sample ot ormally distributed Mode: the most frequet observatio of data/distributio. Distributio may have more tha 1 mode. There are 2 types of data? Group data ad U-group data (very rich) Why we group the data? Groupig the actual data collected will lose erichmet of the data set from its actual values but some time we eed to hide the actual data from the public ad other competitors or for simplificatio of data we hadig large data set. f = or N ; total umber of frequecy = umber of observatios (sample size) Number of classes or groups eeded to make histogram: 2 k or N Class Iterval Size = MaximumMiimum ; this is icremet value that would be added k For group data arithmetic mea; = mf f, where (m = mid-value of class iterval)
Mid-value = (Lower limit:l1 + Upper limit:l2) 2; these L = real limits oly (x- ) = Zero, always Variace for a group data; (SD 2 or σ 2 ) = fm2 f 2 While computig arithmetic mea for a give grouped frequecy distributio, it is assumed that all values fallig i a particular group or class are located at the midpoit of the group. For group media= L 1 + L2L1 x N C, f = media frequecy, C=cumulative fre. f 2 Law of ext If the give class limits are score limits the covert them to real limits Last group of cumulative frequecy = N or or f For group mode = L 1 + L2L1 x (f f1) ; class with maximum frequecy 2ff1f2 Quartiles ad Percetiles: are the values i the cotiuous distributio showig the proportio/percetage of lyig below (or up to) the give value Q i = L 1 + L2L1 i x N x C; i = 1,2,3 (looks very likely to media formula) f 4 Iterquartile rage (IQR): reflects the variability amog the middle 50% of the observatio of the data. Better tha rage ( uses extreme values oly) Q 1 (25%) ad Q 2 (50%) ad Q 3 (75%) IQR = Q 3 Q 1 ; better tha rage = 75%-25%=50% P 50 = Q 2 = Media; of cotiuous data distributio Real times limits used for group data for: media, mode, quartiles, ad percetiles
P i = L 1 + L2L1 f x i x N 100 C; i = 1,2,3,.,99 (looks very likely to media formula) Rule of ext to locate the class iterval from cumulative frequecy distributio Measure of Variability = Rage, IQR, Variace, SD, ad Coefficiet of Variatio Measure of Variability = Scatter or dispersio of data aroud the mea Rage = Largest observatio Smallest observatio σ 2 = (Xμ)2 N or SD 2 = (x 1 )2 ; variace of ugroup data Group data σ 2 or SD 2 = x2 ( x)2 ; o eed for 1 (1) = fm f σ or SD = + σ 2 or SD 2 ; uit of SD is similar to observatio value CV = SD x 100 ; o uit its uitless quatity CV% is used to compare variatio betwee same sample variables or differet A evet = outcome Probability of (A) = is the proportio of times the outcomes would occur i a very log series of repetitios. (all evets are equally likely) P(A) = m (0 m ); whe () is exhaustive, mutually exclusive Equally likely trials of (m) is possible
Idepedet evets: two evets are said to be idepedet if the presece or absece of oe does ot alter the chaces of the other beig preset, or of the occurrece of oe does ot alter the chace of occurrece of the other. (meas that they ca occur together) Mutually exclusive evets: if they caot both occur together or be preset at the same time. No overlappig betwee the outcomes. Cois flippig head or tail Additive rule: mutually exclusive evets the probability of occurrece of 2 or more mutually exclusive evets is the sum of their probabilities of each outcome P (A or B) = P (A) + P (B) e.g. throwig die for odd umbers- mutually exclusive ev. Multiplicative rule: Idepedet evets probability of simultaeous occurrece of evets A ad B i a series of idepedet trails (i.e. chace of oe outcome occurrig is ot affected by kowledge of whether or ot the other occurred) is the product of their probabilities. P (A ad B) = P(A) x P(B) Idepedet evets Geeral additive rule: if the 2 evets are ot mutually exclusive, the the probability that either evet A or B occurs is: P(A or B or both) = P(A) + P(B) P(A & B) Discrete Probability Distributio (DPD): sum of p(x)s = 1, probability of each outcome is betwee 0-1, outcomes are mutually exclusive. μ= (x i p(x i )) ad σ 2 = ((x i μ) 2. p(x i )) ; for discrete probability distributio Coditioal probability: Joit probability: P(A B)= P(A) x P(B) = multiplicative rule Biomial Distributio: have two outcomes oly oe or zero. Its discrete distributio p(x) = C p q ; C is called biomial coefficiet. (0 x ) C =1 ad C = 1 ad 0! = 1 ad (p+q) = 1; p is the parameters ad is the degree of biomial distributio ad ad p is fixed, trails idepedet, 2 outcomes possible
Its applicatio whe populatio is dichotomized or divided ito 2 classes oly (p) is the probability of success ad (q) is the probability of failure. (p+q)=1 The mea of the biomial distributio (expected value) = p(x) = mea = p The variace of biomial distributio V(x) or σ 2 = p q; if.p.q 10 we ca use ormal distributio to approximate biomial At least to 10 = P(10 x ) = i the questios At most to 10 = P(0 x 10) = i the questios At least oe will retur: 1-p(x=0) i the biomial distributio = i the questios The Poisso distributio: discrete distributio, trails are idepedet, p is very small, is very large, evets are very rare. P(x) = x P(x) = eλ λ x x! ; x=0, 1, 2,.. λ (Aver.)=.p; is parameters (Mea = Variace) e=2.7183 Normal distributio: for cotiuous distributio, large umber of observatios, curve is bell-shaped, symmetrical about the mea, mea=mode=media, total area uder the curve = 1sqr uit ad it approximate the histogram (frequecy polygo). The mea of all possible sample mea is equal to the populatio mea, therefore sample mea is called ubiased estimatio of populatio.
Z (λ) µ±1sd = 0.6826 µ±2sd = 0.9544 Empirical rule=bell Curved-shaped µ±3sd = 0.9973 The degree of flatess or peakess of the curve is determied by the value of σ or SD Stadard Normal Distributio(Z): μ=0, σ 2 =1; σ = 1, Z or Z(λ)= Xμ λ = area uder the curve after trasformatio process. Z(λ) is poit o horizotal lie Estimatio of discrete sample size = = Z2 p q, Z = 1.96 (95% CI) or 2.58 (99% CI) or L2 3.29 (99.9%CI) L: is the permissible error o either side of the estimate (2L is the width of the iterval) If the permissible error o either side of the estimate is give i % L is calculate as ( # 100 x p); do pilot study to estimate p) The populatio proportio of the characteristic is expected to lie i the iterval (p 1 -L, p 2 +L) σ
Estimatio of cotiuous sample size = = Z2 SD 2 (99%CI) or 3.29 (99.9%CI), Z = 1.96 (95% CI) or 2.58 d2 If the permissible error o either side of the estimate is give i % d is calculate as ( # 100 x ) Whe 95% of cofidece iterval: ±1.96 (SE( )) = SD Whe 95% of cofidece iterval: p±1.96 (SE(p)) = p.q SD 2 = p q, Prevalece rate mea old ad ew cases together (Prevalece rate) V(p) = p.q SE ( )= SD the it follows that SE(p) = p.q for prevalece rate of the populatio SD: average amout of deviatio of differet sample values from the mea value SE: average amout of deviatio of differet meas (of differet samples) from the populatio mea Average Mea Deviatio = x Positive skew of the curve : mea > media ad the right side skewed (positive) Geometric mea = product of all % values or = value at ed value at begig 1 Weighted mea = (1x 1)(2 x 2) 12 A experiemet: the observatio of some activity or the act of takig some measuremet. (havig 3 childre) by 3 pregacies A outcome: particular result of a experimet. All the (BBB, BBG ) = 8 outcomes A evet: is the collectio (subset) of oe or more outcomes. E.g. Boy-Girl-Boy A, B, C if we wat 2 joits Combiatios (C r )=! - this is used i biomial probability: AB, BC, AC =3 (r)! r!
Permutatios (P r ) =! ; AB, AC, BA, BC, CA, CB = 6 (r)! Simple Radom Sample: each uit or item has a equal chace of beig selected Samplig error = a sample statistic populatio parameter We reject the ull hypothesis, P<0.05 for testig of sigificace t-distributio We accept the ull hypothesis, P>0.05 for testig of sigificace t-distributio P-value = α (5% or 1% or 0.1%) = rejectio area= tailed area V (X i ) = N N1 x σ2 = SE( ) Cetral Limit Theory: the mea of all possible samples mea is equal to the populatio mea. Therefore; sample mea is called ubiased estimatio of populatio mea. V(X) = N N1 σ2 if the populatio is fiite V(X) = σ2 if the populatio is ifiite (ulimited) = (SE)2 Chi-Square Test: x 2 = (OE)2 E ; (No of colum-1) (No of raw-1) =df If calculated value is greater tha tabular value the there is associatio
Oe-tailed t-test; H 0 =0 ad H 1 > 0 or H 1 < 0
P-value: Presumig H 0 is true, the likelihood of chace variatio yieldig a t-statistic more extreme tha -2.01 o either side of 0 (sice H 1 directio is both high ad low) is.11. Coclusio: Sice P-value >.05, we do ot reject H 0. Two-tailed t-test; H 0 =0 ad H 1 0
Oe sample test: Compariso of sample mea with populatio mea. Degree of freedom = -1 for t-test which is distributio of differeces If the calculated value of t > table value we reject the ull hypothesis, H 0 : μ = μ 0 = # (o differece or they are same ad equal)-type I error H 1 0 or H 1 > 0 or H 1 < 0 Z = μ0 ; here <30 where assumptio of SD = σ SE( ) t= μ0 ; here <30 where SD σ, eve (N) is ormally distributed SE( ) Upaired two sample test: Compariso of two idepedet sample meas. H 0 :μ 1 = μ 2 = (μ 1 μ 2 = Zero) they come from same populatio, samples are take from the populatio z = 1 2 SE ( 1 2) ; 30 SE( 1 2)= SD12 1 + SD22 2 ; 30
t = 1 2 SE ( 1 2) <30 ; studet t-distributio SE (μ1 μ2) = s 1 1 + 1 2 ; <30 S = (11)SD12 (21)SD2 2 122 ; <30 Degree of freedom = (1-1) + (2-1) = 1+2-2 Paired sample test: Compariso of meas of two correlated samples. Same subject i both groups. Mea differece for the values is Zero H 0 : µ d = 0 (the mea of the differece i the populatio is zero D= di ad SD d = (did)2 1 Degree of freedom = -1 t= D SE(SDd) SE(SDd) = SDd
If (P-value) is low or equal the Null (H 0 ) must GO (Rejected) Iferece of proportios: H 0 : P = P 0 Z = pp0 SE(p) ad SE (p) = P0 x Q0 ad p= m m is prevalece Where Q 0 = 1-P 0 (remember this is populatio proportio) (p) is calculate from () Two sample t-test is as follow: H 0 : P 1 = P 2 (P 1 - P 2 = Zero) z = p A p B, for 2 sample test of proportio for ay () sample # SE (p A p B ) p = r 1r 2 1 2 ; weighted average for 2 sample test of proportio for ay () sample SE (p A p B ) = pq 1 1 + 1 2 ; for 2 sample test of proportio for ay () sample # Correlatio of (X,Y): DF= -2 t= r 2 1r 2 Calculated t-value is greater tha table t-value the X ad Y sigificatly related to each other
Regressio: a=is the y-itercept ad b=slope Y= a + bx Percetage of total variatio i Y explaied by X = 100 (r) 2 t= r 2 1r 2 if t(calculated) > t(table) the variables (X,Y) related to each other