Descriptive Statistics
|
|
- Julius Hawkins
- 6 years ago
- Views:
Transcription
1 Chapter 00 Itroductio This procedure summarizes variables both statistically ad graphically. Iformatio about the locatio (ceter), spread (variability), ad distributio is provided. The procedure provides a large variety of statistical iformatio about a sigle variable. Kids of Research Questios The use of this module for a sigle variable is geerally appropriate for oe of four purposes: umerical summary, data screeig, outlier idetificatio (which sometimes is icorporated ito data screeig), ad distributioal shape. We will briefly discuss each of these ow. Numerical Descriptors The umerical descriptors of a sample are called statistics. These statistics may be categorized as locatio, spread, shape idicators, percetiles, ad iterval estimates. Locatio or Cetral Tedecy Oe of the first impressios that we like to get from a variable is its geeral locatio. You might thik of this as the ceter of the variable o the umber lie. The average (mea) is a commo measure of locatio. Whe ivestigatig the ceter of a variable, the mai descriptors are the mea, media, mode, ad the trimmed mea. Other averages, such as the geometric ad harmoic mea, have specialized uses. We will ow briefly compare these measures. If the data come from the ormal distributio, the mea, media, mode, ad the trimmed mea are all equal. If the mea ad media are very differet, most likely there are outliers i the data or the distributio is skewed. If this is the case, the media is probably a better measure of locatio. The mea is very sesitive to extreme values ad ca be seriously cotamiated by just oe observatio. A compromise betwee the mea ad media is give by the trimmed mea (where a predetermied umber of observatios are trimmed from each ed of the data distributio). This trimmed mea is more robust tha the mea but more sesitive tha the media. Compariso of the trimmed mea to the media should show the trimmed mea approachig the media as the degree of trimmig icreases. If the trimmed mea coverges to the media for a small degree of trimmig, say 5 or 10%, the umber of outliers is relatively few. 00-1
2 Variability, Dispersio, or Spread After establishig the ceter of a variable s values, the ext questio is how closely the data fall about this ceter. The patter of the values aroud the ceter is called the spread, dispersio, or variability. There are umerous measures of variability: rage, variace, stadard deviatio, iterquartile rage, ad so o. All of these measures of dispersio are affected by outliers to some degree, but some do much better tha others. The stadard deviatio is oe of the most popular measures of dispersio. Ufortuately, it is greatly iflueced by outlyig observatios ad by the overall shape of the distributio. Because of this, various substitutes for it have bee developed. It will be up to you to decide which is best i a give situatio. Shape The shape of the distributio describes the patter of the values alog the umber lie. Are there a few uique values that occur over ad over, or is there a cotiuum? Is the patter symmetric or asymmetric? Are the data bell shaped? Do they seem to have a sigle ceter or are there several areas of clumpig? These are all aspects of the shape of the distributio of the data. Two of the most popular measures of shape are skewess ad kurtosis. Skewess measures the directio ad lack of symmetry. The more skewed a distributio is, the greater the eed for usig robust estimators, such as the media ad the iterquartile rage. Positive skewess idicates a logtailedess to the right while egative skewess idicates logtailedess to the left. Kurtosis measures the heaviess of the tails. A kurtosis value less tha three idicates lighter tails tha a ormal distributio. Kurtosis values greater tha three idicate heavier tails tha a ormal distributio. The measures of shape require more data to be accurate. For example, a reasoable estimate of the mea may require oly te observatios i a radom sample. The stadard deviatio will require at least thirty. A reasoably detailed estimate of the shape (especially if the tails are importat) will require several hudred observatios. Percetiles Percetiles are extremely useful for certai applicatios as well as for cases whe the distributio is very skewed or cotamiated by outliers. If the distributio of the variable is skewed, you might wat to use the exact iterval estimates for the percetiles. Cofidece Limits or Iterval Estimates A iterval estimate of a statistic gives a rage of its possible values. Cofidece limits are a special type of iterval estimate that have, uder certai coditios, a level of cofidece or probability attached to them. If the assumptio of ormality is valid, the cofidece itervals for the mea, variace, ad stadard deviatio are valid. However, the stadard error of each of these itervals depeds o the sample stadard deviatio ad the sample size. If the sample stadard deviatio is iaccurate, these other measures will be also. The bottom lie is that outliers ot oly affect the stadard deviatio but also all cofidece limits that use the sample stadard deviatio. It should be obvious the that the stadard deviatio is a critical measure of dispersio i parametric methods. 00-
3 Data Screeig Data screeig ivolves missig data, data validity, ad outliers. If these issues are ot dealt with prior to the use of descriptive statistics, errors i iterpretatios are very likely. Missig Data Wheever data are missig, questios eed to be asked. 1. Is the missigess due to icomplete data collectio? If so, try to complete the data collectio.. Is the missigess due to orespose from a survey? If so, attempt to collect data from the orespoders. 3. Are the missig data due to a cesorig of data beyod or below certai values? If so, some differet statistical tools will be eeded. 4. Is the patter of missigess radom? If oly a few data poits are missig from a large data set ad the patter of missigess is radom, there is little to be cocered with. However, if the data set is small or moderate i size, ay degree of missigess could cause bias i iterpretatios. Wheever missig values occur without aswers to the above questios, there is little that ca be doe. If the distributioal shape of the variable is kow ad there are missig data for certai percetiles, estimates could be made for the missig values. If there are other variables i the data set as well ad the patter of missigess is radom, multiple regressio ad multivariate methods ca be used to estimate the missig values. Data Validity Data validity eeds to be cofirmed prior to ay statistical aalysis, but it usually begis after a uivariate descriptive aalysis. Extremes or outliers for a variable could be due to a data etry error, to a icorrect or iappropriate specificatio of a missig code, to samplig from a populatio other tha the iteded oe, or due to a atural abormality that exists i this variable from time to time. The first two cases of ivalid data are easily corrected. The latter two require iformatio about the distributio form ad ecessitate the use of regressio or multivariate methods to re-estimate the values. Outliers Outliers i a uivariate data set are defied as observatios that appear to be icosistet with the rest of the data. A outlier is a observatio that sticks out at either ed of the data set. The visualizatio of uivariate outliers ca be doe i three ways: with the stem-ad-leaf plot, with the box plot, ad with the ormal probability plot. I each of these iformal methods, the outlier is far removed from the rest of the data. A word of cautio: the box plot ad the ormal probability plot evaluate the potetiality of a outlier assumig the data are ormally distributed. If the variable is ot ormally distributed, these plots may idicate may outliers. You must be careful about checkig what distributioal assumptios are behid the outliers you may be lookig for. Outliers ca completely distort descriptive statistics. For istace, if oe suspects outliers, a compariso of the mea, media, mode, ad trimmed mea should be made. If the outliers are oly to oe side of the mea, the media is a better measure of locatio. O the other had, if the outliers are equally diverget o each side of the ceter, the mea ad media will be close together, but the stadard deviatio will be iflated. The iterquartile rage is the oly measure of variatio ot greatly affected by outliers. Outliers may also cotamiate measures of skewess ad kurtosis as well as cofidece limits. This discussio has focused o uivariate outliers, i a simplistic way. If the data set has several variables, multiple regressio ad multivariate methods must be used to idetify these outliers. 00-3
4 Normality A primary use of descriptive statistics is to determie whether the data are ormally distributed. If the variable is ormally distributed, you ca use parametric statistics that are based o this assumptio. If the variable is ot ormally distributed, you might try a trasformatio o the variable (such as, the atural log or square root) to make the data ormal. If a trasformatio is ot a viable alterative, oparametric methods that do ot require ormality should be used. NCSS provides seve tests to formally test for ormality. If a variable fails a ormality test, it is critical to look at the box plot ad the ormal probability plot to see if a outlier or a small subset of outliers has caused the oormality. A pragmatic approach is to omit the outliers ad reru the tests to see if the variable ow passes the ormality tests. Always remember that a reasoably large sample size is ecessary to detect ormality. Oly extreme types of oormality ca be detected with samples less tha fifty observatios. There is a commo miscoceptio that a histogram is always a valid graphical tool for assessig ormality. Sice there are may subjective choices that must be made i costructig a histogram, ad sice histograms geerally eed large sample sizes to display a accurate picture of ormality, preferece should be give to other graphical displays such as the box plot, the desity trace, ad the ormal probability plot. Data Structure The data are cotaied i a sigle variable. Height dataset (subset) Height Procedure Optios This sectio describes the optios available i this procedure. To fid out more about usig a procedure, tur to the Procedures chapter. Followig is a list of the procedure s optios. Variables Tab The optios o this pael specify which variables to use. Data Variables Variable(s) Specify a list of oe or more variables upo which the uivariate statistics are to be geerated. You ca doubleclick the field or sigle click the butto o the right of the field to brig up the Variable Selectio widow. 00-4
5 Frequecy Variable Frequecy Variable This optioal variable specifies the umber of observatios that each row represets. Whe omitted, each row represets a sigle observatio. If your data is the result of a previous summarizatio, you may wat certai rows to represet several observatios. Note that egative values are treated as a zero weight ad are omitted. This is oe way of weightig your data. Groupig Variables Group (1-5) Variable You ca select up to five categorical variables. Whe oe or more of these are specified, a separate set of reports is geerated for each uique set of values for these variables. Data Trasformatio Optios Expoet Occasioally, you might wat to obtai a statistical report o the square root or square of your variable. This optio lets you specify a o-the-fly trasformatio of the variable. The form of this trasformatio is X = Y A, where Y is the origial value, A is the selected expoet, ad X is the value that is summarized. Additive Costat Occasioally, you might wat to obtai a statistical report o a trasformed versio of a variable. This optio lets you specify a o-the-fly trasformatio of the variable. The form of this trasformatio is X = Y+B, where Y is the origial value, B is the selected value, ad X is the value that is summarized. Note that if you apply both the Expoet ad the Additive Costat, the form of the trasformatio is X = (Y+B) A. Reports Tab The optios o this pael cotrol the format of the report. Select Reports Summary Sectio Percetile Sectio Each of these optios idicates whether to display the idicated report. Alpha Level The value of alpha for the cofidece limits ad rejectio decisios. Usually, this umber will rage from 0.1 to The default value of 0.05 results i 95% cofidece limits. Stem ad Leaf Stem Leaf Specify whether to iclude the stem ad leaf plot. 00-5
6 Report Optios Precisio Specify the precisio of umbers i the report. A sigle-precisio umber will show seve-place accuracy, while a double-precisio umber will show thirtee-place accuracy. Note that the reports were formatted for sigle precisio. If you select double precisio, some umbers may ru ito others. Also ote that all calculatios are performed i double precisio regardless of which optio you select here. This is for reportig purposes oly. Value Labels This optio applies to the Group Variable(s). It lets you select whether to display data values, value labels, or both. Use this optio if you wat the output to automatically attach labels to the values (like 1=Yes, =No, etc.). See the sectio o specifyig Value Labels elsewhere i this maual. Variable Names This optio lets you select whether to display oly variable ames, variable labels, or both. Report Optios - Decimal Places Values, Meas, Probabilities Specify the umber of decimal places whe displayig this item. Select Geeral to display all possible decimal places. Report Optios - Percetiles Percetile Type This selects from five methods used to calculate the p th percetile, z p. The first optio, Xp(+1), gives the commo value of the media. These optios are: AveXp(+1) The 100p th percetile is computed as Z p = (1-g)X [k1] + gx [k] where k1 equals the iteger part of p(+1), k=k1+1, g is the fractioal part of p(+1), ad X [k] is the k th observatio whe the data are sorted from lowest to highest. AveXp() The 100p th percetile is computed as Z p = (1-g)X [k1] + gx [k] where k1 equals the iteger part of p, k=k1+1, g is the fractioal part of p, ad X [k] is the k th observatio whe the data are sorted from lowest to highest. Closest to p The 100p th percetile is computed as Z p = X [k1] where k1 equals the iteger that is closest to p ad X [k] is the k th observatio whe the data are sorted from lowest to highest. 00-6
7 EDF The 100p th percetile is computed as Z p = X [k1] where k1 equals the iteger part of p if p is exactly a iteger or the iteger part of p+1 if p is ot exactly a iteger. X [k] is the k th observatio whe the data are sorted from lowest to highest. Note that EDF stads for empirical distributio fuctio. EDF w/ave The 100p th percetile is computed as Z p = (X [k1] + X [k])/ where k1 ad k are defied as follows: If p is a iteger, k1=k=p. If p is ot exactly a iteger, k1 equals the iteger part of p ad k = k1+1. X [k] is the k th observatio whe the data are sorted from lowest to highest. Note that EDF stads for empirical distributio fuctio. Smallest Percetile By default, the smallest percetile displayed is the 1st percetile. This optio lets you chage this value to ay value betwee 0 ad 100. For example, you might eter.5 to see the.5 th percetile. Largest Percetile By default, the largest percetile displayed is the 99th percetile. This optio lets you chage this value to ay value betwee 0 ad 100. For example, you might eter 97.5 to see the 97.5 th percetile. Plots Tab These optios specify the plots. Select Plots Histogram ad Probability Plot Specify whether to display the idicated plots. Click the plot format butto to chage the plot settigs. 00-7
8 Example 1 Ruig This sectio presets a detailed example of how to ru a descriptive statistics report o the Height variable i the Height dataset. To ru this example, take the followig steps (ote that step 1 is ot ecessary if the Height dataset is ope): You may follow alog here by makig the appropriate etries or load the completed template Example 1 by clickig o Ope Example Template from the File meu of the widow. 1 Ope the Height dataset. From the File meu of the NCSS Data widow, select Ope Example Data. Click o the file Height.NCSS. Click Ope. Ope the widow. Usig the Aalysis meu or the Procedure Navigator, fid ad select the procedure. O the meus, select File, the New Template. This will fill the procedure with the default template. 3 Specify the Height variable. O the widow, select the Variables tab. (This is the default.) Double-click i the Variables text box. This will brig up the variable selectio widow. Select Height from the list of variables ad the click Ok. The word Height will appear i the Variables box. Remember that you could have etered a 1 here sigifyig the first (left-most) variable o the dataset. 4 Ru the procedure. From the Ru meu, select Ru Procedure. Alteratively, just click the gree Ru butto. The followig reports ad charts will be displayed i the Output widow. Report This report is rather large ad complicated, so we will defie each sectio separately. Usually, you will focus o oly a few items from this report. Ufortuately, each user wats a differet few items, so we had to iclude much more tha ay oe user eeds! Several of the formulas ivolve both raw ad cetral momets. The raw momets are defied as: The cetral momets are defied as: m r m ' r = = i = 1 i = 1 x r i ( x x ) i r 00-8
9 Large sample estimates of the stadard errors are provided for several statistics. These are based o the followig formula from Kedall ad Stuart (1987): m r mr + 4mmr -1 rmr - mr + Var( mr ) = dg Var( g( x)) = Var( x) dx 1 1 Summary Sectio Summary Sectio of Height Stadard Stadard Cout Mea Deviatio Error Miimum Maximum Rage Cout This is the umber of omissig values. If o frequecy variable was specified, this is the umber of omissig rows. Mea This is the average of the data values. (See Meas Sectio below.) Stadard Deviatio This is the stadard deviatio of the data values. (See Variatio Sectio below.) Stadard Error This is the stadard error of the mea. (See Meas Sectio below.) Miimum The smallest value i this variable. Maximum The largest value i this variable. Rage The differece betwee the largest ad smallest values for a variable. If the data for a give variable is ormally distributed, a quick estimate of the stadard deviatio ca be made by dividig the rage by six. Cout Sectio Couts Sectio of Height Sum of Missig Distict Total Adjusted Rows Frequecies Values Values Sum Sum Squares Sum Squares Rows This is the total umber of rows available i this variable. 00-9
10 Sum of Frequecies This is the umber of omissig values. If o frequecy variable was specified, this is the umber of omissig rows. Missig Values The umber of missig (empty) rows. Distict Values This is the umber of uique values i this variable. This value is useful for fidig data etry errors ad for determiig if a variable is cotiuous or discrete. Sum This is the sum of the data values. Total Sum Squares This is the sum of the squared values of the variable. It is sometimes referred to as the uadjusted sum of squares. It is reported for its usefuless i calculatig other statistics ad is ot iterpreted directly. sum squares Adjusted Sum Squares This is the sum of the squared differeces from the mea. = x i i =1 sum squares = ( x x ) i = 1 i Meas Sectio Meas Sectio of Height Geometric Harmoic Parameter Mea Media Mea Mea Sum Mode Value Std Error % LCL % UCL T-Value Prob Level 0 Cout The geometric mea cofidece iterval assumes that the l(y) are ormally distributed. The harmoic mea cofidece iterval assumes that the 1/y are ormally distributed. Mea This is the average of the data values. x = i = 1 x i 00-10
11 Std Error (Mea) This is the stadard error of the mea. This is the estimated stadard deviatio for the distributio of sample meas for a ifiite populatio. LCL ad 95% UCL of the Mea s x = This is the upper ad lower values of a 100(1-α) iterval estimate for the mea based o a t distributio with -1 degrees of freedom. This iterval estimate assumes that the populatio stadard deviatio is ot kow ad that the data for this variable are ormally distributed. s x t s ± a/, 1 x T-Value (Mea) This is the t-test value for testig that the sample mea is equal to zero versus the alterative that it is ot. The degrees of freedom for this t-test are -1. The variable that is beig tested must be approximately ormally distributed for this test to be valid. t = x α /, 1 s Prob Level (Mea) This is the sigificace level of the above t-test, assumig a two-tailed test. Geerally, this p-value is compared to the level of sigificace,.05 or.01, chose by the researcher. If the p-value is less tha the pre-determied level of sigificace, the sample mea is differet from zero. Media The value of the media. The media is the 50th percetile of the data set. It is the poit that splits the data base i half. The value of the percetile depeds upo the percetile method that was selected. LCL ad 95% UCL of the Media These are the values of a exact cofidece iterval for the media. These exact cofidece itervals are discussed i the Percetile Sectio. Geometric Mea The geometric mea (GM) is a alterative type of mea that is used for busiess, ecoomic, ad biological applicatios. Oly oegative values are used i the computatio. If oe of the values is zero, the geometric mea is defied to be zero. Oe example of whe the GM is appropriate is whe a variable is the product of may small effects combied by multiplicatio istead of additio. GM = x x i i= 1 A alterative form, showig the GM s relatioship to the arithmetic mea, is: 1/ GM = exp 1 l( x i ) Cout for Geometric Mea The umber of positive umbers used i computig the geometric mea
12 Harmoic Mea The harmoic mea is used to average rates. For example, suppose we wat the average speed of a bus that travels a fixed distace every day at speeds s 1, s, ad s 3. The average speed, foud by dividig the total distace by the total time, is equal to the harmoic mea of the three speeds. The harmoic mea is appropriate whe the distace is costat from trial to trial ad the time required was variable. However, if the times were costat ad the distaces were variable, the arithmetic mea would have bee appropriate. Oly ozero values may be used i its calculatio. HM = 1 i =1 x i Cout for the Harmoic Mea The umber of ozero umbers used i computig the harmoic mea. Sum This is the sum of the data values. The stadard error ad cofidece limits are foud by multiplyig the correspodig values for the mea by the sample size,. Std Error of Sum This is the stadard deviatio of the distributio of sums. With this stadard error, cofidece itervals ad hypothesis testig ca be doe for the sum. The assumptios for the iterval estimate of the mea must also hold here. Mode This is the most frequetly occurrig value i the data. Mode Cout This is a cout of the most frequetly occurrig value, i.e., frequecy. s sum = s x Variatio Sectio Variatio Sectio of Height Stadard Ubiased Std Error Iterquartile Parameter Variace Deviatio Std Dev of Mea Rage Rage Value Std Error % LCL % UCL Variace The sample variace, s, is a popular measure of dispersio. It is a average of the squared deviatios from the mea. s i = 1 = ( x x ) i
13 Std Error of Variace This is a large sample estimate of the stadard error of s for a ifiite populatio. LCL of the Variace This is the lower value of a 100(1-α) iterval estimate for the variace based o the chi-squared distributio with -1 degrees of freedom. This iterval estimate assumes that the variable is ormally distributed. UCL of the Variace LCL = s ( - 1) χ α /, 1 This is the upper value of a 100(1-α) iterval estimate for the variace based o the chi-squared distributio with -1 degrees of freedom. This iterval estimate assumes that the variable is ormally distributed. UCL = s ( - 1) χ 1 α /, 1 Stadard Deviatio The sample stadard deviatio, s, is a popular measure of dispersio. It measures the average distace betwee a sigle observatio ad its mea. The use of -1 i the deomiator istead of the more atural is ofte of cocer. It turs out that if (istead of -1) were used, a biased estimate of the populatio stadard deviatio would result. The use of -1 corrects for this bias. Ufortuately, s is iordiately iflueced by outliers. For this reaso, you must always check for outliers i your data before you use this statistic. Also, s is a biased estimator of the populatio stadard deviatio. A ubiased estimate, calculated by adjustig s, is give uder the headig Ubiased Std Dev. s = i = 1 ( x x ) i 1 Aother form of the above formula that shows that the stadard deviatio is proportioal to the differece betwee each pair of observatios. Notice that the sample mea does ot eter ito this secod formulatio. s = i all i, j where i < j ( x x ) ( 1) j Std Error of Stadard Deviatio This is a large sample estimate of the stadard error of s for a ifiite populatio. LCL of Stadard Deviatio This is the lower value of a 100(1-α) iterval estimate for the stadard deviatio based o the chi-squared distributio with -1 degrees of freedom. This iterval estimate assumes that the variable is ormally distributed. LCL = s ( - 1) χ α /,
14 UCL of Stadard Deviatio This is the upper value of a 100(1-α) iterval estimate for the stadard deviatio based o the chi-squared distributio with -1 degrees of freedom. This iterval estimate assumes that the variable is ormally distributed. UCL = s ( - 1) χ 1 α /, 1 Ubiased Std Dev This is a ubiased estimate of the stadard deviatio. If the data come from a ormal distributio, the sample variace, s, is a ubiased estimate of the populatio variace. Ufortuately, the sample stadard deviatio, s, is a biased estimate of the populatio stadard deviatio. This bias is usually overlooked, but divisio of s by a correctio factor, c 4, will correct for this bias. This is frequetly doe i quality cotrol applicatios. The formula for c 4 is: where 1 0 t Γ( ) = t e dt c 4 = Γ( / ) 1 Γ(( 1) / ) Std Error of Mea This is a estimate of the stadard error of the mea. This is a estimate of the precisio of the sample mea. It, its stadard error ad cofidece limits, are calculated by dividig the correspodig Stadard Deviatio value by the square root of. Iterquartile Rage This is the iterquartile rage (IQR). It is the differece betwee the third quartile ad the first quartile (betwee the 75th percetile ad the 5th percetile). This represets the rage of the middle 50 percet of the distributio. It is a very robust (ot affected by outliers) measure of dispersio. I fact, if the data are ormally distributed, a robust estimate of the sample stadard deviatio is IQR/1.35. If a distributio is very cocetrated aroud its mea, the IQR will be small. O the other had, if the data are widely dispersed, the IQR will be much larger. Rage The differece betwee the largest ad smallest values for a variable. If the data for a give variable is ormally distributed, a quick estimate of the stadard deviatio ca be made by dividig the rage by six. Skewess ad Kurtosis Sectio Skewess ad Kurtosis Sectio of Height Coefficiet Coefficiet Parameter Skewess Kurtosis Fisher's g1 Fisher's g of Variatio of Dispersio Value Std Error
15 Skewess This statistic measures the directio ad degree of asymmetry. A value of zero idicates a symmetrical distributio. A positive value idicates skewess (logtailedess) to the right while a egative value idicates skewess to the left. Values betwee -3 ad +3 idicate are typical values of samples from a ormal distributio. For a alterative measure of skewess, see Fisher s g1, below. m 3 b1 = 3/ m Std Error of Skewess This is a large sample estimate of the stadard error of skewess for a ifiite populatio. Kurtosis This statistic measures the heaviess of the tails of a distributio. The usual referece poit i kurtosis is the ormal distributio. If this kurtosis statistic equals three ad the skewess is zero, the distributio is ormal. Uimodal distributios that have kurtosis greater tha three have heavier or thicker tails tha the ormal. These same distributios also ted to have higher peaks i the ceter of the distributio (leptokurtic). Uimodal distributios whose tails are lighter tha the ormal distributio ted to have a kurtosis that is less tha three. I this case, the peak of the distributio teds to be broader tha the ormal (platykurtic). Be forewared that this statistic is a ureliable estimator of kurtosis for small sample sizes. For a alterative measure of skewess, see Fisher s g, below. m b = 4 m Std Error of Kurtosis This is a large sample estimate of the stadard error of skewess for a ifiite populatio. Fisher s g1 Fisher s g 1 measure is a alterative measure of skewess. g = 1 ( -1) b - 1 Fisher s g The Fisher s g measure is a alterative measure of kurtosis. g = (+1)( -1) 3( -1) b - ( - )( - 3) +1 Coefficiet of Variatio The coefficiet of variatio is a relative measure of dispersio. It is most ofte used to compare the amout of variatio i two samples. It ca be used for the same data over two time periods or for the same time period but two differet places. It is the stadard deviatio divided by the mea: cv = s x Std Error of Coefficiet of Variatio This is a large sample estimate of the stadard error of the estimated coefficiet of variatio
16 Coefficiet of Dispersio The coefficiet of dispersio is a robust, relative measure of dispersio. It is frequetly used i real estate or tax assessmet applicatios. xi COD = media - media Trimmed Sectio Trimmed Sectio of Height 5% 10% 15% 5% 35% 45% Parameter Trimmed Trimmed Trimmed Trimmed Trimmed Trimmed Trim-Mea Trim-Std Dev Cout %Trimmed We call 100g the trimmig percetage, the percet of data that is trimmed from each side of the sorted data. Thus, if g = 5%, for a sample size of 00, 10 observatios are igored from each side of the sorted array of data values. Note that our formulatio allows fractioal data values. Differet trimmig percetages are available, but 5% ad 10% are the most commo i practice. Trim-Mea These are the alpha-trimmed meas discussed by Hoagli (1983, page 311). These are useful for quickly assessig the impact of outliers. You would like to see stability i these trimmed meas after a small degree of trimmig. The formula for the trimmed mea for 100g% trimmig is where g [ ] = α ad r = α g. g 1 1 x = ( 1 r ( ) )[ X X ] + X 1 α ( α ) ( g ) ( g ) ( i) i= g+ Trim-Std Dev This is the stadard deviatio of the observatios that remai after the trimmig. It ca be used to evaluate chages i the stadard deviatio for differet degrees of trimmig. The formula for the trimmed stadard deviatio for 100g% trimmig is the stadard formula for a weighted average usig the weights give below. a i = a i = 0 if i g or i g +1 1 r α if i = g +1or i = g 1 ai = α if g + i g 1 Cout This is the umber of observatios remaiig after the trimmig operatio. Note that this may be a fractioal amout uder alpha-trimmig
17 Mea-Deviatio Sectio Mea-Deviatio Sectio of Height Parameter X-Mea X-Media (X-Mea)^ (X-Mea)^3 (X-Mea)^4 Average Std Error Average of X-Mea This is a measure of dispersio, called the mea deviatio or the mea absolute deviatio. It is ot affected by outliers as much as the stadard deviatio, sice the differeces from the mea are ot squared. If the distributio for the variable of iterest is ormal, the mea deviatio is approximately equal to 0.8 stadard deviatios. MAD = i = 1 Std Error of X-Mea This is a estimate of the stadard error of the mea deviatio. x i x SE = s ( 1) π 1 MAD + ( ) + arcsi π 1 Average of X-Media This is a alterate formulatio of the mea deviatio above that is more robust to outliers sice the media is used as the ceter poit of the distributio. MAD Robust = i= 1 x media i Average of (X-Mea)^ This is the secod momet about the mea, m. Std Error of (X-Mea)^ This is the estimated stadard deviatio of the secod momet. Average of (X-Mea)^3 This is the third momet about the mea, m 3. Std Error of (X-Mea)^3 This is the estimated stadard deviatio of the third momet. Average of (X-Mea)^4 This is the fourth momet about the mea, m 4. Std Error of (X-Mea)^4 This is the estimated stadard deviatio of the fourth momet
18 Quartile Sectio This gives the value of the j th percetile. Of course, the 5 th percetile is called the first (lower) quartile, the 50 th percetile is the media, ad the 75 th percetile is called the third (upper) quartile. Quartile Sectio of Height 10th 5th 50th 75th 90th Parameter Percetile Percetile Percetile Percetile Percetile Value % LCL % UCL Value These are the values of the specified percetiles. Note that the defiitio of a percetile depeds o the type of percetile that was specified. LCL ad 95% UCL These give a exact, 100(1-α)% cofidece iterval for the populatio percetile. This cofidece iterval does ot assume ormality. Istead, it oly assumes a radom sample of items from a cotiuous distributio. The iterval is based o the equatio: 1 α = I ( r, r + 1) I ( r + 1, r) Here I p(a,b) is the itegral of the icomplete beta fuctio: ad q=1-p ad I p(a,b) = 1- I 1-p(b,a). p r 1 I q ( r + 1, r ) = k p ( 1 p ) k = 0 p k k Normality Test Sectio Normality Test Sectio of Height Test Prob 10% Critical 5% Critical Decisio Test Name Value Level Value Value (5%) Shapiro-Wilk W Ca't reject ormality Aderso-Darlig Ca't reject ormality Martiez-Iglewicz Ca't reject ormality Kolmogorov-Smirov Ca't reject ormality D'Agostio Skewess Ca't reject ormality D'Agostio Kurtosis Ca't reject ormality D'Agostio Omibus Ca't reject ormality Normality Tests This sectio displays the results of seve tests of the hypothesis that the data come from the ormal distributio. The Shapiro-Wilk adaderso-darlig tests are usually cosidered as the best. The Kolmogorov-Smirov test is icluded because of its historical popularity, but is bettered i almost every way by the other tests. Ufortuately, these tests have small statistical power (probability of detectig oormal data) uless the sample sizes are large, say over 100. Hece, if the decisio is to reject, you ca be reasoably certai that the data are ot ormal. However, if the decisio is to accept, the situatio is ot as clear. If you have a sample size of 100 or more, you ca reasoably assume that the actual distributio is closely approximated by the ormal distributio. If your sample size is less tha 100, all you kow is that there was ot eough evidece i your data to reject the ormality assumptio. I other words, the data might be oormal, you just could ot prove it. I this case, you must rely o the graphics ad past experiece to justify the ormality assumptio
19 Shapiro-Wilk W Test This test for ormality has bee foud to be the most powerful test i most situatios. It is the ratio of two estimates of the variace of a ormal distributio based o a radom sample of observatios. The umerator is proportioal to the square of the best liear estimator of the stadard deviatio. The deomiator is the sum of squares of the observatios about the sample mea. The test statistic W may be writte as the square of the Pearso correlatio coefficiet betwee the ordered observatios ad a set of weights which are used to calculate the umerator. Sice these weights are asymptotically proportioal to the correspodig expected ormal order statistics, W is roughly a measure of the straightess of the ormal quatile-quatile plot. Hece, the closer W is to oe, the more ormal the sample is. The test was developed by Shapiro ad Wilk (1965) for samples up to 0. NCSS uses the approximatios suggested by Roysto (199) ad Roysto (1995) which allow ulimited sample sizes. Note that Roysto oly checked the results for sample sizes up to 5000, but idicated that he saw o reaso larger sample sizes should ot work. The probability values for W are valid for samples greater tha 3. W may ot be as powerful as other tests whe ties occur i your data. The test is ot calculated whe a frequecy variable is specified. Aderso-Darlig Test This test, developed by Aderso ad Darlig (1954), is the most popular ormality test that is based o EDF statistics. I some situatios, it has bee foud to be as powerful as the Shapiro-Wilk test. The test is ot calculated whe a frequecy variable is specified. Martiez-Iglewicz This test for ormality, developed by Martiez ad Iglewicz (1981), is based o the media ad a robust estimator of dispersio. They have show that this test is very powerful for heavy-tailed symmetric distributios as well as a variety of other situatios. A value of the test statistic that is close to oe idicates that the distributio is ormal. This test is recommeded for exploratory data aalysis by Hoagli (1983). The formula for this test is: where s bi is a biweight estimator of scale. I = ( x i x ) i = 1 ( 1) s Martiez-Iglewicz (10% Critical ad 5% Critical) The 10% ad 5% critical values are give here. If the value of the test statistic is greater tha this value, reject ormality at that level of sigificace. Martiez-Iglewicz Decisio (5%) This reports the outcome of this test at the 5% sigificace level. Kolmogorov-Smirov This test for ormality is based o the maximum differece betwee the observed distributio ad expected cumulative-ormal distributio. Sice it uses the sample mea ad stadard deviatio to calculate the expected ormal distributio, the Lilliefors adjustmet is used. The smaller the maximum differece the more likely that the distributio is ormal. This test has bee show to be less powerful tha the other tests i most situatios. It is icluded because of its historical popularity. bi 00-19
20 Kolmogorov-Smirov (10% Critical ad 5% Critical) The 10% ad 5% critical values are give here. If the value of the test statistic is greater tha this value, reject ormality at that level of sigificace. The critical values are the Lilliefors adjusted values as give by Dallal (1986). If the test value is greater tha the reject critical value, ormality is rejected at that level of sigificace. Kolmogorov-Smirov Decisio (5%) This reports the outcome of this test at the 5% sigificace level. D Agostio Skewess D Agostio (1990) describes a ormality test based o the skewess coefficiet, b 1. Recall that because the ormal distributio is symmetrical, b 1 is equal to zero for ormal data. Hece, a test ca be developed to determie if the value of b 1 is sigificatly differet from zero. If it is, the data are obviously oormal. The statistic, z s, is, uder the ull hypothesis of ormality, approximately ormally distributed. The computatio of this statistic, which is restricted to sample sizes >8, is where b m 1 = 3 3 m T z = T s d a + l + 1 a T = ( + 1)( + 3) b1 6( ) 3( )( + 1)( + 3) C = ( )( + 5)( + 7)( + 9) W = 1 + ( C 1) a = d = W 1 1 l( W ) Skewess Test (Prob Level) This is the two-tail, sigificace level for this test. Reject the ull hypothesis of ormality if this value is less tha a pre-determied value, say Skewess Test Decisio (5%) This reports the outcome of this test at the 5% sigificace level. D Agostio Kurtosis D Agostio (1990) describes a ormality test based o the kurtosis coefficiet, b. Recall that for the ormal distributio, the theoretical value of b is 3. Hece, a test ca be developed to determie if the value of b is sigificatly differet from 3. If it is, the data are obviously oormal. The statistic, z k, is, uder the ull hypothesis of ormality, approximately ormally distributed for sample sizes >0. The calculatio of this test proceeds as follows: 00-0
21 where b m = 4 m G = b ( )( 3) ( + 1) ( + 3)( + 5) z k = 1 1 A 9A 1+ G A 4 9A 1/ 3 6( 5 + ) E = ( + 7)( + 9) 6( + 3)( + 5) ( )( 3) 8 4 A = E E E Prob Level of Kurtosis Test This is the two-tail sigificace level for this test. Reject the ull hypothesis of ormality if this value is less tha a pre-determied value, say Decisio of Kurtosis Test This reports the outcome of this test at the 5% sigificace level. D Agostio Omibus D Agostio (1990) describes a ormality test that combies the tests for skewess ad kurtosis. The statistic, K, is approximately distributed as a chi-square with two degrees of freedom. After calculated z s ad z k, calculate K as follows: s k K = z + z Prob Level D Agostio Omibus This is the sigificace level for this test. Reject the ull hypothesis of ormality if this value is less tha a predetermied value, say Decisio of D Agostio Omibus Test This reports the outcome of this test at the 5% sigificace level. 00-1
22 Histogram Plot The followig plot shows a histogram of the data. Histogram The histogram is a traditioal way of displayig the shape of a group of data. It is costructed from a frequecy distributio, where choices o the umber of bis ad bi width have bee made. These choices ca drastically affect the shape of the histogram. The ideal shape to look for i the case of ormality is a bell-shaped distributio. Normal Probability Plot This is a plot of the iverse of the stadard ormal cumulative versus the ordered observatios. If the uderlyig distributio of the data is ormal, the poits will fall alog a straight lie. Deviatios from this lie correspod to various types of oormality. Stragglers at either ed of the ormal probability plot idicate outliers. Curvature at both eds of the plot idicates log or short distributio tails. Covex, or cocave, curvature idicates a lack of symmetry. Gaps, plateaus, or segmetatio i the plot idicate certai pheomeo that eed closer scrutiy. Cofidece bads serve as a visual referece for departures from ormality. If ay of the observatios fall outside the cofidece bads, the data are ot ormal. The umerical ormality tests will usually cofirm this fact statistically. If oly oe observatio falls outside the cofidece limits, it may be a outlier. Note that these cofidece bads are based o large sample formulas. They may ot be accurate for small samples (less tha 30). 00-
23 Percetile Sectio Percetile Sectio of Height Percetile Value 95% LCL 95% UCL Exact Cof. Level Percetile Formula: Ave X(p[+1]) This sectio gives a larger set of percetiles tha was icluded i the Quartile Sectio. Use it whe you eed a less commo percetile. Percetile This is the percetage amout that you wat the percetile of. Value This gives the value of the p th percetile. Note that the percetile method used is listed at the bottom of the report. 95%LCL ad 95% UCL These give a exact, 100(1-α)% cofidece iterval for the populatio percetile. This cofidece iterval does ot assume ormality. Istead, it oly assumes a radom sample of items from a cotiuous distributio. The iterval is based o the equatio: 1 α = I ( r, r + 1) I ( r + 1, r) Here I p(a,b) is the itegral of the icomplete beta fuctio: p p ad q=1-p ad I p(a,b) = 1- I 1-p(b,a). r 1 Iq r r k p k ( + 1, ) = ( p ) 1 k= 0 k Exact Cof. Level Because of the discrete ature of the cofidece iterval costructed above, NCSS fids a iterval that is less tha the specified alpha level. This colum gives the actual cofidece coefficiet of the iterval. 00-3
24 Stem-ad-Leaf Plot Sectio Stem-ad-Leaf Plot Sectio of Height Depth Stem Leaves 4 5* * * Uit = 1 Example: 1 Represets 1 The stem-leaf plot is a type of histogram which retais much of the idetity of the origial data. It is useful for fidig data-etry errors as well as for studyig the distributio of a variable. Depth This is the cumulative umber of leaves, coutig i from the earest ed. Stem The stem is the first digit of the actual umber. For example, the stem of the umber 53 is 5 ad the stem of is 3. This is modified appropriately if the batch cotais umbers of differet orders of magitude. The largest order of magitude is used i determiig the stem. Depedig upo the umber of leaves, a stem may be divided ito two or more sub-stems. A special set of symbols is the used to mark the stems. The star (*) represets umbers i the rage of zero to four, while the period (.) represets umbers i the rage of five to ie. Leaf The leaf is the secod digit of the actual umber. For example, the leaf of the umber 53 is ad the leaf of is. This is modified appropriately if the batch cotais umbers of differet orders of magitude. The largest order of magitude is used i determiig the leaf. Uit This lie at the bottom idicates how the data were scaled to make the plot. 00-4
NCSS Statistical Software. Tolerance Intervals
Chapter 585 Itroductio This procedure calculates oe-, ad two-, sided tolerace itervals based o either a distributio-free (oparametric) method or a method based o a ormality assumptio (parametric). A two-sided
More informationChapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers
Chapter 4 4-1 orth Seattle Commuity College BUS10 Busiess Statistics Chapter 4 Descriptive Statistics Summary Defiitios Cetral tedecy: The extet to which the data values group aroud a cetral value. Variatio:
More informationChapter 2 Descriptive Statistics
Chapter 2 Descriptive Statistics Statistics Most commoly, statistics refers to umerical data. Statistics may also refer to the process of collectig, orgaizig, presetig, aalyzig ad iterpretig umerical data
More informationt distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference
EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationParameter, Statistic and Random Samples
Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,
More informationMEASURES OF DISPERSION (VARIABILITY)
POLI 300 Hadout #7 N. R. Miller MEASURES OF DISPERSION (VARIABILITY) While measures of cetral tedecy idicate what value of a variable is (i oe sese or other, e.g., mode, media, mea), average or cetral
More informationAnna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2
Aa Jaicka Mathematical Statistics 18/19 Lecture 1, Parts 1 & 1. Descriptive Statistics By the term descriptive statistics we will mea the tools used for quatitative descriptio of the properties of a sample
More informationENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!
ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Solutios Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationMedian and IQR The median is the value which divides the ordered data values in half.
STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More informationCHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.
CHAPTER 2 umerical Measures Graphical method may ot always be sufficiet for describig data. You ca use the data to calculate a set of umbers that will covey a good metal picture of the frequecy distributio.
More informationBinomial Distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationA goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality
A goodess-of-fit test based o the empirical characteristic fuctio ad a compariso of tests for ormality J. Marti va Zyl Departmet of Mathematical Statistics ad Actuarial Sciece, Uiversity of the Free State,
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationChapter 23: Inferences About Means
Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For
More informationContinuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised
Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for
More informationEconomics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls
Ecoomics 250 Assigmet 1 Suggested Aswers 1. We have the followig data set o the legths (i miutes) of a sample of log-distace phoe calls 1 20 10 20 13 23 3 7 18 7 4 5 15 7 29 10 18 10 10 23 4 12 8 6 (1)
More informationµ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion
Poit Estimatio Poit estimatio is the rather simplistic (ad obvious) process of usig the kow value of a sample statistic as a approximatio to the ukow value of a populatio parameter. So we could for example
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationNumber of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day
LECTURE # 8 Mea Deviatio, Stadard Deviatio ad Variace & Coefficiet of variatio Mea Deviatio Stadard Deviatio ad Variace Coefficiet of variatio First, we will discuss it for the case of raw data, ad the
More informationSample Size Determination (Two or More Samples)
Sample Sie Determiatio (Two or More Samples) STATGRAPHICS Rev. 963 Summary... Data Iput... Aalysis Summary... 5 Power Curve... 5 Calculatios... 6 Summary This procedure determies a suitable sample sie
More information7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals
7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationBecause it tests for differences between multiple pairs of means in one test, it is called an omnibus test.
Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal
More informationACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics
ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationOverview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions
Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationElementary Statistics
Elemetary Statistics M. Ghamsary, Ph.D. Sprig 004 Chap 0 Descriptive Statistics Raw Data: Whe data are collected i origial form, they are called raw data. The followig are the scores o the first test of
More informationData Description. Measure of Central Tendency. Data Description. Chapter x i
Data Descriptio Describe Distributio with Numbers Example: Birth weights (i lb) of 5 babies bor from two groups of wome uder differet care programs. Group : 7, 6, 8, 7, 7 Group : 3, 4, 8, 9, Chapter 3
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationSummarizing Data. Major Properties of Numerical Data
Summarizig Data Daiel A. Meascé, Ph.D. Dept of Computer Sciece George Maso Uiversity Major Properties of Numerical Data Cetral Tedecy: arithmetic mea, geometric mea, media, mode. Variability: rage, iterquartile
More informationRead through these prior to coming to the test and follow them when you take your test.
Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationChapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo
More informationLecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.
Lecture Mai Topics: Defiitios: Statistics, Populatio, Sample, Radom Sample, Statistical Iferece Type of Data Scales of Measuremet Describig Data with Numbers Describig Data Graphically. Defiitios. Example
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationSTA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:
STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio
More informationREGRESSION (Physics 1210 Notes, Partial Modified Appendix A)
REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data
More informationGG313 GEOLOGICAL DATA ANALYSIS
GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationFinal Examination Solutions 17/6/2010
The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationComparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading
Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual
More informationData Analysis and Statistical Methods Statistics 651
Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio
More informationStatisticians use the word population to refer the total number of (potential) observations under consideration
6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationII. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation
II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationChapter 6. Sampling and Estimation
Samplig ad Estimatio - 34 Chapter 6. Samplig ad Estimatio 6.. Itroductio Frequetly the egieer is uable to completely characterize the etire populatio. She/he must be satisfied with examiig some subset
More informationProbability and statistics: basic terms
Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample
More informationRecall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.
Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed
More informationAP Statistics Review Ch. 8
AP Statistics Review Ch. 8 Name 1. Each figure below displays the samplig distributio of a statistic used to estimate a parameter. The true value of the populatio parameter is marked o each samplig distributio.
More informationANALYSIS OF EXPERIMENTAL ERRORS
ANALYSIS OF EXPERIMENTAL ERRORS All physical measuremets ecoutered i the verificatio of physics theories ad cocepts are subject to ucertaities that deped o the measurig istrumets used ad the coditios uder
More informationTests of Hypotheses Based on a Single Sample (Devore Chapter Eight)
Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notatio Math 113 - Itroductio to Applied Statistics Name : Use Word or WordPerfect to recreate the followig documets. Each article is worth 10 poits ad ca be prited ad give to the istructor
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationComputing Confidence Intervals for Sample Data
Computig Cofidece Itervals for Sample Data Topics Use of Statistics Sources of errors Accuracy, precisio, resolutio A mathematical model of errors Cofidece itervals For meas For variaces For proportios
More informationActivity 3: Length Measurements with the Four-Sided Meter Stick
Activity 3: Legth Measuremets with the Four-Sided Meter Stick OBJECTIVE: The purpose of this experimet is to study errors ad the propagatio of errors whe experimetal data derived usig a four-sided meter
More informationError & Uncertainty. Error. More on errors. Uncertainty. Page # The error is the difference between a TRUE value, x, and a MEASURED value, x i :
Error Error & Ucertaity The error is the differece betwee a TRUE value,, ad a MEASURED value, i : E = i There is o error-free measuremet. The sigificace of a measuremet caot be judged uless the associate
More informationGoodness-Of-Fit For The Generalized Exponential Distribution. Abstract
Goodess-Of-Fit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated
More informationChapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more
More information6 Sample Size Calculations
6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationAccess to the published version may require journal subscription. Published with permission from: Elsevier.
This is a author produced versio of a paper published i Statistics ad Probability Letters. This paper has bee peer-reviewed, it does ot iclude the joural pagiatio. Citatio for the published paper: Forkma,
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationBIOS 4110: Introduction to Biostatistics. Breheny. Lab #9
BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous
More informationChapter 10: Power Series
Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because
More informationChapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008
Chapter 6 Part 5 Cofidece Itervals t distributio chi square distributio October 23, 2008 The will be o help sessio o Moday, October 27. Goal: To clearly uderstad the lik betwee probability ad cofidece
More informationMeasures of Variation
Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig CHAPTER Measures of Variatio Key
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationLecture 24 Floods and flood frequency
Lecture 4 Floods ad flood frequecy Oe of the thigs we wat to kow most about rivers is what s the probability that a flood of size will happe this year? I 100 years? There are two ways to do this empirically,
More informationPower and Type II Error
Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationLecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS
Lecture 5: Parametric Hypothesis Testig: Comparig Meas GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review from last week What is a cofidece iterval? 2 Review from last week What is a cofidece
More informationDS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationWHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT
WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still
More informationIf, for instance, we were required to test whether the population mean μ could be equal to a certain value μ
STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially
More informationStatistical Fundamentals and Control Charts
Statistical Fudametals ad Cotrol Charts 1. Statistical Process Cotrol Basics Chace causes of variatio uavoidable causes of variatios Assigable causes of variatio large variatios related to machies, materials,
More informationMath 140 Introductory Statistics
8.2 Testig a Proportio Math 1 Itroductory Statistics Professor B. Abrego Lecture 15 Sectios 8.2 People ofte make decisios with data by comparig the results from a sample to some predetermied stadard. These
More informationUnderstanding Samples
1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We
More informationMOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.
XI-1 (1074) MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. R. E. D. WOOLSEY AND H. S. SWANSON XI-2 (1075) STATISTICAL DECISION MAKING Advaced
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationChapter 8: Estimating with Confidence
Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig
More information11 Correlation and Regression
11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record
More informationSection 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis
Sectio 9.2 Tests About a Populatio Proportio P H A N T O M S Parameters Hypothesis Assess Coditios Name the Test Test Statistic (Calculate) Obtai P value Make a decisio State coclusio Sectio 9.2 Tests
More informationBUSINESS STATISTICS (PART-9) AVERAGE OR MEASURES OF CENTRAL TENDENCY: THE GEOMETRIC AND HARMONIC MEANS
BUSINESS STATISTICS (PART-9) AVERAGE OR MEASURES OF CENTRAL TENDENCY: THE GEOMETRIC AND HARMONIC MEANS. INTRODUCTION We have so far discussed three measures of cetral tedecy, viz. The Arithmetic Mea, Media
More informationCircle the single best answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.
More information