Lecture 1: Simple descriptive statistics I

Size: px
Start display at page:

Download "Lecture 1: Simple descriptive statistics I"

Transcription

1 Lecture : Simple descriptive statistics I L. Example: Number of egg clutches received by male sticklebacks The three-spie stickleback, Gasterosteus aculeatus, reproduces as follows. Male sticklebacks build ests. A female is the attracted to the est where she lays a clutch of eggs. As soo as the eggs are laid she leaves the est. The male eters ad fertilizes the eggs. He the chases the female away ad begis hutig for ew mates. The data below give the umber of egg clutches received by oe hudred male sticklebacks It is ot easy to quickly size up these data so summarise the iformatio i a frequecy distributio as tabulated below. Number of clutches received Frequecy It ca be see that a few males received may clutches yet a third of the males received o clutches. The umber of clutches with highest frequecy is zero. This is called the mode of the frequecy distributio. We ca illustrate these data usig a lie chart, the heights of the lies beig proportioal to the frequecies. Frequecy Number of clutches received A secod sample of forty sticklebacks gave the results below. Number of clutches received Frequecy Suppose we wat to compare the two samples. Because of the differet sample sizes it makes sese to plot lie graphs showig the relative frequecy or proportio at each clutch size. For example, for clutches of size two i the secod sample the relative frequecy is 8/40 = 0.20, whereas for the first sample it is 5/00 =

2 Relative frequecy First sample of 00 males Secod sample of 40 males Number of clutches received What is the average umber of clutches received by the sample of 00 males? Average umber = sum of clutches 00 males i sample = = 205 = 2.05 clutches Mathematically this ca be writte as follows: deote the umbers of clutches for the males i the sample by x, x 2,..., x, ad the average or sample mea by x (read as x-bar ). The, sample mea x = x + x x = x i. The mea could also be evaluated usig the frequecies preseted i the frequecy distributio. Here zero clutches were observed 35 times, oe clutch times, ad so o. x = sum of clutches (35 0) + ( ) + + (2 7) = = 205 = 2.05 clutches. 00 males i sample Mathematically, suppose there are k distict umbers, x, x 2,..., x k, which are observed with frequecies f, f 2,..., f k respectively. The, x = f x + f 2 x f k x k = f i x i. Notice that the total frequecy is just i f i =. The sample mea provides a measure of locatio about which the sampled values are spread. For this example the values have a mea of 2.05 clutches. A alterative measure of locatio is the sample media. This is the middle value if we imagie orderig the sample values from smallest to largest. Suppose we re-order the 00 values from smallest to largest. The middle pair of ordered values, the 50 th ad the 5 st, are 2 ad 2 respectively, ad so the media is 2 (2 + 2) = 2 clutches. The mea ad media are examples of summary statistics. They provide summary iformatio about the data set examied. The stickleback data suggests that while the average umber of egg clutches per male is about two, some males receive large umber of clutches, whereas a sizeable proportio of ests had o females spawig with their owers. Why are so may males uable to attract females to spaw i their ests? Why are some males very successful? Aswerig these questios leads zoologists o to further studies, throwig up more data for the statisticia to aalyse. If you wat aswers to these questios tur to the pages of Scietific America, April

3 L. Example: Book setece legth The umber of words i the first 35 seteces of Emily Brotë s Wutherig Heights are respectively, 9, 6, 23, 20, 3, 45, 4, 5, 6, 42, 25, 54, 43, 34, 5, 5, 9, 20, 60, 26, 45, 48, 22, 65, 34, 35, 27, 25, 26, 32, 42, 4, 93, 35, 27. The sample mea is x = x i = = = words. To fid the sample media rak the origial values from smallest to largest to give: 3, 4, 5, 6, 6, 9, 4, 5, 9, 20, 20, 22, 23, 25, 25, 26, 26, 27, 27, 32, 34, 34, 35, 35, 42, 42, 43, 45, 45, 48, 5, 54, 60, 65, 93. The sample media, the 8 th whe the values are raked i order, is 27. Oe problem with the sample mea x is that it ca be sesitive to outliers or extreme values i the sample. For example, removig the setece with 93 words from the sample chages the sample mea for the remaiig 34 seteces to = 29.0 words. What about the media? Remove the setece with 93 words ad order the remaiig 34 seteces from smallest to largest. The middle pair of ordered values, the 7 th ad 8 th, are 26 ad 27 respectively, ad so the media is 2 ( ) = The media is relatively robust to outliers. Although the media is more robust to outliers, the sample mea is still the most widely used measure of locatio. There are o average about thirty words per setece with some spread about this mea. How ca we measure this spread? A simple way to measure the spread of values would be to use the sample rage give by Rage = Maximum value Miimum value. For this example, the rage is 93 3 = 90 words. Ufortuately, the rage will be severely affected by outliers ad is of limited use. We still wat to kow how much the data values are spread or dispersed about the mea. We wat a measure of dispersio about the sample mea x. For ay value x i the distace or deviatio (x i x) shows how much x i differs from x. x x i x It is o good usig the average of (x i x) as a measure of dispersio. It is clearly always equal to zero. Mathematically, x i (x i x) = x i x = x ( x) = 0. We could measure dispersio usig the mea of the absolute deviatios x i x, mea absolute dispersio = x i x. Ufortuately this is aalytically difficult to use. 5

4 Cosider istead the squared deviatio of x i about the mea x, give by (x i x) 2. The average squared deviatio about the sample mea is (x i x) 2. For reasos which are discussed i a later lecture it is better to use divisor ( ) ad defie the sample variace s 2, sample variace s 2 = (x i x) 2. For data sets closely clustered about x the values (x i x) 2 will be small ad s 2 will i tur be small. For data values more widely spread about x we would expect s 2 to be large. Notice that s 2, beig made up of squared terms (x i x) 2, ca ever be egative. It is easier i practice to evaluate s 2 usig the formula { s 2 = x 2 i x2. This holds because givig s 2 = (x i x) 2 = For the setece legth data = 35, x = , = = = (x 2 i 2x i x + x 2 ) { x 2 i 2 x x i + x 2 { x 2 i 2 x( x) + x2 { x 2 i x 2. x 2 i = = 46366, { s 2 = x 2 i x2 = 34 { (30.857) 2 = { = = (words) Sice s 2 is measured i uits of (words) 2 whereas x is measured i uits of (words) defie, as a alterative measure of dispersio, the sample stadard deviatio s, give by s = + s 2. For this example, s = = words. As a compariso, i Douglas Adams s The Hitch Hiker s Guide to the Galaxy the first twety five seteces have sample mea 5.5, media 4, sample stadard deviatio. 6

5 L. Example: Size of casual groups Two researchers recetly studied the frequecy distributio of the size of casual groups of people at cocktail parties, at shoppig cetres, of childre at play, ad so o. Oe such distributio of 2423 groups, obtaied o a Sprig afteroo i Portlad, Orego, is give below. The sample mea is foud usig Size of group Frequecy x = f i x i = (486 ) + (694 2) + (95 3) = =.5. The actual value of 3663/2423 is It is sesible to roud this as.5 or.5. For this data set, with k distict values x, x 2,..., x k observed with frequecies f, f 2,..., f k, the sample variace s 2 is defied usig s 2 = f i (x i x) 2. I practice evaluate the sample variace usig the equivalet formula { s 2 = f i x 2 i x 2. Ca you prove this follows from the previous defiitio of s 2? It is ofte easier whe evaluatig x ad s 2 by had to display the calculatios i a table. x = x i f i f i x i f i x 2 i Totals = f i x i = 3663 = { s 2 = f i x 2 i x2 = { (.576) 2 = s = s 2 = = I calculatig s 2 do ot use the rouded value of x. For example, usig x.5 gives s 2 = { (.5) 2 /2422 = Usig x.5 gives s 2 = ad usig x 2 gives s 2 =.55! I practice it is better ot to roud x i calculatig s 2 ad so use x 2 = (3663/2423) 2 i the formula for s 2. 7

6 A alterative way to preset the data is to show the cumulative frequecies. The cumulative frequecy at ay value x satisfies Cumulative frequecy at x = Number of observatios with value x. Size of group x Frequecy Cumulative frequecy at x The cumulative frequecies ca be plotted o a graph. Sice the size of group is a discrete quatity, oly takig iteger values here, the cumulative frequecy plot is a step-fuctio. This is easily see. For example, for all values x satisfyig x < 2 the cumulative frequecy is 486. Cumulative frequecy Size of group The cumulative frequecy plot icreases mootoically betwee 0 ad, the total frequecy. Sometimes we plot the cumulative percetage, which icreases from 0% to 00%, or the cumulative relative frequecy, which icreases from 0 to. 8

7 L. Example: Legths of cotto yar The followig are legths per uit weight (haks of 840 yd/lb) of oe hudred test specimes from a batch of cotto yar To costruct a frequecy distributio givig the frequecies for each of the values from 34.9 to 39.6 would ot provide a good summary of these data. To better summarise these data we ca group the observatios ito classes ad record the umber of observatios i each class. Legth Frequecy The class has collected together all observatios recorded as 39.0, 39., 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, Similarly for the other classes. No observatio ca lie i more tha oe class. The smallest ad largest possible recorded values i the class are 39.0 ad 39.9 respectively. These are the class limits. They defie the class. Sice the data values are recorded rouded to the earest 0. uit, the class has collected all observatios with outcomes betwee ad These two values are the lower ad upper class boudaries. The mid-poit of the class is This is called the class mark. The distace betwee the lower ad upper class boudaries is the class width, ad equals oe uit here. Some detail has bee lost i groupig the actual data values ito classes, but we have gaied a better impressio of the way the data are distributed. Notice that these data are a example of cotiuous data. The sample values could take ay value i some iterval, eve though we may oly record them to the earest whole umber, or here earest 0. uit. 9

8 A histogram ca be used to display these data. O each class iterval erect a block whose area is proportioal to the class frequecy. The class boudaries 33.95, 34.95,...ca be approximated as 34, 35,...for clarity. Freq. per uit legth Legth To calculate the sample mea ad variace for these grouped data use the class marks as the x i values ad the class frequecies as the frequecies f i. Legth Class mark x i Frequecy f i f i x i f i x 2 i Totals = x = f i x i = = uits. 00 { s 2 = f i x 2 i x 2 = { (37.26) 2 = s = s 2 = = uits. The sample stadard deviatio is ofte quoted to oe more sigificat figure tha the sample mea. If the data had ot bee grouped ito classes but the origial 00 values used to calculate the sample mea ad variace, they would have give the followig results, x = f i x i = = uits. { s 2 = f i x 2 i x2 = { (37.229) 2 = s = s 2 = = uits. By groupig the data some fie detail has bee lost but a overall impressio of the way the data behaves has bee gaied. 0

9 L. Example: Failure of aircraft air coditioig equipmet NOT examied! The followig data, reported by Proscha, summarise the itervals i service hours betwee failures of the air coditioig i oe Boeig 720 jet aircraft. Time betwee failures (hours) Frequecy The sample mea ad media provide summaries of locatio. The sample variace ad stadard deviatio provide summaries of the dispersio. Ca we summarise other aspects of these data? Freq. per 50 hour class Time betwee failures (hours) The histogram shows that the frequecy distributio has a log tail o the right; it is skewed to the right. How ca skewess be measured? Suppose there are k distict values x, x 2,..., x k, which are observed with frequecies f, f 2,..., f k respectively, so that there are = f i observatios i total. Defie i skewess = m 3 = f i (x i x) 3. The quatity m 3 measures the symmetry of a distributio. If a data set is symmetric about the sample mea x the there will be as may positive values of (x i x) 3 as egative values, so that cacellatio occurs ad m 3 = 0. Note that though a symmetric distributio has m 3 = 0 it does ot ecessarily follow that a data set with m 3 = 0 is symmetric. I practice it is easier to evaluate m 3 usig a alterative formula m 3 = f i (x i x) 3 = f i (x 3 i 3x 2 i x + 3x i x 2 x 3 ) { { { = f i x 3 i 3 x f i x 2 i + 3 x 2 f i x i { { = f i x 3 i 3 x f i x 2 i + 3 x 3 x 3 { { = f i x 3 i 3 x f i x 2 i + 2 x 3. x 3 { f i

10 For calculatios doe by had, display the results i a table. Class Class mark x i Frequecy f i f i x i f i x 2 i f i x 3 i Totals = x = f i x i = = 65 hours. { s 2 = f i x 2 i x2 = { (65) 2 = hours Skewess m 3 = s = s 2 = = hours. f i (x i x) 3 = { { f i x 3 i 3 x = (65) f i x 2 i + 2 x 3 + 2(65) 3 = hours 3. Sice the dimesio of m 3 depeds o the uits of measuremet defie a coefficiet of skewess b which is a dimesioless costat. Sice this example gives Coefficiet of skewess b = { f i (x i x) 2 = f i (x i x) 3 /{ ( ) s 2,.5 f i (x i x) 2. f i (x i x) 2 = = 4400, 30 so the coefficiet of skewess b = /(4400).5 =.9, idicatig a positive skewess. (We refer to positive or egative skewess, ot right or left skewess.) 2

11 Lecture 2: Simple descriptive statistics II L2. Example: Midday world temperatures The data below give the midday temperatures o 2st/22d December at 8 locatios aroud the world to the earest degree Celsius A simple frequecy distributio is give below. Temp. Freq. Temp. Freq. Temp. Freq. Temp. Freq. Temp. Freq. Temp. Freq The correspodig histogram, with classes of width oe degree Celsius, is give below. Freq. per class Temperature ( C) As a summary display of the data it may be felt that this has too may class itervals to clearly show the behaviour of the data distributio. Perhaps the graphical summary might be improved by groupig some values together. Suppose temperature is grouped ito classes of width te degrees Celsius. Temperature C 24 to 5 4 to 5 4 to to to to +35 Frequecy

12 Freq. per 0 class Temperature ( C) As a summary of the data this is ot too bad, but perhaps it might be felt to have oversummarised the data. The histogram above could have bee plotted with the same vertical scale as before to facilitate compariso. Now try groupig the data ito classes of width three degrees Celsius. Temp. Freq. Temp. Freq. Temp. Freq. Temp. Freq. 24 to 22 9 to to to to to 4 +9 to to to 6 3 to 5 +2 to to to to to to to to to to The correspodig histogram is give below. Freq. per 3 class Temperature ( C) Perhaps this is the ideal graphical summary. Betwee five ad fiftee itervals ofte provides a good display of the data. The choice of umber of itervals will deped o the total umber of data values ad o their distributio. Sometimes a ope-eded class may be used, such as 30 ad above or 0 ad below. Oe way to hadle such cases i plottig histograms is to assume that the ope-eded class has the same width as its eighbourig class. 4

13 L2. Example: Depths of earthquakes i Fiji The statistical package R is a freely available statistical package ad is dowloadable from the iteret, see The third colum of data set quakes withi R gives the depths i kilometres for 000 earthquakes i the Toga trech off Fiji havig magitude greater tha 4.0. The R commad data(quakes) hist(quakes[,3]) gives the followig histogram. Freq. per 50 km class Depth (km) The histogram above suggests the data is bimodal. Oe large group of earthquakes has modal class cetred at 75 km ad the other at 575 km. Cosider ow a histogram with class width 0km ad first class startig at 0 km. Values x satisfyig 40 x < 50 are put ito the class. This ca be writte as the iterval [40, 50), a closed-iterval at the bottom ad a ope-iterval at the top. It ca be see that there is really Freq. per 0 km class Depth (km) a peak ear 40 km, the miimum depth at which these earthquakes occur. The shape of a histogram is affected by both the choice of class width ad the startig poit of the classes. The R commad used here was: hist(quakes[,3],breaks=c(0:68)*0,right=false) # Gives ope iterval o right. By default R gives closed itervals o the right (right=true). 5

14 L2. Example: Diameters of stoe circles Betwee the middle of the Neolithic ad the Middle Broze Age, 3300 BC to 500 BC, large umbers of stoe circles were costructed i the British Isles. I The Stoe Circles of the British Isles Aubrey Burl lists the diameters for 80 stoe circles i Eglad. The data are give below. Diameter (feet) Frequecy Diameter (feet) Frequecy Diameter (feet) Frequecy The class cotais all diameters x satisfyig 50 x < 200, so x [50,200). Frequecy Eglish circles: Diameter (feet) The histogram above is wrog! Recall that area is proportioal to frequecy. If the vertical scale represets frequecy per 0 feet class, the five observatios i the iterval is equivalet to oe observatio i each of the itervals 50 60, 60 70,..., The height of the histogram for the iterval should be.0. The area of this block is the height width = 5 = 5 observatios as required, where the width is five uits of 0 feet. Similarly the height for the iterval should be 0.8. The correct histogram is below. Freq. per 0 class Eglish circles: Diameter (feet) 6

15 For 286 stoe circles i Scotlad the diameters are show below. Diameter (feet) Frequecy Diameter (feet) Frequecy Diameter (feet) Frequecy Because the total frequecy for the Scottish stoe circles is ot the same as for the Eglish circles, the two data sets are best compared by plottig the relative frequecy i each iterval. For example, for the iterval i Scotlad, the relative frequecy is = If the vertical scale is made relative frequecy per oe foot class the the height of the correspodig histogram block would be The correspodig relative frequecy for the [20, 30) iterval for the Eglish circles is = 0. so that the height of the correspodig histogram block is here 0.0. The total area uder each histogram becomes uity. Rel. freq. per class Rel. freq. per class Eglish circles: Diameter (feet) Scottish circles: Diameter (feet) It ca be see that the Eglish circles have slightly fewer very small circles tha i Scotlad. This may be because there were origially fewer small circles i Eglad tha i Scotlad. Alteratively it could be that over the years more small circles have bee destroyed i Eglad tha i Scotlad. 7

16 L2. Example: Examiatio marks Two hudred studets were examied ad awarded a mark out of 00. The actual marks were 5, 36, 63, 82, 62, ad so o. The data were grouped as follows. Examiatio marks Frequecy The class has collected all examiatio marks betwee 30 ad 39 iclusive. Imagie a studet s mark as beig rouded to the earest whole umber, so that this class has boudaries 29.5 ad 39.5 with class mark Similarly for the other classes. Calculate the sample mea ad variace as before by displayig the results i a table. Class Class mark x i Frequecy f i f i x i f i x 2 i Totals = x = f i x i = = marks. { s 2 = f i x 2 i x2 = { (58.5) 2 = s = s 2 = = marks. Although settig out the calculatios i a table has helped us avoid makig some arithmetic errors it is still tedious to evaluate ! Ca we simplify calculatios still further? Oe way to simplify calculatios is to code the data, fid the mea ad variace of the coded data, ad the decode to give the mea ad variace of the origial data. Suppose the data values are deoted by x, x 2,..., x. Code the data usig the trasformatio z i = x i m c where m ad c are arbitrary costats. Typically the value of m will be some value of x close to the cetre of the data distributio ad c will be the class width. Mea of coded values z = Sample variace of coded values = s 2 z = z i. (z i z) 2 = { zi 2 z 2. If z i = (x i m)/c, the x i = m + cz i, ad the sample mea of the origial data is x = x i = (m + cz i ) = m + cz i = m + c z i = m + c z. 8

17 so The To calculate the sample variace s 2 x of the origial data ote that x i = m+cz i ad x = m+c z, s 2 x = (x i x) = (m + cz i ) (m + c z) = c(z i z). (x i x) 2 = {c(z i z) 2 = c2 (z i z) 2 = c 2 s 2 z. To calculate the sample mea ad variace for the examiatio data usig codig we ca choose m = 54.5 ad c = 0. Our table of calculatios are give below. Class Class mark x i Frequecy f i z i = (x i 54.5)/0 f i z i f i zi Totals = Mea of coded values is Similarly Thus Similarly z = f i z i = = { s 2 z = f i zi 2 z 2 = { (0.365) 2 = Sample mea x = m + c z = = marks. Sample variace s 2 x = c2 s 2 z = = Notice how we use the otatio s 2 x to deote the sample variace of the x-values ad s 2 z to deote the sample variace of the z-values. The subscripts help to emphasize the particular data we are lookig at. This is of importace whe we might be studyig several differet data sets, with labels X, Y, Z, ad so o. 9

18 L2. Example: Price-earigs ratios of British geeral retail compaies The price-earigs ratios of sixty geeral retail compaies listed o the Lodo Stock Exchage o a certai date i September 200 are give below. P/E ratio Frequecy Suppose we wat to evaluate the cumulative frequecies. There are 2 observatios i the first class ad these must lie below the class upper boudary 0.0. Thus the cumulative frequecy at 0.0 is 2. Similarly there are 32 observatios i the 0 20 class. There are 53 observatios less tha, or equal to, 20.0, ad so o. Class Upper boudary Frequecy Cumulative frequecy at upper boudary Plot the cumulative frequecies i a cumulative frequecy polygo. Costruct this by joiig the cumulative frequecies at each class upper boudary by straight lies. This is equivalet to drawig histograms with horizotal tops. I both cases we are assumig that data values i each class are uiformly spread throughout that class. Freq. per 0 uits Cumulative freq M P/E ratio P/E ratio The media M is the middle value whe the data are ordered. If there are observatios i total, the media correspods to a cumulative frequecy of 2. I this example = 60 so the media correspods to a cumulative frequecy of 30. From the above cumulative frequecy polygo it looks as though M 2. Ca we determie M more precisely? 20

19 Yes! First fid the class i which the media lies, the media class, ad the use liear iterpolatio to obtai the media. By ispectio, the media class is the 0 20 class. At x = 0 the cumulative frequecy is 2. At x = 20 the cumulative frequecy is 53. There are 32 observatios spread over the 0 20 class. By iterpolatio, the media equals M = 0 + {(30 2)/32) 0 = = 2.8. Cumulative freq M M = P/E ratio The media M divides the data ito two equal parts. Half the data values lie below the media, ad half above. Defie quartiles Q, Q 2, ad Q 3, which divide the data ito four equal parts. A quarter of the observatios lie below Q ; a quarter betwee Q ad Q 2 ; a quarter betwee Q 2 ad Q 3 ; ad a quarter lie above Q 3. Thus Q correspods to a cumulative frequecy of 4 = 5 here, Q 3 to a cumulative frequecy of 3 4 = 45 here, while Q 2 = M. Determie Q ad Q 3 by iterpolatio i like maer as for the media. Cumulative freq Q M Q P/E ratio We kow that the sample mea ca be affected by extreme observatios. The same is true of the sample stadard deviatio s. A alterative measure of dispersio might be the distace betwee Q ad Q 3. I practice use half this distace, ad defie the semi-iterquartile rage, For this example Semi-iterquartile rage = 2 (Q 3 Q ). Q = 0 + {5/2 0 = 7.4, Q 3 = 0 + {(45 2)/32 0 = 7.50, Semi-iterquartile rage = 2 (Q 3 Q ) = 2 ( ) =

20 L2. Example: July raifall at Auradhapura, Sri Laka The followig data gives the July raifall, i iches, for a forty year period at a locatio i Sri Laka. Raifall i iches Frequecy (years) It ca be see that the July raifall has bee recorded to the earest ich. The successive classes have boudaries , 0.5.5,.5 2.5, , , ad so o. The first class has class width 0.5 iches ad all other classes have class width.0 iches. I drawig the histogram, recall that the area of each block is proportioal to the class frequecy. Suppose we make the vertical axis of the histogram frequecy per class iterval of oe ich. The the frequecy 2 for the class will have height two uits. The area of this block will be = 2. For the class we have 20 observatios i a iterval of width 0.5 which is equivalet to forty values i a iterval of width.0. We make the height of the block for the class equal to forty uits. The area of this block will be = 20 as required. Freq. per class Raifall (iches) What about calculatig the sample mea ad variace? The class has mid-poit.0 so the class mark is.0. Similarly the class has class mark 2.0, ad so o. However, the class has mid-poit 0.25, so that this value is the class mark. 22

21 Raifall Class mark x i Frequecy f i f i x i f i x 2 i Totals = x = f i x i = 29.0 = iches. 40 { s 2 = f i x 2 i x 2 = 39 { (3.225)2 = s = s 2 = = iches. What about calculatig the media ad quartiles? There is o real problem here ad these are foud as before. The oly thig to otice is that the class boudaries are successively 0.0, 0.5,.5, 2.5, 3.5, 4.5, ad so o. The media class is the class where a cumulative frequecy of 20 correspods with the upper boudary. We have here = 40 observatios i total so the media, correspodig to a cumulative frequecy of 2 is give by M = 0.5 iches. For this example perhaps this is more represetative of the quatity of rai to be expected i ay July! The lower quartile Q lies i the class also. There are twety observatios lyig i the class so a cumulative frequecy of 4 = 0 will ituitively correspod with the class mid-poit. Usig iterpolatio gives, as expected, Q = (0/20) 0.5 = There are thirty observatios less tha or equal to the upper boudary of the class so the upper quartile is Q 3 = 4.5. Semi-iterquartile rage = 2 (Q 3 Q ) = 2 ( ) = iches. 23

22 L2. Example: Gestatioal ages of 53 ifats NOT examied! Huma gestatioal age is measured from the first day of a woma s last mestrual period util birth. The data below give the gestatioal ages i weeks for 53 births at St. George s Hospital, Lodo, over a eightee moth period. Age (weeks) Births Age (weeks) Births We have see several measures of locatio ad dispersio ad a measure of skewess. We ca geerate further summary statistics for the data. Suppose that we have k distict values x, x 2,..., x k, which are observed with frequecies f, f 2,..., f k, respectively, so that there are = f i observatios i total. Defie, for r =,2,3,..., i r th sample momet about the mea m r = f i (x i x) r. Notice that m 0, m 2 = ( ) s2 so that s 2 m 2, ad m 3 = skewess. Now defie, for r =,2,3,..., r th sample momet about the origi m r = f i x r i. Notice that m = x, the sample mea. We use the m r to evaluate the m r values more easily. For example, m 3 = m 2 = f i (x i x) 3 = f i (x i x) 2 = { { f i x 2 i x 2 = m 2 (m ) 2. { f i x 2 i 3 x f i x 2 i + 2 x 3 = m 3 3m m 2 + 2(m )3. We have see that m 2 s 2 ad so m 2 is also a measure of dispersio. We have also see that m 3 measures skewess, but, because it depeded upo the uits of measuremet, we defied a coefficiet of skewess b give by, coefficiet of skewess b = { f i (x i x) 3 /{.5 f i (x i x) 2. We ca see that b = m 3 /m.5 2. The momet m 4 is sometimes called the kurtosis. Agai, because m 4 depeds upo the uits of measuremet, defie a coefficiet of kurtosis b 2 give by, coefficiet of kurtosis b 2 = m 4 m 2. 2 Note that some textbooks defie skewess by b = m 3 /m.5 2 ad kurtosis by b 2 = (m 4 /m 2 2 3). 24

23 For these data we ca derive various summary statistics. Sample mea x = f i x i = = weeks. Treatig the data as grouped about each give mid-poit, we have, Media = ( ) ( ) = weeks. Sample variace s 2 = f i (x i x) 2 = weeks 2. Sample stadard deviatio s = s 2 = = weeks. Other summary statistics ca be obtaied. m 2 = f i (x i x) 2 = weeks 2. Skewess = m 3 = Kurtosis = m 4 = f i (x i x) 3 = 2.8 weeks 3. f i (x i x) 4 = weeks 4. Coefficiet of skewess b = m 3 m2.5 The gestatioal age exhibits egative skewess. Coefficiet of kurtosis b 2 = m 4 m 2 2 = = You met the ormal distributio i MATH75. This distributio has skewess zero ad kurtosis equal to three. Ideed, oe way to test whether a frequecy distributio comes from a ormal distributio is to see whether b 0 ad b 2 3. There is strog evidece that these data ca ot be modelled usig a ormal distributio. Oe problem with this data set is that it is ot clear whether gestatioal age of x weeks meas age betwee x 2 ad x+ 2 ad rouded to be x or age of x weeks measured i completed weeks so beig betwee x ad x + with mid-poit x + 2. We have assumed the former ad deoted the values by x i. Suppose i fact the latter defiitio had bee used. Deote these mid-poits by y i where y i = x i + 2. From what we kow o codig, ȳ = x + 2, so y i ȳ = x i x ad clearly the momets m r for the x ad y values are the same. The variace, skewess ad kurtosis of the x ad y values are the same. These summary statistics are said to be ivariat to a shift of locatio. Similarly, the coefficiets of skewess b ad kurtosis b 2 are ivariat to a chage of scale. For, suppose we re-scale the x values usig z = x/c. The m r (x values) = c r m r (z values), so that cacellatio of the c values occurs i calculatig b ad b 2. 25

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls Ecoomics 250 Assigmet 1 Suggested Aswers 1. We have the followig data set o the legths (i miutes) of a sample of log-distace phoe calls 1 20 10 20 13 23 3 7 18 7 4 5 15 7 29 10 18 10 10 23 4 12 8 6 (1)

More information

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all! ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Solutios Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced

More information

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}. 1 (*) If a lot of the data is far from the mea, the may of the (x j x) 2 terms will be quite large, so the mea of these terms will be large ad the SD of the data will be large. (*) I particular, outliers

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the

More information

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day LECTURE # 8 Mea Deviatio, Stadard Deviatio ad Variace & Coefficiet of variatio Mea Deviatio Stadard Deviatio ad Variace Coefficiet of variatio First, we will discuss it for the case of raw data, ad the

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Chapter 2 Descriptive Statistics

Chapter 2 Descriptive Statistics Chapter 2 Descriptive Statistics Statistics Most commoly, statistics refers to umerical data. Statistics may also refer to the process of collectig, orgaizig, presetig, aalyzig ad iterpretig umerical data

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Elementary Statistics

Elementary Statistics Elemetary Statistics M. Ghamsary, Ph.D. Sprig 004 Chap 0 Descriptive Statistics Raw Data: Whe data are collected i origial form, they are called raw data. The followig are the scores o the first test of

More information

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers Chapter 4 4-1 orth Seattle Commuity College BUS10 Busiess Statistics Chapter 4 Descriptive Statistics Summary Defiitios Cetral tedecy: The extet to which the data values group aroud a cetral value. Variatio:

More information

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Data Description. Measure of Central Tendency. Data Description. Chapter x i Data Descriptio Describe Distributio with Numbers Example: Birth weights (i lb) of 5 babies bor from two groups of wome uder differet care programs. Group : 7, 6, 8, 7, 7 Group : 3, 4, 8, 9, Chapter 3

More information

Median and IQR The median is the value which divides the ordered data values in half.

Median and IQR The median is the value which divides the ordered data values in half. STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media

More information

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2 Aa Jaicka Mathematical Statistics 18/19 Lecture 1, Parts 1 & 1. Descriptive Statistics By the term descriptive statistics we will mea the tools used for quatitative descriptio of the properties of a sample

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2.

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2. Lesso 3- Lesso 3- Scale Chages of Data Vocabulary scale chage of a data set scale factor scale image BIG IDEA Multiplyig every umber i a data set by k multiplies all measures of ceter ad the stadard deviatio

More information

Measures of Spread: Variance and Standard Deviation

Measures of Spread: Variance and Standard Deviation Lesso 1-6 Measures of Spread: Variace ad Stadard Deviatio BIG IDEA Variace ad stadard deviatio deped o the mea of a set of umbers. Calculatig these measures of spread depeds o whether the set is a sample

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

MEASURES OF DISPERSION (VARIABILITY)

MEASURES OF DISPERSION (VARIABILITY) POLI 300 Hadout #7 N. R. Miller MEASURES OF DISPERSION (VARIABILITY) While measures of cetral tedecy idicate what value of a variable is (i oe sese or other, e.g., mode, media, mea), average or cetral

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

Lecture 24 Floods and flood frequency

Lecture 24 Floods and flood frequency Lecture 4 Floods ad flood frequecy Oe of the thigs we wat to kow most about rivers is what s the probability that a flood of size will happe this year? I 100 years? There are two ways to do this empirically,

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Census. Mean. µ = x 1 + x x n n

Census. Mean. µ = x 1 + x x n n MATH 183 Basic Statistics Dr. Neal, WKU Let! be a populatio uder cosideratio ad let X be a specific measuremet that we are aalyzig. For example,! = All U.S. households ad X = Number of childre (uder age

More information

CURRICULUM INSPIRATIONS: INNOVATIVE CURRICULUM ONLINE EXPERIENCES: TANTON TIDBITS:

CURRICULUM INSPIRATIONS:  INNOVATIVE CURRICULUM ONLINE EXPERIENCES:  TANTON TIDBITS: CURRICULUM INSPIRATIONS: wwwmaaorg/ci MATH FOR AMERICA_DC: wwwmathforamericaorg/dc INNOVATIVE CURRICULUM ONLINE EXPERIENCES: wwwgdaymathcom TANTON TIDBITS: wwwjamestatocom TANTON S TAKE ON MEAN ad VARIATION

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

1 Lesson 6: Measure of Variation

1 Lesson 6: Measure of Variation 1 Lesso 6: Measure of Variatio 1.1 The rage As we have see, there are several viable coteders for the best measure of the cetral tedecy of data. The mea, the mode ad the media each have certai advatages

More information

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials Math 60 www.timetodare.com 3. Properties of Divisio 3.3 Zeros of Polyomials 3.4 Complex ad Ratioal Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered

More information

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements. CHAPTER 2 umerical Measures Graphical method may ot always be sufficiet for describig data. You ca use the data to calculate a set of umbers that will covey a good metal picture of the frequecy distributio.

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for

More information

NCSS Statistical Software. Tolerance Intervals

NCSS Statistical Software. Tolerance Intervals Chapter 585 Itroductio This procedure calculates oe-, ad two-, sided tolerace itervals based o either a distributio-free (oparametric) method or a method based o a ormality assumptio (parametric). A two-sided

More information

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying. Lecture Mai Topics: Defiitios: Statistics, Populatio, Sample, Radom Sample, Statistical Iferece Type of Data Scales of Measuremet Describig Data with Numbers Describig Data Graphically. Defiitios. Example

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution Departmet of Civil Egieerig-I.I.T. Delhi CEL 899: Evirometal Risk Assessmet HW5 Solutio Note: Assume missig data (if ay) ad metio the same. Q. Suppose X has a ormal distributio defied as N (mea=5, variace=

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "

(# x) 2 n. ( x) 2 = 30 2 = 900. = sum.  x 2 = =174.  x. Chapter 12. Quick math overview. #(x  x ) 2 = # x 2 Chapter 12 Describig Distributios with Numbers Chapter 12 1 Quick math overview = sum These expressios are algebraically equivalet #(x " x ) 2 = # x 2 " (# x) 2 Examples x :{ 2,3,5,6,6,8 } " x = 2 + 3+

More information

NATIONAL SENIOR CERTIFICATE GRADE 12

NATIONAL SENIOR CERTIFICATE GRADE 12 NATIONAL SENIOR CERTIFICATE GRADE 1 MATHEMATICS P FEBRUARY/MARCH 014 MARKS: 150 TIME: 3 hours This questio paper cosists of 1 pages, 3 diagram sheets ad 1 iformatio sheet. Please tur over Mathematics/P

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Economics Spring 2015

Economics Spring 2015 1 Ecoomics 400 -- Sprig 015 /17/015 pp. 30-38; Ch. 7.1.4-7. New Stata Assigmet ad ew MyStatlab assigmet, both due Feb 4th Midterm Exam Thursday Feb 6th, Chapters 1-7 of Groeber text ad all relevat lectures

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion Poit Estimatio Poit estimatio is the rather simplistic (ad obvious) process of usig the kow value of a sample statistic as a approximatio to the ukow value of a populatio parameter. So we could for example

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

STP 226 EXAMPLE EXAM #1

STP 226 EXAMPLE EXAM #1 STP 226 EXAMPLE EXAM #1 Istructor: Hoor Statemet: I have either give or received iformatio regardig this exam, ad I will ot do so util all exams have bee graded ad retured. PRINTED NAME: Siged Date: DIRECTIONS:

More information

MATH/STAT 352: Lecture 15

MATH/STAT 352: Lecture 15 MATH/STAT 352: Lecture 15 Sectios 5.2 ad 5.3. Large sample CI for a proportio ad small sample CI for a mea. 1 5.2: Cofidece Iterval for a Proportio Estimatig proportio of successes i a biomial experimet

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Topic 10: Introduction to Estimation

Topic 10: Introduction to Estimation Topic 0: Itroductio to Estimatio Jue, 0 Itroductio I the simplest possible terms, the goal of estimatio theory is to aswer the questio: What is that umber? What is the legth, the reactio rate, the fractio

More information

Estimating the Population Mean - when a sample average is calculated we can create an interval centered on this average

Estimating the Population Mean - when a sample average is calculated we can create an interval centered on this average 6. Cofidece Iterval for the Populatio Mea p58 Estimatig the Populatio Mea - whe a sample average is calculated we ca create a iterval cetered o this average x-bar - at a predetermied level of cofidece

More information

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y. Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Final Review for MATH 3510

Final Review for MATH 3510 Fial Review for MATH 50 Calculatio 5 Give a fairly simple probability mass fuctio or probability desity fuctio of a radom variable, you should be able to compute the expected value ad variace of the variable

More information

GG313 GEOLOGICAL DATA ANALYSIS

GG313 GEOLOGICAL DATA ANALYSIS GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

September 2012 C1 Note. C1 Notes (Edexcel) Copyright   - For AS, A2 notes and IGCSE / GCSE worksheets 1 September 0 s (Edecel) Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright

More information

(6) Fundamental Sampling Distribution and Data Discription

(6) Fundamental Sampling Distribution and Data Discription 34 Stat Lecture Notes (6) Fudametal Samplig Distributio ad Data Discriptio ( Book*: Chapter 8,pg5) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye 8.1 Radom Samplig: Populatio:

More information

The Random Walk For Dummies

The Random Walk For Dummies The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli

More information

Analysis of Experimental Measurements

Analysis of Experimental Measurements Aalysis of Experimetal Measuremets Thik carefully about the process of makig a measuremet. A measuremet is a compariso betwee some ukow physical quatity ad a stadard of that physical quatity. As a example,

More information

Stat 139 Homework 7 Solutions, Fall 2015

Stat 139 Homework 7 Solutions, Fall 2015 Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,

More information

AP Statistics Review Ch. 8

AP Statistics Review Ch. 8 AP Statistics Review Ch. 8 Name 1. Each figure below displays the samplig distributio of a statistic used to estimate a parameter. The true value of the populatio parameter is marked o each samplig distributio.

More information

Polynomial Functions and Their Graphs

Polynomial Functions and Their Graphs Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Zeros of Polynomials

Zeros of Polynomials Math 160 www.timetodare.com 4.5 4.6 Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered with fidig the solutios of polyomial equatios of ay degree

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Power and Type II Error

Power and Type II Error Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Binomial Distribution

Binomial Distribution 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

ANALYSIS OF EXPERIMENTAL ERRORS

ANALYSIS OF EXPERIMENTAL ERRORS ANALYSIS OF EXPERIMENTAL ERRORS All physical measuremets ecoutered i the verificatio of physics theories ad cocepts are subject to ucertaities that deped o the measurig istrumets used ad the coditios uder

More information

( ) = p and P( i = b) = q.

( ) = p and P( i = b) = q. MATH 540 Radom Walks Part 1 A radom walk X is special stochastic process that measures the height (or value) of a particle that radomly moves upward or dowward certai fixed amouts o each uit icremet of

More information

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Introducing Sample Proportions

Introducing Sample Proportions Itroducig Sample Proportios Probability ad statistics Aswers & Notes TI-Nspire Ivestigatio Studet 60 mi 7 8 9 0 Itroductio A 00 survey of attitudes to climate chage, coducted i Australia by the CSIRO,

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008 Chapter 6 Part 5 Cofidece Itervals t distributio chi square distributio October 23, 2008 The will be o help sessio o Moday, October 27. Goal: To clearly uderstad the lik betwee probability ad cofidece

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Analytic Continuation

Analytic Continuation Aalytic Cotiuatio The stadard example of this is give by Example Let h (z) = 1 + z + z 2 + z 3 +... kow to coverge oly for z < 1. I fact h (z) = 1/ (1 z) for such z. Yet H (z) = 1/ (1 z) is defied for

More information

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1 October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 1 Populatio parameters ad Sample Statistics October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 2 Ifereces

More information

MA131 - Analysis 1. Workbook 2 Sequences I

MA131 - Analysis 1. Workbook 2 Sequences I MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n, CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

2: Describing Data with Numerical Measures

2: Describing Data with Numerical Measures : Describig Data with Numerical Measures. a The dotplot show below plots the five measuremets alog the horizotal axis. Sice there are two s, the correspodig dots are placed oe above the other. The approximate

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information