Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Size: px
Start display at page:

Download "Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2"

Transcription

1 Aa Jaicka Mathematical Statistics 18/19 Lecture 1, Parts 1 & 1. Descriptive Statistics By the term descriptive statistics we will mea the tools used for quatitative descriptio of the properties of a sample (a give set of iformatio or data, comig from a larger populatio). These tools are purely arithmetical (do ot use methods based o the theory of probability), ad they are aimed at summarizig or visualizig the properties of the data. The preferred tools ad measures deped o the characteristics of the variables that are to be described, ad will vary from case to case. Variables studied with the use of statistical tools may be divided ito two mai groups: measurable ad categorical. The latter group cosists of variables which take o values from a limited set of values, represetig categories (such as eye color, level of educatio, sex etc.). The first group cosists of variables which take o meaigful, umerical values (that ca be measured such as height, weight etc.). 1 Measurable variables ca be further decomposed ito cotiuous (whe the value may be a umber from a rage of real umbers if measured with ifiite precisio for example velocity) ad cout variables (whe the possible values are discrete). Typical cases of cout variables are the umber of childre or studets erolled i a class. Some variables which at first sight resemble cotiuous radom variables are also categorical for example, wages ca be measured oly up to 1 currecy uit, ad ot with ifiite precisio. Such variables, called quasi-cotiuous variables, are treated as cotiuous i all practical applicatios Visualizig the data. We will start our presetatio of descriptive statistical tools with cout variables, to which oe ca apply basically all tools. Assume that we look at the grades from a Mathematical Statistics course, ad that these were equal to:, 4,,,.5,, 4.5,, 5,. Oe would eed to sped some time i order to be able to say somethig geeral about the grades for this course, ad we oly have 1 studets. It would help a little bit if we arraged the umbers i order:,,,,,,.5, 4, 4.5, 5. But eve ow, i case of larger data sets it is evidet: just eumeratig the values will ot be eough to determie at first glace whether, say, it is hard or easy to pass this course. What we ca do to better comprehed the properties of the variable uder study is to group it. I the case of cout variables (ad also i the case of categorical variables), the easiest way of groupig is groupig by precise value of the variable. I the case of studet grades, we have 6 ituitive groups, correspodig to the followig possible outcomes:,,.5, 4, 4.5 ad 5. Therefore, we could summarize the course outcome with the use of the followig table: grade Number of studets Frequecy % %.5 1 1% 4 1 1% % 5 1 1% 1 Numbers are also possible descriptios of the classes of categorical variables; for example, oe could describe the possible outcomes of a coi toss heads or tails as outcome 1 ad outcome, respectively. I this case, however, the values assiged to the two categories are ot meaigful ad could be chaged without loss of our uderstadig of the pheomeo. 1

2 The properties of the data uder study become more apparet ow: we see that approximately % studets fail, ad that aother % of studets obtai the lowest possible passig grade. We could also visualize the proportios of the particular outcomes graphically. The most commoly used graphs i such cases are the bar chart (with couts such as the graph with red bars below or frequecies such as the graph with blue bars below) ad pie chart Note that we could have also grouped the data differetly, for example as outcome Number of studets Frequecy fail % pass 7 7% if, say, we were oly iterested i the failure rate for the course. Note also that this latter represetatio is also a groupig for the followig series of categorical data: fail, fail, fail, pass, pass, pass, pass, pass, pass, pass. I the case of categorical data, we ca also visualize it graphically by meas of bar charts (for umbers or frequecies) ad pie charts: fail pass fail pass fail pass Let us ow look at a differet example a cotiuous variable. Let us assume that we aalyze the surface area of 1 apartmets available for sale i a give viciity (i square meters):.45,.1, 4.6, 5.78, 7.79, 8.54, 8.91, 8.96, 9.5, 9.67, 9.8, 41.45, 41.55, 4.7, 4.4, 4.45, 44.5, 44.5, 44.7, 44.8, 44.9, 45.1, 45.9, 46.5, 47.65, 48.1, 48.55, 48.9, 49, 49.4, 49.55, 49.65, 49.7, 49.9, 5.9, 51.4, 51.5, 51.65, 51.7, 51.8, 51.98, 5, 5.1, 5., 5.65, 5.89, 5.9, 54, 54.1, 55., 55., 55.56, 55.6, 56, 56.7, 56.8, 56.9, 56.95, 57.1, 57.45, 57.7, 57.9, 58, 58.5, 58.67, 58.8, 59., 6.4, 6.7, 64., 64., 64.6, 65, 66.9, 66.78, 67.8, 68.9, 69, 69.5, 7., 76.8, 77.1, 77.8, 78.9, 79.5, 8.7, 8.4, 84.5, 84.9, 85, 86, 89.1, 89.6, 9, 96.7, 98.78, 1, 17.9, 11.7, This time, a sesible visual aalysis of the raw series is impossible to coduct. Costructig a simple frequecy table for sigle values would ot lead to better uderstadig of the pheomeo, as there are o repeated values i the series. I this case, i order to better see the properties, we eed to group the series ito class itervals. The choice of itervals for groupig is ofte ot a easy oe. First of all, it is optimal if the itervals are meaigful: if the govermet subsidizes the acquisitio of apartmets ot exceedig 75 square meters, it would be preferable that the value of 75 be a boud to a iterval, etc. Secod, it is better if the iterval rages are of similar legth (say, 1m each). Third, i some cases it is better

3 (for computatioal reasos) if the frequecies i the particular classes are balaced (i.e. we should avoid groupigs such that oe class has 95 elemets ad the other has 5 elemets, etc.). We should also avoid groupigs which are too detailed or ot detailed eough. Ad, last but ot least, both for computatioal ease ad visual clarity, it is preferable that the classes have eat values. I the case of our apartmet size example, it seems reasoable to group the series ito itervals of width equal to 1, startig from. I this case, all itervals have the same legth ad all have eat ceters (or class marks). Iterval Class mark Number Frequecy Cumulative Cumulative of apartmets umber frequecy c i i f i c i cf i (,4] 5 11,11 11,11 (4,5] 45, 4,4 (5,6] 55, 67,67 (6,7] 65 1,1 79,79 (7,8] 75 6,6 85,85 (8,9] 85 8,8 9,9 (9,1] 95, 96,96 (1,11] 15, 98,98 (11,1] 115, 1 1 Total 1 1 Based o the groupig, we ca ow visualize the data with the use of a histogram (the differece betwee a histogram ad the bar chart lies i the horizotal axis the bars are adjacet to each other, ulike i the previous case, where the categories were separate). Oe ca costruct both a histogram of umbers (couts), ad frequecies Number 5 15 Frequecy Note that i case of cotiuous variables it is preferable use histograms istead of simple bar charts, ad pie charts are usually ot a good choice (uless we categorize the variable). Oe ca also visualize the distributio by meas of a cumulative frequecy histogram or the empirical CDF. 1.. Characteristics of the data. I case of categorical variables, there is ot much more that we ca do i terms of descriptive statistics to visualize the data. I case of measurable variables, we have a whole array of arithmetical tools that ca be used to describe the properties of the data set. There are two basic distictios for the characteristics describig the studied variable. The first differetiatio is based o the feature of the distributio that we wat to describe whether it is the overall magitude (how large, o average, are the values of the variable), the variability, the asymmetry etc. The secod distictio is based o the values that we will be usig for descriptio whether we will be usig differet momets of the distributio (i which case we will be talkig about classical measures) or measures of positio (such as the miimum, maximum, media etc.).

4 1..1. Measures of cetral tedecy. Measures of cetral tedecy are those which tell us where o average is the middle of the distributio located. The basic measures are: the arithmetic mea (the average, a classical measure) or the media ad the mode (positioal measures). Other measures of positio (ot ecessarily talkig about the middle of the distributio) iclude other quatiles (such as quartiles, deciles, percetiles etc.). If X 1, X,..., X are the sample values of the variable uder study (for example, the raw data for the surface areas of apartmets), the the arithmetic mea ca be calculated as X = 1 X i. The average value of the surface area of apartmets, for the data preseted above, would be equal to X = 1 ( ) = If we are dealig with grouped data (for example, like i the case of studet grades above), we ca simplify the calculatios from the above formula by avoidig summig idetical values ad usig multiplicatio istead: X = 1 k X i i, where k is the umber of groups ito which we have divided our data, X i are the values i the groups, ad i are the couts of these groups. I our example of studet grades, we could calculate the average as X = 1 ( ) =.. 1 I both of these examples, the arithmetic mea was calculated precisely. I some cases, however, we may be faced with a situatio where we do ot have exact data at our disposal, but oly some approximatios as is the case if we are provided with data aggregated ito class itervals. I such situatios, we will ot be able to calculate the true value of the mea, but we will be able to calculate a approximatio of the average. This is achieved by treatig all observatios from a give class as beig equal to the middle of the class iterval (the socalled class mark, see the headers i the table above), ad applyig the formula for grouped data, i.e. X = 1 k c i i. I the apartmet surface area example, the approximatio of the mea, calculated based o class iterval data, would be equal to X = 1 ( ) = This result is differet tha the exact amout, which was equal to Obviously, the arrower the class itervals used for groupig the data, the less iformatio is lost ad the more accurate is the approximatio. If raw data are available, they should always be used for calculatig characteristics i order to avoid precisio losses. The mea is a good measure to approximate the ceter of the distributio goverig the data, provided that the distributio has a expected value (ad, as we kow from probability calculus, ot all distributios do). Also, if there are outliers (very high or very low values, or erroeous observatios) i the data, the average will be affected by these observatios. A measure of cetral tedecy which does ot have these flaws is the media, or the middle observatio: (ay) umber such that at least half of the observatios are less tha or equal to it ad at least half of the observatios are greater tha or equal to it. 4

5 I order to calculate the media (as well as ay other measure based o rak), we will eed to rearrage our observatios i ascedig order. We will adopt the followig otatio: X i: will be the i-th smallest value of the elemet sample (the i-th order statistic). I this otatio, X 1: is the smallest value (miimum) i the sample, ad X : is the largest value i the sample (maximum). As for the media, the calculatios will deped o whether the sample size is odd or eve. If it is odd, the there exists a sigle middle observatio; if it is eve, there exist two observatios i the middle, ad we will take a average of these two as the media value. Therefore, Med = X +1 1 ( X : + X +1: ) if is eve. : if is odd, Goig back to our examples, we ca see that: I the case of surface apartmet area raw data, we have = 1 which is eve, so the media will be the average of the 5th ad 51st observatios: Med = 1 ( ) = I the case of grades (grouped data), we eed to fid the class with the fifth ad sixth observatios; i our case, it is goig to be the class of the grade; therefore, the Med = 1 ( + ) =. I the case of grouped class iterval data, the situatio becomes more complicated. We will ot be able to provide a specific value, but oly the rage ito which the media should fall ito (this is the iterval where the cumulative frequecy reaches.5 for the first time). If we are iterested i a sigle value, rather tha a iterval, we will provide a approximatio. I order to derive the formula, please ote the followig. If we kow how may observatios there are i the sample i the classes before the class of the media, we will kow how much additioal observatios from the media class we should take i order to reach the middle observatio. The, kowig how may observatios are actually i the media class, ad assumig observatios are uiformly spaced i the class they fall ito, we should have that the media value is proportioally as far i the class iterval as the ratio of the umber of observatios we eed to reach it to the umber of observatios we have i the class. Therefore, we will use the followig approximatio: Med = c L + b M 1 i, M where M is the umber of the class iterval of the media, c L is the lower boud of the class iterval of the media, b is the legth of the class iterval of the media ad i is the umber of observatios i the i-th class iterval. I our surface area example, we would have the followig: the class i which the.5 threshold is reached, is the rd class, ie. the (5,6] class. This is goig to be the class iterval of the media (i.e., M = ). The lower boud of this class is 5, so c L = 5. I the classes before the third class, we have 11 + = 4 observatios, so we eed the 1 4 = 16-th observatio from the rd class (this is goig to be our media). Sice the legth of the class is b = 1, ad there are M = observatios overall i this class, the media ca be approximated as: Med 5 4 = Please ote that this umber, agai, differs from the true value of 55.5, which meas that the approximate formula should be used oly i cases where raw data is ot available. I some cases, we may be iterested i describig the middle of the distributio with the most frequet observatio i the sample. This is ot always possible there are distributios which do ot have the property of a sigle most frequet value (for example, if the histogram has several peaks ). Therefore, the most frequet value called the mode is usually 5

6 oly calculated if the data come from a distributio with a stadard shape, i.e. whe the histogram has a sigle local maximum. The mode is the equal to this sigle maximum (the most frequet observatio i the sample). I the studet grades example, there are two equally frequet groups. We would ot defie a mode i this case. Please ote that for cotiuous variables, it is ot possible to calculate the sample mode uless we group the data because for cotiuous distributios, we will ot see two observatios which are equal to each other, ad thus each observatio has the same frequecy. But i such cases, it is possible to approximate the mode if we group the data first. If we are iterested i calculatig the mode, we should make sure that the itervals (at least i the middle of the distributio) have equal legths (otherwise, the results of the calculatios would be biased please ote that wider itervals will aturally have more observatios). Oce we have itervals of equal legth, we ca calculate the mode usig the followig formula: Mo = c L + b Mo Mo 1 ( Mo Mo 1 ) + ( Mo Mo+1 ), where, similarly to the formula for the media, we take c L, the lower boud of the class of the mode, ad add to it the appropriate fractio of the legth of the class of the mode (b). I this case, the appropriate fractio is calculated as the ratio of the differece betwee the couts of the class of the mode ( Mo ) ad the class adjacet to the left (with a cout of Mo 1 ), to the sum of the differeces betwee the cout of the class of the mode ad the classes adjacet to the left ad to the right (which has a cout equal to Mo+1 ). This meas that if the classes adjacet to the mode are equally less frequet tha the class of the mode, we should take the midpoit of the iterval as the approximatio of the mode. If the distributio is shifted to the left, i.e. smaller observatios are more frequet, the the mode should ot be i the middle of the iterval but more to the left; if the distributio is shifted to the right, i.e. larger observatios are more frequet, the the mode should be shifted to the right. I the case of our surface area example, all class itervals have equal legth (b = 1), so we ca calculate the mode. The class with the largest amout of observatios is the third class, so we have c L = 5, Mo = =, Mo 1 = =, Mo+1 = 4 = 1, ad Mo = ( ) + ( 1) Other measures of locatio. Additioal characteristics, which may be calculated i order to show where the values of the distributio (ot ecessarily the ceter of the distributio) are located, iclude quatiles other tha the media. For example, if we calculate the first ad the third quartiles (i.e., values such that they divide the sample ito subsamples coutig at least 1 ad observatios), we will kow i what rage the middle 5% 4 4 observatios are to be foud. I order to calculate the quartiles, we will use the same method as for calculatig the media; the oly differece is that we will be lookig for observatios which are raked ot (approximately) 1 out of, but (approximately) ad out of. 4 4 I particular, i cases where raw data are available, the quartiles will be calculated usig the geeral formula for quatiles of rak p, applied to p = 1 ad p =. This geeral formula 4 4 states that X p +1: if p Z Q p = 1 (X p: + X p+1: ) if p Z. For our apartmet surface area example, ad are iteger values, so we would take the 4 4 average of observatios umbered 5 ad 6 as Q 1, ad the average of observatios raked 75 ad 76 as Q, i.e. Q 1 = , ad Q =

7 For grouped class iterval data, we will use the same mechaism of determiig the approximate value from the appropriate iterval that we used for the media, albeit with adjusted couts, i.e.: Q 1 = 4 cl + b M 1 i, M ad Q = 4 cl + b M 1 i, M where the values of M, c L, b ad i are defied aalogously as i the case of the media (but for the first ad third quartiles, respectively). For example, if we wated to calculate the first ad third quartiles of the distributio of apartmet surface areas, we would search for the observatios for which the cumulated frequecy reaches.5 ad.75, respectively; we would therefore have that the first quartile is located i the iterval (4, 5], while the third quartile is located i the iterval (6, 7], ad we would have: ad Q 1 = , Q = The two values calculated o the base of grouped data are, agai, oly approximatios of the true values (calculated above) Measures of variability. Oce we kow where the values of the variable uder study are located (more or less), we may wish to determie whether they are cocetrated aroud the ceter of the distributio, or dispersed. I order to do so, we will use measures of variability. These, too, ca be calculated based o momets of the empirical distributio, or o order statistics. We will start with the latter group. The most simple measure of the variability of a radom variable is the rage, i.e. the differece betwee the smallest ad the largest values observed i the data. I case of grouped class iterval data, we take the differece betwee the lower boud of the lowest iterval, ad the upper boud of the highest iterval. This measure, although simple, has may drawbacks; the most importat oe is that it is very susceptible to outliers (atypical observatios). Therefore, i may cases, istead of this rage, we look at the spread betwee the first ad the third quartiles: IQR = Q Q 1 i.e. the iterquartile rage (also called midspread or middle fifty). This measure is much more robust, as it covers the middle 5% observatios oly. Based o this measure which depeds o the scale of the variable uder study we ca calculate coefficiets of variatio: V Q = Q Med, V Q 1 Q = IQR Q + Q 1 (where Q = IQR/ is the quartile deviatio). These coefficiets allow us to compare dispersio of differet variables. Examples: I our studet grade example, calculatig the rage does ot tell us much it is equal to 5 = ad actually does ot deped o the distributio of the grades (i.e., o whether the subject is easy or hard to pass). I the surface area example, the rage, calculated for raw data is equal to = 86.45, while for grouped class iterval data it is equal to 1 = 9, ad is obviously always biased upwards (the wider the itervals, the more so). 7

8 O the other had, if we calculate the iterquartile rage, it is equal to =.58. This value is tellig, i that it shows that the middle 5% observatios are quite cocetrated, i.e. half of the the surface areas of apartmets are relatively close to the media value. Turig to the classical measures of dispersio, we will start with the variace, which, for raw data, is calculated as Ŝ = 1 i (X X) = 1 Xi ( X), for grouped data as Ŝ = 1 k i (X i X) = 1 k i Xi ( X), ad for grouped class iterval data is approximated as: Ŝ 1 k = i ( c i X) = 1 k i c i ( X). The last formula gives ubiased results if the variable is distributed uiformly i.e., we expect that the observatios classified i itervals are distributed symmetrically aroud the ceters of these itervals. If this assumptio is ot true as it is, for example, if the data come from a ormal distributio, where values further from the ceter of the distributio are less commo, ad we therefore expect values i itervals to be located more to the side the approximate formula for the variace is goig to systematically overestimate the variability i the data. I case of the ormal distributios, the followig formula for a correctio (the so called Sheppard s correctio) has bee proposed: S = Ŝ 1 k i (c i c i 1 ), 1 where c i s deote the bouds of the class itervals. If all the itervals are of equal legth, the value of the correctio reduces to c, where c is the legth of the iterval. 1 Now, if we calculated the variace for the surface area example based o raw data ad usig the value of the mea also calculated for raw data (i.e ), we would fid that the variace is equal to.85. If, o the other had, we were to use the approximate formula for grouped class iterval data, ad assumig that the mea was also calculated based o grouped data (ad equal to 58.7, approximately), we would have that the approximatio of the variace is equal to Ŝ 1 = 1 ( (5 58.7) 11 + ( ) + ( ) + ( ) 1 +( ) 6 + ( ) 8 ( ) + ( ) + ( ) ) 1.1 Please ote that this approximatio is already smaller tha the true value of the variace, ad therefore subtractig the Sheppard s correctio (equal to 1 ) would just itroduce additioal error. This is because although the distributio of surface areas is ot uiform, it is t 1 ormal, either; also, the sample size may be too small (the errors resultig from small sample size may be larger tha the errors arisig from class groupig) to use the correctio. Please also ote that the variace is a measure expressed i squares of the uits of the variable uder study. If we wished to have a measure of variability expressed i the same uits, we would take the square root of the variace ad calculate the stadard deviatio: Ŝ = Ŝ, or S = S. I the surface area example, we would have Ŝ =

9 Now, if we were iterested i comparig the dispersio of differet variables for the same populatio, or the same variable for differet populatios, we would eed a measure of variability which would be ivariat to scalig (ad uits) of the variables. We ca costruct such a measure, called the coefficiet of variatio, by takig the ratio of the stadard deviatio ad the mea of the variable uder study: Measures of asymmetry. V S = Ŝ X. 9

Chapter 2 Descriptive Statistics

Chapter 2 Descriptive Statistics Chapter 2 Descriptive Statistics Statistics Most commoly, statistics refers to umerical data. Statistics may also refer to the process of collectig, orgaizig, presetig, aalyzig ad iterpretig umerical data

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Median and IQR The median is the value which divides the ordered data values in half.

Median and IQR The median is the value which divides the ordered data values in half. STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media

More information

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers Chapter 4 4-1 orth Seattle Commuity College BUS10 Busiess Statistics Chapter 4 Descriptive Statistics Summary Defiitios Cetral tedecy: The extet to which the data values group aroud a cetral value. Variatio:

More information

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Data Description. Measure of Central Tendency. Data Description. Chapter x i Data Descriptio Describe Distributio with Numbers Example: Birth weights (i lb) of 5 babies bor from two groups of wome uder differet care programs. Group : 7, 6, 8, 7, 7 Group : 3, 4, 8, 9, Chapter 3

More information

MEASURES OF DISPERSION (VARIABILITY)

MEASURES OF DISPERSION (VARIABILITY) POLI 300 Hadout #7 N. R. Miller MEASURES OF DISPERSION (VARIABILITY) While measures of cetral tedecy idicate what value of a variable is (i oe sese or other, e.g., mode, media, mea), average or cetral

More information

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying. Lecture Mai Topics: Defiitios: Statistics, Populatio, Sample, Radom Sample, Statistical Iferece Type of Data Scales of Measuremet Describig Data with Numbers Describig Data Graphically. Defiitios. Example

More information

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements. CHAPTER 2 umerical Measures Graphical method may ot always be sufficiet for describig data. You ca use the data to calculate a set of umbers that will covey a good metal picture of the frequecy distributio.

More information

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day LECTURE # 8 Mea Deviatio, Stadard Deviatio ad Variace & Coefficiet of variatio Mea Deviatio Stadard Deviatio ad Variace Coefficiet of variatio First, we will discuss it for the case of raw data, ad the

More information

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls Ecoomics 250 Assigmet 1 Suggested Aswers 1. We have the followig data set o the legths (i miutes) of a sample of log-distace phoe calls 1 20 10 20 13 23 3 7 18 7 4 5 15 7 29 10 18 10 10 23 4 12 8 6 (1)

More information

1 Lesson 6: Measure of Variation

1 Lesson 6: Measure of Variation 1 Lesso 6: Measure of Variatio 1.1 The rage As we have see, there are several viable coteders for the best measure of the cetral tedecy of data. The mea, the mode ad the media each have certai advatages

More information

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for

More information

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all! ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Solutios Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "

(# x) 2 n. ( x) 2 = 30 2 = 900. = sum.  x 2 = =174.  x. Chapter 12. Quick math overview. #(x  x ) 2 = # x 2 Chapter 12 Describig Distributios with Numbers Chapter 12 1 Quick math overview = sum These expressios are algebraically equivalet #(x " x ) 2 = # x 2 " (# x) 2 Examples x :{ 2,3,5,6,6,8 } " x = 2 + 3+

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Elementary Statistics

Elementary Statistics Elemetary Statistics M. Ghamsary, Ph.D. Sprig 004 Chap 0 Descriptive Statistics Raw Data: Whe data are collected i origial form, they are called raw data. The followig are the scores o the first test of

More information

Summarizing Data. Major Properties of Numerical Data

Summarizing Data. Major Properties of Numerical Data Summarizig Data Daiel A. Meascé, Ph.D. Dept of Computer Sciece George Maso Uiversity Major Properties of Numerical Data Cetral Tedecy: arithmetic mea, geometric mea, media, mode. Variability: rage, iterquartile

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Binomial Distribution

Binomial Distribution 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion Poit Estimatio Poit estimatio is the rather simplistic (ad obvious) process of usig the kow value of a sample statistic as a approximatio to the ukow value of a populatio parameter. So we could for example

More information

GG313 GEOLOGICAL DATA ANALYSIS

GG313 GEOLOGICAL DATA ANALYSIS GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Measures of Variation

Measures of Variation Chapter : Measures of Variatio from Statistical Aalysis i the Behavioral Scieces by James Raymodo Secod Editio 97814669676 01 Copyright Property of Kedall Hut Publishig CHAPTER Measures of Variatio Key

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Lecture 24 Floods and flood frequency

Lecture 24 Floods and flood frequency Lecture 4 Floods ad flood frequecy Oe of the thigs we wat to kow most about rivers is what s the probability that a flood of size will happe this year? I 100 years? There are two ways to do this empirically,

More information

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals Chapter 6 Studet Lecture Notes 6-1 Busiess Statistics: A Decisio-Makig Approach 6 th Editio Chapter 6 Itroductio to Samplig Distributios Chap 6-1 Chapter Goals After completig this chapter, you should

More information

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Radom Samplig The basic idea of the statistical iferece is that we are allowed to draw ifereces or coclusios about a populatio based

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE Part 3: Summary of CI for µ Cofidece Iterval for a Populatio Proportio p Sectio 8-4 Summary for creatig a 100(1-α)% CI for µ: Whe σ 2 is kow ad paret

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9 BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous

More information

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234 STA 291 Lecture 19 Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Locatio CB 234 STA 291 - Lecture 19 1 Exam II Covers Chapter 9 10.1; 10.2; 10.3; 10.4; 10.6

More information

Lecture 18: Sampling distributions

Lecture 18: Sampling distributions Lecture 18: Samplig distributios I may applicatios, the populatio is oe or several ormal distributios (or approximately). We ow study properties of some importat statistics based o a radom sample from

More information

(6) Fundamental Sampling Distribution and Data Discription

(6) Fundamental Sampling Distribution and Data Discription 34 Stat Lecture Notes (6) Fudametal Samplig Distributio ad Data Discriptio ( Book*: Chapter 8,pg5) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye 8.1 Radom Samplig: Populatio:

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Probability and statistics: basic terms

Probability and statistics: basic terms Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample

More information

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008 Chapter 6 Part 5 Cofidece Itervals t distributio chi square distributio October 23, 2008 The will be o help sessio o Moday, October 27. Goal: To clearly uderstad the lik betwee probability ad cofidece

More information

ANALYSIS OF EXPERIMENTAL ERRORS

ANALYSIS OF EXPERIMENTAL ERRORS ANALYSIS OF EXPERIMENTAL ERRORS All physical measuremets ecoutered i the verificatio of physics theories ad cocepts are subject to ucertaities that deped o the measurig istrumets used ad the coditios uder

More information

Statisticians use the word population to refer the total number of (potential) observations under consideration

Statisticians use the word population to refer the total number of (potential) observations under consideration 6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space

More information

2: Describing Data with Numerical Measures

2: Describing Data with Numerical Measures : Describig Data with Numerical Measures. a The dotplot show below plots the five measuremets alog the horizotal axis. Sice there are two s, the correspodig dots are placed oe above the other. The approximate

More information

Lecture 5. Random variable and distribution of probability

Lecture 5. Random variable and distribution of probability Itroductio to theory of probability ad statistics Lecture 5. Radom variable ad distributio of probability prof. dr hab.iż. Katarzya Zarzewsa Katedra Eletroii, AGH e-mail: za@agh.edu.pl http://home.agh.edu.pl/~za

More information

Homework 5 Solutions

Homework 5 Solutions Homework 5 Solutios p329 # 12 No. To estimate the chace you eed the expected value ad stadard error. To do get the expected value you eed the average of the box ad to get the stadard error you eed the

More information

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual

More information

Analysis of Experimental Data

Analysis of Experimental Data Aalysis of Experimetal Data 6544597.0479 ± 0.000005 g Quatitative Ucertaity Accuracy vs. Precisio Whe we make a measuremet i the laboratory, we eed to kow how good it is. We wat our measuremets to be both

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

Math 10A final exam, December 16, 2016

Math 10A final exam, December 16, 2016 Please put away all books, calculators, cell phoes ad other devices. You may cosult a sigle two-sided sheet of otes. Please write carefully ad clearly, USING WORDS (ot just symbols). Remember that the

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n, CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe

More information

Chapter 18 Summary Sampling Distribution Models

Chapter 18 Summary Sampling Distribution Models Uit 5 Itroductio to Iferece Chapter 18 Summary Samplig Distributio Models What have we leared? Sample proportios ad meas will vary from sample to sample that s samplig error (samplig variability). Samplig

More information

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech Fial Review Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech 1 Radom samplig model radom samples populatio radom samples: x 1,..., x

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions CS 70 Discrete Mathematics for CS Sprig 2005 Clacy/Wager Notes 21 Some Importat Distributios Questio: A biased coi with Heads probability p is tossed repeatedly util the first Head appears. What is the

More information

a. For each block, draw a free body diagram. Identify the source of each force in each free body diagram.

a. For each block, draw a free body diagram. Identify the source of each force in each free body diagram. Pre-Lab 4 Tesio & Newto s Third Law Refereces This lab cocers the properties of forces eerted by strigs or cables, called tesio forces, ad the use of Newto s third law to aalyze forces. Physics 2: Tipler

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15 CS 70 Discrete Mathematics ad Probability Theory Summer 2014 James Cook Note 15 Some Importat Distributios I this ote we will itroduce three importat probability distributios that are widely used to model

More information

HUMBEHV 3HB3 Measures of Central Tendency & Variability Week 2

HUMBEHV 3HB3 Measures of Central Tendency & Variability Week 2 Describig Data Distributios HUMBEHV 3HB3 Measures of Cetral Tedecy & Variability Week 2 Prof. Patrick Beett Ofte we wish to summarize distributios of data, rather tha showig histograms Two basic descriptios

More information

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1 PH 425 Quatum Measuremet ad Spi Witer 23 SPIS Lab Measure the spi projectio S z alog the z-axis This is the experimet that is ready to go whe you start the program, as show below Each atom is measured

More information

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still

More information

Activity 3: Length Measurements with the Four-Sided Meter Stick

Activity 3: Length Measurements with the Four-Sided Meter Stick Activity 3: Legth Measuremets with the Four-Sided Meter Stick OBJECTIVE: The purpose of this experimet is to study errors ad the propagatio of errors whe experimetal data derived usig a four-sided meter

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Computing Confidence Intervals for Sample Data

Computing Confidence Intervals for Sample Data Computig Cofidece Itervals for Sample Data Topics Use of Statistics Sources of errors Accuracy, precisio, resolutio A mathematical model of errors Cofidece itervals For meas For variaces For proportios

More information

Measures of Spread: Variance and Standard Deviation

Measures of Spread: Variance and Standard Deviation Lesso 1-6 Measures of Spread: Variace ad Stadard Deviatio BIG IDEA Variace ad stadard deviatio deped o the mea of a set of umbers. Calculatig these measures of spread depeds o whether the set is a sample

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 8: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review What do we mea by oparametric? What is a desirable locatio statistic for ordial data? What

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State Eco411 Lab: Cetral Limit Theorem, Normal Distributio, ad Jourey to Girl State 1. Some studets may woder why the magic umber 1.96 or 2 (called critical values) is so importat i statistics. Where do they

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov Microarray Ceter BIOSTATISTICS Lecture 5 Iterval Estimatios for Mea ad Proportio dr. Petr Nazarov 15-03-013 petr.azarov@crp-sate.lu Lecture 5. Iterval estimatio for mea ad proportio OUTLINE Iterval estimatios

More information

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 7: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review How ca we set a cofidece iterval o a proportio? 2 Review How ca we set a cofidece iterval

More information

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}. 1 (*) If a lot of the data is far from the mea, the may of the (x j x) 2 terms will be quite large, so the mea of these terms will be large ad the SD of the data will be large. (*) I particular, outliers

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

Stat 225 Lecture Notes Week 7, Chapter 8 and 11

Stat 225 Lecture Notes Week 7, Chapter 8 and 11 Normal Distributio Stat 5 Lecture Notes Week 7, Chapter 8 ad Please also prit out the ormal radom variable table from the Stat 5 homepage. The ormal distributio is by far the most importat distributio

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Chapter 6. Sampling and Estimation

Chapter 6. Sampling and Estimation Samplig ad Estimatio - 34 Chapter 6. Samplig ad Estimatio 6.. Itroductio Frequetly the egieer is uable to completely characterize the etire populatio. She/he must be satisfied with examiig some subset

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

2 Definition of Variance and the obvious guess

2 Definition of Variance and the obvious guess 1 Estimatig Variace Statistics - Math 410, 11/7/011 Oe of the mai themes of this course is to estimate the mea µ of some variable X of a populatio. We typically do this by collectig a sample of idividuals

More information

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate

More information