MATH 1015: Life Science Statistics. Lecture Pack for Chapter 1 Weeks 1-3. Lecturer: Jennifer Chan Room: Carslaw Room 817 Telephone:

Size: px
Start display at page:

Download "MATH 1015: Life Science Statistics. Lecture Pack for Chapter 1 Weeks 1-3. Lecturer: Jennifer Chan Room: Carslaw Room 817 Telephone:"

Transcription

1 MATH 1015: Life Science Statistics Lecture Pack for Chapter 1 Weeks 1-3. Lecturer: Jennifer Chan Room: Carslaw Room 817 Telephone: Text: Phipps, M. and Quine, M. (2001) A Primer of Statistics (4th Ed.)

2 Lecture 1 Introduction and plots 1 Introduction 1.1 Statistics Statistics is a scientific study of numerical data based on natural phenomena. It is also the science of collecting, organising, interpreting and reporting data. There are four phases of an experiment, survey or study. 1. Planning Advise on the best way to collect data, what readings should be taken, how bias can be reduced or eliminated. This stage is guided by the questions the study wants to address, time and costs. 2. Data Analysis Numerical and graphical summaries of the data to get an impression of the variability and shape of the data. 3. Model Building Develop a mathematical model based on probability theory to explain the patterns observed. 4. Inference Use the model to make inferences about the population from which the sample was drawn. SydU MATH1015 (2007) First semester Dr. J. Chan 2

3 Lecture 1 Introduction and plots 1.2 Typical Statistical Problems 1. Quality Control Problem. In quality control, people regularly sample output and test for defects. Assume the process is deemed to be okay if the proportion of defective articles overall is at most 5%. The company inspects a sample of 60 articles and finds 4 defective articles. Should they reject the whole batch? 5% of 60 is 3. If the true defect rate was 5% can the observed 4 be explained by chance? We draw inferences about the population on the basis of a sample. 2. Polling data. - ACNeilsen Poll (SMH 1/03/03) Why choose a sample of size 1007? What is the margin for error and how is it calculated? 3. In vitro fertilization (IVF) data 6 of the first 7 births on the IVF program in Australia were female. Does this provide evidence of sex bias in IVF babies? SydU MATH1015 (2007) First semester Dr. J. Chan 3

4 Joke Lecture 1 Introduction and plots A Physicist, a Biologist, and a Statistician see two people enter a house, and then after some time, they see three people leave the house. The Physicist concludes, My initial observation must have been incorrect. The Biologist concludes, Clearly, the two reproduced.. The Statistician concludes, Well, if one more person enters the house, then there will be no-one in the house! SydU MATH1015 (2007) First semester Dr. J. Chan 4

5 Lecture 1 Introduction and plots 1.3 Data analysis Discrete data: Stem and Leaf diagrams; Frequency Tables. R code: > stem(x) Examples 1. No. of persons killed in road accidents in Local Govt areas of Sydney in SydU MATH1015 (2007) First semester Dr. J. Chan 5

6 Lecture 1 Introduction and plots 0 1,2,2,2,2,3,3,3,3,4,6,6,7,7,7,7,7,7,8,9,9,9,9 1 0,1,2,3,4,4,5,7,7,8 2 1,1,2,3,4 3 1 Double stem version: Stem Leaf 0 1,2,2,2,2,3,3,3,3,4 0 6,6,7,7,7,7,7,7,8,9,9,9,9 1 0,1,2,3,4,4 1 5,7,7,8 2 1,1,2,3, Stem and Leaf diagram gives an easy way to present ordered data. It gives an idea of the shape of the distribution of the data. 2. Yeast cell counts. (Data set 1, book P. 131.) i Total f i Solution: > x=c(2,2,4,... ) > tab=table(x) > tab x > stem(x) SydU MATH1015 (2007) First semester Dr. J. Chan 6

7 Lecture 1 Introduction and plots The decimal point is at the (Student s weight) Weight in pounds from 92 students: Males Females Construct a stem and leaf plot for this data set. Solution: SydU MATH1015 (2007) First semester Dr. J. Chan 7

8 Lecture 1 Introduction and plots Stem Leaf SydU MATH1015 (2007) First semester Dr. J. Chan 8

9 Lecture 1 Introduction and plots Continuous data: Histograms and Line plots Unlike counts, measurements are often rounded. Be aware of the rounding when summarising the data set. Age is generally rounded down whereas lengths are rounded to the nearest value. A small discrete and continuous data set can be summarized visually using a line plot. Frequency tables give the frequencies for various intervals. - Usually use intervals. - Take care in determining the true class boundaries Relative frequency is the interval frequency divided by the total number of observations. Histograms - Column sides are placed at the interval boundaries. - The area of the column reflects the frequency for the interval. - Unequal class intervals are used to emphasise important structure in the data. - R code: > hist(x) SydU MATH1015 (2007) First semester Dr. J. Chan 9

10 Lecture 1 Introduction and plots Example: (Student s weight) Construct a frequency table, a histogram and a line plot for this data set. Solution: The min and max values are 95 and 215 respectively. The range is = 120. For 8 classes, the class width is 120/8 = 15. CLASS INTERVAL MIDPOINT FREQUENCY RELATIVE FREQUENCY TOTAL Relative freq. Frequency Histogram and line plot SydU MATH1015 (2007) First semester Dr. J. Chan 10

11 Lecture 1 Introduction and plots Histogram with unequal class intervals. Example: (Cervical cancer data) Data set 4, book P.132. Age Frequency Before drawing a histogram calculate the frequency of cancer per year. This gives the height of the rectangles. Age Boundaries Length Freq. Freq/year [20,30) /10= [30,35) /5= [35,40) /5= [40,55) /15= [55,60) /5= [60,70) /10= [70,90) /20=0.65 Freq./year Histogram SydU MATH1015 (2007) First semester Dr. J. Chan 11

12 Lecture 2 5-nos. summaries & boxplot 2 5-number summaries and boxplots number summaries Measures of centre: Median, mean, mode. The median, range and IQR are easily calculated from an ordered list of the data. Median The median, x is a value such that at least half the observations are less than or equal to x and at least half the observations are greater than or equal to x. To find the median, we arrange the data in ascending order. If the number of data is ODD the median is the middle data point. If the number of data is EVEN, we average the 2 values around the middle: = The median Middle space Measures of spread: range, interquartile range (IQR), standard deviation. Two data sets with the same center measures may have completely different spreads. The measures of center and spread taken together give a better picture of the shape of a data set. SydU MATH1015 (2007) First semester Dr. J. Chan 12

13 Lecture 2 5-nos. summaries & boxplot Range The range = Maximum value - Minimum value. Quartiles The lower quartile, Q 1, is a value such that at least 25% of the observations are less than or equal to Q 1 and at least 75% of the observations are greater than or equal to Q 1. Similarly the upper quartile, Q 3, separates off the upper 25% of the observations in an ordered list. Interquartile Range IQR = Q 3 Q 1. The middle 50% of the observations lie in [Q 1, Q 3 ]. 5-number summary 2.2 Boxplot Q 1 Q 2 Q 3 25% 25% 25% 25% IQR (Minimum, Q 1, x, Q 3, Maximum). It is a quick way of presenting this information graphically. Q 1 and Q 3 determine the ends of the box. Outliers are points more than one IQR beyond the ends of the box. To determine the outliers we calculate the upper and lower thresholds. SydU MATH1015 (2007) First semester Dr. J. Chan 13

14 Lecture 2 5-nos. summaries & boxplot IQR IQR LT UT = Q 1 IQR = Q 3 + IQR. Min 50% 50% 50% Outlier, max An example with min within LT and max beyond UT. Shapes of Data Sets Boxplots give an easy graphical means of getting an impression of the shape of the data set. The shape is used to suggest a mathematical model for the situation of interest. 1. Symmetric 2. Right skewed (positive skewness): the boxplot is stretched to the right. 3. Left skewed (negative skewness): the boxplot is stretched to the left. SydU MATH1015 (2007) First semester Dr. J. Chan 14

15 Lecture 2 5-nos. summaries & boxplot Transformation We can often find a simple transformation that will make the data more symmetric. For right skewed data log or square root transformations often work. R code: > s=summary(x) > s > IQR=s[5]-s[2] > IQR > UT=s[5]+IQR > UT > LT=s[2]-IQR > LT > range(x) > boxplot(x) SydU MATH1015 (2007) First semester Dr. J. Chan 15

16 Lecture 2 5-nos. summaries & boxplot Example: (Student weight) Find the min, max, Q 1, Q 3, IQR, LT and UT. Draw the boxplot. Solution: Min = 95 range = = 120 Max = 215 Median = X 0.5(92+1) = X 45.5 = X 45 + X 46 2 Q 1 = X 0.25(92+1) = X = X 23 + X 24 2 Q 3 = X 0.75(92+1) = X = X 69 + X 70 2 IQR = = 31 LT = Q 1 IQR = = 94 UT = Q 3 + IQR = = = = = = = = IQR 50% 125 IQR % 50% Min Outlier, max 215 Example: (Road Accident data) Number of persons killed in road accidents in Local Govt areas of Sydney in ,2,2,2,2,3,3,3,3,4,6,6,7,7,7,7,7,7,8,9,9,9,9 1 0,1,2,3,4,4,5,7,7,8 2 1,1,2,3,4 3 1 n = 39. Range = 31-1=30. SydU MATH1015 (2007) First semester Dr. J. Chan 16

17 Lecture 2 5-nos. summaries & boxplot x = X (39+1)/2 = X 20 = 9 Q 1 = X (39+1)/4 = X 10 = 4 Q 3 = X 3(39+1)/4 = X 30 = 15 IQR = 15 4 = 11. UT = = 26 < 31 and LT = 4 11 = 7 < 1. The max no. 31 is an outlier. > x=c(1,2,2,2,2,3,3,3,3,4,6,6,7,7,7,7,7,7,8,9,9,9,9,10,11,12,13,14,14,15, 17,17,18,21,21,22,23,24,31) > sqrtx=sqrt(x) > par(mfrow=c(2,2)) > boxplot(x) > title("x") > boxplot(sqrtx) > title("square root of x") R Output: x square root of x Are the data roughly symmetrical about the median, left or right skewed? It is right skewed and it becomes symmetric after taking square root transformation. SydU MATH1015 (2007) First semester Dr. J. Chan 17

18 Lecture 2 5-nos. summaries & boxplot 2.3 Estimating Quartiles Example: (Diameters) Data set 3, book P.132. Frequency Table Intervals Frequency Boundaries Cum Freq n = 200. Class width = We can approximate the quartiles from the cumulative frequency diagram. x = X (200+1)/2 = X = Q 1 = X (200+1)/4 = X = = = Q 3 = X 3(200+1)/4 = X = = To construct the cumulative frequency diagram plot the cumulative frequency against the class interval UPPER boundary. SydU MATH1015 (2007) First semester Dr. J. Chan 18

19 f ABC ADE x C 55 E A 28 D B ? Lecture 2 5-nos. summaries & boxplot x SydU MATH1015 (2007) First semester Dr. J. Chan 19

20 Lecture 3 Sample mean and variance 3 Sample mean and variance 3.1 Review of summation notation For the values x 1 = 3, x 2 = 4, x 3 = 5, x 4 = 3 evaluate the following summation expressions i=2 4 x i = = 15 x 2 i = = 59 x i = = 9 (2x i + 3) = = Sample Mean The sample mean is the simple average of the observations. observations x 1, x 2,..., x n For x = x 1 + x x n n If e i = cx i + f then ē = c x + f. n e i = c n x i + nf = 1 n n x i. SydU MATH1015 (2007) First semester Dr. J. Chan 20

21 Joke Lecture 3 Sample mean and variance Did you hear about the statistician who had his head in an oven and his feet in a bucket of ice? When asked how he felt, he replied, On the average I feel just fine. When she told me I was average, she was just being mean. Grouped frequency table. If we only have the information provided by a grouped frequency table, for example, we only have access to the published report and not the original data set, then we can approximate the sample mean by x = 1 n k (f i u i ), where the interval centres are u 1, u 2,..., u k with corresponding frequencies f 1, f 2,..., f k. Example: (Diameters) Data set 3, book P.132. Frequency Table Intervals Frequency (f j ) Interval centre (u j ) SydU MATH1015 (2007) First semester Dr. J. Chan 21

22 Lecture 3 Sample mean and variance There are k = 12 intervals. n = f i u i = 2(13.12) + 1(13.17) + 8(13.22) (13.67) = x = 1 n 12 f i u i = = The mean of the raw data was SydU MATH1015 (2007) First semester Dr. J. Chan 22

23 Lecture 3 Sample mean and variance 3.3 Mean vs Median 1. Mean is easier to calculate and easier to handle theoretically. 2. If the data are roughly symmetric then the mean, median, and mode are close and they lie at the center of the distribution. 3. If the data are skewed then the mean is pulled toward the long tail. We have mode median mean if it is right skewed. 4. The median is robust against outliers and incorrect readings whereas the mean is not. Example: (Heat of sublimation of platinum) Data set 14, book P.136. Stem and Leaf Display ,2,3,5,7,8,8,8,9, ,0,2,2,4,4,8, , SydU MATH1015 (2007) First semester Dr. J. Chan 23

24 Lecture 3 Sample mean and variance n = 26. Median: x = X (26+1)/2 = X 13.5 = 1 2 (X 13 + X 14 ) = 1 2 (0 + 2) = 1. In Data set 14, if is changed to 34.1 the median does not change but the sample mean changes. SydU MATH1015 (2007) First semester Dr. J. Chan 24

25 Lecture 3 Sample mean and variance 3.4 Sample variance and standard deviation. An alternative to the IQR as a measure of spread is the sample standard deviation, s. For data x 1, x 2,..., x n The sample variance is Calculation formula is s = 1 n 1 s 2 = 1 n 1 s 2 = 1 n 1 where S xx = n (x i x) 2 n n n (x i x) 2. (x i x) 2. x 2 i 1 n ( n x i ) 2 If we use working origin a and working units h with d i = x i a h then s x = hs d. For data from frequency tables we use s 2 = 1 n 1 k j=1 f j (u j ū) 2. SydU MATH1015 (2007) First semester Dr. J. Chan 25

26 Lecture 3 Sample mean and variance Example: Solution: n = 12. First calculate xi = = 689 x 2 i = = Mean: x = 1 n xi = = Variance: s 2 1 = x 2 i ( x i ) 2 = 1 n 1 n (689)2 12 = Standard Deviation: s = = Example: (Interobital width) A random sample of 12 measurements of interobital width of domestic pigeons is obtained as follow: Find the mean, median, mode, variance, standard deviation, range and quartiles. Construct a boxplot for these data. Solution: Arranged: mean = n x i n = = SydU MATH1015 (2007) First semester Dr. J. Chan 26

27 Lecture 3 Sample mean and variance median = = 11.8 mode = 11.8 and 12.2 not unique s 2 = 1 n 1 n x 2 i n x 2 = = s = = range = = 2.6 Q 1 = = Q 3 = = 12.2 IQR = = 1.15 (LT,UT) = ( , ) = (9.325, ) (no outliers) SydU MATH1015 (2007) First semester Dr. J. Chan 27

28 Lecture 4 Scatter plot and correlation 4 Scatter plot and correlation Procedures considered so far only involve observations on a single feature. Often we take several readings on each subject or experimental unit. For example, x y patient s age blood pressure temperature reaction time alcohol consumption cholesterol level. 4.1 Scatterplot The first step is to construct a scatterplot of the observed pairs. Example: (Weight & height) Height is frequently named as a good predictor for weight among people of the same gender. Give a scatterplot of the following heights (in cm) and weights (in kg) from 14 males between the ages of 19 and 26 years. Weight Height Solution: In R, > weight=c(83.9,99,63.8,...) > height=c(185,180,173,...) > par(mfrow=c(1,1)) > plot(weight,height) > title("weight against height") R output: (scatter plot) SydU MATH1015 (2007) First semester Dr. J. Chan 28

29 Lecture 4 Scatter plot and correlation Weight against height height weight 4.2 Correlation Coefficient The correlation coefficient is a numerical index that measures the degree of linear association between x and y. r = Σ n (x i x)(y i ȳ) Σ n (x i x) 2 Σ n (y i ȳ) 2 = S xy Sxx S yy, where S xx = n (x i x) 2 S yy = n (y i ȳ) 2 S xy = n (x i x)(y i ȳ). Note that if we rescale the x or the y values we do not change r. If we replace x i by w i = cx i + a for all i = 1, 2,.., n then w = c x + a and so (w i w) = c(x i x). SydU MATH1015 (2007) First semester Dr. J. Chan 29

30 Lecture 4 Scatter plot and correlation Consider n = n = n 2 Syy x i x y i ȳ Sxx ( xi x Sxx ) 2 2 n (x i x) 2 S xx + = S xx S xx + S yy S yy 2 n (y i ȳ) 2 ( ) xi x Sxx S yy 2 y i ȳ + n Syy n S xy Sxx S yy = 2 2r. 2 Syy y i ȳ (x i x)(y i ȳ) Sxx S yy Thus 2 2r 0 so 1 r. If r = 1 then x i x Sxx = y i ȳ Syy for all i. Thus the points (x i, y i ) all lie on the straight line y = mx + d, where m is the slope and d is the y-intercept. Similarly, n so 2 + 2r 0 so r 1. 2 Syy x i x + y i ȳ Sxx = 2 + 2r, 1 r 1. SydU MATH1015 (2007) First semester Dr. J. Chan 30

31 Lecture 4 Scatter plot and correlation If r = 1 then the observations (x i, y i ) all lie on a straight line with negative slope. Characteristics of r 1. Scale free r If r = ±1.0, then all the observations fall on a straight line. 4. Note x and y can have a very strong non-linear relationship and r = Remember a high correlation coefficient does not necessarily imply any causal relationship between the two variables. r is positive Y X r is negative Y X r is zero Y X Perfect fit r = 1, σ 2 = 0 Y X Imperfect fit r < 1, σ 2 > 0 Y X Perfect negative correlation No correlation Perfect positive correlation Strong negative Moderate negative Weak negative Weak positive Moderate positive Strong positive correlation correlation correlation correlation correlation correlation ve r +ve r 1.00 SydU MATH1015 (2007) First semester Dr. J. Chan 31

32 Lecture 4 Scatter plot and correlation Example (Weight and height) Calculate the correlation coefficient. Solution: In R: > cor(weight,height) [1] Example (Soil temp. and germination interval) Soil temperature (x i ) and germination interval (y i, in days) (book P.37) were observed for plots of winter wheat in 10 localities: x i y i Solution: n = 10 n x i = = 58.5 n y i = = 251 n x2 i = = n y2 i = = 6985 n x iy i = 12.5(10) + 5(26) (33) = S xx = n x 2 i n x 2 = ( ) = S yy = n yi 2 nȳ 2 = ( ) = SydU MATH1015 (2007) First semester Dr. J. Chan 32

33 Lecture 4 Scatter plot and correlation r = S xy = n x i y i n xȳ = (5.850)(925.1) = S xy Sxx S yy = (684.9) = SydU MATH1015 (2007) First semester Dr. J. Chan 33

34 Lecture 5 Regression 5 Regression Consider the weight and height data. How do we fit a trend line to a data set like this? Data: (x 1, y 1 ), (x 2, y 2 ),.., (x n, y n ) 5.1 Regression line Using a regression line y = a + bx the estimated value of y at x i is ŷ i = a + bx i. The observed value of y at x i is y i. The residual error at x i is e i = y i ŷ i. We minimise e 2 i = n (y i (a + bx i )) 2 to find a and b. b = S xy /S xx Note b = S xy Sxx S yy a = ȳ b x. S yy S xx = r S yy S xx. The slope has the same sign as the correlation coefficient. SydU MATH1015 (2007) First semester Dr. J. Chan 34

35 Lecture 5 Regression Example: (Weight & height) Fit a regression line ŷ = a + bx to the data. Solution: We have y i i yi 2 i = ; x i = 2472; i = ; x 2 i = ; i i x i y i = S yy y = x = = i i y i n i x i = 1, = n = = yi 2 ny 2 = = S xy S xx = i = i x i y i nxy = = x 2 i nx 2 = = Hence ˆb = â = S xy S xx = = , ȳ ˆb x = = , The regression line is Weight= *Height. When x = 0, ŷ = â = is an imaginary level of the predicted weight when the height is 0 which is impossible. For each 1 cm increase in height, the weight will be increased by ˆb = kg. In R, > y=weight SydU MATH1015 (2007) First semester Dr. J. Chan 35

36 Lecture 5 Regression > x=height > c=lsfit(x,y)$coeff > c Intercept X > par(mfrow=c(1,1)) > plot(x,y,xlab="height",ylab="weight") > abline(c[1],c[2]) > title("fitted line") R output: (fitted line plot) fitted line weight height 5.2 Residual Plots If the plot of residuals against x shows any strong structure then a more complex model is needed. SydU MATH1015 (2007) First semester Dr. J. Chan 36

37 Lecture 5 Regression A random scatterplot indicates the model assumtions are OK. e ŷ Nonconstant variance σ 2 e, σ 2 e increases with x. e ŷ Residuals plots e vs ŷ Functional form f(x) may be wrong. It should be f(x) = β 0 + β 1 x + β 2 x 2 + β 3 x 3. e ŷ y y y x x Fitted line Y vs x x Example: (Weight & height) What will be the average weight of those males who are 175 cm tall? Check the regression model using residual plot. Solution: In R, > predict=c[1]+c[2]*175 > predict Intercept > res=lsfit(x,y)$res > fitted=y-res > par(mfrow=c(2,2)) > plot(fitted,res) > abline(h=0) SydU MATH1015 (2007) First semester Dr. J. Chan 37

38 Lecture 5 Regression > title("residual plot") > boxplot(res) > title("boxplot of residuals") R output: (fitted line plot) residual plot boxplot of residuals res fitted The predict weight is kg. Apart from an outlier, the residual plot shows that the errors are random and symmetric. Example: (Dose & urine concentration) Dose x in gms and concentration in the urine, y (in mg/gm): x: y: n = 12 i x i = 507 x = = i y i = 144 y = = 12 i x 2 i = i y 2 i = 1802 SydU MATH1015 (2007) First semester Dr. J. Chan 38

39 Lecture 5 Regression i x i y i = 6314 S xx = i x 2 i nx 2 = = S yy = i y 2 i ny 2 = = 74 S xy = i x i y i nxy = = 230 Correlation coefficient: Regression: r = S xy Sxx S yy = b = S xy S xx = = , (74) = a = ȳ b x = = , Fitted line: Concentration= *dose. SydU MATH1015 (2007) First semester Dr. J. Chan 39

40 Lecture 5 Regression 5.3 Regression Effect 1. For large data sets split the x-axis into narrow strips and find the average of the y values in each strip of the scatterplot. 2. Means often scatter around a straight line called the regression line. The regression line reflects how the average y values vary with x. 3. In Galton s data the scatterplot is roughly elliptical. Average father s height x = 68 inches Average son s height ȳ = 69 inches r = Sons of tall fathers tend to be shorter than their fathers whereas sons of short fathers tend to be taller than their fathers on average. Galton noted the regression effect sometimes called regression to the mean. SydU MATH1015 (2007) First semester Dr. J. Chan 40

2.1 Measures of Location (P.9-11)

2.1 Measures of Location (P.9-11) MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Lecture 2 and Lecture 3

Lecture 2 and Lecture 3 Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

STATISTICS 1 REVISION NOTES

STATISTICS 1 REVISION NOTES STATISTICS 1 REVISION NOTES Statistical Model Representing and summarising Sample Data Key words: Quantitative Data This is data in NUMERICAL FORM such as shoe size, height etc. Qualitative Data This is

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Sampling, Frequency Distributions, and Graphs (12.1)

Sampling, Frequency Distributions, and Graphs (12.1) 1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Chapter 3 continued Describing distributions with numbers Measuring spread of data: Quartiles Definition 1: The interquartile

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

STAT2012 Statistical Tests 23 Regression analysis: method of least squares 23 Regression analysis: method of least squares L23 Regression analysis The main purpose of regression is to explore the dependence of one variable (Y ) on another variable (X). 23.1 Introduction (P.532-555)

More information

Sociology 6Z03 Review I

Sociology 6Z03 Review I Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing

More information

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004 UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 50W - Introduction to Biostatistics Fall 00 Exercises with Solutions Topic Summarizing Data Due: Monday September 7, 00 READINGS.

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Vocabulary: Samples and Populations

Vocabulary: Samples and Populations Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we

More information

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem. Statistics 1 Mathematical Model A mathematical model is a simplification of a real world problem. 1. A real world problem is observed. 2. A mathematical model is thought up. 3. The model is used to make

More information

1.3.1 Measuring Center: The Mean

1.3.1 Measuring Center: The Mean 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Chapter 6: Exploring Data: Relationships Lesson Plan

Chapter 6: Exploring Data: Relationships Lesson Plan Chapter 6: Exploring Data: Relationships Lesson Plan For All Practical Purposes Displaying Relationships: Scatterplots Mathematical Literacy in Today s World, 9th ed. Making Predictions: Regression Line

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

Summarising numerical data

Summarising numerical data 2 Core: Data analysis Chapter 2 Summarising numerical data 42 Core Chapter 2 Summarising numerical data 2A Dot plots and stem plots Even when we have constructed a frequency table, or a histogram to display

More information

STA 218: Statistics for Management

STA 218: Statistics for Management Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Problem How much do people with a bachelor s degree

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Lecture 8 CORRELATION AND LINEAR REGRESSION

Lecture 8 CORRELATION AND LINEAR REGRESSION Announcements CBA5 open in exam mode - deadline midnight Friday! Question 2 on this week s exercises is a prize question. The first good attempt handed in to me by 12 midday this Friday will merit a prize...

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)

More information

UCLA STAT 10 Statistical Reasoning - Midterm Review Solutions Observational Studies, Designed Experiments & Surveys

UCLA STAT 10 Statistical Reasoning - Midterm Review Solutions Observational Studies, Designed Experiments & Surveys UCLA STAT 10 Statistical Reasoning - Midterm Review Solutions Observational Studies, Designed Experiments & Surveys.. 1. (i) The treatment being compared is: (ii). (5) 3. (3) 4. (4) Study 1: the number

More information

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,

More information

Whitby Community College Your account expires on: 8 Nov, 2015

Whitby Community College Your account expires on: 8 Nov, 2015 To print higher resolution math symbols, click the Hi Res Fonts for Printing button on the jsmath control panel. If the math symbols print as black boxes, turn off image alpha channels using the Options

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

1. Descriptive stats methods for organizing and summarizing information

1. Descriptive stats methods for organizing and summarizing information Two basic types of statistics: 1. Descriptive stats methods for organizing and summarizing information Stats in sports are a great example Usually we use graphs, charts, and tables showing averages and

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Boxplots and standard deviations Suhasini Subba Rao Review of previous lecture In the previous lecture

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

CHAPTER 5 LINEAR REGRESSION AND CORRELATION CHAPTER 5 LINEAR REGRESSION AND CORRELATION Expected Outcomes Able to use simple and multiple linear regression analysis, and correlation. Able to conduct hypothesis testing for simple and multiple linear

More information

Lecture 1: Description of Data. Readings: Sections 1.2,

Lecture 1: Description of Data. Readings: Sections 1.2, Lecture 1: Description of Data Readings: Sections 1.,.1-.3 1 Variable Example 1 a. Write two complete and grammatically correct sentences, explaining your primary reason for taking this course and then

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Chapter 01 : What is Statistics?

Chapter 01 : What is Statistics? Chapter 01 : What is Statistics? Feras Awad Data: The information coming from observations, counts, measurements, and responses. Statistics: The science of collecting, organizing, analyzing, and interpreting

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures

More information

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct. ,, Dr. Back Oct. 14, 2009 Son s Heights from Their Fathers Galton s Original 1886 Data If you know a father s height, what can you say about his son s? Son s Heights from Their Fathers Galton s Original

More information

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model A1: There is a linear relationship between X and Y. A2: The error terms (and

More information

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Honors Algebra 1 - Fall Final Review

Honors Algebra 1 - Fall Final Review Name: Period Date: Honors Algebra 1 - Fall Final Review This review packet is due at the beginning of your final exam. In addition to this packet, you should study each of your unit reviews and your notes.

More information

Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System PREP Course #10: Introduction to Exploratory Data Analysis and Data Transformations (Part 1) Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

More information

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks IB Questionbank Mathematical Studies 3rd edition Grouped discrete 184 min 183 marks 1. The weights in kg, of 80 adult males, were collected and are summarized in the box and whisker plot shown below. Write

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter

More information

Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2)

Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2) Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

PROBABILITY DENSITY FUNCTIONS

PROBABILITY DENSITY FUNCTIONS PROBABILITY DENSITY FUNCTIONS P.D.F. CALCULATIONS Question 1 (***) The lifetime of a certain brand of battery, in tens of hours, is modelled by the f x given by continuous random variable X with probability

More information

Stat 101: Lecture 6. Summer 2006

Stat 101: Lecture 6. Summer 2006 Stat 101: Lecture 6 Summer 2006 Outline Review and Questions Example for regression Transformations, Extrapolations, and Residual Review Mathematical model for regression Each point (X i, Y i ) in the

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 1 / 24 Sections 2.3 and 2.4 Note made by: Dr. Timothy Hanson Instructor: Peijie Hou Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences

More information

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that

More information

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,

More information

University of Jordan Fall 2009/2010 Department of Mathematics

University of Jordan Fall 2009/2010 Department of Mathematics handouts Part 1 (Chapter 1 - Chapter 5) University of Jordan Fall 009/010 Department of Mathematics Chapter 1 Introduction to Introduction; Some Basic Concepts Statistics is a science related to making

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice Name Period AP Statistics Bivariate Data Analysis Test Review Multiple-Choice 1. The correlation coefficient measures: (a) Whether there is a relationship between two variables (b) The strength of the

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

15.0 Linear Regression

15.0 Linear Regression 15.0 Linear Regression 1 Answer Questions Lines Correlation Regression 15.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

More information

Introduction to Probability and Statistics Slides 1 Chapter 1

Introduction to Probability and Statistics Slides 1 Chapter 1 1 Introduction to Probability and Statistics Slides 1 Chapter 1 Prof. Ammar M. Sarhan, asarhan@mathstat.dal.ca Department of Mathematics and Statistics, Dalhousie University Fall Semester 2010 Course outline

More information

THE ROYAL STATISTICAL SOCIETY 2002 EXAMINATIONS SOLUTIONS ORDINARY CERTIFICATE PAPER II

THE ROYAL STATISTICAL SOCIETY 2002 EXAMINATIONS SOLUTIONS ORDINARY CERTIFICATE PAPER II THE ROYAL STATISTICAL SOCIETY 2002 EXAMINATIONS SOLUTIONS ORDINARY CERTIFICATE PAPER II The Society provides these solutions to assist candidates preparing for the examinations in future years and for

More information

For use only in Badminton School November 2011 S1 Note. S1 Notes (Edexcel)

For use only in Badminton School November 2011 S1 Note. S1 Notes (Edexcel) For use only in Badminton School November 011 s (Edexcel) Copyright www.pgmaths.co.uk - For AS, A notes and IGCSE / GCSE worksheets 1 For use only in Badminton School November 011 Copyright www.pgmaths.co.uk

More information

y n 1 ( x i x )( y y i n 1 i y 2

y n 1 ( x i x )( y y i n 1 i y 2 STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered

More information

DSST Principles of Statistics

DSST Principles of Statistics DSST Principles of Statistics Time 10 Minutes 98 Questions Each incomplete statement is followed by four suggested completions. Select the one that is best in each case. 1. Which of the following variables

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management Lecture Notes for BUSINESS STATISTICS - BMGT 571 Chapters 1 through 6 Professor Ahmadi, Ph.D. Department of Management Revised May 005 Glossary of Terms: Statistics Chapter 1 Data Data Set Elements Variable

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2.

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2. Chapter 3 Solutions 3.1 3.2 3.3 87% of the girls her daughter s age weigh the same or less than she does and 67% of girls her daughter s age are her height or shorter. According to the Los Angeles Times,

More information

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points 1-13 13 14 3 15 8 16 4 17 10 18 9 19 7 20 3 21 16 22 2 Total 75 1 Multiple choice questions (1 point each) 1. Look at

More information

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight AP Statistics Chapter 9 Re-Expressing data: Get it Straight Objectives: Re-expression of data Ladder of powers Straight to the Point We cannot use a linear model unless the relationship between the two

More information

Lecture 1 : Basic Statistical Measures

Lecture 1 : Basic Statistical Measures Lecture 1 : Basic Statistical Measures Jonathan Marchini October 11, 2004 In this lecture we will learn about different types of data encountered in practice different ways of plotting data to explore

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 4. Linear Regression 4.1 Introduction So far our data have consisted of observations on a single variable of interest.

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Algebra I. Mathematics Curriculum Framework. Revised 2004 Amended 2006

Algebra I. Mathematics Curriculum Framework. Revised 2004 Amended 2006 Algebra I Mathematics Curriculum Framework Revised 2004 Amended 2006 Course Title: Algebra I Course/Unit Credit: 1 Course Number: Teacher Licensure: Secondary Mathematics Grades: 9-12 Algebra I These are

More information