For nominal data, we use mode to describe the central location instead of using sample mean/median.

Similar documents
Elementary Statistics

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

Chapter 2 Descriptive Statistics

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Topic 9: Sampling Distributions of Estimators

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 7: Properties of Random Samples

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Sampling Distributions, Z-Tests, Power

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting


BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

MATH/STAT 352: Lecture 15

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Parameter, Statistic and Random Samples

Probability and statistics: basic terms

Median and IQR The median is the value which divides the ordered data values in half.

An Introduction to Randomized Algorithms

(6) Fundamental Sampling Distribution and Data Discription

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Introducing Sample Proportions

Chapter 8: Estimating with Confidence

Random Variables, Sampling and Estimation

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

z is the upper tail critical value from the normal distribution

Expectation and Variance of a random variable

Lecture 2: Monte Carlo Simulation

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Introducing Sample Proportions

Common Large/Small Sample Tests 1/55

Census. Mean. µ = x 1 + x x n n

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Sample Size Determination (Two or More Samples)

Module 1 Fundamentals in statistics

NCSS Statistical Software. Tolerance Intervals

Homework 5 Solutions

MATH CALCULUS II Objectives and Notes for Test 4

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

HUMBEHV 3HB3 Measures of Central Tendency & Variability Week 2

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Machine Learning Assignment-1

PRACTICE PROBLEMS FOR THE FINAL

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Chapter 2 The Monte Carlo Method

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Chapter 1 (Definitions)

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Topic 5 [434 marks] (i) Find the range of values of n for which. (ii) Write down the value of x dx in terms of n, when it does exist.

Confidence Intervals

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Chapter 23: Inferences About Means

(7 One- and Two-Sample Estimation Problem )

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date:

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

DISTRIBUTION LAW Okunev I.V.

Measures of Spread: Variance and Standard Deviation

Chapter 4 - Summarizing Numerical Data

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Final Review for MATH 3510

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Lecture 24 Floods and flood frequency

Binomial Distribution

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Stat 225 Lecture Notes Week 7, Chapter 8 and 11

To make comparisons for two populations, consider whether the samples are independent or dependent.

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Chapter 18: Sampling Distribution Models

Department of Mathematics

Advanced Engineering Mathematics Exercises on Module 4: Probability and Statistics

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

ANALYSIS OF EXPERIMENTAL ERRORS

x 2 x x x x x + x x +2 x

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Topic 10: Introduction to Estimation

Frequentist Inference

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

GG313 GEOLOGICAL DATA ANALYSIS

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Transcription:

Summarizig data Summary statistics for cetral locatio. Sample mea ( 樣本平均 ): average; ofte deoted by X. Sample media ( 樣本中位數 ): the middle umber or the average of the two middle umbers for the sorted data. Sample media is less sesitive to etreme values i the data tha the sample mea. Eample. Cosider the sample (2, 7, ). Fid the sample mea ad the sample media. Sol. The sample mea is (2 + 7 + ) = 4. The sorted data are 2,, 7, so the middle umber is, which is the sample media. Software sol. To fid the sample mea ad media for the sample (2, 7, ) usig R, at the R prompt, eter <- c(2,7,); mea(); media() The R returs the sample mea ad sample media. For omial data, we use mode to describe the cetral locatio istead of usig sample mea/media. Eample 2. Studets may get to school by various meas. Below is a summary table of trasportatio tools for a class of 0 studets, where the codig rule for differet tools is as follows: for bus, 2 for feet, for motorcycle ad 4 for other. The mode for the sample is. trasportatio tool 2 4 cout 0 8 7 5 Summary statistics for dispersio. sample mea X. Let (X,..., X ) be a sample with Mea deviatio: X i X. Sample variace ( 樣本變異數 ): ( (X i X) 2 = ) Xi 2 X 2. i= i= i=

Sample stadard deviatio ( 樣本標準差 ): sample variace. Eample. Cosider the sample (2, 7, ). Fid the mea deviatio, the sample variace ad sample stadard deviatio. Sol. From Eample, the sample mea for the sample (2, 7, ) is 4, so the mea deviatio is the sample variace is ( 2 4 + 7 4 + 4 ) = 2, ( (2 4) 2 + (7 4) 2 + ( 4) 2) = ( 2 2 + 7 2 + 2 4 2) = 7 ad the sample stadard deviatio is 7 2.64575. To fid the sample variace ad sample stadard deviatio for the sample (2, 7, ) usig R, at the R prompt, eter <- c(2,7,); var(); sd() The R returs the sample variace ad sample stadard deviatio. Chebyshev s Theorem. For a sample (X,..., X ) with sample mea X ad sample stadard deviatio S, ( umber of Xi s such that X i X S ) 2. Eample 4. Suppose that we have a sample of 000 eam scores, where the sample mea ad sample stadard deviatio are 75 ad 2 respectively. At least what percet of the scores are betwee 70 ad 80? Sol. Note that (80 75)/2 = 2.5 ad (70 75)/2 = 2.5, so the rage 75 ± (2.5)(2) is the rage from 70 to 80. Tae = 2.5 ad apply Chebyshev s Theorem, the at least /(2.5) 2 = 84% of the scores are withi the rage 75±(2.5)(2), so at least 84% of the scores are betwee 70 ad 80. Eample 5. Suppose that we have a sample of 000 eam scores, where the sample mea ad sample stadard deviatio are 75 ad 2 respectively. Fid a rage that covers at least 80% of the scores. Sol. Solvig / 2 = 0.8 gives = 5. By Chebyshev s Theorem, at least 80% of the scores are i the rage from 75 2 5 70.52786 to 75 + 2 5 79.4724. Histogram costructio for a sample (X,..., X ) based o Scott s rule. 2

. Determie : the umber of classes. Choose to be the smallest umber such that (24, π) / S /.5 S / where S is the sample stadard deviatio. 2. Determie the class width (called class iterval i the tet). Let I be the class width, the I = or I ca be the smallest umber so that I ad I is a multiple of I 0, where I 0 is chose for coveiece (usually 0 or 00).. Determie the class limits for each class. Remars. Put approimately equal amouts of the ecess i each of the two tails. Use coveiet class limits; mae the lower limit of the first class a multiple of the class width if possible. I the tetboo, is chose so that 2, which was suggested by Sturges (926). Scott (979) proposed to use class width (24 π) / σ /, where σ ca be estimated by the sample stadard deviatio S. The costat (24 π) / σ i Scott s rule is chose to miimize the itegrated mea squared error for the ormalized histogram as a desity estimator whe the sample is a radom sample from a ormal distributio (we will lear about desity, radom sample ad ormal distributio later). Note that Steps 2 ad ca be simplified by taig the class width I =, but here we tae the class width ad class limits to be a multiple of I 0 to mae it easier to read the resultig frequecy table.

Eample 6. For a sample of size 999 with miimum 5546, maimum 5925 ad sample stadard deviatio 4.289, determie the umber of classes for drawig a histogram usig Scott s rule. Sol. Choosig the smallest such that gives = 8. 5925 5546 (24 π) / 4.289 (999) / 5925 5546 7.6.5 4.289 (999) / Drawig a histogram usig R. Suppose that the sample has bee geerated ad stored i a vector i R by ruig <- qorm(seq(0.00, -0.00, 0.00))*20000/6 <- -mi()+5546; <- c([<5925], 5925) Below are the R codes for drawig a histogram for based o Scott s rule. c.5 <- (24*sqrt(pi))^(/) width <- c.5*sd()*legth()^(-/) rage <- ma()-mi() <- ceilig(rage/width) brs <- seq(mi(), by=rage/, legth.out=+) hist(, breas=brs) For a histogram that shows a shape with a uique pea (the mode), we ca tell from the histogram. the cetral locatio of the data, 2. the rage for most of the data (for eample the rage for the middle 50% of the data), ad. whether the shape is symmetric about the pea. If the shape of the histogram is essetially symmetric about the pea, the the mode ad the media for the bied data are approimately the same. It is atural to use the pea locatio as the cetral locatio of the data. If the histogram is essetially asymmetric, the the mode ad the media are ot the same. For a histogram that shows more tha oe pea, we ca still tell where most of the data are located from the histogram. Try to determie the cetral locatio(s) ad the rage for most of the data for each of the followig histogram. Left-upper histogram. Mode ad Media: 0. At least 50% of the data are betwee -.5 ad.5. All data are betwee -4 ad 4. 4

Histogram of Histogram of 0 00 200 00 400 500 600 700 0 00 200 00 400 500 600 4 2 0 2 0.002 0.000 0.002 0.004 Histogram of Histogram of 0 00 200 00 400 500 600 0 200 400 600 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Right-upper histogram. Mode ad Media: 0. At least 50% of the data are betwee -0.005 ad 0.005. All data are betwee -0.004 ad 0.004. Left-bottom histogram. Mode: 0.286. Media > 0.286. At least 50% of the data are betwee 0.2 ad 0.6. All data are betwee 0 ad. Right-bottom histogram. Most data are ear -2 or 2. At least 25% of the data are betwee - ad - ad at least aother 25% of the data are betwee ad. All data are betwee -6 ad 6. Refereces [] D. W. Scott, O optimal ad data-based histograms, Biometria, 66 (979), pp. 605 60. [2] H. A. Sturges, The choice of a class iterval, Joural of the America Statistical Associatio, 2 (926), pp. 65 66. 5