Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Similar documents
CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Median and IQR The median is the value which divides the ordered data values in half.

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Chapter 2 Descriptive Statistics

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "


ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

Parameter, Statistic and Random Samples

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Elementary Statistics

Expectation and Variance of a random variable

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Chapter 1 (Definitions)

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Chapter 8: Estimating with Confidence

Measures of Spread: Variance and Standard Deviation

2: Describing Data with Numerical Measures

(6) Fundamental Sampling Distribution and Data Discription

STP 226 EXAMPLE EXAM #1

1 Lesson 6: Measure of Variation

Computing Confidence Intervals for Sample Data

Lecture 2: Monte Carlo Simulation

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Topic 10: Introduction to Estimation

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

MEASURES OF DISPERSION (VARIABILITY)

Summarizing Data. Major Properties of Numerical Data

Read through these prior to coming to the test and follow them when you take your test.

Formulas and Tables for Gerstman

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

Chapter 4 - Summarizing Numerical Data

Chapter 6 Sampling Distributions

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Statistics 511 Additional Materials

Probability and statistics: basic terms

4.1 Sigma Notation and Riemann Sums

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Lecture 2: Concentration Bounds

Solution to selected problems in midterm exam in principal of statistics PREPARED BY Dr. Nafez M. Barakat Islamic university of Gaza

Statistical Intervals for a Single Sample

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Stat 225 Lecture Notes Week 7, Chapter 8 and 11

MATH/STAT 352: Lecture 15

Final Examination Solutions 17/6/2010

Sample Size Determination (Two or More Samples)

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Data Analysis and Statistical Methods Statistics 651

(7 One- and Two-Sample Estimation Problem )

Homework 5 Solutions

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Random Variables, Sampling and Estimation

Measures of Variation

For nominal data, we use mode to describe the central location instead of using sample mean/median.

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Topic 9: Sampling Distributions of Estimators

Chapter 23: Inferences About Means

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Census. Mean. µ = x 1 + x x n n

Frequentist Inference

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Chapter 18 Summary Sampling Distribution Models

Confidence Intervals

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

CURRICULUM INSPIRATIONS: INNOVATIVE CURRICULUM ONLINE EXPERIENCES: TANTON TIDBITS:

a. For each block, draw a free body diagram. Identify the source of each force in each free body diagram.

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Topic 9: Sampling Distributions of Estimators

Confidence Intervals รศ.ดร. อน นต ผลเพ ม Assoc.Prof. Anan Phonphoem, Ph.D. Intelligent Wireless Network Group (IWING Lab)

DAWSON COLLEGE DEPARTMENT OF MATHEMATICS 201-BZS-05 PROBABILITY AND STATISTICS FALL 2015 FINAL EXAM

Topic 9: Sampling Distributions of Estimators

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

Simulation. Two Rule For Inverting A Distribution Function

Applied Statistics Part 2: mathematical statistics, r c-cross tables and nonparametric methods.

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Common Large/Small Sample Tests 1/55

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Confidence Intervals QMET103

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Analysis of Experimental Data

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Lecture 1 Probability and Statistics

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Lecture 24 Floods and flood frequency

Describing the Relation between Two Variables

Transcription:

Lecture Mai Topics: Defiitios: Statistics, Populatio, Sample, Radom Sample, Statistical Iferece Type of Data Scales of Measuremet Describig Data with Numbers Describig Data Graphically. Defiitios. Example (uemploymet): Suppose we wat to kow the uemploymet rate i the coutry. This is the umber of uemployed people divided by the umber of people i the labor force. This is estimated by radomly selectig ad surveyig approximately 6000 adults. The uemploymet rate for these 6000 adults is used to estimate the uemploymet rate for the whole coutry. Statistics: A sciece of iformatio. Populatio: The populatio is the collectio of all subjects we re iterested i studyig. Sample: The sample is a subset of populatio. Radom Sample: If a sample is selected radomly, that is, every subject i the populatio has the same chace to be chose, the the sample is called radom sample. Statistical iferece: Drawig coclusios about the populatio based o the sample.. Type of Data.

() Why worry about the type of data? Differet methods of aalysis are appropriate for differet types of data. () Types of data There are two mai types of data: Qualitative (Categorical) data: It coveys a quality. Examples of qualitative data: Occupatio Geder Studet s major Political affiliatio Quatitative (Numerical) data: It coveys a quatity. Examples of quatitative data: Icome i dollars Number of employees i a compay Commutig distace i miles (3) Cautio Qualitative data ca cosist of umbers. For example, we might code me as ad wome as 0, or we may code quality of product, for excellet, for good, 0 for defective. But, of course, computig thigs like meas does t make sese. 3. Scales of Measuremet. There are four geerally used scales of measuremet. From weakest to strogest, they are Nomial Scale: Ordial Scale: Iterval Scale: Ratio Scale:

4. Numerical Summaries of Data Summarizig a data set umerically ad graphically is very importat. Numerical summaries we ll lear about iclude: percetiles, mea, stadard deviatio. (). Percetiles Objective: Percetiles are maily used to describe the distributio of quatitative data. What is percetile? The P-th (0 P 00) percetile of a group of umbers is that value below which lie P% of the umbers i the group. Algorithm to fid the percetile. First order the data from smallest to largest. Secod, the positio of the P-th percetile is (+)P/00, where is the umber of observatios i the data set. If the positio is a whole umber, the the P-th percetile is the umber i that locatio; if the positio is ot a whole umber, take the weighted average of the two umbers surroudig the positio: let f be the fractioal part of the locatio ad let i be the greatest iteger less tha the positio. Let a be the umber at positio i ad let b be the umber at positio i+. The the P-th percetile is ( -f ) a + f b Example (Example -): A large departmet store collects data o sales made by each of its salespeople. The data, umber of sales made o a give day by each of 0 salespeople, are as follows: 9, 6,, 0, 3, 5, 6, 4, 4, 6, 7, 6, 4,,, 8, 9, 8, 0, 7 Fid the 50 th, 80 th, ad 90 th percetiles of this data set.

Commets: There are may differet algorithms for computig percetiles. The algorithm that we re usig is arguably ot the best. The differece i the algorithms fade as gets large, so we ll stick with the text algorithm. The 50 th percetile is also called the media. The 5 th percetile is also called the first quartile. The 75 th percetile is also called the third quartile. (). Mea Objective: Measure the cetral tedecy of the data set. Let x, x,, be the observatios i the data set, the mea of this data set is their K x average. More specific, x = x i i= Example: Calculate the mea of the observatios of Example -. Commets: Mea is ot resistat to the outliers; media is resistat to the outliers. To fully describe the data set, mea is ot eough Example: Two statistics classes take a exam. The first class has scores of 73, 74, 75, 76, 77; The secod class has scores of 50, 60, 75, 90, 00. Both classes have a mea score 70. But there is a big differece (the secod class scores are more variable ) that is ot reflected i the meas. (3). Rage, Iterquartile rage, variace ad stadard deviatio. Objective: Measure the variability of the data set. Measures of variability. Rage: Rage = Maximum - Miimum;

Iterquartile Rage: IQR = Third Quartile - First Quartile; Variace: s = i= ( x i x) = i= x i x i= i Stadard Deviatio: s = s Empirical Rule: For symmetric ad bell-shaped data.. About 68% of data withi oe stdev. of mea.. About 95% of data withi two stdevs. of mea. 3. About 99.7% of data withi three stdevs. of mea. Chebyshev s Rule: For ay data set.. At least 3/4 (75%) of data withi two stdevs. of mea.. At least 8/9 (89%) of data withi three stdevs. of mea. 3. I geeral, at least - /k of the data withi k stdevs. of mea. 4. Does't say aythig about oe stdev.

5. Describig Data Graphically Stem ad Leaf Plots (Applied to small umerical data set). Example: Here are Babe Ruth s home ru totals for the 5 years he played for the Yakees. 54 59 35 4 46 5 47 60 54 46 49 46 4 34 Here is a stem ad leaf plot of these data. Sometimes a back-to-back stem ad leaf plot allows us to quickly compare two data sets. Here is a back-to-back stem ad leaf plot of Babe Ruth s home ru totals ad Mickey Matle s home ru totals. Histogram (Applied to ay size umerical data set). How to create histogram?. Fid a lower boud, a, ad a upper boud, b, of the data set.. Divide the iterval [a, b] ito small subitervals (classes). Obviously, the legth of each iterval is (b-a)/. 3. Cout how may observatios fall ito each subiterval. (The cout is called frequecy) 4. Calculate the relative frequecy i each subiterval.

5. Costruct a x-y coordiate system. Put subitervals o x-axis, y-axis represets the relative frequecy. Over each iterval draw a bar with height beig equal to the frequecy, relative frequecy, or desity which is defied by: Desity = Relative Frequecy / Legth of the subiterval. Example (Mercury i lakes): Data were collected o mercury cocetratios (parts per millio) i 5 Florida lakes. Some of the data are.3, 7.00, 6.00, 0.44. Here are the data divided ito classes. Classes Number of Lakes 0 to 0 to 4 0 4 to 6 6 to 8 4 8 to 0 0 to 0 to 4 3 4 to 6 Here is a histogram of the mercury data. 0.0 Histogram of Mercury Level 0.5 Desity 0.0 0.05 0.00 0 4 6 8 0 Mercury Level 4 6 What we lear from the above histogram? No-symmetric shape May lakes with low mercury level, may lakes with high level, few i the middle. Levels are all betwee 0 ad 6 ppm.

Commets: Differet choices for the umber of subitervals lead to differet lookig histograms. Edpoits. - Q: Should 6.00 go ito the class 6 to 8 or the class 4 to 6? - A: Just be cosistet; if it goes ito 6 to 8, the 0.00 should go ito 0 to. This histogram is draw usig a desity scale i the y-axis. Sometimes, a frequecy or a relative frequecy scale is used. The shape is the same o matter what the scale. What we should look for from a histogram. Symmetric Symmetric ad bell-shaped Skewed to the right Skewed to the left Short tailed Log tailed Uimodal or Multimodal. Etc. Effect of shape o mea ad media. The mea gets pulled i the directio of the skewedess. For right-skewed data, the mea is greater tha the media. For the left-skewed data, the mea is less tha the media. Box plots. How to draw a box plot? Box exteds from first quartile to third quartile. Lie draw at media Whiskers exted from the upper quartile ad lower quartile to the largest ad smallest observatios withi a distace of.5*iqr. Poits outside this rage are called outliers ad are plotted separately. Example. Here are data o stadardized readig scores of 5 th graders. 48 67 73 8 83 86 9 93 94 94 94 95 97 98 98 99 00 0 0 0 03 05 06 07 08 5 7 3 34 49 Draw a box plot for this data set.

Bar Charts ad Pie Charts Example: The followig is the frequecy table of the racial compositio of Igham Couty, accordig to the 000 cesus. Note that the relative frequecy of a category is just the proportio of the data that are i that category. Race Frequecy Relative Frequecy White 935 0.975 Black or Afr. Am. 30340 0.09 Am. Idia or Alaska Native 58 0.005 Asia 073 0.037 Native Hawaiia etc. 43 0.00 Other 6746 0.04 Two or more races 8355 0.09 Total 7930.0 Followig are a pie chart ad a bar chart of the data. Pie Chart of Race White Amid Asia Black Hawaiia other TwoOrMore 50000 Chart of Race 00000 50000 Cout 00000 50000 0 Amid Asia Black Hawaiia Race Other TwoOrMore White