Statistics I Chapter 1: Introduction

Similar documents
Statistics I Chapter 1: Introduction

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Analysis of univariate data

Statistics I Chapter 3: Bivariate data analysis

Statistics 301: Probability and Statistics Introduction to Statistics Module

All the men living in Turkey can be a population. The average height of these men can be a population parameter

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Stages in scientific investigation: Frequency distributions and graphing data: Levels of measurement:

Vehicle Freq Rel. Freq Frequency distribution. Statistics

Lecture Notes 2: Variables and graphics

Lectures of STA 231: Biostatistics

ECON1310 Quantitative Economic and Business Analysis A

Introduction to Basic Statistics Version 2

Chapter 2: Tools for Exploring Univariate Data

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Draft Proof - Do not copy, post, or distribute

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

CHAPTER 1. Introduction

Applied Statistics in Business & Economics, 5 th edition

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Descriptive Statistics-I. Dr Mahmoud Alhussami

Section 2.1 ~ Data Types and Levels of Measurement. Introduction to Probability and Statistics Spring 2017

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Introduction to Probability and Statistics Slides 1 Chapter 1

Statistics. Industry Business Education Physics Chemistry Economics Biology Agriculture Psychology Astronomy, etc. GFP - Sohar University

FREQUENCY DISTRIBUTIONS AND PERCENTILES

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics

Biostatistics Presentation of data DR. AMEER KADHIM HUSSEIN M.B.CH.B.FICMS (COM.)

Stochastic calculus for summable processes 1

Chapter 2. Mean and Standard Deviation

Lecture 25. STAT 225 Introduction to Probability Models April 16, Whitney Huang Purdue University. Agenda. Notes. Notes.

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Part 7: Glossary Overview

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

Data Analysis and Statistical Methods Statistics 651

The science of learning from data.

3/30/2009. Probability Distributions. Binomial distribution. TI-83 Binomial Probability

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

Practice problems from chapters 2 and 3

Math 201 Statistics for Business & Economics. Definition of Statistics. Two Processes that define Statistics. Dr. C. L. Ebert

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.

CIVL 7012/8012. Collection and Analysis of Information

Example 2. Given the data below, complete the chart:

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Lecture 1: Descriptive Statistics

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

Essentials of Statistics and Probability

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

BASIC CONCEPTS C HAPTER 1

Σ x i. Sigma Notation

3.1 Measure of Center

Histograms allow a visual interpretation

MEASURES OF LOCATION AND SPREAD

Introduction to Statistics

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

ECLT 5810 Data Preprocessing. Prof. Wai Lam

A SHORT INTRODUCTION TO PROBABILITY

Scales of Measuement Dr. Sudip Chaudhuri

DESCRIPTIVE STATISTICS

A is one of the categories into which qualitative data can be classified.

Probability Distributions

University of Jordan Fall 2009/2010 Department of Mathematics

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

Vocabulary: Samples and Populations

Multiple Choice. Chapter 2 Test Bank

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Chapter 01 : What is Statistics?

Statistical Process Control

Basic Statistics and Probability Chapter 3: Probability

THE SAMPLING DISTRIBUTION OF THE MEAN

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Tabulation means putting data into tables. A table is a matrix of data in rows and columns, with the rows and the columns having titles.

LC OL - Statistics. Types of Data

Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. H.G. Wells

Goodness of Fit Tests

S1600 #2. Data Presentation #1. January 14, 2016

psychological statistics

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

Frequency Distribution Cross-Tabulation

Chapitre 3. 5: Several Useful Discrete Distributions

Revision Topic 13: Statistics 1

Descriptive Statistics Methods of organizing and summarizing any data/information.

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Calculus for the Life Sciences

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Atomic structure. Resources and methods for learning about these subjects (list a few here, in preparation for your research):

Author : Dr. Pushpinder Kaur. Educational Statistics: Mean Median and Mode

Week 1: Intro to R and EDA

CISC 1100: Structures of Computer Science

Sampling Populations limited in the scope enumerate

Chapter (3) Describing Data Numerical Measures Examples

Chapter 2: Summarizing and Graphing Data

Teaching Research Methods: Resources for HE Social Sciences Practitioners. Sampling

Transcription:

Statistics I Chapter 1: Introduction Chapter 1: Introduction Contents What is Statistics? - definition Key-words: population, parameter, sample, statistic, population size, sample size, individuals, objects Types of variables: categorical (ordinal, nominal) and numerical (discrete, continuous) Why sample? Definition of a simple random sample Frequencies and frequency distribution/table: absolute, absolute cumulative, relative, relative cumulative. Properties.

Chapter 1: Introduction Recommended reading Peña, D., Romo, J., Introducción a la Estadística para las Ciencias Sociales Chapters 1, 2, 3 Newbold, P. Estadística para los Negocios y la Economía (2009) Chapter 1 Sections 2.1, 2.4, 2.7. How to lie with Statistics Definition of Statistics Def. Statistics is a science that deals with: collecting, organizing, summarizing, presenting, interpreting, processing data to transform data into information predictions, forecasts, estimation Descriptive Statistics Inferential Statistics On what occasions did you hear/saw word statistics? football/tennis match summary unemployment rates, number of people injured in car accidents There is much more to statistics than percentages and counts!

Key-words A population is the complete collection of all items/individuals/objects/subjects of interest or under investigation N represents the population size A sample is an observed subset of the population, typically chosen to investigate the properties of a parent population n represents the sample size A parameter is a specific characteristic of a population (fixed) A statistic is a specific characteristic of a sample (varies from sample to sample) A variable is a characteristic of an individual Examples Pop: all students at UC3M Variable: height (0, ) Param: Average height of all students Statistic: Average height of sampled students Pop: all fish in a sea Variable: size {L, M, S} Param: Number of small fish in the entire sea Statistic: Number of small fish caught Pop: all patients of Getafe Hospital Variable: blood type {A,B,AB,O} Param: Percentage of all patients with AB Statistic: Percentage of sampled patients with AB Pop: all Philip s light-bulbs Variable: life-expectancy in days {0, 1, 2,...} Param: Variation in life-expectancy of all light-bulbs Statistic: Variation in life-expectancy of sampled light-bulbs

Types of data Data (Variable) Categorical (Qualitative) Numerical (Quantitative) Ordinal Nominal Discrete Continuous classes can be ranked no natural order integer nonintegers Example Example Example Example Clothes size: Blood type: # of children: Height: L>M>S A,B,AB,O 0,1,2,... 1.55cm, 1.71cm Notation: Letters X, Y, Z are typically used. Example: X = height in cm (upper-case letters in definition) x = 1.55 (lower-case letters for specific values) x 1 = 1.55, x 2 = 1.71 (add subscripts if more than one) Why sample? In practice we don t study the population because: We may destroy the population (eg. life-expectancy of a light-bulb) Population may exist as a concept but not in reality (eg. population of defective items) Impractical (eg. population of all fish in a sea) Too expensive Too time consuming

Definition of a simple random sample (SRS) Def. Simple random sample is obtained in such a way that each member of the population is chosen strictly by chance each member of the population is likely to be chosen, and every possible sample of n objects is equally likely to be chosen Notation: Sample of size n from a variable X means that: We have n individuals selected at random from a population For each of the individuals we report the value of the variable X If X is categorical or discrete, it is convenient to write the different sample values that X takes as x 1, x 2,..., x k, k n (ranked from the smallest to the largest, unless X is nominal) Frequencies and frequency distribution Def. A frequency distribution is a list or a table... containing class groupings (categories or ranges within which the data fall)... and the corresponding frequencies with which data fall within each class or category Frequencies: absolute (number of times the value appeared in the sample) relative (proportion of times the value appeared in the sample)

Why use frequency distributions? A frequency distribution is a way to summarize data The distribution condenses the raw data into a more useful form... and allows for a quick visual interpretation of the data Grouping by classes: categorical and discrete data Note: Cumulative Cumulative Absolute Relative Absolute Relative Class, x i Freq, n i Freq, f i Freq, N i Frequency, F i x 1 n 1 f 1 = n 1 N 1 = n 1 F 1 = f 1 x 2 n 2 f 2 = n 2 n N 2 = N 1 + n 2 F 2 = F 1 + f 2..... x k n k f k = n k n N k = n F k = 1 Total n 1 empty empty n i = number of x i in the sample, f i = number of x i n N i = N i 1 + n i, F i = F i 1 + f i 0 f i, F i 1 F i and N i do not make sense for categorical-nominal variables

Grouping by classes Example 1: The data below shows blood types reported for a sample of 40 individuals. AB, A, B, O, A, A, A, B, O, AB, B, O, B, B, B, A, A, A, AB, B, O, A, A, A, AB, AB, O, B, B, AB, O, B, O, O, A, A, O, B, AB, AB What kind of variable is blood type? Find a frequency distribution of the data. What percentage of the sampled people have blood type A? What percentage of the individuals have blood type other than O? Grouping by classes Example 1 cont.: Categorical, nominal with 4 different classes. The frequency distribution is: 30% 100% 22.5% = 77.5% Absolute Relative Class Frequency Frequency A 12 0.300 B 11 0.275 AB 8 0.200 O 9 0.225 Total 40 1

Grouping by classes Example 2: The table below shows different levels of satisfaction (S=satisfied, V=very, U=unsatisfied) for 901 employees. Absolute Class Frequency VU 62 U 108 S 319 VS 412 Total 901 What type of variable is being studied? Find a frequency distribution of the data. What percentage of the sampled people are satisfied? How many individuals are unsatisfied or worse? In %? How many individuals are at least satisfied? In %? Grouping by classes Example 2 cont.: Categorical, ordinal with 4 different classes. The frequency distribution is: Cumulative Cumulative Absolute Relative Absolute Relative Class Frequency Frequency Frequency Frequency VU 62 0.07 62 0.07 U 108 0.12 170 0.19 S 319 0.35 489 0.54 VS 412 0.46 901 1 Total 901 1 35% 170, 19% 319 + 412 = 731 or 901 170 = 731, 35% + 46% = 81% or 100% 19% = 81%

Grouping by classes Example 3: To evaluate the performance of a new pesticide, a sample of 50 plants, from those treated by the new pesticide, was selected. The number of leaves attacked by a pest was counted for each of the sampled plants. The results are shown below. Absolute x i Frequency 0 6 1 10 2 12 3 8 4 5 5 4 6 3 8 1 10 1 Total 50 Grouping by classes Example 3 cont.: What can you say about the variable in the study? Find its frequency distribution. What percentage of the sampled plants had only 3 leaves attacked? How many plants had no more than 3 leaves attacked? How many plants had at least 6 leaves attacked? What percentage of plants have between 3 and 5 leaves attacked? What percentage of plants had at least 8 leaves attacked? What percentage of plants had at most 2 leaves attacked?

Grouping by classes Example 3 cont.: Numerical, discrete with 9 different values. The frequency distribution is: Cumulative Cumulative Absolute Relative Absolute Relative x i Frequency Frequency Frequency Frequency 0 6 0.12 6 0.12 1 10 0.20 16 0.32 2 12 0.24 28 0.56 3 8 0.16 36 0.72 4 5 0.10 41 0.82 5 4 0.08 45 0.90 6 3 0.06 48 0.96 8 1 0.02 49 0.98 10 1 0.02 50 1 Total 50 1 Grouping by classes Example 3 cont.: 16% 36 3 + 1 + 1 or 50 45 = 5 16% + 10% + 8% = 34% or (8 + 5 + 4)/50 = 34% 2% + 2% = 4% or 100% 96% = 4% 56%

Grouping by class intervals: continuous (and discrete) data Note: Class Interval Midpoint [l i 1, l i ) x i = l i +l i 1 2 n i f i N i F i [l 0, l 1 ) x 1 n 1 f 1 N 1 F 1 [l 1, l 2 ) x 2 n 2 f 2 N 2 F 2...... [l k 1, l k ] x k n k f k n 1 Total n 1 empty empty Left end-point is included, but right end-point is excluded (typical convention) Reverse end-point convention can be applied - check your software for definition Useful for tabulating discrete data if X takes many values Grouping by class intervals: continuous (and discrete) data Very often class intervals have the same width Determine the width w of each interval by w = largest number - smallest number number of desired intervals How many intervals? Roughly between 5 and 20. More specifically: k n if n is small k 1 + 3.22 log(n) if n is large Intervals never overlap Round up the interval width to get desirable interval endpoints

Grouping by class intervals: continuous (and discrete) data Example 4: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature (in Fahrenheit) 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27 Find the frequency distribution of the data. Sort raw data in ascending order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Find range: 58 12 = 46 Select number of classes: say k = 5 Compute interval width: 10 (46/5 then round up) Determine the end-points: 10 but less than 20, 20 but less than 30, etc Count the observations and assign to classes Grouping by class intervals: continuous (and discrete) data Example 4 cont.: Class Interval Midpoint n i f i N i F i [10, 20) 15 3 0.15 3 0.15 [20, 30) 25 6 0.30 9 0.45 [30, 40) 35 5 0.25 14 0.70 [40, 50) 45 4 0.20 18 0.90 [50, 60] 55 2 0.10 20 1 Total 20 1 On how many days the temperature was below 30F? In %? (3 + 6 = 9, which is 45%) On how many days (approximately) the temperature was at least 45F? In %? (2 + 4 45 40 50 40 = 4, which is 20%)