Introduction to Statistics

Similar documents
Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

2011 Pearson Education, Inc

Preliminary Statistics course. Lecture 1: Descriptive Statistics

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Describing distributions with numbers

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Descriptive Statistics C H A P T E R 5 P P

Example 2. Given the data below, complete the chart:

1. Exploratory Data Analysis

Section 3. Measures of Variation

Chapter 3. Data Description

Units. Exploratory Data Analysis. Variables. Student Data

STATISTICS. 1. Measures of Central Tendency

Describing distributions with numbers

Frequency Distribution Cross-Tabulation

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

SUMMARIZING MEASURED DATA. Gaia Maselli

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

P8130: Biostatistical Methods I

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Chapter 3. Measuring data

MATH 1150 Chapter 2 Notation and Terminology

Statistics for Managers using Microsoft Excel 6 th Edition

Unit 2: Numerical Descriptive Measures

Unit 2. Describing Data: Numerical

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Chapter 1 - Lecture 3 Measures of Location

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Measures of Central Tendency

DESCRIPTIVE STATISTICS

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

Introduction to Statistics

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Chapter 4. Displaying and Summarizing. Quantitative Data

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

A is one of the categories into which qualitative data can be classified.

Chapter 7: Statistics Describing Data. Chapter 7: Statistics Describing Data 1 / 27

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Math 14 Lecture Notes Ch Percentile

MgtOp 215 Chapter 3 Dr. Ahn

Section 3.2 Measures of Central Tendency

2.1 Measures of Location (P.9-11)

Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers

Sets and Set notation. Algebra 2 Unit 8 Notes

Statistical Methods. by Robert W. Lindeman WPI, Dept. of Computer Science

Histograms allow a visual interpretation

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Statistics I Chapter 2: Univariate data analysis

STATISTICS INDEX NUMBER

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

All the men living in Turkey can be a population. The average height of these men can be a population parameter

Measures of the Location of the Data

Descriptive Statistics

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)

Statistics I Chapter 2: Univariate data analysis

Chapter 1. Looking at Data

TOPIC: Descriptive Statistics Single Variable

Chapter 3 Data Description

MATH 117 Statistical Methods for Management I Chapter Three

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition

Lecture 3: Chapter 3

MATH 10 INTRODUCTORY STATISTICS

Scales of Measuement Dr. Sudip Chaudhuri

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Elementary Statistics

Chapter 1:Descriptive statistics

CHAPTER 1. Introduction

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Determining the Spread of a Distribution

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

3.1 Measure of Center

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Introduction to Basic Statistics Version 2

Determining the Spread of a Distribution

Class 11 Maths Chapter 15. Statistics

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Learning Objectives for Stat 225

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

Descriptive Univariate Statistics and Bivariate Correlation

Chapter 5. Understanding and Comparing. Distributions

Foundations of Algebra/Algebra/Math I Curriculum Map

Sampling, Frequency Distributions, and Graphs (12.1)

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Chapter 1: Exploring Data

Transcription:

Introduction to Statistics By A.V. Vedpuriswar October 2, 2016

Introduction The word Statistics is derived from the Italian word stato, which means state. Statista refers to a person involved with the affairs of state. Therefore, statistics useful to the Statista. originally meant the collection of facts 1

Nominal Scale, Ordinal Scale Nominal Scale: In the Nominal Scale of measurement, numbers are used simply as labels for groups or classes. Ordinal Scale: In the ordinal scale of measurement, data elements may be ordered according to their relative size or quality. We do not know how much better one element is than others, only that it is better. 2

Interval Scale, Ratio Scale Interval Scale: In the interval scale of measurement, we can assign a meaning to distances between any two observations. The distances between elements can be measured in units. Ratio Scale: The ratio scale is the most sophisticated scale of measurement. Here not only do distances between paired observations have a meaning, but so do the ratios of the distances. 3

Samples and Populations The population consists of the set of which we are interested. all measurements in The population is also called the Universe. A sample is a subset of measurements selected from the population. If sampling is done randomly, such that every possible sample of n elements will have an equal chance of being selected, it is called a simple random sample, or just a random sample. 4

Statistical Inference A conclusion drawn about a population based on the information in a sample from the population is called a statistical inference. 5

Percentiles and Quartiles The P th percentile of a group of numbers is that value below which lie P% (P percent) of the numbers in the group. Quartiles are the percentiles that break down the data set into quarters first, second, third and fourth quarter. First quartile is the 25 th percentile, below which lie 25% of the data. Median is the 50 th percentile, below which lie half the data. Third quartile is the 75 th percentile, below which lie 75% of the data. Interquartile range is the difference between first and third quartiles. 6

Frequency and Histogram The number of times a data point occurs in a data set is called frequency. Relative frequency is the frequency divided by the total frequency. Data points are often classified into class intervals. The number of data points lying within each class interval is the frequency of the interval. A histogram is a plot of the frequencies of the class intervals. 7

Frequency Polygons and Ogives A frequency polygon is similar to a histogram except that there are no rectangles, only a point in the middle of each interval at a height proportional to the frequency of the category of the interval. By adding up the frequencies, we get the cumulative frequency. An ogive is a cumulative-frequency (or cumulative relativefrequency) graph. An ogive starts at 0 and goes to 1.00 (for a relative-frequency ogive) or to the maximum cumulative frequency. 8

Box Plots A box plot is a set of five summary measures of the distribution of the data: The median of the data The lower quartile The upper quartile The smallest observation The largest observation 9

Measures of Central Tendency The median lies at the center of the data. Half the data lie below it and half above it. The median is thus a measure of centrality. The mode of the data set is the value that occurs most frequently. The mean of a set of observations is their average. It is equal to the sum of all observations divided by the number of observations in the set. 10

More about the Mean The Mean is the most commonly used measure of central tendency. The mean summarizes all of the information in the data. The mean is the point where all the mass of the observations is concentrated. It is the centre of mass of the data. 11

Mean vs Median The mean is based on information contained in all the observations in the data set, rather than being an observation lying in the middle of the set. The mean also has some desirable mathematical properties that make it useful in statistical inference. In cases where we want to guard against the influence of a few outlying observations (called outliers), however, we may prefer to use the median. The median is resistant to extreme observations. 12

Mean vs Mode The mode is less useful than the mean or even the median. There may be several modes in a data set. If a data set or population is symmetric and if the distribution of the observations has only one mode, then the mode, the median, and the mean are all equal. 13

Range The range of a set of observations is the difference between the largest observation and the smallest observation. The range may get distorted due to outliers. The interquartile range is more resistant to extreme observations. 14

Variance and standard deviation Variance = (1/N) X X i - m) 2 The variance and the standard deviation are more useful than the range and the interquartile range. Like the mean, they use the information contained in all the observations in the data set or population. We square the deviations to ensure that the positive and negative deviations do not cancel each other. We work a lot with variance because it has an additive property. We need the standard deviation because it has the same unit as the variable. 15

Skewness Skewness is a measure of the degree of asymmetry of a frequency distribution. Skewness = (1/s 3 ) X (X i m) 3 A distribution which stretches to the right more than it does to the left is right-skewed. Similarly, a left-skewed distribution is one that stretches asymmetrically to the left. Generally, for a right-skewed distribution, the mean is to the right of the median, which in turn lies to the right of the mode (assuming a single mode). The opposite is true for left-skewed distributions. - 16

Kurtosis Kurtosis is a measure of the flatness (versus peakedness) of a frequency distribution. Kurtosis = (1/s 4 ) X (X i - m) 4. X i is the value of the variable, s is the standard deviation and m is the mean. Flat distributions are called platykurtic. Peaked distributions are called leptokurtic. Neutral distributions not too flat and not too peaked are called mesokurtic. 17

Chebyshev s Theorem A mathematical theorem attributed to Chebyshev has established the following rules: (1) At least ¾th of the observations in a data set will lie within 2 standard deviations of the mean. (2) At least 8/9th of the observations in a set will lie within 3 standard deviations of the mean. (3) In general, the rule states that at least (1 1/k 2 ) of the observations will lie within k standard deviations of the mean. 18

Useful Empirical Rules If the distribution of the data is mound shaped that is, if the histogram of the data is more or less symmetric with a single mode or high point then the following rules will apply. (1) Approximately 68% of the observations will be within 1 standard deviation of the mean. (2) Approximately 95% of the observations will be within 2 standard deviations of the mean. (3) A vast majority of the observations (all of them, or almost all of them) will be within 3 standard deviations of the mean. 19