Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition

Similar documents
Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Chapter 2: Tools for Exploring Univariate Data

Unit 2. Describing Data: Numerical

Chapter 1:Descriptive statistics

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Frequency Distribution Cross-Tabulation

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Chapter 1 - Lecture 3 Measures of Location

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

P8130: Biostatistical Methods I

Chapter 3. Data Description

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Math 221, REVIEW, Instructor: Susan Sun Nunamaker

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Describing distributions with numbers

Chapter 1. Looking at Data

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Contents. Acknowledgments. xix

Describing distributions with numbers

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Learning Objectives for Stat 225

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

A is one of the categories into which qualitative data can be classified.

Introduction to Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Example 2. Given the data below, complete the chart:

Determining the Spread of a Distribution

Measures of the Location of the Data

Determining the Spread of a Distribution

CIVL 7012/8012. Collection and Analysis of Information

SESSION 5 Descriptive Statistics

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

TOPIC: Descriptive Statistics Single Variable

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Statistics and parameters

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

20 Hypothesis Testing, Part I

Determining the Spread of a Distribution Variance & Standard Deviation

AP Final Review II Exploring Data (20% 30%)

MATH 1150 Chapter 2 Notation and Terminology

Chapter 3. Measuring data

Statistics I Chapter 2: Univariate data analysis

GRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards

Scales of Measuement Dr. Sudip Chaudhuri

Sets and Set notation. Algebra 2 Unit 8 Notes

Histograms allow a visual interpretation

Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2

Statistics for Managers using Microsoft Excel 6 th Edition

Describing Distributions With Numbers Chapter 12

Statistics Add Ins.notebook. November 22, Add ins

For instance, we want to know whether freshmen with parents of BA degree are predicted to get higher GPA than those with parents without BA degree.

2011 Pearson Education, Inc

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Statistics I Chapter 2: Univariate data analysis

Chapter 3 Data Description

Math 082 Final Examination Review

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

Descriptive Statistics-I. Dr Mahmoud Alhussami

STAT 200 Chapter 1 Looking at Data - Distributions

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics

MEASURING THE SPREAD OF DATA: 6F

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

MATH 117 Statistical Methods for Management I Chapter Three

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Using SPSS for One Way Analysis of Variance

Lecture 11. Data Description Estimation

The Normal Distribution. Chapter 6

Section 3. Measures of Variation

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Introduction to Statistics

Elementary Statistics

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

3.1 Measure of Center

MATH 10 INTRODUCTORY STATISTICS

Introduction to Statistics for Traffic Crash Reconstruction

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

1. Exploratory Data Analysis

Sampling, Frequency Distributions, and Graphs (12.1)

CHAPTER 2 Modeling Distributions of Data

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Describing Distributions

Lecture 1 : Basic Statistical Measures

PubHlth 540 Fall Summarizing Data Page 1 of 18. Unit 1 - Summarizing Data Practice Problems. Solutions

The science of learning from data.

Clinical Research Module: Biostatistics

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Statistics in medicine

Transcription:

Overview INFOWO Statistics lecture S1: Descriptive statistics Peter de Waal Introduction to statistics Descriptive statistics Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht Lecture S1: 1 / 48 Lecture S1: 2 / 48 Detailed Overview of the Statistics track Definition S1 Descriptive statistics S2 Scores and probability distributions S3 Hypothesis testing and t-test S4 More t-tests S5 Correlation and prediction M5 Homegeneity and reliability S6 Analysis of variance S7 Chi 2 -test Q&A lecture Statistics: The study of the collection, organization, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments. (from Wikipedia) Lecture S1: Statistics Introduction 3 / 48 Lecture S1: Statistics Introduction 4 / 48

Statistics are everywhere The usefulness of statistics For: Information, argumentation, infotainment, commercial Use of equipment for mobile internet Frequent e-shoppers by gender and age To contribute to the accuracy and reliability of the evidence we argue for our ideas... Summarise and systematise data. Interpret research findings on the basis of numbers: Is there a systematic factor behind observed differences? Are heavy Facebook users more assertive/aggressive/autistic? Bridge the gap between sample and population (statistical inference). Can we generalise our findings from this group to all students? Lecture S1: Statistics Introduction 5 / 48 Lecture S1: Statistics Introduction 6 / 48 The bad reputation of statistics Distorting images: UU Jaarbeeld 2012 Complicated and difficult Biased predictions Varying definitions Distorted images False conclusions... But statistics can be fun too! Advice: Keep up with the course. Test yourself. When well-placed, students flourish Core figures RESEARCH Scientific publications 2011: 7773 2012: 8114 PhD degrees 2011: 485 2012: 518 Indirect and contract funding (in millions) 2011: 190 2012: 194 STAFF Appointed professors 2011: 70 2011: 313 2012: 72 2012: 301 Academic staff (in FTE) 2011: 2919 2012: 2828 Support and administrative staff (in FTE) 2011: 2376 TEACHING Student enrolment 2011: 30.449 2012: 29.755 Bachelor s programmes 2011: 45 2012: 45 Master s programmes 2011: 100 2012: 75 Teacher training programmes 2011: 32 2012: 20 GRANTS ERC Advanced 2011: 2 2012: 3 ERC Starting 2011: 3 2012: 2278 2012: 7 Lecture S1: Statistics Introduction 7 / 48 VICI FINANCES Lecture S1: Decriptive statistics Budget (x 1000) Measurement scales 2011: 6 2012: 6 8 / 48 2011: 767,354

What is measured? Measurement scales for variables Objects: Things Concrete things: People, students, companies, books, cars, countries... Properties: Characteristics of objects Physical properties: weight, height, posture Psychological properties: attitude, intelligence, opinion Social properties: status, number of friends, peer-group pressure... Measurements: indicants of properties (of objects) Nominal Ordinal Interval Ratio Lecture S1: Decriptive statistics Measurement scales 9 / 48 Lecture S1: Decriptive statistics Measurement scales 10 / 48 Nominal scale Ordinal scale: Comparison operation possible for: (in)equality Values are exhaustive and mutually exclusive Example: Gender Comparison possible for: (in)equality order Example: Highest attained education: 1 primary school 2 high school 3 university Lecture S1: Decriptive statistics Measurement scales 11 / 48 Lecture S1: Decriptive statistics Measurement scales 12 / 48

Interval scale: Comparison possible for: (in)equality order distance/difference (equality of differences) No natural zero value! Example: Temperature in o C. Ratio scale: Comparison possible for: (in)equality order distance/difference proportion (equality of ratios) Has natural zero value, and no negative values! Example: Weight Lecture S1: Decriptive statistics Measurement scales 13 / 48 Lecture S1: Decriptive statistics Measurement scales 14 / 48 Measurement scale? Measurement scale? Apple growing areas by variety Rank of students on final grade of INFOWO: 1 Jansen 2 Pietersen 3 Jones 4.... 76 Zijlstra Lecture S1: Decriptive statistics Measurement scales 15 / 48 Lecture S1: Decriptive statistics Measurement scales 16 / 48

Measurement scale? Measurement scale? Age (years): Indicate your age (tick one box!): 1 15 24 2 25 34 3 35 44 4 45 54 5 55 64 6 65 Caracal course evaluation: Question: I learned a lot during the lecture (so far): Totally Totally disagree agree 1 2 3 4 5 Questions: What is the measurement scale? Why would you want to measure age like this? Lecture S1: Decriptive statistics Measurement scales 17 / 48 Lecture S1: Decriptive statistics Measurement scales 18 / 48 Summarizing data Frequency measurements (Frequency table) Indicates how often different values occur in measurements. Descriptive measures Frequency measurements Measure of location/central tendency Measure of dispersion Measures of shape Example: Consumer choice of smartphone type Absolute frequencies: 13 (out of 42) Relative frequencies: 26.5% 0.265 Also called: Proportion. Lecture S1: Decriptive statistics Descriptive measures 19 / 48 Lecture S1: Decriptive statistics Descriptive measures 20 / 48

Frequency measurements (Pie chart) Relative frequencies: Percentages Example: Consumer choice of smartphone type Absolute frequencies: 13 (out of 42) Relative frequencies: 26.5% 0.265 Also called: Proportion. Lecture S1: Decriptive statistics Descriptive measures 21 / 48 Lecture S1: Decriptive statistics Descriptive measures 22 / 48 Frequency measurements (Frequency graph) Frequency Tables in SPSS Example: Consumer choice of smartphone type Absolute frequencies: 13 (out of 42) Relative frequencies: 26.5% 0.265 Also called: Proportion. How-to: Menu Analyze Descriptive Statistics Frequencies Lecture S1: Decriptive statistics Descriptive measures 23 / 48 Lecture S1: Decriptive statistics Descriptive measures 24 / 48

Frequency Bar Graph in SPSS Frequencies: Histogram in SPSS How-to: Menu Analyze Descriptive Statistics Frequencies How-to: Menu Analyze Descriptive Statistics Frequencies Lecture S1: Decriptive statistics Descriptive measures 25 / 48 Lecture S1: Decriptive statistics Descriptive measures 26 / 48 Percentiles Percentiles: example Percentile The score of the n-th percentile (P n ) is the score at which n% in the distribution is lower and (100 n)% is higher. Example: P 90 = 189 means that 90% of the scores has a value 189 and 10% has a value 189. Frequently used percentiles are: P 50 : Second quartile (also Median) P 25 : First quartile P 75 : Third quartile Age Frequency Cumulative Percentile 23 1 1 12.5 24 3 4 50.0 25 2 6 75.0 26 0 6 75.0 27 2 8 100.0 Lecture S1: Decriptive statistics Percentiles 27 / 48 Lecture S1: Decriptive statistics Percentiles 28 / 48

Summarizing data Frequency graph versus histogram Descriptive measures Frequency measurements Measure of location/central tendency Measure of dispersion Measures of shape Lecture S1: Decriptive statistics Measures of location 29 / 48 Lecture S1: Decriptive statistics Measures of location 30 / 48 Measures of location / central tendency Purpose: Identity center of the distribution Identify best representative score Mode: Most frequently occuring value Bimodal/multimodal: more than one value is most frequent Median: Midpoint of the distribution Insensitive with respect to outliers (contrary to mean) Mean: Equilibrium or balance point of the distribution. Median: Midpoint of the distribution The Median represents the midpoint of the scores in a distribution when they are listed in order from smallest to largest. The median equals the 50-th percentile (P 50 ). The median divides the groups into two groups of equal size. Lecture S1: Decriptive statistics Measures of location 31 / 48 Lecture S1: Decriptive statistics Measures of location 32 / 48

Mean: Balance point of distribution N i=1 Population: µ = X i N n i=1 Sample: X = M = X i n Population versus sample Why are there two formulas for the mean? Population Set of all the individuals of interest in a particular study The size of the population is usually denoted as: N. The mean µ is a parameter of the population, and usually unknown. Sample Selection of individuals from a population, usually to represent the population in a particular study The size of the sample is usually denoted as: n. The mean X is a statistic, a value obtained from the sample, which is used as an estimate for the unknown population parameter. Lecture S1: Decriptive statistics Measures of location 33 / 48 Lecture S1: Decriptive statistics Measures of location 34 / 48 Mean versus median Which measure for which scale? Example: Sample 1 2 2 3 5 6 7 8 11 Mean: 5 Median: 5 Example: Sample 1 2 2 3 5 6 7 8 20 Mean: 6 Median: 5 Mode Median Mean Nominal: Mode Ordinal: Mode, Median Interval: Mode, Median, Mean Ratio: Mode, Median, Mean Lecture S1: Decriptive statistics Measures of location 35 / 48 Lecture S1: Decriptive statistics Measures of location 36 / 48

Measures of spread / dispersion /variability Range: Example 1 What is the range for this frequency distribution? And the IQR? Only for interval or ratio scales! Range: Difference between largest and smallest score of distribution. Interquartile range (IQR): Difference between first and third quartiles of distribution. Variance: A weighted sum of the squared deviations from the mean. Standard deviation: Square root of the variance Age in years Valid Frequency Cumul. Percent. 18 1 5.0 20 1 10.0 22 1 15.0 28 2 25.0 32 2 35.0 41 2 45.0 48 1 50.0 53 3 65.0 57 2 75.0 62 1 80.0 66 2 90.0 70 2 100.0 Lecture S1: Decriptive statistics Measures of dispersion 37 / 48 Lecture S1: Decriptive statistics Measures of dispersion 38 / 48 Range: Example 2A Range: Example 2B Ageinyears Lecture S1: Decriptive statistics Measures of dispersion 39 / 48 Lecture S1: Decriptive statistics Measures of dispersion 40 / 48

Variance and standard deviation Sum of squares Variance: Population and sample variance use the same sum of squared deviations or Sum of Squares for short: N Population (parameter): σ 2 i=1 = (X i µ) 2 N n Sample (statistic): s 2 i=1 = (X i X) 2 n 1 or SS = N (X i µ) 2 (Population) i Notice the differences in the formulas!! SS = n (X i X) 2 (Sample) i This term will re-appear in later chapters. Lecture S1: Decriptive statistics Measures of dispersion 41 / 48 Lecture S1: Decriptive statistics Measures of dispersion 42 / 48 Degrees of freedom Population variance: Mean is known Deviations are computed from a known mean Sample variance as estimate of population Population mean is unknown Using sample mean restricts variability Degrees of freedom Number of scores in sample that are independent and free to vary Degrees of freedom df = n 1. Variance and standard deviation Variance: N Population (parameter): σ 2 i=1 = (X i µ) 2 N n Sample (statistic): s 2 i=1 = (X i X) 2 n 1 Standard deviation: N i=1 Population (parameter): σ = (X i µ) 2 N n i=1 Sample (statistic): s = (X i X) 2 n 1 Average squared distance from the mean. Measured in the same dimension as the mean. Lecture S1: Decriptive statistics Measures of dispersion 43 / 48 Lecture S1: Decriptive statistics Measures of dispersion 44 / 48

Measure of shape Skewness example Skewness (sk): Measures the distribution s deviation from symmetry. 1 N N i=1 sk = (X i X) 3 ( N ) 3/2. i=1 (X i X) 2 1 N Symmetric: sk = 0. Tilted towards left : sk > 0 ( Positive skew ) Tilted towards right : sk < 0 ( Negative skew ) Statement: In a distribution with negative skew, the mode is larger than the mean. (True or False?) Answer: True Lecture S1: Decriptive statistics Measures of shape 45 / 48 Lecture S1: Decriptive statistics Measures of shape 46 / 48 Lessons learnt What s next Why you want to learn all about statistics What descriptive statistics is The four different types of data The main descriptive measures for data Now: Research practicum meeting Thursday: Methods lecture 2 Exercise class Saturday: submit Deliverable P1a Do not forget to fill in the INFOWO questionnaire! (see website) Lecture S1: Decriptive statistics Summary 47 / 48 Lecture S1: Decriptive statistics Summary 48 / 48