CS 147: Computer Systems Performance Analysis

Similar documents
SUMMARIZING MEASURED DATA. Gaia Maselli

Summarizing Measured Data

Summarizing Measured Data

Summarizing Measured Data

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Unit 2. Describing Data: Numerical

Descriptive Data Summarization

1 Measures of the Center of a Distribution

1. Exploratory Data Analysis

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

Elementary Statistics

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

P8130: Biostatistical Methods I

TOPIC: Descriptive Statistics Single Variable

MAT Mathematics in Today's World

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

CIVL 7012/8012. Collection and Analysis of Information

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Chapter 3. Data Description

Chapter2 Description of samples and populations. 2.1 Introduction.

MATH4427 Notebook 4 Fall Semester 2017/2018

Statistics and parameters

2.1 Measures of Location (P.9-11)

Quantitative Tools for Research

Chapter 4. Displaying and Summarizing. Quantitative Data

Introduction to statistics

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Lecture 2 and Lecture 3

Section 3. Measures of Variation

Descriptive Statistics-I. Dr Mahmoud Alhussami

Week 1: Intro to R and EDA

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Essentials of Statistics and Probability

STATISTICS 1 REVISION NOTES

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Calculus with Algebra and Trigonometry II Lecture 21 Probability applications

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Introduction. ECN 102: Analysis of Economic Data Winter, J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, / 51

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Confidence Intervals

Descriptive Statistics (And a little bit on rounding and significant digits)

Statistical Data Analysis

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Continuous distributions

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

a table or a graph or an equation.

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Describing Distributions with Numbers

BNG 495 Capstone Design. Descriptive Statistics

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Chapter 2: Tools for Exploring Univariate Data

Σ x i. Sigma Notation

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Chapter 1. Looking at Data

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Units. Exploratory Data Analysis. Variables. Student Data

Unit 4 Probability. Dr Mahmoud Alhussami

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

STATISTICS. 1. Measures of Central Tendency

Statistics, continued

AP Final Review II Exploring Data (20% 30%)

ECLT 5810 Data Preprocessing. Prof. Wai Lam

in the company. Hence, we need to collect a sample which is representative of the entire population. In order for the sample to faithfully represent t

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 1 Descriptive Statistics

Week 11 Sample Means, CLT, Correlation

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Introduction to Statistics

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Numerical Measures of Central Tendency

Chapter 5. Understanding and Comparing. Distributions

Continuous distributions

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

OCR Maths S1. Topic Questions from Papers. Representation of Data

Learning Objectives for Stat 225

Introduction to Statistics for Traffic Crash Reconstruction

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

Nonparametric Density Estimation. October 1, 2018

Chapter 11. Output Analysis for a Single Model Prof. Dr. Mesut Güneş Ch. 11 Output Analysis for a Single Model

Exploratory data analysis: numerical summaries

Nonparametric Inference via Bootstrapping the Debiased Estimator

MATH 117 Statistical Methods for Management I Chapter Three

The AAFCO Proficiency Testing Program Statistics and Reporting

BAYESIAN DECISION THEORY

Transcription:

CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1 / 49

Overview Introduction Indices of Dispersion Range Variance, Standard Deviation, C.V. Quantiles Miscellaneous Measures Choosing a Measure Overview Overview Introduction Indices of Dispersion Range Variance, Standard Deviation, C.V. Quantiles Miscellaneous Measures Choosing a Measure Histograms Quantile-Quantile Plots Statistics of Samples Meaning of a Sample Guessing the True Value Histograms Quantile-Quantile Plots Statistics of Samples Meaning of a Sample Guessing the True Value 2 / 49

Introduction Summarizing Variability Introduction Summarizing Variability Summarizing Variability A single number rarely tells entire story of a data set Usually, you need to know how much the rest of the data set varies from that index of central tendency A single number rarely tells entire story of a data set Usually, you need to know how much the rest of the data set varies from that index of central tendency 3 / 49

4 / 49 Introduction Why Is Variability Important? Introduction Why Is Variability Important? Why Is Variability Important? Consider two Web servers: Server A services all requests in 1 second Server B services 9% of all requests in. seconds But 1% in seconds Both have mean service times of 1 second But which would you prefer to use? Consider two Web servers: Server A services all requests in 1 second Server B services 9% of all requests in. seconds But 1% in seconds Both have mean service times of 1 second But which would you prefer to use?

/ 49 Introduction Indices of Dispersion Introduction Indices of Dispersion Indices of Dispersion Measures of how much a data set varies Range Variance and standard deviation Percentiles Semi-interquartile range Mean absolute deviation Measures of how much a data set varies Range Variance and standard deviation Percentiles Semi-interquartile range Mean absolute deviation

Indices of Dispersion Range Range Indices of Dispersion Range Range Range Minimum & maximum values in data set Can be tracked as data values arrive Variability characterized by difference between minimum and maximum Often not useful, due to outliers Minimum tends to go to zero Maximum tends to increase over time Not useful for unbounded variables Minimum & maximum values in data set Can be tracked as data values arrive Variability characterized by difference between minimum and maximum Often not useful, due to outliers Minimum tends to go to zero Maximum tends to increase over time Not useful for unbounded variables 6 / 49

7 / 49 Indices of Dispersion Range Example of Range Indices of Dispersion Range Example of Range Example of Range For data set 2,.4, -17, 6, 44, -4.8, 84.3, 92, 27, -1 Maximum is 6 Minimum is -17 Range is 273 While arithmetic mean is 268 For data set 2,.4, -17, 6, 44, -4.8, 84.3, 92, 27, -1 Maximum is 6 Minimum is -17 Range is 273 While arithmetic mean is 268

8 / 49 Indices of Dispersion Variance, Standard Deviation, C.V. Variance (and Its Cousins) Indices of Dispersion Variance, Standard Deviation, C.V. Variance (and Its Cousins) Variance (and Its Cousins) Sample variance is s 2 = 1 n (xi x) 2 n 1 i=1 Expressed in units of the measured quantity, squared Which isn t always easy to understand Standard deviation and coefficient of variation are derived from variance Sample variance is s 2 = 1 n 1 n (x i x) 2 Expressed in units of the measured quantity, squared Which isn t always easy to understand Standard deviation and coefficient of variation are derived from variance i=1

9 / 49 Indices of Dispersion Variance, Standard Deviation, C.V. Variance Example Indices of Dispersion Variance, Standard Deviation, C.V. Variance Example Variance Example For data set 2,.4, -17, 6, 44, -4.8, 84.3, 92, 27, -1 Variance is 413746.6 You can see the problem with variance: Given a mean of 268, what does that variance indicate? For data set 2,.4, -17, 6, 44, -4.8, 84.3, 92, 27, -1 Variance is 413746.6 You can see the problem with variance: Given a mean of 268, what does that variance indicate?

Indices of Dispersion Variance, Standard Deviation, C.V. Standard Deviation Indices of Dispersion Variance, Standard Deviation, C.V. Standard Deviation Standard Deviation Square root of the variance In same units as units of metric So easier to compare to metric Square root of the variance In same units as units of metric So easier to compare to metric 1 / 49

Indices of Dispersion Variance, Standard Deviation, C.V. Standard Deviation Example Indices of Dispersion Variance, Standard Deviation, C.V. Standard Deviation Example Standard Deviation Example For sample set we ve been using, standard deviation is 643 Given mean of 268, standard deviation clearly shows lots of variability from mean For sample set we ve been using, standard deviation is 643 Given mean of 268, standard deviation clearly shows lots of variability from mean 11 / 49

Indices of Dispersion Variance, Standard Deviation, C.V. Coefficient of Variation Indices of Dispersion Variance, Standard Deviation, C.V. Coefficient of Variation Coefficient of Variation Ratio of standard deviation to mean Normalizes units of these quantities into ratio or percentage Often abbreviated C.O.V. or C.V. Ratio of standard deviation to mean Normalizes units of these quantities into ratio or percentage Often abbreviated C.O.V. or C.V. 12 / 49

Indices of Dispersion Variance, Standard Deviation, C.V. Coefficient of Variation Example Indices of Dispersion Variance, Standard Deviation, C.V. Coefficient of Variation Example Coefficient of Variation Example For sample set we ve been using, standard deviation is 643 Mean is 268 So C.O.V. is 643/268 2.4 For sample set we ve been using, standard deviation is 643 Mean is 268 So C.O.V. is 643/268 2.4 13 / 49

14 / 49 Indices of Dispersion Quantiles Percentiles Indices of Dispersion Quantiles Percentiles Percentiles Specification of how observations fall into buckets E.g., -percentile is observation that is at the lower % of the set While 9-percentile is observation at the 9% boundary Useful even for unbounded variables Specification of how observations fall into buckets E.g., -percentile is observation that is at the lower % of the set While 9-percentile is observation at the 9% boundary Useful even for unbounded variables

/ 49 Indices of Dispersion Quantiles Relatives of Percentiles Indices of Dispersion Quantiles Relatives of Percentiles Relatives of Percentiles Quantiles - fraction between and 1 Instead of percentage Also called fractiles Deciles percentiles at 1% boundaries First is 1-percentile, second is 2-percentile, etc. Quartiles divide data set into four parts % of sample below first quartile, etc. Second quartile is also median Quantiles - fraction between and 1 Instead of percentage Also called fractiles Deciles percentiles at 1% boundaries First is 1-percentile, second is 2-percentile, etc. Quartiles divide data set into four parts % of sample below first quartile, etc. Second quartile is also median

16 / 49 Indices of Dispersion Quantiles Calculating Quantiles Indices of Dispersion Quantiles Calculating Quantiles Calculating Quantiles To estimate α-quantile: First sort the set Then take [(n 1)α + 1] th element 1-indexed Round to nearest integer index Exception: for small sets, may be better to choose intermediate value as is done for median To estimate α-quantile: First sort the set Then take [(n 1)α + 1] th element 1-indexed Round to nearest integer index Exception: for small sets, may be better to choose intermediate value as is done for median

Indices of Dispersion Quantiles Quartile Example Indices of Dispersion Quantiles Quartile Example Quartile Example For data set 2,.4, -17, 6, 44, -4.8, 84.3, 92, 27, -1 (1 observations) Sort it: -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 First quartile, Q1, is -4.8 Third quartile, Q3, is 92 For data set 2,.4, -17, 6, 44, -4.8, 84.3, 92, 27, -1 (1 observations) Sort it: -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 First quartile, Q1, is -4.8 Third quartile, Q3, is 92 17 / 49

18 / 49 Indices of Dispersion Quantiles Interquartile Range Indices of Dispersion Quantiles Interquartile Range Interquartile Range Yet another measure of dispersion The difference between Q3 and Q1 Semi-interquartile range is half that: Q3 Q1 SIQR = 2 Often interesting measure of what s going on in middle of range Basically indicates distance of quartiles from median Yet another measure of dispersion The difference between Q3 and Q1 Semi-interquartile range is half that: SIQR = Q 3 Q 1 2 Often interesting measure of what s going on in middle of range Basically indicates distance of quartiles from median

19 / 49 Indices of Dispersion Quantiles Semi-Interquartile Range Example Indices of Dispersion Quantiles Semi-Interquartile Range Example Semi-Interquartile Range Example For data set -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 Q3 is 92 Q1 is -4.8 Q3 Q1 92 ( 4.8) SIQR = = = 48 2 2 Compare to standard deviation of 643 Suggests that much of variability is caused by outliers For data set -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 Q3 is 92 Q1 is -4.8 SIQR = Q 3 Q 1 2 = 92 ( 4.8) 2 = 48 Compare to standard deviation of 643 Suggests that much of variability is caused by outliers

2 / 49 Indices of Dispersion Miscellaneous Measures Mean Absolute Deviation Indices of Dispersion Miscellaneous Measures Mean Absolute Deviation Mean Absolute Deviation Yet another measure of variability Mean absolute deviation = 1 n xi x n i=1 Good for hand calculation (doesn t require multiplication or square roots) Yet another measure of variability Mean absolute deviation = 1 n x i x n Good for hand calculation (doesn t require multiplication or square roots) i=1

Indices of Dispersion Miscellaneous Measures Mean Absolute Deviation Example Indices of Dispersion Miscellaneous Measures Mean Absolute Deviation Example Mean Absolute Deviation Example For data set -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 Mean absolute deviation is 1 1 1 i=1 xi 268 = 393 For data set -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 Mean absolute deviation is 1 1 1 i=1 x i 268 = 393 21 / 49

Indices of Dispersion Choosing a Measure Sensitivity To Outliers Indices of Dispersion Choosing a Measure Sensitivity To Outliers Sensitivity To Outliers From most to least, Range Variance Mean absolute deviation Semi-interquartile range From most to least, Range Variance Mean absolute deviation Semi-interquartile range 22 / 49

Bounded? Range Yes No Unimodal Symmetrical? C.O.V. Yes No Percentiles or SIQR Indices of Dispersion Choosing a Measure So, Which Index of Dispersion Should I Use? Indices of Dispersion Choosing a Measure So, Which Index of Dispersion Should I Use? So, Which Index of Dispersion Should I Use? But always remember what you re looking for Bounded? No Unimodal Symmetrical? No Percentiles or SIQR Yes Yes Range C.O.V. But always remember what you re looking for 23 / 49

Finding a Distribution for Datasets Finding a Distribution for Datasets Finding a Distribution for Datasets If a data set has a common distribution, that s the best way to summarize it Saying a data set is uniformly distributed is more informative than just giving mean and standard deviation So how do you determine if your data set fits a distribution? If a data set has a common distribution, that s the best way to summarize it Saying a data set is uniformly distributed is more informative than just giving mean and standard deviation So how do you determine if your data set fits a distribution? 24 / 49

Methods of Determining a Distribution Methods of Determining a Distribution Methods of Determining a Distribution Plot a histogram Kernel density estimation Quantile-quantile plot Statistical methods (not covered in this class) Plot a histogram Kernel density estimation Quantile-quantile plot Statistical methods (not covered in this class) / 49

Histograms Plotting a Histogram Histograms Plotting a Histogram Plotting a Histogram Suitable if you have relatively large number of data points Procedure: 1. Determine range of observations 2. Divide range into buckets 3. Count number of observations in each bucket 4. Divide by total number of observations and plot as column chart Suitable if you have relatively large number of data points Procedure: 1. Determine range of observations 2. Divide range into buckets 3. Count number of observations in each bucket 4. Divide by total number of observations and plot as column chart 26 / 49

27 / 49 Histograms Problems With Histogram Approach Histograms Problems With Histogram Approach Problems With Histogram Approach Determining cell size If too small, too few observations per cell If too large, no useful details in plot If fewer than five observations in a cell, cell size is too small Determining cell size If too small, too few observations per cell If too large, no useful details in plot If fewer than five observations in a cell, cell size is too small

28 / 49 Basic idea: any observation represents probability of high near near that observation Example: Seeing 7 means pdf is high all around 7 Seeing 6. also means pdf is high near 7 Average out observations to get smooth histogram Basic idea: any observation represents probability of high near near that observation Example: Seeing 7 means pdf is high all around 7 Seeing 6. also means pdf is high near 7 Average out observations to get smooth histogram

29 / 49 KDE Equations Want to estimate continuous p(x): KDE Equations KDE Equations Want to estimate continuous p(x): ˆp(x) = 1 n ( ) x xi K nh h i=1 Where K (x) is kernel function Must integrate to unity: K (x) dx = 1 Purpose is to select nearby samples h is bandwidth parameter Controls how many nearby samples selected Large bandwidth more smoothing, less detail ˆp(x) = 1 nh n ( ) x xi K h i=1 Where K (x) is kernel function Must integrate to unity: K (x) dx = 1 Purpose is to select nearby samples h is bandwidth parameter Controls how many nearby samples selected Large bandwidth more smoothing, less detail

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

3 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

31 / 49 2 1 1 2 2 1 1 2

.3.2.1. - 1 KDE Example Sample data set: -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 One observation per sample KDE Example KDE Example Sample data set: -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 One observation per sample KDE with Gaussian window (RHS dropped): p(x) KDE with Gaussian window (RHS dropped):.3.2 p(x).1. - 1 32 / 49

.3.2.1. - 1 KDE Example #2 Same data set Narrower Gaussian window (Again, RHS dropped): KDE Example #2 KDE Example #2 Same data set Narrower Gaussian window (Again, RHS dropped): p(x).3.2 p(x).1. - 1 33 / 49

34 / 49 Quantile-Quantile Plots Quantile-Quantile Plots Quantile-Quantile Plots Quantile-Quantile Plots Quantile-Quantile Plots More suitable than KDE for small data sets Basically, guess a distribution Plot where quantiles of data should fall in that distribution Against where they actually fall If plot is close to linear, data closely matches guessed distribution More suitable than KDE for small data sets Basically, guess a distribution Plot where quantiles of data should fall in that distribution Against where they actually fall If plot is close to linear, data closely matches guessed distribution

3 / 49 Quantile-Quantile Plots Obtaining Theoretical Quantiles Quantile-Quantile Plots Obtaining Theoretical Quantiles Obtaining Theoretical Quantiles Need to determine where quantiles should fall for a particular distribution Requires inverting CDF for that distribution Then determining quantiles for observed points Then plugging quantiles into inverted CDF Need to determine where quantiles should fall for a particular distribution Requires inverting CDF for that distribution Then determining quantiles for observed points Then plugging quantiles into inverted CDF

Quantile-Quantile Plots Inverting a Distribution Quantile-Quantile Plots Inverting a Distribution Inverting a Distribution Many common distributions have already been inverted (how convenient...) For others that are hard to invert, tables and approximations often available (nearly as convenient) Many common distributions have already been inverted (how convenient...) For others that are hard to invert, tables and approximations often available (nearly as convenient) 36 / 49

37 / 49 Quantile-Quantile Plots Is Our Sample Data Set Normally Distributed? Quantile-Quantile Plots Is Our Sample Data Set Normally Distributed? Is Our Sample Data Set Normally Distributed? Our data set was -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 Does this match normal distribution? Normal distribution doesn t invert nicely But there is an approximation: ( xi = 4.91 qi.14 (1 qi).14) Or invert numerically Our data set was -17, -1, -4.8, 2,.4, 27, 84.3, 92, 44, 6 Does this match normal distribution? Normal distribution doesn t invert nicely But there is an approximation: ( x i = 4.91 qi.14 (1 q i ).14) Or invert numerically

38 / 49 Quantile-Quantile Plots Data For Example Normal Quantile-Quantile Plot i q i y i x i 1. -17. -1.64684 2. -1. -1.3481 3. -4.8 -.67234 4.3 2. -.3837.4.4 -.11 6. 27..11 7.6 84.3.3837 8.7 92..67234 9.8 44. 1.3481 1.9 6. 1.64684 Quantile-Quantile Plots Data For Example Normal Quantile-Quantile Plot Data For Example Normal Quantile-Quantile Plot i qi yi xi 1. -17. -1.64684 2. -1. -1.3481 3. -4.8 -.67234 4.3 2. -.3837.4.4 -.11 6. 27..11 7.6 84.3.3837 8.7 92..67234 9.8 44. 1.3481 1.9 6. 1.64684

39 / 49 Quantile-Quantile Plots Example Normal Quantile-Quantile Plot Quantile-Quantile Plots Example Normal Quantile-Quantile Plot Example Normal Quantile-Quantile Plot 2 1-1 1-2 1 - -1 1

4 / 49 Quantile-Quantile Plots Analysis Quantile-Quantile Plots Analysis Analysis Definitely not normal Because it isn t linear Tail at high end is too long for normal But perhaps the lower part of graph is normal? Definitely not normal Because it isn t linear Tail at high end is too long for normal But perhaps the lower part of graph is normal?

41 / 49 Quantile-Quantile Plots Quantile-Quantile Plot of Partial Data Quantile-Quantile Plots Quantile-Quantile Plot of Partial Data Quantile-Quantile Plot of Partial Data 1-1 1-1

Quantile-Quantile Plots Analysis of Partial Data Plot Quantile-Quantile Plots Analysis of Partial Data Plot Analysis of Partial Data Plot Again, at highest points it doesn t fit normal distribution But at lower points it fits somewhat well So, again, this distribution looks like normal with longer tail to right Again, at highest points it doesn t fit normal distribution But at lower points it fits somewhat well So, again, this distribution looks like normal with longer tail to right 42 / 49

Quantile-Quantile Plots Analysis of Partial Data Plot Quantile-Quantile Plots Analysis of Partial Data Plot Analysis of Partial Data Plot Again, at highest points it doesn t fit normal distribution But at lower points it fits somewhat well So, again, this distribution looks like normal with longer tail to right (Really need more data points) Again, at highest points it doesn t fit normal distribution But at lower points it fits somewhat well So, again, this distribution looks like normal with longer tail to right (Really need more data points) 42 / 49

Quantile-Quantile Plots Analysis of Partial Data Plot Quantile-Quantile Plots Analysis of Partial Data Plot Analysis of Partial Data Plot Again, at highest points it doesn t fit normal distribution But at lower points it fits somewhat well So, again, this distribution looks like normal with longer tail to right (Really need more data points) You can keep this up for a good, long time Again, at highest points it doesn t fit normal distribution But at lower points it fits somewhat well So, again, this distribution looks like normal with longer tail to right (Really need more data points) You can keep this up for a good, long time 42 / 49

Quantile-Quantile Plots Interpreting Quantile-Quantile Plots Quantile-Quantile Plots Interpreting Quantile-Quantile Plots Interpreting Quantile-Quantile Plots Mnemonic: Q-Q plot shaped like S has Short tails; opposite has long ones. Mnemonic: Q-Q plot shaped like S has Short tails; opposite has long ones. 43 / 49

44 / 49 Statistics of Samples Meaning of a Sample What is a Sample? Statistics of Samples Meaning of a Sample What is a Sample? What is a Sample? How tall is a human? Could measure every person in the world Or could measure everyone in this room Population has parameters Real and meaningful Sample has statistics Drawn from population Inherently erroneous How tall is a human? Could measure every person in the world Or could measure everyone in this room Population has parameters Real and meaningful Sample has statistics Drawn from population Inherently erroneous

4 / 49 Statistics of Samples Meaning of a Sample Sample Statistics Statistics of Samples Meaning of a Sample Sample Statistics Sample Statistics How tall is a human? People in B126 have a mean height People in Edwards have a different mean Sample mean is itself a random variable Has own distribution How tall is a human? People in B126 have a mean height People in Edwards have a different mean Sample mean is itself a random variable Has own distribution

46 / 49 Statistics of Samples Meaning of a Sample Estimating Population from Samples Statistics of Samples Meaning of a Sample Estimating Population from Samples Estimating Population from Samples How tall is a human? Measure everybody in this room Calculate sample mean x Assume population mean µ equals x What is the error in our estimate? How tall is a human? Measure everybody in this room Calculate sample mean x Assume population mean µ equals x What is the error in our estimate?

47 / 49 Statistics of Samples Meaning of a Sample Estimating Error Statistics of Samples Meaning of a Sample Estimating Error Estimating Error Sample mean is a random variable Mean has some distribution Multiple sample means have mean of means Knowing distribution of means, we can estimate error Sample mean is a random variable Mean has some distribution Multiple sample means have mean of means Knowing distribution of means, we can estimate error

48 / 49 Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Estimating the Value of a Random Variable How tall is Fred? How tall is Fred?

48 / 49 Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Estimating the Value of a Random Variable How tall is Fred? Suppose average human height is 17 cm How tall is Fred? Suppose average human height is 17 cm

48 / 49 Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Estimating the Value of a Random Variable How tall is Fred? Suppose average human height is 17 cm Fred is 17 cm tall How tall is Fred? Suppose average human height is 17 cm Fred is 17 cm tall

48 / 49 Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Estimating the Value of a Random Variable How tall is Fred? Suppose average human height is 17 cm Fred is 17 cm tall Yeah, right How tall is Fred? Suppose average human height is 17 cm Fred is 17 cm tall Yeah, right

48 / 49 Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Statistics of Samples Guessing the True Value Estimating the Value of a Random Variable Estimating the Value of a Random Variable How tall is Fred? Suppose average human height is 17 cm Fred is 17 cm tall Yeah, right Safer to assume a range How tall is Fred? Suppose average human height is 17 cm Fred is 17 cm tall Yeah, right Safer to assume a range

49 / 49 Statistics of Samples Guessing the True Value Confidence Intervals Statistics of Samples Guessing the True Value Confidence Intervals Confidence Intervals How tall is Fred? How tall is Fred?

49 / 49 Statistics of Samples Guessing the True Value Confidence Intervals Statistics of Samples Guessing the True Value Confidence Intervals Confidence Intervals How tall is Fred? Suppose 9% of humans are between and 19 cm How tall is Fred? Suppose 9% of humans are between and 19 cm

49 / 49 Statistics of Samples Guessing the True Value Confidence Intervals Statistics of Samples Guessing the True Value Confidence Intervals Confidence Intervals How tall is Fred? Suppose 9% of humans are between and 19 cm Fred is between and 19 cm How tall is Fred? Suppose 9% of humans are between and 19 cm Fred is between and 19 cm

49 / 49 Statistics of Samples Guessing the True Value Confidence Intervals Statistics of Samples Guessing the True Value Confidence Intervals Confidence Intervals How tall is Fred? Suppose 9% of humans are between and 19 cm Fred is between and 19 cm We are 9% confident that Fred is between and 19 cm How tall is Fred? Suppose 9% of humans are between and 19 cm Fred is between and 19 cm We are 9% confident that Fred is between and 19 cm