Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Size: px
Start display at page:

Download "Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1"

Transcription

1 Summary statistics 1. Visualize data 2. Mean, median, mode and percentiles, variance, standard deviation 3. Frequency distribution. Skewness 4. Covariance and correlation 5. Autocorrelation MSc Induction - Summary statistics 1

2 Overview In this teaching note we cover summary statistics, applied to data that relate to events that have happened in the past. Therefore the data are known This set of notes also contains, for your reference, a few Excel functions and charting facilities for calculating summary statistics and visualizing data MSc Induction - Summary statistics 2

3 1 Visualize data Whenever possible, we should visualize the data that we want to analyze This will help us to understand what we want to do It will also help us to communicate our ideas in a clear and understandable way MSc Induction - Summary statistics 3

4 Some definitions Some preliminary vocabulary that we will need hereafter: Population: is the total collection of observations that could be of interest to the statistician; Sample: is a subset/fraction of the population that the statistician uses for his/her analysis Of course, the sample should be representative of the population Two popular ways of creating a sample are Random sampling: each member of the population has equal chance to be picked Nonrandom sampling: the statistician decides which elements to pick in the population MSc Induction - Summary statistics 4

5 Visualizing a time series 1 The straightforward case of one single time series is a good starting point to introduce a first set of summary statistics and graphical representations Let us therefore consider the daily changes in the $/ exchange rate. The example is clearly relevant for business in our global world We shall consider 144 daily changes (from the beginning of 2006 to July 27 included). The data are available in stats1.xls; in the following slide we reproduce the first 9 rows MSc Induction - Summary statistics 5

6 Exhibit 1 $/ FX rate and its trading days variations, stats1.xls. The shaded headings {dollar, percent} are the Excel names assigned to the two arrays. Read Excel help on how to name arrays Date $/ $ Change % Change 30-Dec dollar percent 3-Jan % 4-Jan % 5-Jan % 6-Jan % 9-Jan % 10-Jan % 11-Jan % 12-Jan % MSc Induction - Summary statistics 6

7 Exhibit 2 - The trading days data of the $/ FX rate display a high level of variability Dec-05 1-Mar-06 1-May-06 1-Jul-06 MSc Induction - Summary statistics 7

8 Graphs are a convenient way of representing data; however, they may be useful to draw some preliminary conclusions on the behaviour of the data In the previous slide, data did show great variability and, moreover, the data did not seem to fluctuate around a constant, clearly identifiable, value Such feature is typical of variables such as exchange rates, (asset) prices, macroeconomic aggregates, and is sometimes referred to as non stationarity makes it difficult to predict the future behaviour of the series Thus, it could be more convenient to study the percentage changes (or growth rates, or returns) MSc Induction - Summary statistics 8

9 Exhibit 3 Daily % variation in the $/ FX rate (trading days). QM stats1.xls workbook 2% 1% 0% -1% -2% 30-Dec-05 1-Mar-06 1-May-06 1-Jul-06 MSc Induction - Summary statistics 9

10 Here one can quite easily identify a clear behaviour: the series is indeed fluctuating around a constant, stable value as time elapses This phenomenon is also known as mean reversion (as the data keep flipping around the mean), and is a typical feature of variables such as growth rates and returns makes it possible (not necessarily easy) to draw some conclusions on the future behaviour of the series However, this analysis is not enough: we do not know which one is the value around which the series is fluctuating we cannot say anything about the variability of the series MSc Induction - Summary statistics 10

11 2 Average, median, mode, percentiles, variance, standard deviation Descriptive statistics are the foundation of the concepts used in probability, sampling and regression. We shall use these quantities to draw some further, more precise conclusions about the behaviour of a series. Particularly, there are two different features that are worth investigating: 1) The value around which the data are fluctuating measures of location; 2) The degree of dispersion around such value measures of dispersion MSc Induction - Summary statistics 11

12 Measures of location Average Median Mode MSc Induction - Summary statistics 12

13 Exhibit 4 - Average, median, variance (Var), standard deviation (std). These descriptive statistics are calculated with reference to the $/ FX rate time series (and its daily variations) from the beginning of 2006 to July 27 included $ change % change Excel Average AVERAGE Median MEDIAN Var VARP Std STDEVP Std/Mean MSc Induction - Summary statistics 13

14 Average The first summary statistic is the arithmetic average. In statistics, the average is often referred to as a measure of location The average will be indicated with the Greek letter μ Indicating with X the observations and with (n) their number, the equation to calculate the average is: μ n 1 1 = ( n) = n n j= 1 X X X j MSc Induction - Summary statistics 14

15 Some comments on the average: It is the most frequently employed measure of location, if anything because of its computational convenience It is customarily interpreted as the equilibrium point of the sample It is computed using all the values of the sample, and therefore it uses all the information in the sample (which is good) but it can be affected by extreme, anomalous values (which is a problem): E.g. {1,2,0,1,2,0,1,2,0,10000}, average= Note that the average is merely the average, and it may not belong to the sample E.g.: if the sample is {1,3}, the average is 2, which is not in the sample MSc Induction - Summary statistics 15

16 Median Consider a set of (n) numbers, sorted in descending order (here, however, the order is not of pivotal importance, and one could sort the data in ascending order for that matters) Roughly speaking, the median is the number in the middle of the set; that is, half the numbers have values that are greater than the median, and half have values that are less If n = odd, then the median is the middle number of the set If n = even, the median is the arithmetic average of the two middle numbers an example is on the next slide MSc Induction - Summary statistics 16

17 Exhibit 5 Median of daily variations of the $/ FX time series. We have 144 observations: $ change % change Value No % Value No % Median % MSc Induction - Summary statistics 17

18 Some comments on the median The median is another frequently employed measure of location The median does not use all the data in the sample, but actually only one data point: it doesn t use all the information in the distribution (which is not good) but it is not sensitive to extreme, anomalous values (which is good) MSc Induction - Summary statistics 18

19 Example: {1,2,3,4,5}; median= 3 (same as the average, by the way ); {1,2,3,4,10000}; median=3 (same as before!) the average is strongly affected instead MSc Induction - Summary statistics 19

20 Comparing the average and the median could be an acid test to check whether the data are symmetric or not around their average when the distribution of the values is perfectly symmetrical, then average = median E.g.: given the sample {1,2,3,4,5}, average and median are both equal to 3 MSc Induction - Summary statistics 20

21 however, when the distribution of the values is not perfectly symmetrical, some ambiguities arise: Consider the sample {1,2,3,4,10}. Here the median is 3 and the average is 4; note also that the sample is skewed to the right; Consider the sample {1,7,8,9,10}. Here the average is 7 and the median is 8; note also that the sample is skewed to the left Consider the sample {0,0,3,4,8}. Here average and median are both equal to 3, but the sample is not symmetric. MSc Induction - Summary statistics 21

22 To sum up: median>average means that the distribution of the data is NOT symmetric and skewed to the left median>average means that the distribution of the data is NOT symmetric and skewed to the right median=average is inconclusive, and it cannot be inferred that the distribution of the data is symmetric MSc Induction - Summary statistics 22

23 Mode The mode is yet another indicator of location It is defined as the most frequently occurring value in a data set Example: {1,2,2,2,3}; the mode is 2 If no single value occurs more than once, then the mode is practically useless: consider the sample {1,1,2,3,3}. Here there are two modes, namely 1 and 3 such a case is termed bimodal computing the mode is straightforward, but when there is not a unique mode, the interpretation becomes difficult MSc Induction - Summary statistics 23

24 Measures of dispersion Percentiles (and variations) Variance and standard deviation MSc Induction - Summary statistics 24

25 Percentiles Percentiles are a possible way of measuring the dispersion of the data around the mean The percentile of order X% (where X can be any number, say e.g. 5) is defined/computed as follows: sort the data in ascending order (here the order does matter); chop the dataset into two chunks, where the first one contains the first X% of the data, and the second one the remaining (1-X)% of the data the datapoint where the chopping occurs is the percentile of order X% MSc Induction - Summary statistics 25

26 Deciles, quartiles and quantiles These definitions are merely special cases of percentiles Decile (the percentile of order 10%) and quartile (the percentile of order 25%) are however frequently used customarily, one may find definitions such as the 3 rd decile, or the 2 nd quartile ; these correspond, respectively, to the percentile of order 30% and the percentile of order 50%. Note that the median corresponds to the 50th percentile, to the 5th decile and to the 2nd quartile Quantiles is a generic name for the above metrics (percentile, decile, etc.) MSc Induction - Summary statistics 26

27 Exhibit 6 Results of quantiles calculations for our $/ FX time series ( Excel (using functions PERCENTILE and QUARTILE)) $ change % change Median % 2nd Quartile % 50th Percentile % MSc Induction - Summary statistics 27

28 Variance The variance is an extremely popular measure of dispersion around the average (or, in the jargon of statistics, as a measure of scale), and is customarily denoted as σ 2 It is calculated as the average of the square deviations of the observations from the average n 1 n n j= 1 σ = j n μ + + μ = μ ( X 2 2 )... ( X ) ( X ) 2 MSc Induction - Summary statistics 28

29 Some comments the building block in the formula is the term (X j - μ) 2 : the presence of μ is needed since we are considering deviations from the mean; the square serves two purposes: it eliminates the sign of the deviation (any even power would do the job); it weights each deviation by itself: large deviations, irrespective of whether they are positive or negative, have a larger impact e.g.: a deviation of magnitude 0.1 becomes 0.01 (almost negligible), whilst a deviation of magnitude 10 becomes 100. MSc Induction - Summary statistics 29

30 An alternative equation for variance Variance can be expressed with a different equation this is NOT the same as another popular expression for the variance where the denominator is replaced by n-1 We should be aware of this alternative formulation because it is often used in QM to simplify a number of equations and proofs n 2 2 n ( X ) ( ) j X j σ = n μ = n μ j= 1 j= 1 MSc Induction - Summary statistics 30

31 Problems with variance First (an interpretation issue) it is expressed in the square of the units. In our case variance = $ 2 (which is meaningless) Second (a scale problem) it changes with the square of the observations values, therefore the ratio between variance and mean becomes useless To eliminate these problems, we use standard deviation, i.e. the square root of the population variance (denoted as σ). Thus, it is expressed in the same units of measurement of the observations ($ in our case) σ = σ 2 MSc Induction - Summary statistics 31

32 3 Frequency distribution. Skewness and Kurtosis MSc Induction - Summary statistics 32

33 In most cases it is helpful to calculate and visualize the frequency distribution of a variable By frequency distribution we mean how the observations are distributed by size buckets For example, we may want to know how many % FX variations fall in 0.10% intervals, such as {-1.25% to -1.15%; -1.15% to -1.05%; ;+1.95% to 2.05%} MSc Induction - Summary statistics 33

34 Exhibit 7 Frequency distribution are often shown as bar charts, such as this plot of the frequency of % FX variations (see stats2.xls) % -0.5% 0.0% 0.5% 1.0% 1.5% 2.0% MSc Induction - Summary statistics 34

35 Exhibit 8 Frequency distribution can also be shown as a regular chart. In this case the line connecting the dots is only there for better visualization (see stats2.xls) 12% 10% 8% 6% 4% 2% 0% -1.5% -1.0% -0.5% 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% MSc Induction - Summary statistics 35

36 Skewness If we consider exhibits 7 & 8 we note that they are not symmetrical around their means. In fact, the two charts are visibly right-skewed (they have a right tail which is longer than the left tail) An indicator of asymmetry is the skewness, whose formula is 1 n ( X ) μ j n j= 1 3 MSc Induction - Summary statistics 36

37 Note that this indicator is similar to the variance with one very important difference the building block in the formula is now the term (X j - μ) 3 : the presence of μ is needed since we are considering deviations from the mean; the cube serves two purposes: it keeps the sign of the deviation: negative and positive deviations are treated in a different way it weights each deviation by the square of itself skewness is therefore the balance between negative and positive weighted deviations MSc Induction - Summary statistics 37

38 The criterion to assess the presence of symmetry is thus: Skewness < 0: Left skewed Skewness = 0: Symmetrical Skewness > 0: Right skewed MSc Induction - Summary statistics 38

39 Kurtosis Kurtosis is referred to as an indicator of the weight of the tails of a distribution, i.e. the parts that are faraway from the mean and that should represent extreme events The equation is 1 n ( X ) μ j n j= 1 This indicator is roughly speaking similar to the variance, save for the fact that it attenuates small deviations and boosts large deviations much more 4 MSc Induction - Summary statistics 39

40 Cumulative frequency distribution The cumulative frequency distribution (cfd) is calculated by adding up the frequencies Each point in the cfd represents the number of data points that are smaller than or equal to the corresponding value: e.g. if the cfd returns 0.4 for the number 10, it means that 40% of the data in the sample are equal to 10 or smaller than 10 The cumulative % frequency distribution will add up to 100% for the highest bin MSc Induction - Summary statistics 40

41 Exhibit 9 Cumulative % frequency distribution of daily $/ FX changes 100% 80% 60% 40% 20% 0% -1.2% -0.8% -0.4% 0.0% 0.4% 0.8% 1.2% 1.6% 2.0% MSc Induction - Summary statistics 41

42 4 Covariance and correlation Covariance and correlation are widely used as measures of strength of association between two series of values. As such they are at the core of many statistical techniques, such as least squares regressions In order to build on the past section we will continue to use the $/ FX values, introducing as a second series the $/100 exchange rate from the beginning of 2006 to July 27, 2006 MSc Induction - Summary statistics 42

43 Plotting the time series The $/ time series and its business days variations would be plotted exactly in the same way in which we plotted the $/ data What is new here is that we may want to compare the evolution of the two time series The following exhibit shows the two series rebased MSc Induction - Summary statistics 43

44 Exhibit $ / 100 $ / Dec-05 1-Mar-06 1-May-06 1-Jul-06 MSc Induction - Summary statistics 44

45 Exhibit 11 Scatter of the % daily changes of the $/ versus the $/ FX rates. A degree of positive linear association is clearly visible 2.5% 2.0% 1.5% $ / EUR 1.0% 0.5% 0.0% -0.5% -1.0% -1.5% -2.0% -1.0% 0.0% 1.0% 2.0% MSc Induction - Summary statistics 45

46 Covariance and correlation equations I It is important to know the formula for covariance, as this appears very often in econometrics In the following equations, X and Y represent two data series (which must have the same number of terms, j = 1, 2,, n) We continue to indicate the populations averages with μ 1 cov( XY, ) = X Y n ρ( XY, ) n j= 1 cov( XY, ) = σ( X) σ( Y) ( )( ) j μx j μy MSc Induction - Summary statistics 46

47 Covariance Covariance is a measure of association between two series (time series in this case). Covariance is a linear measure of association: it measures how close a scatter plot will be to a straight line Covariance > 0, the line will have a positive slope (positive association) Covariance < 0, the line will have a negative slope (negative association) Covariance = 0, the line could have any slope (no association) MSc Induction - Summary statistics 47

48 Covariance and correlation Covariance has similar problems to variance: (a) It is expressed in squared units of measurement, (b) It is a power of 2 measure (therefore could be a large number simply because the variances are high) Therefore we mostly use correlation (indicated with the Greek letter ρ) which is covariance divided by the product of the standard deviations of the two sets of data. Correlation is a pure number, an index, that measures the linear association between two variables It can be proved that the value of correlation is between ± 1 so that if ρ=0.7, we usually say that X and Y have 70% of their behaviour in common MSc Induction - Summary statistics 48

49 Some more comments The problem with correlation is that it is a linear measure of association: thus, non-linear dependence is not captured by correlation An interesting example is the case where Y= X 2 Y is fully dependent on X, but it can be proved that the two variables have zero correlation In some cases this problem can be solved by some transformation of variables MSc Induction - Summary statistics 49

50 Covariance and correlation equations II From the just examined equations, it appears clearly that the covariance of a series of data with itself equals its variance (and correlation equals one) 1 cov( X, X) = ( Xj μ )( ) X X j μx = var ( X) n j= 1 var( X ) ρ( X, X) = = 1 σ( X) σ( X) n MSc Induction - Summary statistics 50

51 5 Autocorrelation Autocorrelation is the correlation of the values of a time series with past values of the series itself This is an important topic in econometrics and forecasting because empirical work, pioneered by Box and Jenkins, showed that forecasts based only on the past value of the time series itself are often more accurate than forecasts based on complex structural models Autocorrelation is merely an application of covariance and correlation, but it plays a pivotal role in econometrics MSc Induction - Summary statistics 51

52 Autocorrelation in trading days FX changes for $/ 100 Exhibit 12 shows the scatter plot for the changes of day (t) and day (t 1) The low correlation is consistent with the so called weak form of the efficient markets hypothesis which posits the impossibility of reaping extra returns using forecasts based on past returns The formula to calculate autocorrelation is just an application of the more general formula for correlation note that the populations mean is indicated with μ, and that now the sample size is referred to as T ρ 1 T t t= 1 = T ( X μ)( X μ) ( X t μ) t= 1 t 1 2 MSc Induction - Summary statistics 52

53 Exhibit 12 Scatter of $/100 returns for time (t) against the return for time (t-1): ρ = % % changes day (t+1) 1% 0% -1% -2% -2% -1% 0% 1% 2% % changes day (t) MSc Induction - Summary statistics 53

54 Exhibit 13 Toys R us U.S. sales, in $ millions; quarters {4,8,12,16} include the Xmas period 5,000 4,000 Revenues 3,000 2,000 1,000 0 Quarters MSc Induction - Summary statistics 54

55 Autocorrelation in quarterly sales - seasonality Exhibit 13 shows the time series of U.S. sales for the retail chain Toys R us Clearly, the series shows autocorrelation and could well be used as a tool for forecasting quarterly sales However, we must take into account the seasonal pattern. The correlation of sales in quarter t should be correlated to sales in quarter t-4; In this case, the formula employed is ρ 4 T t t= 4 = T ( X μ)( X μ) ( X t μ) t= 1 t 4 2 MSc Induction - Summary statistics 55

56 To sum up Graphs are important, but seldom conclusive: they contain too much information Thus, we use descriptive statistics: Average Variance Skewness Kurtosis We often deal with more than one series at a time, and we would like to find the degree of association between them Correlation But even when we have only one series, it could be a good idea to check if the current behaviour is related (and therefore can be predicted) with the past behaviour Autocorrelation MSc Induction - Summary statistics 56

57 6 APPENDIX List of Excel functions linked to this TN AVERAGE CONCATENATE CORREL COVAR FREQUENCY MAX MEDIAN MIN MODE PERCENTILE QUARTILE SKEW SQRT STDEV (sample STD) STDEVP (population STD) VAR (sample variance) VARP (population variance) MSc Induction - Summary statistics 57

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

Summarizing Measured Data

Summarizing Measured Data Summarizing Measured Data 12-1 Overview Basic Probability and Statistics Concepts: CDF, PDF, PMF, Mean, Variance, CoV, Normal Distribution Summarizing Data by a Single Number: Mean, Median, and Mode, Arithmetic,

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

1. AN INTRODUCTION TO DESCRIPTIVE STATISTICS. No great deed, private or public, has ever been undertaken in a bliss of certainty.

1. AN INTRODUCTION TO DESCRIPTIVE STATISTICS. No great deed, private or public, has ever been undertaken in a bliss of certainty. CIVL 3103 Approximation and Uncertainty J.W. Hurley, R.W. Meier 1. AN INTRODUCTION TO DESCRIPTIVE STATISTICS No great deed, private or public, has ever been undertaken in a bliss of certainty. - Leon Wieseltier

More information

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES VARIABLE Studying the behavior of random variables, and more importantly functions of random variables is essential for both the

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation Chapter Four Numerical Descriptive Techniques 4.1 Numerical Descriptive Techniques Measures of Central Location Mean, Median, Mode Measures of Variability Range, Standard Deviation, Variance, Coefficient

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2

Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2 Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2 Topics: 1. Mean 2. Mode 3. Median 4. Order Statistics 5. Minimum, Maximum, Range 6. Percentiles, Quartiles, Interquartile Range

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

SUMMARIZING MEASURED DATA. Gaia Maselli

SUMMARIZING MEASURED DATA. Gaia Maselli SUMMARIZING MEASURED DATA Gaia Maselli maselli@di.uniroma1.it Computer Network Performance 2 Overview Basic concepts Summarizing measured data Summarizing data by a single number Summarizing variability

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Class 11 Maths Chapter 15. Statistics

Class 11 Maths Chapter 15. Statistics 1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class

More information

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are

More information

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03 Meelis Kull meelis.kull@ut.ee Autumn 2017 1 Demo: Data science mini-project CRISP-DM: cross-industrial standard process for data mining Data understanding: Types of data Data understanding: First look

More information

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11. Chapter 4 Statistics 45 CHAPTER 4 BASIC QUALITY CONCEPTS 1.0 Continuous Distributions.0 Measures of Central Tendency 3.0 Measures of Spread or Dispersion 4.0 Histograms and Frequency Distributions 5.0

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability STA301- Statistics and Probability Solved MCQS From Midterm Papers March 19,2012 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

More information

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Preliminary Statistics course. Lecture 1: Descriptive Statistics Preliminary Statistics course Lecture 1: Descriptive Statistics Rory Macqueen (rm43@soas.ac.uk), September 2015 Organisational Sessions: 16-21 Sep. 10.00-13.00, V111 22-23 Sep. 15.00-18.00, V111 24 Sep.

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

Statistical Concepts. Constructing a Trend Plot

Statistical Concepts. Constructing a Trend Plot Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable

More information

After completing this chapter, you should be able to:

After completing this chapter, you should be able to: Chapter 2 Descriptive Statistics Chapter Goals After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Find the range, variance, standard

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

APPENDIX 1 BASIC STATISTICS. Summarizing Data

APPENDIX 1 BASIC STATISTICS. Summarizing Data 1 APPENDIX 1 Figure A1.1: Normal Distribution BASIC STATISTICS The problem that we face in financial analysis today is not having too little information but too much. Making sense of large and often contradictory

More information

Introduction to Basic Statistics Version 2

Introduction to Basic Statistics Version 2 Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts

More information

Histograms, Central Tendency, and Variability

Histograms, Central Tendency, and Variability The Economist, September 6, 214 1 Histograms, Central Tendency, and Variability Lecture 2 Reading: Sections 5 5.6 Includes ALL margin notes and boxes: For Example, Guided Example, Notation Alert, Just

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter

More information

Measures of Central Tendency

Measures of Central Tendency Measures of Central Tendency Summary Measures Summary Measures Central Tendency Mean Median Mode Quartile Range Variance Variation Coefficient of Variation Standard Deviation Measures of Central Tendency

More information

Topic 10 - Linear Regression

Topic 10 - Linear Regression Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008 DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS Introduction to Business Statistics QM 120 Chapter 3 Spring 2008 Measures of central tendency for ungrouped data 2 Graphs are very helpful to describe

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Histograms, Mean, Median, Five-Number Summary and Boxplots, Standard Deviation Thought Questions 1. If you were to

More information

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Spring 2015: Lembo GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Descriptive statistics concise and easily understood summary of data set characteristics

More information

Sampling, Frequency Distributions, and Graphs (12.1)

Sampling, Frequency Distributions, and Graphs (12.1) 1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize

More information

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Mean 26.86667 Standard Error 2.816392 Median 25 Mode 20 Standard Deviation 10.90784 Sample Variance 118.981 Kurtosis -0.61717 Skewness

More information

Unit 2: Numerical Descriptive Measures

Unit 2: Numerical Descriptive Measures Unit 2: Numerical Descriptive Measures Summation Notation Measures of Central Tendency Measures of Dispersion Chebyshev's Rule Empirical Rule Measures of Relative Standing Box Plots z scores Jan 28 10:48

More information

1 Measures of the Center of a Distribution

1 Measures of the Center of a Distribution 1 Measures of the Center of a Distribution Qualitative descriptions of the shape of a distribution are important and useful. But we will often desire the precision of numerical summaries as well. Two aspects

More information

Expectations and Variance

Expectations and Variance 4. Model parameters and their estimates 4.1 Expected Value and Conditional Expected Value 4. The Variance 4.3 Population vs Sample Quantities 4.4 Mean and Variance of a Linear Combination 4.5 The Covariance

More information

MgtOp 215 Chapter 3 Dr. Ahn

MgtOp 215 Chapter 3 Dr. Ahn MgtOp 215 Chapter 3 Dr. Ahn Measures of central tendency (center, location): measures the middle point of a distribution or data; these include mean and median. Measures of dispersion (variability, spread):

More information

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations: Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value

More information

ALGEBRA I CURRICULUM OUTLINE

ALGEBRA I CURRICULUM OUTLINE ALGEBRA I CURRICULUM OUTLINE 2013-2014 OVERVIEW: 1. Operations with Real Numbers 2. Equation Solving 3. Word Problems 4. Inequalities 5. Graphs of Functions 6. Linear Functions 7. Scatterplots and Lines

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures

More information

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and....................................

More information

Identifying SVARs with Sign Restrictions and Heteroskedasticity

Identifying SVARs with Sign Restrictions and Heteroskedasticity Identifying SVARs with Sign Restrictions and Heteroskedasticity Srečko Zimic VERY PRELIMINARY AND INCOMPLETE NOT FOR DISTRIBUTION February 13, 217 Abstract This paper introduces a new method to identify

More information

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table 2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67 Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate

More information

Higher Secondary - First year STATISTICS Practical Book

Higher Secondary - First year STATISTICS Practical Book Higher Secondary - First year STATISTICS Practical Book th_statistics_practicals.indd 07-09-08 8:00:9 Introduction Statistical tools are important for us in daily life. They are used in the analysis of

More information

SESSION 5 Descriptive Statistics

SESSION 5 Descriptive Statistics SESSION 5 Descriptive Statistics Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple

More information

Sets and Set notation. Algebra 2 Unit 8 Notes

Sets and Set notation. Algebra 2 Unit 8 Notes Sets and Set notation Section 11-2 Probability Experimental Probability experimental probability of an event: Theoretical Probability number of time the event occurs P(event) = number of trials Sample

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Expectations and Variance

Expectations and Variance 4. Model parameters and their estimates 4.1 Expected Value and Conditional Expected Value 4. The Variance 4.3 Population vs Sample Quantities 4.4 Mean and Variance of a Linear Combination 4.5 The Covariance

More information

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc. Chapter 3 Numerically Summarizing Data Section 3.1 Measures of Central Tendency Objectives 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data

More information

Chapter 1:Descriptive statistics

Chapter 1:Descriptive statistics Slide 1.1 Chapter 1:Descriptive statistics Descriptive statistics summarises a mass of information. We may use graphical and/or numerical methods Examples of the former are the bar chart and XY chart,

More information

Chapter 12 - Part I: Correlation Analysis

Chapter 12 - Part I: Correlation Analysis ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

MATH 117 Statistical Methods for Management I Chapter Three

MATH 117 Statistical Methods for Management I Chapter Three Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

Summarizing Measured Data

Summarizing Measured Data Performance Evaluation: Summarizing Measured Data Hongwei Zhang http://www.cs.wayne.edu/~hzhang The object of statistics is to discover methods of condensing information concerning large groups of allied

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Time series and Forecasting

Time series and Forecasting Chapter 2 Time series and Forecasting 2.1 Introduction Data are frequently recorded at regular time intervals, for instance, daily stock market indices, the monthly rate of inflation or annual profit figures.

More information

in the company. Hence, we need to collect a sample which is representative of the entire population. In order for the sample to faithfully represent t

in the company. Hence, we need to collect a sample which is representative of the entire population. In order for the sample to faithfully represent t 10.001: Data Visualization and Elementary Statistical Analysis R. Sureshkumar January 15, 1997 Statistics deals with the collection and the analysis of data in the presence of variability. Variability

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur Lecture No. #13 Probability Distribution of Continuous RVs (Contd

More information

= observed volume on day l for bin j = base volume in jth bin, and = residual error, assumed independent with mean zero.

= observed volume on day l for bin j = base volume in jth bin, and = residual error, assumed independent with mean zero. QB research September 4, 06 Page -Minute Bin Volume Forecast Model Overview In response to strong client demand, Quantitative Brokers (QB) has developed a new algorithm called Closer that specifically

More information

Introduction. ECN 102: Analysis of Economic Data Winter, J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, / 51

Introduction. ECN 102: Analysis of Economic Data Winter, J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, / 51 Introduction ECN 102: Analysis of Economic Data Winter, 2011 J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 4, 2011 1 / 51 Contact Information Instructor: John Parman Email: jmparman@ucdavis.edu

More information

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

TESTING FOR CO-INTEGRATION

TESTING FOR CO-INTEGRATION Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power

More information

Summarizing Measured Data

Summarizing Measured Data Summarizing Measured Data Dr. John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu COMP 528 Lecture 7 3 February 2005 Goals for Today Finish discussion of Normal Distribution

More information

2.1 Measures of Location (P.9-11)

2.1 Measures of Location (P.9-11) MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Marquette University MATH 1700 Class 5 Copyright 2017 by D.B. Rowe

Marquette University MATH 1700 Class 5 Copyright 2017 by D.B. Rowe Class 5 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 2017 by D.B. Rowe 1 Agenda: Recap Chapter 3.2-3.3 Lecture Chapter 4.1-4.2 Review Chapter 1 3.1 (Exam

More information

BNG 495 Capstone Design. Descriptive Statistics

BNG 495 Capstone Design. Descriptive Statistics BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus

More information

Essentials of Statistics and Probability

Essentials of Statistics and Probability May 22, 2007 Department of Statistics, NC State University dbsharma@ncsu.edu SAMSI Undergrad Workshop Overview Practical Statistical Thinking Introduction Data and Distributions Variables and Distributions

More information

Continuous random variables

Continuous random variables Continuous random variables A continuous random variable X takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The total area under a density

More information

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers Diploma Part Quantitative Methods Examiner s Suggested Answers Question 1 (a) The standard normal distribution has a symmetrical and bell-shaped graph with a mean of zero and a standard deviation equal

More information

Chapter 4.notebook. August 30, 2017

Chapter 4.notebook. August 30, 2017 Sep 1 7:53 AM Sep 1 8:21 AM Sep 1 8:21 AM 1 Sep 1 8:23 AM Sep 1 8:23 AM Sep 1 8:23 AM SOCS When describing a distribution, make sure to always tell about three things: shape, outliers, center, and spread

More information

Introduction to Statistics for Traffic Crash Reconstruction

Introduction to Statistics for Traffic Crash Reconstruction Introduction to Statistics for Traffic Crash Reconstruction Jeremy Daily Jackson Hole Scientific Investigations, Inc. c 2003 www.jhscientific.com Why Use and Learn Statistics? 1. We already do when ranging

More information

Lecture 2. Descriptive Statistics: Measures of Center

Lecture 2. Descriptive Statistics: Measures of Center Lecture 2. Descriptive Statistics: Measures of Center Descriptive Statistics summarize or describe the important characteristics of a known set of data Inferential Statistics use sample data to make inferences

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Week 1: Intro to R and EDA

Week 1: Intro to R and EDA Statistical Methods APPM 4570/5570, STAT 4000/5000 Populations and Samples 1 Week 1: Intro to R and EDA Introduction to EDA Objective: study of a characteristic (measurable quantity, random variable) for

More information

This note introduces some key concepts in time series econometrics. First, we

This note introduces some key concepts in time series econometrics. First, we INTRODUCTION TO TIME SERIES Econometrics 2 Heino Bohn Nielsen September, 2005 This note introduces some key concepts in time series econometrics. First, we present by means of examples some characteristic

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information