Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03
|
|
- Egbert Jennings
- 6 years ago
- Views:
Transcription
1 Meelis Kull Autumn
2 Demo: Data science mini-project CRISP-DM: cross-industrial standard process for data mining Data understanding: Types of data Data understanding: First look at attributes Types of attributes First look at a nominal attribute First look at a ordinal attribute First look at a numeric attribute 2
3 Demo: Data science mini-project CRISP-DM: cross-industrial standard process for data mining Data understanding: Types of data Data understanding: First look at attributes Types of attributes First look at a nominal attribute First look at a ordinal attribute First look at a numeric attribute 3
4 4
5 All the same applies as for nominal attributes Need to make sure that the order is retained in histograms Additional ways to look at the attribute: Calculate the min, max, median of the attribute 5
6 All the same applies as for nominal attributes Need to make sure that the order is retained in histograms Additional ways to look at the attribute: Calculate the min, max, median of the attribute More generally, calculate quantiles 6
7 There are 3 quartiles: lower quartile, median, upper quartile They divide a sorted data set into 4 equal parts Percentiles divide into 100 equal parts Lower quartile is the 25 th percentile Median is the 50 th percentile Deciles divide into 10 equal parts Quantiles generalise to any location over sorted data: Lower quartile = 25 th percentile = quantile at th decile = quantile at 0.4 7
8 A. 1 st and 2 nd decile B. 2 nd and 3 rd decile C. 3 rd and 4 th decile D. 4 th and 5 th decile E. 5 th and 6 th decile F. 6 th and 7 th decile G. 7 th and 8 th decile H. 8 th and 9 th decile 8
9 A. Nominal attributes B. Ordinal attributes C. Both D. Neither E. Not sure 9
10 Same aspects as in nominal attributes Additionally: Is the order specified in the documentation? (meta-data) If all possible values specified in the meta-data: Check if all values present in the data If not, make sure that the empty bars in histograms are in the right location corresponding to the order 10
11 Demo: Data science mini-project CRISP-DM: cross-industrial standard process for data mining Data understanding: Types of data Data understanding: First look at attributes Types of attributes First look at a nominal attribute First look at a ordinal attribute First look at a numeric attribute 11
12 12
13 Is it really numeric? E.g. if contains only values 0.0 and 1.0 then perhaps nominal is more appropriate? Is it discrete (only integers)? E.g. if contains integers 0 to 100 then can have a first look as if it were an ordinal attribute Calculate quantiles as for ordinal attributes Calculate the mean (arithmetic average): 13
14 Is it really numeric? Is it discrete (only integers)? Calculate quantiles as for ordinal attributes Calculate the mean (arithmetic average) Plot a histogram (with default binning) 14
15 Check if some value occurs many times With real numbers often no number occurs more than once If a value is frequent then: Possibly can denote missingness (e.g. value 0), should be treated as N/A (not available, missing) Perhaps can denote an extremal value (more extremal values represented as this value) Check for rounding E.g., if all numbers rounded to 2 nd digit and one has more digits then it might be a typing error 15
16 A. Nominal attributes B. Ordinal attributes C. Numeric attributes D. All of the above E. None of the above F. Not sure 16
17 Demo: Data science mini-project CRISP-DM: cross-industrial standard process for data mining Data understanding: Types of data Data understanding: First look at attributes Types of attributes First look at a nominal attribute First look at a ordinal attribute First look at a numeric attribute 17
18 Data understanding: distribution of attributes Types of histograms How to describe probability distributions? Some standard probability distributions More ways to visualise distributions Visualising relations of attributes Are the attributes related? 18
19 Data understanding: distribution of attributes Types of histograms How to describe probability distributions? Some standard probability distributions More ways to visualise distributions Visualising relations of attributes Are the attributes related? 19
20 20
21 We have now looked at each attribute A key question we can now answer: How do the items in this dataset look like? Now we have at least 3 answers to this: 21
22 How do the items in this dataset look like? Let us just pick up one as an example: Age = 22 Workclass = Private Education = 11 th Occupation = Other-service Capital.gain =
23 How do the items in this dataset look like? Let us just provide the ranges of attributes Age: {17,18,19,,90} Workclass: {Federal-gov, Local-gov, Private, } Education: {1 st -4 th,5 th -6 th,, Doctorate) Occupation: {Adm-clerical, Exec-managerial, } Capital.gain: [0,99999]... 23
24 How do the items in this dataset look like? Let us just provide the histograms Age: <histogram> Workclass: <histogram> Education: <histogram> Occupation: <histogram> Capital.gain: <histogram>... 24
25 Data understanding: distribution of attributes Types of histograms How to describe probability distributions? Some standard probability distributions More ways to visualise distributions Visualising relations of attributes Are the attributes related? 25
26 26
27 Histograms on discrete data Nominal Ordinal Numeric with few different values E.g. small number of different integers Histograms on continuous data Numeric with many different values 27
28 Frequency histogram Frequency = count of items with each value ggplot(data) + geom_histogram(aes(x=age),stat="count") 28
29 Frequency histogram Frequency = count of items with each value ggplot(data) + geom_histogram(aes(x=workclass),stat="count") 29
30 Relative frequency histogram Relative frequency = proportion (0..1) or percentage (0..100%) of items with each value Heights of bars sum up to 1 ggplot(data) + geom_bar(aes(x=workclass,..count../sum(..count..)),stat="count") 30
31 Depends on the goal Frequency histogram Gives actual counts in the data Relative frequency histogram Gives proportions in the data Interpretable as probability distribution of a randomly chosen item 31
32 Continuous attribute: Usually value is different in each item Need to introduce bins (a.k.a. intervals, ranges) Histogram not informative without bins: ggplot(data) + geom_histogram(aes(x=factor(salaries)),stat="count") 32
33 Frequency histogram of binned data Frequency = count of items in each bin ggplot(data) + geom_histogram(aes(x=salaries),stat="bin",boundary=0,binwidth=10000) 33
34 Relative frequency histogram of binned data Relative frequency = proportion of items in bins Heights of bars add up to 1 ggplot(data) + geom_histogram(aes(x=salaries,..count../sum(..count..)),stat="bin",boundary=0,binwidth=10000) 34
35 Density histogram of binned data Density = Y-axis such that areas of bars in the histogram add up to 1 Density scale is invariant to the sizes of bins ggplot(data) + geom_histogram(aes(x=salaries,..density..),stat="bin",boundary=0,binwidth=10000) 35
36 Density histogram of binned data Density = Y-axis such that areas of bars in the histogram add up to 1 Density scale is invariant to the sizes of bins + geom_histogram(aes(x=salaries,..density..),stat="bin",boundary=0,binwidth=1000,alpha=0,colour="red") 36
37 Density histogram of binned data Density = Y-axis such that areas of bars in the histogram add up to 1 Density scale is invariant to the sizes of bins + geom_histogram(aes(x=salaries),stat="density",colour="blue") 37
38 Represented by the probability density function (pdf) Area under the curve is equal to 1 Areas represent probabilities Area = P(a<X<b) = probability that X is between a and b 38
39 Data understanding: distribution of attributes Types of histograms How to describe probability distributions? Some standard probability distributions More ways to visualise distributions Visualising relations of attributes Are the attributes related? 39
40 40
41 Statistic measure calculated from all values Mode of the distribution: The most probable value (or values) The most frequent value if discrete Value with highest density if continuous Median and other quantiles We have defined earlier Mean of the distribution: Average value of the attribute Arithmetic average if discrete Expected value if continuous Centre of mass of the distribution 41
42 Consider attribute with values: 1,2,2,2,3,3,4,7 Mode: 2 because occurs three times Median 2.5 because 2 & 3 are in the middle, (2+3)/2=2.5 Mean: 3.0 because ( )/8 = 24/8 =
43 A. 0 B. 0.5 C. 1 D. 1.5 E. None of the above 43
44 A. 0 B. 0.5 C. 1 D. 1.5 E. None of the above 44
45 A. 0 B. 0.5 C. 1 D. 1.5 E. None of the above 45
46 Variance of a distribution is: average squared deviation from the mean Standard deviation is square root of variance Quadratic average deviation from the mean Example: Values: 1,2,2,2,3,3,4,7 Mean: 3.0 Variance: ((1-3)^2+(2-3)^2+ +(7-3)^2) / 8 = 3.0 Standard deviation: sqrt(3.0)=
47 Symmetric Skewed / right-skewed / left-skewed Heavy-tailed Bimodal Multi-modal Capped / right-capped / left-capped 47
48 Simple definitions, not fully correct: Unimodal 1 mode Bimodal 2 modes Multimodal multiple modes Actually: Bimodal usually means that the density (pdf) has two local maxima: Multimodal means pdf has multiple maxima (2 or more) 48
49 Symmetric distribution: Symmetric around a vertical axis of symmetry Left and right side are mirror-images of each other Left- (or right-)skewed distribution: Non-symmetric and mean below (above) mode Usually used only for unimodal distributions sym negati positiv m vely ely et ske ske ri wed we c d 49
50 Data understanding: distribution of attributes Types of histograms How to describe probability distributions? Some standard probability distributions More ways to visualise distributions Visualising relations of attributes Are the attributes related? 50
51 51
52 Statisticians have names for many different families of probability distributions Why need to know some of them? In practice these are used to communicate the distribution without having to visualise it We will talk about: Uniform distributions Normal distributions Power law distributions 52
53 All options are equally probable 53
54 All values in [a,b] are equally probable: The pdf is constant between a and b, and 0 elsewhere 54
55 Represent data dispersion, spread Represent central tendency 55
56 N(mean,variance) The most common non-uniform continuous distribution Why? Sum of many independent and identically distributed (i.i.d.) random variables is approximately normally distributed Standard normal distribution: N(0,1) 56
57 Heavy-tailed (or long-tailed) distribution Very high or low values are likely More likely than in case of normal distribution Technical definition: Tails are not exponentially bounded 57
58 Alternative names: scale-free / scale-independent Distributions on positive real values The probability of M times bigger value is K times smaller (power law; M, K - parameters) Linearly descending pdf when drawn in loglog-scale (both x and y logarithmic) Examples: Number of connections in many real-world graphs Frequencies of words in languages Sizes of craters on the moon 58
59 A. Uniform B. Unimodal symmetric C. Unimodal right-skewed D. Unimodal left-skewed E. Bimodal symmetric F. Bimodal asymmetric G. Multi-modal symmetric H. Multi-modal asymmetric I. Normally distributed J. Power-law distributed 59
60 A. Uniform B. Unimodal symmetric C. Unimodal right-skewed D. Unimodal left-skewed E. Bimodal symmetric F. Bimodal asymmetric G. Multi-modal symmetric H. Multi-modal asymmetric I. Normally distributed J. Power-law distributed 60
61 A. Uniform B. Unimodal symmetric C. Unimodal right-skewed D. Unimodal left-skewed E. Bimodal symmetric F. Bimodal asymmetric G. Multi-modal symmetric H. Multi-modal asymmetric I. Normally distributed J. Power-law distributed 61
62 A. Uniform B. Unimodal symmetric C. Unimodal right-skewed D. Unimodal left-skewed E. Bimodal symmetric F. Bimodal asymmetric G. Multi-modal symmetric H. Multi-modal asymmetric I. Normally distributed J. Power-law distributed 62
63 Data understanding: distribution of attributes Types of histograms How to describe probability distributions? Some standard probability distributions More ways to visualise distributions Visualising relations of attributes Are the attributes related? 63
64 64
65 Compactly visualise many distributions Density plots rotated by 90º and mirrored Distribution of salaries for each education level ggplot(data) + geom_violin(aes(x=education,y=salaries)) 65
66 Compact visualise many distributions marked median, upper and lower quartile, and outliers (R ggplot outlier = more than 1.5x interquartile range from quartile) ggplot(data) + geom_boxplot(aes(x=education,y=salaries)) 66
67 Violin plots and box plots can also be shown together: ggplot(data) + geom_violin(aes(x=education_factor,y=salaries)) + geom_boxplot(aes(x=education_factor,y=salaries),alpha=0.3) 67
68 Often too crowded, but sometimes provide extra insights compared to box plots and violin plots Over-crowded with too many points ggplot(data) + geom_point(aes(x=education_factor,y=salaries)) 68
69 Often too crowded, but sometimes provide extra insights compared to box plots and violin plots More useful, here only 100 points ggplot(data) + geom_point(aes(x=education_factor,y=salaries)) 69
70 Data understanding: distribution of attributes Types of histograms How to describe probability distributions? Some standard probability distributions More ways to visualise distributions Visualising relations of attributes Are the attributes related? 70
71 71
72 2 categorical attributes: Cross-table (workclass & education) table(data$education,data$workclass) 72
73 2 categorical attributes: Cross-table (workclass & education) Heatmap (workclass & education) ggplot(data %>% group_by(education,workclass) %>% summarise(count=length(education))) + geom_tile(aes(y=education,x=workclass,fill=count)) + scale_fill_gradient(low='white',high='black',trans="log",breaks=c(1,10,100,1000)) + theme_bw() 73
74 2 categorical attributes: Cross-table (workclass & education) Heatmap (workclass & education) Cross-table and heatmap combined + geom_text(aes(x=workclass,y=education,label=count),color="red") 74
75 Scatter plot, box plot, violin plot 75
76 Scatter plot ggplot(data) + geom_point(aes(x=age,y=salaries)) 76
77 Scatter plot ggplot(data) + geom_point(aes(x=capital.gain,y=salaries)) 77
78 Scatter plot Discretise one (or both) of the attributes Discretise = make into categorical For instance, introduce bins library(arules) data$capital.gain.discretised = discretize(data$capital.gain,method="fixed",categories=c(0,5,10,20,50,100)*1000) ggplot(data) + geom_boxplot(aes(x=capital.gain.discretised,y=salaries)) 78
79 Make all pairwise visualisations and organise in a cross-table Perform dimensionality reduction Introduced later in the course Projects the data into a 2-dimensional space Then visualise Use colours, shapes, etc. 79
80 Four attributes visualised together ggplot(data) + geom_point(aes(x=capital.gain,y=salaries,color=workclass,shape=gender)) + scale_colour_brewer(type="qual") 80
81 Four attributes visualised together Often hard to read when many attributes visualised this way ggplot(data) + geom_point(aes(x=capital.gain,y=salaries,color=workclass,shape=gender)) + scale_colour_brewer(type="qual") 81
82 Data understanding: distribution of attributes Types of histograms How to describe probability distributions? Some standard probability distributions More ways to visualise distributions Visualising relations of attributes Are the attributes related? 82
83 83
84 There are many statistical tests about this, we will cover some in the next lecture Visual inspection can reveal some relations For example, university graduates have higher median salaries than others 84
85 There are many statistical tests about this, we will cover some in the next lecture Visual inspection can reveal some relations For example, university graduates have higher median salaries than others Correlation is a statistic on 2 numeric attributes, quantifying linear relationships 85
86 Mean (actually sample mean as explained in next lecture) x 1 n n i 1 x i Correlation (Pearson correlation coefficient) 86
87 Ranges from -1 to +1: R=-1: perfectly anti-correlated R=+1: perfectly correlated R=0: absolutely uncorrelated Positively correlated Negatively correlated 87
88 Uncorrelated Uncorrelated Uncorrelated 88
89 89
90 Several sets of (x, y) points, with the Pearson correlation coefficient of x and y for each set. Note that the correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). N.B.: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero. 90
91 91
92 A. All equal B. All different C. Some equal, some different 92
93 93
94 94
95 95
96 96
97 97
98 405 - Feb CO2 Noise Pressure 98
99 Feb 2017 Changed the scale of noise CO2 Pressure Noise 99
100 80 70 CO2 vs Noise in 405 y = x R² = Noise CO
101 80 70 Not the same as correlation! CO2 vs Noise in 405 y = x R² = Noise CO
102 r Pearson correlation coefficient R 2 coefficient of determination Proportion of variance in the target attribute that is predictable from the source attribute Basically, measures how well the data follow the regression line In simple linear regression R 2 is equal to the square of r 102
103 Source: 103
104 Source: 104
105 ????????? Source: 105
106 Source: 106
107 ????????? 107
108 108
109 Relation Correlation Causation 109
110 Relation Correlation Causation Spurious correlations! Multiple testing problem (next lecture) 110
111 Data understanding: distribution of attributes Types of histograms How to describe probability distributions? Some standard probability distributions More ways to visualise distributions Visualising relations of attributes Are the attributes related? 111
112 112
113 The most merciful thing in the world... is the inability of the human mind to correlate all its contents. H. P. Lovecraft The correlation of quality of life and cost of energy is huge Sam Altman Visualization is daydreaming with a purpose. Bo Bennett Source: 113
P8130: Biostatistical Methods I
P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data
More informationChapter 4. Displaying and Summarizing. Quantitative Data
STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range
More informationMidrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used
Measures of Central Tendency Mode: most frequent score. best average for nominal data sometimes none or multiple modes in a sample bimodal or multimodal distributions indicate several groups included in
More informationLast Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics
Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different
More informationΣ x i. Sigma Notation
Sigma Notation The mathematical notation that is used most often in the formulation of statistics is the summation notation The uppercase Greek letter Σ (sigma) is used as shorthand, as a way to indicate
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationCS 147: Computer Systems Performance Analysis
CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More information200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR
Ana Jerončić 200 participants [EUR] about half (71+37=108) 200 = 54% of the bills are small, i.e. less than 30 EUR (18+28+14=60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR
More informationTastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?
Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)
More information1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.
1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions
More information1. Exploratory Data Analysis
1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number
More informationSTAT 200 Chapter 1 Looking at Data - Distributions
STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the
More informationHistograms allow a visual interpretation
Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called
More informationADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes
We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures
More informationHistograms, Central Tendency, and Variability
The Economist, September 6, 214 1 Histograms, Central Tendency, and Variability Lecture 2 Reading: Sections 5 5.6 Includes ALL margin notes and boxes: For Example, Guided Example, Notation Alert, Just
More informationWeek 1: Intro to R and EDA
Statistical Methods APPM 4570/5570, STAT 4000/5000 Populations and Samples 1 Week 1: Intro to R and EDA Introduction to EDA Objective: study of a characteristic (measurable quantity, random variable) for
More informationStat 101 Exam 1 Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative
More informationPreliminary Statistics course. Lecture 1: Descriptive Statistics
Preliminary Statistics course Lecture 1: Descriptive Statistics Rory Macqueen (rm43@soas.ac.uk), September 2015 Organisational Sessions: 16-21 Sep. 10.00-13.00, V111 22-23 Sep. 15.00-18.00, V111 24 Sep.
More informationSTP 420 INTRODUCTION TO APPLIED STATISTICS NOTES
INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make
More informationMATH 10 INTRODUCTORY STATISTICS
MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:
More informationIAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES
IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES VARIABLE Studying the behavior of random variables, and more importantly functions of random variables is essential for both the
More informationAP Final Review II Exploring Data (20% 30%)
AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure
More informationChapter 4.notebook. August 30, 2017
Sep 1 7:53 AM Sep 1 8:21 AM Sep 1 8:21 AM 1 Sep 1 8:23 AM Sep 1 8:23 AM Sep 1 8:23 AM SOCS When describing a distribution, make sure to always tell about three things: shape, outliers, center, and spread
More informationSummary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1
Summary statistics 1. Visualize data 2. Mean, median, mode and percentiles, variance, standard deviation 3. Frequency distribution. Skewness 4. Covariance and correlation 5. Autocorrelation MSc Induction
More informationQUANTITATIVE DATA. UNIVARIATE DATA data for one variable
QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationDescriptive Statistics-I. Dr Mahmoud Alhussami
Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.
More information3.1 Measure of Center
3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects
More informationModule 1. Identify parts of an expression using vocabulary such as term, equation, inequality
Common Core Standards Major Topic Key Skills Chapters Key Vocabulary Essential Questions Module 1 Pre- Requisites Skills: Students need to know how to add, subtract, multiply and divide. Students need
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures
More informationUnit 2. Describing Data: Numerical
Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient
More informationIntroduction to Basic Statistics Version 2
Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts
More informationChapter 2 Descriptive Statistics
Chapter 2 Descriptive Statistics Lecture 1: Measures of Central Tendency and Dispersion Donald E. Mercante, PhD Biostatistics May 2010 Biostatistics (LSUHSC) Chapter 2 05/10 1 / 34 Lecture 1: Descriptive
More informationClass 11 Maths Chapter 15. Statistics
1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class
More informationSUMMARIZING MEASURED DATA. Gaia Maselli
SUMMARIZING MEASURED DATA Gaia Maselli maselli@di.uniroma1.it Computer Network Performance 2 Overview Basic concepts Summarizing measured data Summarizing data by a single number Summarizing variability
More informationChapter 2: Tools for Exploring Univariate Data
Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is
More informationIndex I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474
Index A Absolute value explanation of, 40, 81 82 of slope of lines, 453 addition applications involving, 43 associative law for, 506 508, 570 commutative law for, 238, 505 509, 570 English phrases for,
More informationSociology 6Z03 Review I
Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing
More informationStat 20: Intro to Probability and Statistics
Stat 20: Intro to Probability and Statistics Lecture 5: Summary Statistics Tessa L. Childers-Day UC Berkeley 30 June 2014 By the end of this lecture... You will be able to: Describe a data set by its:
More informationMIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability
STA301- Statistics and Probability Solved MCQS From Midterm Papers March 19,2012 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability
More informationStatistical Methods. by Robert W. Lindeman WPI, Dept. of Computer Science
Statistical Methods by Robert W. Lindeman WPI, Dept. of Computer Science gogo@wpi.edu Descriptive Methods Frequency distributions How many people were similar in the sense that according to the dependent
More informationUnit Two Descriptive Biostatistics. Dr Mahmoud Alhussami
Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are
More information2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS
Spring 2015: Lembo GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Descriptive statistics concise and easily understood summary of data set characteristics
More informationChapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation
Chapter Four Numerical Descriptive Techniques 4.1 Numerical Descriptive Techniques Measures of Central Location Mean, Median, Mode Measures of Variability Range, Standard Deviation, Variance, Coefficient
More information2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table
2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations
More informationTOPIC: Descriptive Statistics Single Variable
TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency
More informationDescriptive Statistics
Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter
More informationANÁLISE DOS DADOS. Daniela Barreiro Claro
ANÁLISE DOS DADOS Daniela Barreiro Claro Outline Data types Graphical Analysis Proimity measures Prof. Daniela Barreiro Claro Types of Data Sets Record Ordered Relational records Video data: sequence of
More informationMathematics Standards for High School Algebra I
Mathematics Standards for High School Algebra I Algebra I is a course required for graduation and course is aligned with the College and Career Ready Standards for Mathematics in High School. Throughout
More informationDescriptive Univariate Statistics and Bivariate Correlation
ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to
More informationLecture Notes 2: Variables and graphics
Highlights: Lecture Notes 2: Variables and graphics Quantitative vs. qualitative variables Continuous vs. discrete and ordinal vs. nominal variables Frequency distributions Pie charts Bar charts Histograms
More informationLecture 2 and Lecture 3
Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.
More informationElementary Statistics
Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:
More informationKeystone Exams: Algebra
KeystoneExams:Algebra TheKeystoneGlossaryincludestermsanddefinitionsassociatedwiththeKeystoneAssessmentAnchorsand Eligible Content. The terms and definitions included in the glossary are intended to assist
More information3rd Quartile. 1st Quartile) Minimum
EXST7034 - Regression Techniques Page 1 Regression diagnostics dependent variable Y3 There are a number of graphic representations which will help with problem detection and which can be used to obtain
More informationChapter I: Introduction & Foundations
Chapter I: Introduction & Foundations } 1.1 Introduction 1.1.1 Definitions & Motivations 1.1.2 Data to be Mined 1.1.3 Knowledge to be discovered 1.1.4 Techniques Utilized 1.1.5 Applications Adapted 1.1.6
More informationWhat is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.
What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,
More information1 Measures of the Center of a Distribution
1 Measures of the Center of a Distribution Qualitative descriptions of the shape of a distribution are important and useful. But we will often desire the precision of numerical summaries as well. Two aspects
More informationExploring, summarizing and presenting data. Berghold, IMI, MUG
Exploring, summarizing and presenting data Example Patient Nr Gender Age Weight Height PAVK-Grade W alking Distance Physical Functioning Scale Total Cholesterol Triglycerides 01 m 65 90 185 II b 200 70
More informationSummarizing and Displaying Measurement Data/Understanding and Comparing Distributions
Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Histograms, Mean, Median, Five-Number Summary and Boxplots, Standard Deviation Thought Questions 1. If you were to
More informationSummarizing Measured Data
Summarizing Measured Data 12-1 Overview Basic Probability and Statistics Concepts: CDF, PDF, PMF, Mean, Variance, CoV, Normal Distribution Summarizing Data by a Single Number: Mean, Median, and Mode, Arithmetic,
More informationDescriptive Statistics
Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize
More informationUnits. Exploratory Data Analysis. Variables. Student Data
Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as
More information3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population
. Measures of Central Tendency: Mode, Median and Mean Average a single number that is used to describe the entire sample or population. Mode a. Easiest to compute, but not too stable i. Changing just one
More informationIntroduction to Statistics
Introduction to Statistics By A.V. Vedpuriswar October 2, 2016 Introduction The word Statistics is derived from the Italian word stato, which means state. Statista refers to a person involved with the
More informationChapter2 Description of samples and populations. 2.1 Introduction.
Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that
More informationChapter 3. Data Description
Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.
More informationFLORIDA STANDARDS TO BOOK CORRELATION
FLORIDA STANDARDS TO BOOK CORRELATION Florida Standards (MAFS.912) Conceptual Category: Number and Quantity Domain: The Real Number System After a standard is introduced, it is revisited many times in
More informationObservations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra
September The Building Blocks of Algebra Rates, Patterns and Problem Solving Variables and Expressions The Commutative and Associative Properties The Distributive Property Equivalent Expressions Seeing
More informationQuantitative Tools for Research
Quantitative Tools for Research KASHIF QADRI Descriptive Analysis Lecture Week 4 1 Overview Measurement of Central Tendency / Location Mean, Median & Mode Quantiles (Quartiles, Deciles, Percentiles) Measurement
More informationExample 2. Given the data below, complete the chart:
Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is
More informationChapter 1. Looking at Data
Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,
More informationStat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago
Stat 22000 Lecture Slides Exploring Numerical Data Yibi Huang Department of Statistics University of Chicago Outline In this slide, we cover mostly Section 1.2 & 1.6 in the text. Data and Types of Variables
More informationMATH 117 Statistical Methods for Management I Chapter Three
Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.
More informationTypes of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511
Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories
More informationStatistics and parameters
Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize
More informationLecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)
Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning
More informationare the objects described by a set of data. They may be people, animals or things.
( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms
More informationLesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table
Lesson Plan Answer Questions Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 1 2. Summary Statistics Given a collection of data, one needs to find representations
More information1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.
Chapter 4 Statistics 45 CHAPTER 4 BASIC QUALITY CONCEPTS 1.0 Continuous Distributions.0 Measures of Central Tendency 3.0 Measures of Spread or Dispersion 4.0 Histograms and Frequency Distributions 5.0
More informationAlgebra I Number and Quantity The Real Number System (N-RN)
Number and Quantity The Real Number System (N-RN) Use properties of rational and irrational numbers N-RN.3 Explain why the sum or product of two rational numbers is rational; that the sum of a rational
More informationBNG 495 Capstone Design. Descriptive Statistics
BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus
More informationChapter 2 Class Notes Sample & Population Descriptions Classifying variables
Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is
More informationCurriculum Summary 8 th Grade Algebra I
Curriculum Summary 8 th Grade Algebra I Students should know and be able to demonstrate mastery in the following skills by the end of Eighth Grade: The Number System Extend the properties of exponents
More information3 GRAPHICAL DISPLAYS OF DATA
some without indicating nonnormality. If a sample of 30 observations contains 4 outliers, two of which are extreme, would it be reasonable to assume the population from which the data were collected has
More informationAlgebra 1 Mathematics: to Hoover City Schools
Jump to Scope and Sequence Map Units of Study Correlation of Standards Special Notes Scope and Sequence Map Conceptual Categories, Domains, Content Clusters, & Standard Numbers NUMBER AND QUANTITY (N)
More informationFoundations of Algebra/Algebra/Math I Curriculum Map
*Standards N-Q.1, N-Q.2, N-Q.3 are not listed. These standards represent number sense and should be integrated throughout the units. *For each specific unit, learning targets are coded as F for Foundations
More informationLecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)
Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.
More informationDublin City Schools Mathematics Graded Course of Study Algebra I Philosophy
Philosophy The Dublin City Schools Mathematics Program is designed to set clear and consistent expectations in order to help support children with the development of mathematical understanding. We believe
More informationStatistics in medicine
Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease
More informationCOMMON CORE STATE STANDARDS TO BOOK CORRELATION
COMMON CORE STATE STANDARDS TO BOOK CORRELATION Conceptual Category: Number and Quantity Domain: The Real Number System After a standard is introduced, it is revisited many times in subsequent activities,
More informationBiostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015
Biostatistics for biomedical profession BIMM34 Karin Källen & Linda Hartman November-December 2015 12015-11-02 Who needs a course in biostatistics? - Anyone who uses quntitative methods to interpret biological
More informationProbabilities and Statistics Probabilities and Statistics Probabilities and Statistics
- Lecture 8 Olariu E. Florentin April, 2018 Table of contents 1 Introduction Vocabulary 2 Descriptive Variables Graphical representations Measures of the Central Tendency The Mean The Median The Mode Comparing
More informationST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart
ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More informationMath 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency
Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More information