Sources and Methods for the Analysis of International Data
|
|
- Andrea McKenzie
- 5 years ago
- Views:
Transcription
1 UNIVERSITA' DEGLI STUDI DI NAPOLI FEDERICO II Master s Degree in International Relations Sources and Methods for the Analysis of International Data F. Di Iorio
2 Couse Outline Population and statistical unit. Statistical variables. Data Collection. Data sources. Official statistics sources and online databases: ISTAT, Eurostat, World Bank, UN, WHO, etc. Exploratory data analysis: representing and synthesizing a distribution. Graphical representations. Measures of location and their properties (mean, median, mode, etc). Measure of variability and their properties (variance, standard deviation, interquartile range, etc). Measures of skewness and the boxplot. Concentration. Analysis of relationships between variables: covariance, correlation and simple regression. Chi square test for independence. Basics of the Analysis of Variance. Data manipulation and data analysis with Excel.
3 What is statistics? Statistics consists of a body of methods for collecting and analyzing data. (Agresti & Finlay, 1997) Statistics is the methodology which scientists and mathematicians have developed for interpreting and drawing conclusions from collected data. Statistical methods can be used to find answers to the questions like: What kind and how much data need to be collected? How should we organize and summarize the data? How can we analyse the data and draw conclusions from it? How can we assess the strength of the conclusions and evaluate their uncertainty?
4 What is statistics? Statistics provides methods for: Design: Planning and carrying out research studies. Description: Summarizing and exploring data. Inference: Making predictions and generalizing about phenomena represented by the data.
5 Population and statistical unit Population and sample are two basic concepts of statistics. Population is the set of individual persons or objects in which an investigator is primarily interested. Population is the collection of all individuals or items under consideration in a statistical study (Weiss, 1999). A (statistical) population is the set of measurements corresponding to the entire collection of units for which inferences are to be made (Johnson & Bhattacharyya, 1992). Sample is that part of the population from which information is collected. (Weiss, 1999). A sample from statistical population is the set of measurements that are actually collected in the course of an investigation (Johnson & Bhattacharyya, 1992).
6 Population and statistical unit The source of each measurement as sampling unit. A statistical unit is the unit of observation or measurement for which data are collected or derived. An important feature of the Statistical unit is the fact that it concerns the "results" side of the statistical process ; it is the elementary building block for the calculation of statistical aggregates. The statistical unit may be distinct from the collection unit: for example, it is possible to collect information about the "salaried employees" statistical unit by selecting a sample of establishments and obtaining the required information about all or part of the salaried employees working in these establishments
7 Statistical variables A characteristic that varies from one person or thing to another is called a variable, i.e, a variable is any characteristic that varies from one individual member of the population to another. Examples of variables: height, weight, numbers of children in family, sex, marital status, and eye color. First three of these variables yield numerical information and are examples of quantitative variables (continuous or discrete) Last three yield non-numerical information and are examples of qualitative (or categorical) variables.
8 Scales for Qualitative Variables Qualitative Variables can be described according to the scale on which they are defined. The categories into which a qualitative variable falls may or may not have a natural ordering E.g. occupational status (employed- non employed) have no natural ordering. E.g. Education (primary, high school, university) have natural ordering. Qualitative variable with unordered categories are defined on a nominal scale (categories are merely names). Qualitative variable with ordered categories are defined in ordinal scale. Based on what scale a qualitative variable is defined, the variable can be called as a nominal variable or an ordinal variable. Examples of ordinal variables are education (classified e.g. as low, high) and "strength of opinion" on some proposal (classified according to whether the individual favors the proposal, is indifferent towards it, or opposites it), and position at the end of race (first, second, etc.).
9 The W s of a Data Set Who the observations (population set of all objects you are interested in obtaining the value of some parameter for since we usually can t observe all objects, we take a sample of objects a subset of the overall population of objects to observe) What the variables Why why was the data collected How how was the data collected When/Where more information that could be relevant
10 Organization of the data Observing the values of the variables for one or more people or things yield data. Each individual piece of data is called an observation and the collection of all observations for particular variables is called a data set or data matrix. Data set are the values of variables recorded for a set of sampling units. For manipulating (recording and sorting) the values of the qualitative variable, they are often coded by assigning numbers to the different categories, and thus converting the categorical data to numerical data in a trivial sense. For example, marital status might be coded by letting 1,2,3, and 4 denote a person s being single, married, widowed, or divorced but still coded 11 data still continues to be nominal data. Coded numerical data do not share any of the properties of the numbers we deal with ordinary arithmetic. With recards to the codes for marital status, we cannot write 3 > 1 or 2 < 4, and we cannot write 2 1 = 4 3 or = 4. This illustrates how important it is always check whether the mathematical treatment of statistical data is really legimatitelegimatite.
11 Organization of the data Data is presented in a matrix form (data matrix). All the values of particular variable is organized to the same column; the values of variable forms the column in a data matrix. Observation, i.e. measurements collected from sampling unit, forms a row in a data matrix. E.g. k variables and n numbers of observations (sample size is n).
12 Data Collection Who the observations (population set of all objects you are interested in obtaining the value of some parameter for since we usually can t observe all objects, we take a sample of objects a subset of the overall population of objects to observe) Note: There is NO such thing as a population sample Data Collection or sample population. What the variables Why why was the data collected How how was the data collected (related to design/sampling in chapters 12-13) When/Where more information that could be relevant
13 Data sources Official statistics sources Corporate data Sample survey
14 Official statistics sources and online databases Official statistics are Statistics published by government agencies or other public agency such as international organizations. They provide quantitative or qualitative data on all major aspects of citizens' lives: economic and social situations, living conditions,health,education, and so on. National Statistical Office (ISTAT, INSEE, INE, etc) EUROSTAT, NBER Statistical departments of ministries (es. Ministero della Giustizia- Direzione Generale di Statistica e Analisi Organizzativa) OCSE-OECD, IMF, ONU
15 Official statistics sources and online databases Autentica utente
16 Official statistics sources and online databases Autentica utente
17 Official statistics sources and online databases Autentica utente
18 Official statistics sources and online databases Autentica utente
19
20
21
22 Exploratory data analysis Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics. Synthesizing a distribution by Frequency Tables and by Measures of location and their properties (mean, median, mode, etc) and variability. There are a number of graphical tools that are useful Typical graphical techniques: Histogram, Box Plot, Pie chart
23 Frequency table Suppose 30 students in a statistics class took a test and made the following scores: It is natural to group the scores on the standard ten-point scale, and count the number of scores in each group
24 Frequency table Then we have Class Absolute frequency Relative frequency Percentage Total
25 Histogram
26 Frequency and histogram The same procedure can be applied to any collection of numerical data. Observations are grouped into several classes and the frequency (the number of observations) of each class is noted. These classes are arranged and indicated in order on the horizontal axis (called the x-axis), and for each group a vertical bar, whose length is the number of observations in that group, is drawn. The relative frequency table or histogram for the data are exactly the same as the frequency table or histogram except that the vertical axis in the relative frequency, that is the ratio: frequency of a class i f i = Total number of observations
27 Frequency Tables
28 Frequency Tables
29 How to Table unit of measurement and time period self-explanatory title categories Total
30 How to Table self-explanatory title unit of measurement and time period Time series categories of which
31 Grouping variable
32 Combo! Total by rows Time series categories categories Age groups Total by column
33 Pie chart
34 Bar Chart
35 Pictogramme
36 Histogram
37 Time series
38 Time series
39 Time series
40 Infographics (good for a website, not for a repot)
41
42 NO!!!!! (impressive for a website, totally illegible)
43 NO!!!!! (impressive for a website, totally illegible)
44 Statistical Indeces Statistical Indeces location variability shape Mean Median Mode Quantiles Variance Stand. Deviation Range IQ Range Asimmetry Kurtosis
45 Mean, Median, Mode Mean (average): The mean is found by adding up all of the given data and dividing by the number of data. Example: the grade 10 math class recently had a mathematics test: 464 / 6 = Hence, 77.3 is the mean average of the class.
46 Mean, Median, Mode Median: The median is the middle number. First you arrange the numbers in order from lowest to highest, then you find the middle number by crossing off the numbers until you reach the middle. Stupid example: if data are 1,2,3,4,5 the median is 3 Example: Consider First arrange the numbers as you can see we have two numbers in the middle, there is no middle number. Solution: take the two middle numbers and find the average, ( or mean ) = / 2 = 76.5 Hence, the median is 76.5.
47 Mean, Median, Mode Mode: this is the number that occurs most often. Example: Given the following data: The mode is 78.
48 Mean, Median, Mode
49 Properties of Arithmetic Mean: 1. The sum of deviations of the items from the arithmetic mean is always zero i.e. (X m) =0. 2. The Sum of the squared deviations of the items from arithmetic mean is minimum, which is less than the sum of the squared deviations of the items from any other values. 3. If each item in the series is replaced by the mean, then the sum of these substitutions will be equal to the sum of the individual items. 4. The arithmetic mean is between the minimum and maximum values of the sample
50 Demerits of Arithmetic Mean It is affected by extreme items i.e., very small and very large items (Variance is everything!!). Es. 1,2,3,4,5 3 1,2,3,4,50 15 In some cases A.M. does not represent the actual item. For example, average patients admitted in a hospital is 10.7 per day. A.M. is not suitable in extremely asymmetrical distributions.
51 Variability Consider the mark of the following three students A={ } B={ } C={ } The 3 students heve the same mean: 25 but we can say that this value has the same meaning in all cases? C s marks have no variability, B have just 22 and 28 (and one 25) VARIANCE IS EVERYTHING
52 Variance Variability is a measurement of the spread between numbers in a data set. The spread can be measured with respect different point The VARIANCE measures how far each number in the set is from the MEAN. Variance is defined as the average of the squared of the deviations of the numbers from the arithmetic mean (remember property 2 of the Mean) In the previous example: Var(A)=4.9 Var(B)=8.2 Var(C) =0
53 Mean, variance and standard devation
54 Mean, variance and standard devation for grouped data
55 Mean, Median, Mode
56 =C5*D5 =Sum(C4:C11) =Sum(D4:D11) =D13/C13
57
58 Variance
59 Quantiles In statistics and probability quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations (frequency distribution) in a sample in the groups of same dimension. 2-quantile is the median 4-quantiles are called quartiles (denoted as Q) the difference between upper and lower quartiles is the interquartile range IQR = Q 3 Q 1 Example: 0,3,3,4,5,6,7,7,7,8,9,11,13,14,15,17,17,17,20 Q1 Me Q3
60 Quartiles Q1 and Q3
61 Interquartile range (IQR) The interquartile range (IQR) is a measure of the spread of a distribution of a single quantitative variable. The IQR is a rather simple calculation and is merely the difference between (hence range ) the upper quartile (Q3) and the lower quartile (Q1) (hence inter and quartile ). Unlike total range, the interquartile range has a breakdown point of 25%, and is thus often preferred to the total range since give a measure of dispersion robust to extreme values. as a tool to determine possible outliers For a symmetric distribution (where the median equals the mean and the average of the first and third quartiles), half the IQR equals the median absolute deviation (MAD).
62 Box plot box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-andwhisker diagram. Outliers, defined as observations that fall below Q1 1.5*IQR or above Q *IQR, may be plotted as individual points.
63 Box plot
64 Box plot: compare distributions
65 Simmetric distribution For symmetric distributions: mean approximately equal to median. The tail is the part where the counts in the histogram become smaller. For a symmetric distribution, the left and right tails are equally balanced, meaning that they have about the same length.
66 Asimmetry Right skewed distribution: the mean greater than the median. Tail of the distribution on the right hand (positive) side is longer than on the left hand side. From the box and whisker diagram we see that the median is closer to the Q1 than Q3.
67 Asimmetry
68 Kurtosis
69 Gini Index and Concentration curve The Gini Index (or coefficient) measures the inequality among values of a frequency distribution. Notable examples are the levels of income or wealth. The coefficient is in [0;1] zero represents perfect equality (same values in the distribution: everyone has the same income) and 1 represents maximal inequality among values (e.g., only one person hasall the income or consumption, and all others have none) Index often used in economic reports (see IMF, OECD etc)
70
71 Gini Index and Concentration curve The index is half of the relative mean absolute difference that is the average absolute difference of all pairs of values of the population divided by the average The graphical representation is the concentration curve or the Lorenz curve
72 Concentration curve The Lorenz curve is a function where the cumulative proportion of ordered individuals is on the horizontal axis and the corresponding cumulative proportion portion of the variable (wealth or income) is on the vertical axis
73 Gini Index and Concentration curve
74 Composite indicators
75 Indicators A statistical indicator is the representation of statistical data for a specified time, place or any other relevant characteristic, corrected for at least one dimension (usually size) so as to allow for meaningful comparisons. It is a summary measure related to a key issue or phenomenon and derived from a series of observed facts. Indicators can be used to reveal relative positions or show positive or negative change By themselves, indicators do not necessarily contain all aspects of development or change, but they hugely contribute to explaining them. They allow comparisons over time between, for instance, countries and regions, and in this way assist in gathering evidence for decision making.
76 Indicators Accident at work Age-specific fertility rate Agricultural area (AA), Agricultural income Divorce rate, marriage rate, mortality rate, death rate, Fertility rate, birth rate, CPI, Death rate of enterprise, Deficit, Deflator of sales Degree of defoliation Disposable income, GDP, Concentration index, Average labour cost per hour, Average monthly labour cost, Employment rate, Gross electricity consumption Etc etc
77 Indicators A single indicator described a single aspect, a single phenomenon but our world can be more complex E.g. the GPD per capita describes the state of an economy but it doesn t take in to account other possible aspects such as the status of the labour market, Dependence on foreign raw materials (energy in particular), the environment conditions and so on. More a Comparison between States, or institutions or systems in general become a difficult task when considering a set of single indcators (who s the more relevant indicators? How define a ranking?) From a set of indicators to a COMPOSITE INDICATOR
78 Composite indcators Composite indicators are tools for assessing and ranking countries and institutions in terms of environmental performance, sustainability, and other complex concepts that are not directly measurable A composite indicator is formed when individual indicators are compiled into a single index, on the basis of an underlying model of the multi-dimensional concept that is being measured.
79 Composite indicators A well designed composite indicators can provide a comprehensive vision of a multidimensional phenomenon allows for the setting of national benchmarks a for further international comparisons is a starting point for analysis and discussion
80 Composite indicatos The number of CIs in existence around the world is growing year after year (more than 160 composite indicators). Such composite indicators provide simple comparisons of countries that can be used to illustrate complex and sometimes elusive issues in wide-ranging fields, e.g., environment, economy, society or technological development. It is easier interpret CIs than to identify common trends across many separate indicators, and they are useful in benchmarking country performance. However, CIs can send misleading policy messages if they are poorly constructed or misinterpreted
81 Composite indicators
82 Composite indicators
83 How build a CI 3. Input missing data
84 How build a CI
85 How build a CI
86 How build a CI
87 Then: Have a clear idea of what you want to measure Choose the elementary indicators that best fir to the research question Identify the best way to synthesize indicators (simple average? Weighted average? when weighted, what weights? Find strengths and weaknesses of each selected indicator Normalize: obtain a CI possible in the range [0,1]
88
89
90 Weighting and aggregation Weights are essentially value judgements. Weights can be based on statistical methods, Weights can be chosen to reward (or punish) components that are deemed more (or less) influential, depending on expert opinion, to better reflect policy priorities or theoretical factors.
91 Weighting and aggregation Most composite indicators rely on equal weighting (EW) (simple average) Weight can be based on statistical models: Principal components analysis, Data envelopment analysis, Regression analysis Based on public/expert opinions
92 Robustness and sensitivity Uncertainty analysis focuses on how uncertainty in the input factors propagates through the structure of the composite indicator and affects the composite indicator values. Sensitivity analysis assesses the contribution of the individual source of uncertainty to the output variance. Possible actions: 1. Inclusion and exclusion of individual indicators. 2. Using alternative data normalisation schemes, such as Mni-Max, standardisation, use of rankings. 3. Using different weighting schemes
93 Presentation and dissemination The way composite indicators are presented is not a trivial issue. Tables provide the complete information, but sometimes can be obscure or long to read Graphical representation can help A tabular format is the simplest presentation, in which the composite indicator is presented for example for each country as a table of values. Usually countries are displayed in descending rank order. Rankings can be used to track changes in country performance over time. Composite indicators can be expressed via a simple bar chart; ex. countries are on the vertical axis and the values of the composite on the horizontal. The top bar can indicate the average performance of all countries
94 Some CI examples Environmental Sustainability Index(WEF) Air Quality Index (WEF) Human Development Index (United Nations) Health System Achievement Index (WHO) Corruption Perceptions Index (Transparency International) World Income Inequality Database:GiniIndex (United Nations) Economic Sentiment Indicator (EC) Composite Leading Indicators (OECD) Innovative Capacity Index(Porter and Stern) Investment/Performance in the knowledge based economy (EC) World Competitiveness Index (IMD)
95
96
97
Chapter 1:Descriptive statistics
Slide 1.1 Chapter 1:Descriptive statistics Descriptive statistics summarises a mass of information. We may use graphical and/or numerical methods Examples of the former are the bar chart and XY chart,
More informationCIVL 7012/8012. Collection and Analysis of Information
CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real
More informationChapter 2: Tools for Exploring Univariate Data
Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is
More informationIntroduction to Statistics
Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,
More informationMATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.
MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline. data; variables: categorical & quantitative; distributions; bar graphs & pie charts: What Is Statistics?
More informationLast Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics
Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationWhat is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.
What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,
More informationAP Final Review II Exploring Data (20% 30%)
AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific
More information(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)
3. Descriptive Statistics Describing data with tables and graphs (quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) Bivariate descriptions
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,
More informationChapter 3. Data Description
Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.
More informationLecture 1 : Basic Statistical Measures
Lecture 1 : Basic Statistical Measures Jonathan Marchini October 11, 2004 In this lecture we will learn about different types of data encountered in practice different ways of plotting data to explore
More informationElementary Statistics
Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number
More informationSTP 420 INTRODUCTION TO APPLIED STATISTICS NOTES
INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make
More informationChapter2 Description of samples and populations. 2.1 Introduction.
Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that
More informationAll the men living in Turkey can be a population. The average height of these men can be a population parameter
CHAPTER 1: WHY STUDY STATISTICS? Why Study Statistics? Population is a large (or in nite) set of elements that are in the interest of a research question. A parameter is a speci c characteristic of a population
More informationP8130: Biostatistical Methods I
P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data
More informationWhat is Statistics? Simple Summaries and Plots. Brian D. Ripley
What is Statistics? Simple Summaries and Plots Brian D. Ripley Statistics The name derives from state-istics, quantification of the state, an area now know as official statistics. Health statistics have
More information1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.
1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions
More informationLecture 1: Descriptive Statistics
Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics
More informationLecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #
Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures
More informationUnit 2. Describing Data: Numerical
Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient
More informationDescriptive Statistics-I. Dr Mahmoud Alhussami
Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.
More informationProbabilities and Statistics Probabilities and Statistics Probabilities and Statistics
- Lecture 8 Olariu E. Florentin April, 2018 Table of contents 1 Introduction Vocabulary 2 Descriptive Variables Graphical representations Measures of the Central Tendency The Mean The Median The Mode Comparing
More informationStat 101 Exam 1 Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative
More information1. Exploratory Data Analysis
1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be
More informationSTAT 200 Chapter 1 Looking at Data - Distributions
STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the
More informationBNG 495 Capstone Design. Descriptive Statistics
BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus
More informationFurther Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data
Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)
More information1 Measures of the Center of a Distribution
1 Measures of the Center of a Distribution Qualitative descriptions of the shape of a distribution are important and useful. But we will often desire the precision of numerical summaries as well. Two aspects
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures
More informationChapter 2 Class Notes Sample & Population Descriptions Classifying variables
Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is
More informationADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes
We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures
More informationDescriptive Statistics
Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter
More informationLecture Notes 2: Variables and graphics
Highlights: Lecture Notes 2: Variables and graphics Quantitative vs. qualitative variables Continuous vs. discrete and ordinal vs. nominal variables Frequency distributions Pie charts Bar charts Histograms
More informationUnit Two Descriptive Biostatistics. Dr Mahmoud Alhussami
Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are
More informationLecture 2 and Lecture 3
Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.
More informationST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart
ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that
More informationDescriptive Univariate Statistics and Bivariate Correlation
ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to
More informationMATH 1150 Chapter 2 Notation and Terminology
MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the
More informationNotation Measures of Location Measures of Dispersion Standardization Proportions for Categorical Variables Measures of Association Outliers
Notation Measures of Location Measures of Dispersion Standardization Proportions for Categorical Variables Measures of Association Outliers Population - all items of interest for a particular decision
More informationAfter completing this chapter, you should be able to:
Chapter 2 Descriptive Statistics Chapter Goals After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Find the range, variance, standard
More informationClass 11 Maths Chapter 15. Statistics
1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationTopic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!
Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of
More informationPractical Statistics for the Analytical Scientist Table of Contents
Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning
More information2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS
Spring 2015: Lembo GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS Descriptive statistics concise and easily understood summary of data set characteristics
More informationStatistics lecture 3. Bell-Shaped Curves and Other Shapes
Statistics lecture 3 Bell-Shaped Curves and Other Shapes Goals for lecture 3 Realize many measurements in nature follow a bell-shaped ( normal ) curve Understand and learn to compute a standardized score
More information1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.
Chapter 4 Statistics 45 CHAPTER 4 BASIC QUALITY CONCEPTS 1.0 Continuous Distributions.0 Measures of Central Tendency 3.0 Measures of Spread or Dispersion 4.0 Histograms and Frequency Distributions 5.0
More informationTOPIC: Descriptive Statistics Single Variable
TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency
More information2011 Pearson Education, Inc
Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value
More informationMATH 117 Statistical Methods for Management I Chapter Three
Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.
More information3.1 Measure of Center
3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects
More informationA SHORT INTRODUCTION TO PROBABILITY
A Lecture for B.Sc. 2 nd Semester, Statistics (General) A SHORT INTRODUCTION TO PROBABILITY By Dr. Ajit Goswami Dept. of Statistics MDKG College, Dibrugarh 19-Apr-18 1 Terminology The possible outcomes
More informationUnits. Exploratory Data Analysis. Variables. Student Data
Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationScales of Measuement Dr. Sudip Chaudhuri
Scales of Measuement Dr. Sudip Chaudhuri M. Sc., M. Tech., Ph.D., M. Ed. Assistant Professor, G.C.B.T. College, Habra, India, Honorary Researcher, Saha Institute of Nuclear Physics, Life Member, Indian
More informationShape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays
Histograms: Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Sep 9 1:13 PM Shape: Skewed left Bell shaped Symmetric Bi modal Symmetric Skewed
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More informationChapter 1. Looking at Data
Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,
More informationPreliminary Statistics course. Lecture 1: Descriptive Statistics
Preliminary Statistics course Lecture 1: Descriptive Statistics Rory Macqueen (rm43@soas.ac.uk), September 2015 Organisational Sessions: 16-21 Sep. 10.00-13.00, V111 22-23 Sep. 15.00-18.00, V111 24 Sep.
More informationCHAPTER 1. Introduction
CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing
More informationTypes of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511
Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories
More informationFinal Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above
King Abdul Aziz University Faculty of Sciences Statistics Department Final Exam STAT 0 First Term 49-430 A 40 Name No ID: Section: You have 40 questions in 9 pages. You have 90 minutes to solve the exam.
More informationReview for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data
Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature
More informationPractice problems from chapters 2 and 3
Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,
More informationA is one of the categories into which qualitative data can be classified.
Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative
More informationStatistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.
Statistics 1 Mathematical Model A mathematical model is a simplification of a real world problem. 1. A real world problem is observed. 2. A mathematical model is thought up. 3. The model is used to make
More informationChapter 2 Solutions Page 15 of 28
Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that
More informationChapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity
More informationSection 3. Measures of Variation
Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The
More informationLecture 1: Description of Data. Readings: Sections 1.2,
Lecture 1: Description of Data Readings: Sections 1.,.1-.3 1 Variable Example 1 a. Write two complete and grammatically correct sentences, explaining your primary reason for taking this course and then
More informationStatistics in medicine
Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease
More informationChapter 2: Descriptive Analysis and Presentation of Single- Variable Data
Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Mean 26.86667 Standard Error 2.816392 Median 25 Mode 20 Standard Deviation 10.90784 Sample Variance 118.981 Kurtosis -0.61717 Skewness
More informationWhat is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected
What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types
More informationF78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives
F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested
More informationMidrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used
Measures of Central Tendency Mode: most frequent score. best average for nominal data sometimes none or multiple modes in a sample bimodal or multimodal distributions indicate several groups included in
More informationTastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?
Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)
More informationFREQUENCY DISTRIBUTIONS AND PERCENTILES
FREQUENCY DISTRIBUTIONS AND PERCENTILES New Statistical Notation Frequency (f): the number of times a score occurs N: sample size Simple Frequency Distributions Raw Scores The scores that we have directly
More informationAnalytical Graphing. lets start with the best graph ever made
Analytical Graphing lets start with the best graph ever made Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian
More informationDescriptive Statistics C H A P T E R 5 P P
Descriptive Statistics C H A P T E R 5 P P 1 1 0-130 Graphing data Frequency distributions Bar graphs Qualitative variable (categories) Bars don t touch Histograms Frequency polygons Quantitative variable
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1
More informationLet's Do It! What Type of Variable?
Ch Online homework list: Describing Data Sets Graphical Representation of Data Summary statistics: Measures of Center Box Plots, Outliers, and Standard Deviation Ch Online quizzes list: Quiz 1: Introduction
More informationExample 2. Given the data below, complete the chart:
Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is
More informationMeasures of center. The mean The mean of a distribution is the arithmetic average of the observations:
Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number
More informationLet's Do It! What Type of Variable?
1 2.1-2.3: Organizing Data DEFINITIONS: Qualitative Data are those which classify the units into categories. The categories may or may not have a natural ordering to them. Qualitative variables are also
More informationStatistics I Chapter 2: Univariate data analysis
Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,
More informationUnit 2: Numerical Descriptive Measures
Unit 2: Numerical Descriptive Measures Summation Notation Measures of Central Tendency Measures of Dispersion Chebyshev's Rule Empirical Rule Measures of Relative Standing Box Plots z scores Jan 28 10:48
More informationHistograms allow a visual interpretation
Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called
More informationProbability Distributions
Probability Distributions Probability This is not a math class, or an applied math class, or a statistics class; but it is a computer science course! Still, probability, which is a math-y concept underlies
More informationStatistics. Industry Business Education Physics Chemistry Economics Biology Agriculture Psychology Astronomy, etc. GFP - Sohar University
Statistics اإلحصاء تعاريف 3-1 Definitions Statistics is a branch of Mathematics that deals collecting, analyzing, summarizing, and presenting data to help in the decision-making process. Statistics is
More information2 Descriptive Statistics
2 Descriptive Statistics Reading: SW Chapter 2, Sections 1-6 A natural first step towards answering a research question is for the experimenter to design a study or experiment to collect data from the
More informationMath 140 Introductory Statistics
Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The
More informationMath 140 Introductory Statistics
Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The
More informationOverview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition
Overview INFOWO Statistics lecture S1: Descriptive statistics Peter de Waal Introduction to statistics Descriptive statistics Department of Information and Computing Sciences Faculty of Science, Universiteit
More information