ASSESSING VARIATION: A UNIFYING APPROACH FOR ALL SCALES OF MEASUREMENT JSM Tamar Gadrich Emil Bashkansky

Size: px
Start display at page:

Download "ASSESSING VARIATION: A UNIFYING APPROACH FOR ALL SCALES OF MEASUREMENT JSM Tamar Gadrich Emil Bashkansky"

Transcription

1 ASSESSING VARIATION: A UNIFYING APPROACH FOR ALL SCALES OF MEASUREMENT Tamar Gadrich Emil Bashkansky (ORT Braude College of Engineering, Israel) Ri cardas Zitikis (University of Western Ontario, Canada) JSM 04

2 MOTIVATION For various reasons quite often we wish or need to measure the variability of populations, or samples, which can be quantitative, qualitative, and quite often mixed. social inequality and mobility, political consensus, homogeneity of some material, uncertainty of prediction, diversity or similarity of species, synchronization degree of biological rhythms, etc. These are complex tasks due to a number of reasons, not least because of the inherent heterogeneity of populations, which are usually made up of various groups and categories often requiring different scales of measurement.

3 Status quo A number of variability measures have been developed to accommodate various scales of measurement, and there are four of them: nominal, ordinal, interval, and ratio. The variety of scales and, accordingly, various restrictions on possible arithmetical operations and order relationships carry serious challenges for researchers and decision makers. Most popular are measures designed to evaluate variability of numerical (interval and ratio) data and among them are the range, IQR, variance, and SD, as well as measures based on mean absolute deviation/difference and also entropy-based measures. There is a necessity for developing variability measures based on legitimate arithmetical operations between categorical variables and their possible decomposition into intra - and inter-components in a single unifying way. 3

4 THE CLASSICAL VARIANCE AND THE GINI MEAN DIFFERENCE Classical variance: Gini formula for variance: ( k ) k with k k () k k VAR x p x p VAR x x p p ( i j) i j () j i Gini mean dif ference (GMD): GMD x x p p j i i j i j (3) For both definitions: j i L( x, x ) p p (4) i j i j 4

5 FOUR SCALES FOR QUALITY DATA Numerical Categorical Ratio Interval Ordinal Nominal,,,,,,,,,/,,,,,, Quality cost, delivery time, number of defects, MTTF temperature, calendar time, lake level customer satisfaction, status, FMECA ranking, belt rank (Y,G,B,MB), quality level... vendor, failure mode, record type, quality requirement 5

6 THE LAYOUT OF CATEGORICAL DATA 6

7 DESCRIPTION OF CATEGORICAL DATA Set of n ordinal/nominal data based on an scale with ordered/nonordered categories coded by integers k=,,, n, n,..., n Proportion of data belonging to the k-th category For ordinal data: cumulative frequency of data belonging up to k the k-th category Fˆ ˆ k pj n=500 apples j n (Low=, Medium=, High=3) 7,9,40 pˆ k ˆ p,, ˆ F,, nk n 7

8 MEASURING VARIABILIT Y - CATEGORICAL c Let, k, k, be category codes and k their corresponding probabilities, which we also call frequencies. L( c, c ) Let i j be a two-argument function, defined on the codes, which is non-negative, symmetric, and such that for all k. We call it loss-of-similarity function. The population total variation is defined by: V L( c, c ) p p T i j i j j i The sample total variation is defined by: Vˆ L( c, c ) pˆˆ p T i j i j j i p L( c, c ) 0 (5) (6) k k The m-th group variation: Vˆ n (, ) ( ) km ˆ ˆ ˆ m L ci c j pi m p j m pk m j i nm (7) 8

9 Let m group. Obviously, WEIGHTED AVERAGE OF THE WITHIN GROUP VARIATIONS n N m to present the proportion of data in the m-th M m M ˆ Vˆ (8) V W m m 9

10 THE BETWEEN GROUPS COVARIATION & ITS CHARACTERISTIC ERNEL Cˆ L( c, c ) B i j i j j i ˆ (9) M ˆ ( pˆ pˆ )( pˆ pˆ ) i j m i m i j m j m (0) 0

11 MEASURING VARIABILIT Y - CONTINUOUS V L( x, x) df( x) df( x) T V L( x, x) df( x m) df( xm) m C L( x, x) d( x, x) M B m ( )( ) ( x, x) m F( x m) F( x) F( x m) F( x)

12 TOTAL-VARIATION DECOMPOSITION THEOREM The total-variation can be split to the sum of the within-variation and the between covariation: V V C T W B ()

13 INDEX PVE PVE ˆ C ˆB V T () Note the following properties:. PVE = 0 when there is no association, that is, when there is no group effect on the category distribution. In mathematical terms, that is, the total-variation is a pure (i.e., without any interaction) aggregate of the individual group variations..pve = when data within every group fall into one (but perhaps not the same for all samples) category, that is, when there is perfect predictability. 3

14 INDEX OF SEGREGATION POWER (SP) AMONG GROUPS Rule of thumb: if SP > 3, homogeneity hypothesis H 0 must be rejected, if SP < - not rejected, the region [-3] is the region of doubt, i.e. more data is required. 4

15 EXAMPLE- MONTE CARLO SIMULATION 5

16 SPECIAL CASE : NOMINAL VARIABLES CATANOVA OF LIGHT AND MARGOLIN (97) L( c, c ) i j { 0 when i j, when i j. Normalizing the total-variation by its maximal value we obtain: IQV ( pˆ k ) the between-covariation: T k ( k ) k k k Vˆ pˆ pˆ pˆ M k Cˆ ( pˆ pˆ ) B m km k k m 6

17 SPECIAL CASE : ORDINAL VARIABLES ORDANOVA OF GADRICH AND BASHANSY(03) L( c, c ) i j i j not c c! i j Normalizing the total-variation by its maximal value we obtain: ˆ 4 h Fˆ ( ˆ k Fk) the between-covariation: Vˆ Fˆ ( Fˆ ) T k k k M k Cˆ ( Fˆ Fˆ ) B m km k k m (Berry & Mielke, 99) (Blair & Lacy, 996) 7

18 SPECIAL CASE 3: INTERVAL VARIABLES GMD L( x, x ) x x i j i j Vˆ F( x)( F( x)) dx T Normalizing the total-variation by its maximal value we obtain: RHS Gini mean difference ( GMD) the between-covariation: M ˆ ˆ ˆ CB m ( Fm ( x) F( x)) dx m 8

19 SPECIAL CASE 4: RATIO SCALE The loss-of-similarity function L( x, x) log( x) log( x) is well suited for the ratio scale. By adopting this function, we effectively replace our considerations on the ratio scale by those on the interval scale, and thus work with the loss-of-similarity function L( y, y) y y,where instead of the original x s we now deal with their logarithms y = log x. Hence, all our earlier results pertaining to the interval scale can be utilized in a straightforward manner to establish analogous results on the ratio scale. Of course, there is an element of arbitrariness in our choice of the logarithmic transformation there are indeed many alternatives. Nevertheless, our experience suggests that underlying problems and philosophies for tackling the problems usually restrict the class of loss-of-similarity functions as well as of transformations to just a few reasonable ones, and certain axiomatic approaches may even produce unique choices. 9

20 SUMMARY We have presented a unifying approach for assessing variation in populations and data sets that accommodates every scale of measurement: nominal, ordinal, interval, and ratio. In particular, we have put forward a general decomposition result for the total variation into within (intra)and between (inter) components. This has enabled us to introduce two indices: PVE as the proportion-of-variation-explained and SP as the segregation power. Our results extend and generalize the ORDANOVA method developed by Gadrich and Bashkansky (0) in the case of categorical ordinal variables. 0

21 THAN YOU FOR YOUR ATTENTION!

22 VARIATION DEFINITION Nominal: IQV p GMD ( ˆ k ) k Ordinal: ˆ ˆ ˆ k k 4 h F F k

23 CATANOVA (CATEGORICAL DATA ANALYSIS OF VARIATION) DECOMPOSITION M samples Within IQV W IQV IQV [ ( p )] M M ( m) ˆ WITHIN m WITHIN m km m m k Between IQV B M IQV [ ( pˆ ) pˆ ] BETWEEN m km k k m pˆkm - the frequency of data belonging to the k- th category in the m-th sample Total Variation ( pˆ k ) k IQV T ( ) pˆk - the total frequency of items belonging to the k-th category 3

24 ORDANOVA (ORDINAL DATA ANALYSIS OF VARIATION) DECOMPOSITION M ordinal samples of the same size n Within M ˆ ˆ M m 4 k F km ˆ h W within mth sample ( F ) km Between S B M ( F ˆ F ˆ. ) ( ) / 4 M m km k k between samples for every k th category Fˆkm The cumulative frequency of data belonging up to the k-th category in the m-th sample Total Variation Fˆ. ˆ k Fk. 4 k ˆ h T Fˆ k. M M m Fˆ km The total cumulative frequency of items belonging up to the k-th category 4

25 ORDANOVA DECOMPOSITION EXAMPLE () Given M=3 samples, size n=00, total N=600 items Classifying according to k=4 categories Samples data: Sample Category 3 Total Total

26 ORDANOVA DECOMPOSITION EXAMPLE () Cumulative frequency up to the k-th category within the m-th sample (k=,,3,4; m=,,3) Last column The total cumulative frequency of items belonging up to the k-th category Sample Category 3 Fˆk. 78/00 /00 4/00 4/600 4/00 /00 89/00 34/ /00 4/00 35/00 44/600 4 Total

27 ORDANOVA DECOMPOSITION EXAMPLE (3) sample Category 3 Fˆk. 78/00 /00 4/00 4/600 4/00 /00 89/00 34/ /00 4/00 35/00 44/600 4 Total ˆ / h T 7

28 ORDANOVA DECOMPOSITION EXAMPLE (4) sample Category 3 Fˆk. 78/00 /00 4/00 4/600 4/00 /00 89/00 34/ /00 4/00 35/00 44/600 4 Total hˆ hˆ ˆ h h ˆ Dispersion within Wthe m-th sample: W 3W h h h W 3 3 W W W h mw ˆ h W 4 / ˆ ˆ ˆ ˆ

29 ORDANOVA DECOMPOSITION EXAMPLE (5) sample Category 3 Fˆk. 3 4 Total 78/00 4/00 65/00 00 /00 /00 4/ /00 89/00 35/ /600 34/600 44/ S B S B S 3 B Classic variation between the samples for the k-th category ˆ S B S B S S B B B 4 4 S kb 9

30 ORDANOVA DECOMPOSITION EXAMPLE (6) sample Category 3 Fˆk. 78/00 /00 4/00 4/600 4/00 /00 89/00 34/ /00 4/00 35/00 44/600 4 Total hˆ hˆ S T W B

31 DISTINGUISHING STATISTIC FOR ORDINAL DATA Item measures according to an scale with categories M samples of equal size n are drawn Were all samples drawn from the same population characterized by p, p,..., p or not? Under H 0 B Multinomial distribution W M n N E between variation E within variation E total variation = = df df df T where : df = N -, df = M(n - ), df = M -, TOTAL WITHIN BETWEEN in other words : E( MS ) =E( MS ) = E( MS ) B W T 3

32 I cr df 0.95 DISTINGUISHING STATISTIC SP MS MS B T SP can be asymptotically approximated by ( M )( ) ( M )( ) The quintiles (95%, for example) of the last may be used for hypothesis checking. degrees of freedom ( M )( ) ( M )( )

33 DISTINGUISHING FACTOR IDENTIFICATION Data can be divided/segregated according to various type of factors (segregation). For each segregation, calculate the indicator I: The best segregating factor is the one for which the indicator is the largest. 33

34 DISTRIBUTION OF ACADEMIC DEGREE HOLDERS BY ORDINAL DEGREE LEVEL ( ST DEGREE, ND DEGREE, 3 RD DEGREE) First case: according to age Up to Total Under graduate degree ( st degree) Graduate degree ( nd degree) ,3,380,70 46 SP=,84 4,058 4, 65,80 376,60 8,793,759 Doctoral degree (3 rd degree) ,437 Total 448 4,378 5,53 6,40 3,44,805 34

35 DISTRIBUTION OF ACADEMIC DEGREE HOLDERS BY ORDINAL DEGREE LEVEL ( ST DEGREE, ND DEGREE, 3 RD DEGREE) Second case: according to religion & origin/ethnic group Under graduate degree ( st degree) Graduate degree ( nd degree) Doctoral degree (3 rd degree) Jews born in Israel 3,83 9,04,04 Jews born abroad,894,90 Moslems Christians Druze 07 Age is a much more SP=33 significant 58 Others distinguishing/segregating factor than religion & origin /ethnic Total 8,793,759,437 Total 448 4,378 5,53 6,40 3,44,805 35

36 EXAMPLE IDENTIFY THE DISTINGUISHING FACTOR Distribution of faculty by ordinal academic ranks (lecturer, senior lecturer, associate professor, full professor) in five different types of higher educational institutions. Indicator ratio= 08! Number of positions Lecturer Senior lecturer Associate professor Full professor Type Type Type 3 Type 4 Type 5 36

37 EXAMPLE IDENTIFY THE DISTINGUISHING FACTOR In order to find the outlier use the Jackknife procedure Option no SP

Module 10: Analysis of Categorical Data Statistics (OA3102)

Module 10: Analysis of Categorical Data Statistics (OA3102) Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

One-way ANOVA. Experimental Design. One-way ANOVA

One-way ANOVA. Experimental Design. One-way ANOVA Method to compare more than two samples simultaneously without inflating Type I Error rate (α) Simplicity Few assumptions Adequate for highly complex hypothesis testing 09/30/12 1 Outline of this class

More information

THE ROYAL STATISTICAL SOCIETY 2015 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 3

THE ROYAL STATISTICAL SOCIETY 2015 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 3 THE ROYAL STATISTICAL SOCIETY 015 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 3 The Society is providing these solutions to assist candidates preparing for the examinations in 017. The solutions are

More information

Introduction to Survey Analysis!

Introduction to Survey Analysis! Introduction to Survey Analysis! Professor Ron Fricker! Naval Postgraduate School! Monterey, California! Reading Assignment:! 2/22/13 None! 1 Goals for this Lecture! Introduction to analysis for surveys!

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

MATH Notebook 3 Spring 2018

MATH Notebook 3 Spring 2018 MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean Confidence Intervals Confidence interval for sample mean The CLT tells us: as the sample size n increases, the sample mean is approximately Normal with mean and standard deviation Thus, we have a standard

More information

Analysis of Variance and Co-variance. By Manza Ramesh

Analysis of Variance and Co-variance. By Manza Ramesh Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Randomized Decision Trees

Randomized Decision Trees Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition Overview INFOWO Statistics lecture S1: Descriptive statistics Peter de Waal Introduction to statistics Descriptive statistics Department of Information and Computing Sciences Faculty of Science, Universiteit

More information

Two-Sample Inferential Statistics

Two-Sample Inferential Statistics The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio. Answers to Items from Problem Set 1 Item 1 Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.) a. response latency

More information

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47 Contents 1 Non-parametric Tests 3 1.1 Introduction....................................... 3 1.2 Advantages of Non-parametric Tests......................... 4 1.3 Disadvantages of Non-parametric Tests........................

More information

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics - Lecture 8 Olariu E. Florentin April, 2018 Table of contents 1 Introduction Vocabulary 2 Descriptive Variables Graphical representations Measures of the Central Tendency The Mean The Median The Mode Comparing

More information

UNIVERSITY OF THE PHILIPPINES LOS BAÑOS INSTITUTE OF STATISTICS BS Statistics - Course Description

UNIVERSITY OF THE PHILIPPINES LOS BAÑOS INSTITUTE OF STATISTICS BS Statistics - Course Description UNIVERSITY OF THE PHILIPPINES LOS BAÑOS INSTITUTE OF STATISTICS BS Statistics - Course Description COURSE COURSE TITLE UNITS NO. OF HOURS PREREQUISITES DESCRIPTION Elementary Statistics STATISTICS 3 1,2,s

More information

15: CHI SQUARED TESTS

15: CHI SQUARED TESTS 15: CHI SQUARED ESS MULIPLE CHOICE QUESIONS In the following multiple choice questions, please circle the correct answer. 1. Which statistical technique is appropriate when we describe a single population

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Introduction to Structural Equation Modeling

Introduction to Structural Equation Modeling Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression

More information

Ch. 16: Correlation and Regression

Ch. 16: Correlation and Regression Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to

More information

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and

More information

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018 Page 1 of 4 QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018 ECONOMICS 250 Introduction to Statistics Instructor: Gregor Smith Instructions: The exam

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 5: Bivariate Correspondence Analysis

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 5: Bivariate Correspondence Analysis MS-E2112 Multivariate Statistical (5cr) Lecture 5: Bivariate Contents analysis is a PCA-type method appropriate for analyzing categorical variables. The aim in bivariate correspondence analysis is to

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

A Test of Homogeneity Against Umbrella Scale Alternative Based on Gini s Mean Difference

A Test of Homogeneity Against Umbrella Scale Alternative Based on Gini s Mean Difference J. Stat. Appl. Pro. 2, No. 2, 145-154 (2013) 145 Journal of Statistics Applications & Probability An International Journal http://dx.doi.org/10.12785/jsap/020207 A Test of Homogeneity Against Umbrella

More information

Statistics for Managers Using Microsoft Excel

Statistics for Managers Using Microsoft Excel Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Statistics for Managers Using Microsoft Excel 7e Copyright 014 Pearson Education, Inc. Chap

More information

Group comparison test for independent samples

Group comparison test for independent samples Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

On the Impossibility of Certain Ranking Functions

On the Impossibility of Certain Ranking Functions On the Impossibility of Certain Ranking Functions Jin-Yi Cai Abstract Suppose all the individuals in a field are linearly ordered. Groups of individuals form teams. Is there a perfect ranking function

More information

Chapter 12: Inference about One Population

Chapter 12: Inference about One Population Chapter 1: Inference about One Population 1.1 Introduction In this chapter, we presented the statistical inference methods used when the problem objective is to describe a single population. Sections 1.

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that? Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Chapter 9 Hypothesis Testing: Single Population Ch. 9-1 9.1 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population

More information

Scales of Measuement Dr. Sudip Chaudhuri

Scales of Measuement Dr. Sudip Chaudhuri Scales of Measuement Dr. Sudip Chaudhuri M. Sc., M. Tech., Ph.D., M. Ed. Assistant Professor, G.C.B.T. College, Habra, India, Honorary Researcher, Saha Institute of Nuclear Physics, Life Member, Indian

More information

GMM Estimation of a Maximum Entropy Distribution with Interval Data

GMM Estimation of a Maximum Entropy Distribution with Interval Data GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff January, 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them.

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them. Chapter 2 Statistics In the present chapter, I will briefly review some statistical distributions that are used often in this book. I will also discuss some statistical techniques that are important in

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Basics of Uncertainty Analysis

Basics of Uncertainty Analysis Basics of Uncertainty Analysis Chapter Six Basics of Uncertainty Analysis 6.1 Introduction As shown in Fig. 6.1, analysis models are used to predict the performances or behaviors of a product under design.

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

More on Roy Model of Self-Selection

More on Roy Model of Self-Selection V. J. Hotz Rev. May 26, 2007 More on Roy Model of Self-Selection Results drawn on Heckman and Sedlacek JPE, 1985 and Heckman and Honoré, Econometrica, 1986. Two-sector model in which: Agents are income

More information

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text) Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text) 1. A quick and easy indicator of dispersion is a. Arithmetic mean b. Variance c. Standard deviation

More information

Common Knowledge and Sequential Team Problems

Common Knowledge and Sequential Team Problems Common Knowledge and Sequential Team Problems Authors: Ashutosh Nayyar and Demosthenis Teneketzis Computer Engineering Technical Report Number CENG-2018-02 Ming Hsieh Department of Electrical Engineering

More information

STAT Section 2.1: Basic Inference. Basic Definitions

STAT Section 2.1: Basic Inference. Basic Definitions STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included

More information

The Measurement of Inequality, Concentration, and Diversification.

The Measurement of Inequality, Concentration, and Diversification. San Jose State University From the SelectedWorks of Fred E. Foldvary 2001 The Measurement of Inequality, Concentration, and Diversification. Fred E Foldvary, Santa Clara University Available at: https://works.bepress.com/fred_foldvary/29/

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1 4 Hypothesis testing 4. Simple hypotheses A computer tries to distinguish between two sources of signals. Both sources emit independent signals with normally distributed intensity, the signals of the first

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Lecture 25. STAT 225 Introduction to Probability Models April 16, Whitney Huang Purdue University. Agenda. Notes. Notes.

Lecture 25. STAT 225 Introduction to Probability Models April 16, Whitney Huang Purdue University. Agenda. Notes. Notes. Lecture 25 STAT 225 Introduction to Probability Models April 16, 2104 Whitney Huang Purdue University 25.1 Agenda 1 2 3 25.2 Probability vs. Statistics Figure : Taken from JHU Statistical Computing by

More information

Workshop Research Methods and Statistical Analysis

Workshop Research Methods and Statistical Analysis Workshop Research Methods and Statistical Analysis Session 2 Data Analysis Sandra Poeschl 08.04.2013 Page 1 Research process Research Question State of Research / Theoretical Background Design Data Collection

More information

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA 1 Design of Experiments in Semiconductor Manufacturing Comparison of Treatments which recipe works the best? Simple Factorial Experiments to explore impact of few variables Fractional Factorial Experiments

More information

Stochastic Dominance in Polarization Work in progress. Please do not quote

Stochastic Dominance in Polarization Work in progress. Please do not quote Stochastic Dominance in Polarization Work in progress. Please do not quote Andre-Marie TAPTUE 17 juillet 2013 Departement D Économique and CIRPÉE, Université Laval, Canada. email: andre-marie.taptue.1@ulaval.ca

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

Hypothesis Testing One Sample Tests

Hypothesis Testing One Sample Tests STATISTICS Lecture no. 13 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 12. 1. 2010 Tests on Mean of a Normal distribution Tests on Variance of a Normal

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering

More information

POLI 443 Applied Political Research

POLI 443 Applied Political Research POLI 443 Applied Political Research Session 6: Tests of Hypotheses Contingency Analysis Lecturer: Prof. A. Essuman-Johnson, Dept. of Political Science Contact Information: aessuman-johnson@ug.edu.gh College

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F. Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks

More information

11-2 Multinomial Experiment

11-2 Multinomial Experiment Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:

More information

Chapter 5 Confidence Intervals

Chapter 5 Confidence Intervals Chapter 5 Confidence Intervals Confidence Intervals about a Population Mean, σ, Known Abbas Motamedi Tennessee Tech University A point estimate: a single number, calculated from a set of data, that is

More information

10: Crosstabs & Independent Proportions

10: Crosstabs & Independent Proportions 10: Crosstabs & Independent Proportions p. 10.1 P Background < Two independent groups < Binary outcome < Compare binomial proportions P Illustrative example ( oswege.sav ) < Food poisoning following church

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

MONTE CARLO ANALYSIS OF CHANGE POINT ESTIMATORS

MONTE CARLO ANALYSIS OF CHANGE POINT ESTIMATORS MONTE CARLO ANALYSIS OF CHANGE POINT ESTIMATORS Gregory GUREVICH PhD, Industrial Engineering and Management Department, SCE - Shamoon College Engineering, Beer-Sheva, Israel E-mail: gregoryg@sce.ac.il

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

A SHORT INTRODUCTION TO PROBABILITY

A SHORT INTRODUCTION TO PROBABILITY A Lecture for B.Sc. 2 nd Semester, Statistics (General) A SHORT INTRODUCTION TO PROBABILITY By Dr. Ajit Goswami Dept. of Statistics MDKG College, Dibrugarh 19-Apr-18 1 Terminology The possible outcomes

More information

Notes. AS Examinations are in blue A Level Examinations are in red Other examinations are in green

Notes. AS Examinations are in blue A Level Examinations are in red Other examinations are in green Notes AS Examinations are in blue A Level Examinations are in red Other examinations are in green This is the first version of your External Examination Timetable for next summer. Once entries have been

More information

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables International Journal of Statistics and Probability; Vol. 7 No. 3; May 208 ISSN 927-7032 E-ISSN 927-7040 Published by Canadian Center of Science and Education Decomposition of Parsimonious Independence

More information

WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin

WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin Quantitative methods II WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin The first factor analysis C. Spearman (1904). General intelligence, objectively determined and measured. The American Journal

More information