Data Collection and Statistical Inference

Size: px
Start display at page:

Download "Data Collection and Statistical Inference"

Transcription

1 MWO Lecture 2 Artificial Intelligence Laboratory Vrije Universiteit Brussel katrien kevin@arti.vub.ac.be October 8, 2010

2 The Research Process Empirical Data

3 Data gathering Empirical Data Data Sampling Generating Hypotheses Variables Distributions Already a source of many mistakes (conscious or unconscious) Examples A railway company investigating the temporal accuracy of the trains The government investigating the happiness of the people A questionnaire about hygiene In all of these cases, the results are likely to be biased Be aware of possible biases, and report them together with your statistics!

4 Methods of data gathering Data Sampling Generating Hypotheses Variables Distributions Random sampling: Each case has an equal chance to become part of the sample. Needed: a well-defined population, a list of all cases and a random number generator. Systematic sampling: The first case is picked randomly, the rest according to a specific procedure (e.g. Start at random role number, increment with 10 after that.). Possibly introduces a bias. (Example?)

5 How much data is required? Data Sampling Generating Hypotheses Variables Distributions The more, the better! In practice, research is of course limited by money, time and space. The amount of data sometimes depends on the distribution of data points (e.g. some may be very rare but still have a major influence). This could require iterated sampling. Always report the number of samples and how they were obtained. If not, you could just as well be showing that the probability of throwing a 6 when rolling a dice is 100%.

6 )*+",-./-0-1$/1+"20"3+4+$%1* Empirical Data Data Sampling Generating Hypotheses Variables Distributions $%"&'('

7 Importance of Hypotheses Data Sampling Generating Hypotheses Variables Distributions Science and engineering proceed by the formulation of hypotheses and the provision of supporting (or refuting) evidence for them. Informatics should be no exception. But the provision of explicit hypotheses in Informatics is rare! This causes lots of problems: Usually many possible hypotheses Ambiguity is a major cause of referee/reader misunderstanding Vagueness is a major cause of poor methodology (inconclusive evidence, unfocussed research direction)

8 Evaluation begins with claims Data Sampling Generating Hypotheses Variables Distributions Hypotheses in Informatics can be: Claims about a task, system, technique or parameter, e.g.: System X performs better than System Y on dimension Z Technique X has property Y X is the optimal setting of parameter Y Properties and relations along scientific, engineering or cognitive science dimensions.

9 Data Sampling Generating Hypotheses Variables Distributions Scientific Hypotheses For the first claim, relevant hypotheses would be: Experimental Hypothesis (H 1 ): The mean of the ratings for the new system is higher than the mean of the ratings for the baseline system. Null Hypothesis (H 0 ): There is no difference in the mean of the ratings for the new system and the mean of the ratings of the baseline system.

10 Data Sampling Generating Hypotheses Variables Distributions Variables The data of an experiment is a set of observations that is characterised by one or more properties that are extracted as variables: Independent variable: A variable that indicates something you manipulate in an experiment, or some supposedly causal factor that you can t manipulate such as Corpus and System in the sentence compression experiment. Dependent variable: A variable that indicates to greater or lesser degree the causal effects of the factors represented by the independent variables. Examples for sentence compression are compression rate (percentage of words removed) and sentence ratings (1-5).

11 Levels of Measurement Empirical Data Data Sampling Generating Hypotheses Variables Distributions Variables can be split into categorical and continuous, and within these types there are different levels of measurement: Categorical (entities that are divided into distinct categories) Binary variable: There are only two categories Nominal variable: There are more than two categories Ordinal variable: The same as a nominal variable but the categories have a logical order Continuous (entities get a distinct score) Interval variable: Equal intervals on the variable represent equal differences in the property being measured Ratio variable: The same as an interval variable, but the ratios of scores on the scale must also make sense

12 Data as Distributions Empirical Data Data Sampling Generating Hypotheses Variables Distributions fruit - Answer at a glance...\ A distribution depicts pictures the of frequency of x value each value of a measured variable: Tally FREQUENCY //// 5 //// 4 //// // 7 //// 4 tribution of fruit preferences le na ge ar mbers of choosers = frequency B. Single snapshot: DISTRIBUTION = frequency Distribution of fruit preferences Pear Orange Banana Apple type of fruit

13 Data Sampling Generating Hypotheses Variables Distributions Probability Distributions Statistics is used to analyse experimental results. Probability Theory is a mathematical abstraction. To use it (i.e. apply it) you need an interpretation of how real world concepts relate to mathematics. Probability as a degree of belief. P = 1 is certainty; P = 0 is impossibility. We write P(X = x) for the probability distribution (density or mass) that random variable X takes value x.

14 Central Tendency Dispersion Symmetry What do we need to describe about a distribution? Where is it on the scale axis = central tendency What kind of shape does it have? Does it spread out or bunch up? = dispersion Is it symmetrical? = symmetry

15 Central Tendency Dispersion Symmetry Representative values Useful Terms Mean: The average value. x = 1 N xi Median: The value which splits a15 sorted distribution in half. The 50th quantile of themean: distribution Quantile: A cut point Median: q that 8 divides the distribution into pieces of size q/ and (q/100) Examples: th quantile Mean: 93.1 cuts the distribution in half. 25th quantile cuts off the lower quartile. 75th quantile Median: cuts 11 off the upper quartile. Median: The value which splits a sorted distribution in half. The 50th quantile of the distribution. Quantile: A "cut point" q that divides the distribution into pieces of size q/100 and 1- (q/100). Examples: 50th quantile cuts the distribution in half. 25th quantile cuts off the lower quartile. 75th quantile cuts off the upper quartile. Mode: Most frequent value.

16 M3 Empirical Data Central Tendency Dispersion Symmetry Reporting a statistic

17 Central Tendency Dispersion Symmetry Variables and their CT measures Scale Category /nominal consistent coding labels order Intervals 0-point Y N N N Ordinal Y Y N N Interval Y Y Y N Ratio Y Y Y Y permitted operations counting frequencies counting frequencies ranking counting frequencies ranking +! counting frequencies ranking +! " #, etc examples favourite fruit part of speech native language ok v * degree class letter grade beg-interm-adv ok v * v ** skirt length from knee shoe size skirt length from waist % correct height in in/cm permitted measures of central tendency mode Mode, median Mode, median, mean, Mode, median, mean MATISYAHU:TEACHING:SED:08-09:sed5-08.docLast printed 7/10/08 3:02P age 3 of 13

18 Central Tendency Dispersion Symmetry Measures of Dispersion Some measures of dispersion should always accompany representative values. A good measure of dispersion should: take into account all data points; describe the average deviation of data points with respect to the mean; increase when data heterogeneity increases. Examples: Range = max(x) min(x) Deviation = difference of a score from the sample mean: (x i x) Variance = average of squared deviations from the mean: V = 1 N (xi x) 2 (used when scale is no issue) Standard Deviation: S = V = 1 N (xi x) 2

19 Central Tendency Dispersion Symmetry Symmetry versus Skew Symmetrical distributions have 1. Symmetry v skew Inherent order (need ordinal scale or better) a. Symmetrical distributions have i. Same Inherent volume order - so either need ordinal side scale of their or better point of balance Skewed distributions are asymmetrical b. Skewed Negative distributions skew: are pulled asymmetrical out towards low values i. Negative skew: pulled out towards low values Positive pulled out towards high values C. MEASURES OF SHAPE ii. Same volume either side of their point of balance (to R and L of the red line) ii. Positive skew: pulled out towards high values (a) (b) (c) c. Relationship to measures of central tendency: i. Where mean and mode are appropriate, positively skewed distributions often have mean > median

20 Central Tendency Dispersion Symmetry Relationship to measures of central tendency Where mean and mode are appropriate, positively skewed distributions often have mean > median. Where mean and mode are appropriate, positively skewed distributions often haven mean < median.

21 Central Tendency Dispersion Symmetry Symptoms of bad methodology Where there is a minimum or maximum score and distribution is pushed up against it. Ceiling effect (negative skew) (e.g. topmost score) Floor effect (positive skew) (e.g. fastest possible reading time) Where there are a few outliers = cases separated from bulk of cases and from central tendency. If you don t examine the distribution of results in such studies, you may be drawing incorrect conclusions from your results. E.g. An outlier affects the mean disproportionally, for example, the college with the highest mean salary for its graduates. (Use standard deviation!)

22 Using statistics to test hypotheses Significance testing Examples Using statistics to test hypotheses Hypothesis: A dice is crooked I roll it twice, 6 shows up both times Hypothesis: Using Microsoft Windows makes people angry A friend of mine is using Windows and he s always complaining to me about how unstable his computer is Hypothesis: Using Microsoft Windows makes people angry I ask 312 VUB students to fill out a questionnaire after using the computer lab, stating which operation system they used and whether they felt happy or angry when leaving the lab. Operating system usage is roughly the same, but while only 12% of Linux and MacOS users felt angry, 37% of Windows users did.

23 Using statistics to test hypotheses Significance testing Examples Using statistics to test hypotheses Hypothesis: Using Microsoft Windows makes people angry I run a large scale evaluation with participants who have to perform standardised tasks on different operating systems. The participants are evenly distributed across all ages, half of them are male and half of them female. Right before and right after working at the computer for 30 minutes they undergo a standardised psychological test to evaluate their aggressiveness before and after the task. I end up with ordinal results for each participant stating whether they became less (<) or more aggressive (>), or whether their aggressiveness level stayed approximately the same (=). The percentage of Windows users with a > result seems disproportionately higher than in the other operating system groups.

24 Using statistics to test hypotheses Significance testing Examples The plural of anecdote isn t data The more data, the better Hypothesis: A coin is biased towards heads N Heads Tails Higher N is closer in size to an infinite population Remember: we want to make a general claim But: We want to have a measure on how much data we need

25 Using statistics to test hypotheses Significance testing Examples Significance testing We cannot simply compare descriptive parameters, but entire distributions of data Null Hypothesis Significance Testing H0 comes from the sampling distribution: we are seeing only variations we would expect to find by chance when sampling the population H 1 is defined against that distribution: the result is very unlikely to belong to the distribution of chance outcomes according to H 0 In order to have a standard case to compare against (H 0 ), we need a model of the data For the coin: p(heads) = p(tails) = 0.5 For the dice: p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1 To test the effectiveness of a medical treatment: the mortality rate of untreated patients 6

26 Using statistics to test hypotheses Significance testing Examples Significance level Statistical significance tests give a probability p p = the probability that the given data set is simply sampled from the normal H 0 distribution and that any variation from it can be accounted for by the deviation which we would expect for the sample size N If we want to support our H 1, we want this p to be low Typical significance levels α =.05,.01,.001,.0001 (* ** ***) A low p does not mean that H 1 is proven, only that H 0 doesn t account well for the observed data Say we run and publish 20 experiments where the result is p <.05. What does this mean?

27 Using statistics to test hypotheses Significance testing Examples Types of error in statistical hypothesis testing Type I error (false positive): reject the null hypothesis when it is actually true Type II error (false negative): accept the null hypothesis when it is actually false Since we want to be conservative with the claims we make based on our experiments, we want to keep the Type I error rate α low. The lower we set our mandatory significance level α, the higher our Type II error rate β gets - we increase the chance that we reject our H 1 when it is actually true

28 Using statistics to test hypotheses Significance testing Examples The simplest case with two outcomes: Binomial test The math is easy so we can use the Binomial distribution to get an exact result For the coin example, use B(N, 1 2 ) to run a one-tailed test How likely is it that we got H or even more Heads assuming the coin is not biased? N Heads Tails p(b(n, 1 2 ) H) Remember: None of this means that H 1 is proven!

29 Using statistics to test hypotheses Significance testing Examples Multiple nominal outcomes: χ 2 -test For more than two possible outcomes and large sample sizes we can t afford to run an exact test but have to use an approximation (e.g. Pearson s χ 2 -test) χ 2 = n (O i E i ) 2 E i i=1 O i = observed frequency E i = expected frequency n = number of possible outcomes (bins)

30 Using statistics to test hypotheses Significance testing Examples Pearson s χ 2 -test χ 2 = n (O i E i ) 2 E i i=1 Additional conditions for Pearson s χ 2 -test: unrelated design: different individuals in different bins - otherwise you would have to use a multinomial test - why? bins mutually exclusive if the sample size or the expected frequencies for each bin are too low, the approximation of Pearson s χ 2 -test to a real χ 2 distribution is not reliable. In these cases Fisher s exact test can be used instead. Can also be used to check whether 2 observed sample sets are likely to come from the same distribution!

31 Using statistics to test hypotheses Significance testing Examples Directionality of hypotheses A hypothesis can be directed/one-tailed: there is a bias in one specific direction - we are only interested in the probability of that one tail of possible outcomes undirected/two-tailed: there is some bias - unlikely high as well as unlikely low outcomes confirm our hypothesis Some tests (like χ 2 ) can only be used for two-tailed tests. Why? (Hint: only when n = 2 can you inquire about the directionality of the bias)

32 Using statistics to test hypotheses Significance testing Examples Other univariate significance tests For ordinal data: Wilcoxon, Mann-Whitney, Friedman, Kruskal-Wallis, Cohen s Kappa,... For normally distributed interval data: z-test (simply the continuous version of the Binomial distribution/test) For normally distributed interval data of which the underlying distribution of H 0 is not known: t-test For experimental designs involving more than 2 conditions: ANOVA (Analysis of Variance)

33 Using statistics to test hypotheses Significance testing Examples Multivariate statistics So far we have only looked at univariate statistics: only a single dependent variable What about correlations between multiple dependent variables measured in the same experiment? Non-parametric (ordinal): Spearman Rank Order Correlation Parametric (interval/ratio): Pearson Product-Moment Correlation Joint probability distributions - covariance Correlation will not tell you the direction of a potential causal relationship Correlation causation! There is a strong negative correlation between the number of mules and number of PhDs among American states

34 Using statistics to test hypotheses Significance testing Examples Linear regression Simple linear regression: treating Y as a function of X Multiple linear regression: treating Y as a function of any number of Xs Different from multivariate statistics: We are investigating the conditional rather than the joint probability distributions Outcome Linear model: Y = β1 X 1 + β 2 X β n X n + ɛ Quantitative measure of the strength (significance) of the relationship between Y and every X i Just like with correlation coefficients only linear relationships can be detected

35 Using statistics to test hypotheses Significance testing Examples Statistical significance tests in practice Wide-spread in psychology and medicine Controlled experiment setups (often geared towards suiting a particular test... ) Software: SPSS, MatLab, R,... Sometimes tests are run on data that they aren t actually made for... Interval+ratio data can be binned to run ordinal+nominal tests on (e.g. χ 2 )... The conditions for many tests aren t strictly adhered to and there are a number of established corrections that have been proven to work in practice (e.g. Yates Correction for χ 2 with low expected frequencies,... ) Many experimental setups (e.g. complex interactions in multi-agent systems) are hard to capture with statistical tests

36 Using statistics to test hypotheses Significance testing Examples Final thoughts on statistical significance tests Not all correlations are interesting, relevant or important You shouldn t run random or exhaustive tests Testing should be motivated by your theory and hypotheses Results should be analysed and interpreted in terms of your theory (and beyond - keep thinking!) Statistical parameters don t capture everything Eyeball your data closely before running tests, it can give you important clues on what you actually want to look for A picture is worth a thousand words

37 Using statistics to test hypotheses Significance testing Examples Exercise Running and reporting some simple significance tests using R You can download it from A manual can be found here: co.uk/education/lectures/r/basics.htm.data files available from Easily loadable into R via scan() and read.csv() Identify appropriate tests and run them Report the reasons for your choice of tests, the commands you ran and the results in a report (max. 1 page)

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc. Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter

More information

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

Correlation and regression

Correlation and regression NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

psychological statistics

psychological statistics psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Non-parametric methods

Non-parametric methods Eastern Mediterranean University Faculty of Medicine Biostatistics course Non-parametric methods March 4&7, 2016 Instructor: Dr. Nimet İlke Akçay (ilke.cetin@emu.edu.tr) Learning Objectives 1. Distinguish

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

Do not copy, post, or distribute. Independent-Samples t Test and Mann- C h a p t e r 13

Do not copy, post, or distribute. Independent-Samples t Test and Mann- C h a p t e r 13 C h a p t e r 13 Independent-Samples t Test and Mann- Whitney U Test 13.1 Introduction and Objectives This chapter continues the theme of hypothesis testing as an inferential statistical procedure. In

More information

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics Originality in the Arts and Sciences: Lecture 2: Probability and Statistics Let s face it. Statistics has a really bad reputation. Why? 1. It is boring. 2. It doesn t make a lot of sense. Actually, the

More information

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh Introduction to inferential statistics Alissa Melinger IGK summer school 2006 Edinburgh Short description Prereqs: I assume no prior knowledge of stats This half day tutorial on statistical analysis will

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Module 9: Nonparametric Statistics Statistics (OA3102)

Module 9: Nonparametric Statistics Statistics (OA3102) Module 9: Nonparametric Statistics Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 15.1-15.6 Revision: 3-12 1 Goals for this Lecture

More information

Non-parametric tests, part A:

Non-parametric tests, part A: Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are

More information

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com)

NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) 1. GENERAL PURPOSE 1.1 Brief review of the idea of significance testing To understand the idea of non-parametric statistics (the term non-parametric

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Background to Statistics

Background to Statistics FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient

More information

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Nonparametric statistic methods Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Measurement What are the 4 levels of measurement discussed? 1. Nominal or Classificatory Scale Gender,

More information

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Fourier and Stats / Astro Stats and Measurement : Stats Notes Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Basic Statistical Analysis

Basic Statistical Analysis indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition Overview INFOWO Statistics lecture S1: Descriptive statistics Peter de Waal Introduction to statistics Descriptive statistics Department of Information and Computing Sciences Faculty of Science, Universiteit

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)

More information

BNG 495 Capstone Design. Descriptive Statistics

BNG 495 Capstone Design. Descriptive Statistics BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01 An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there

More information

Ch. 16: Correlation and Regression

Ch. 16: Correlation and Regression Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to

More information

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

Table of Contents. Advanced Statistics. Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen

Table of Contents. Advanced Statistics. Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Advanced Statistics Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Table of Contents 1. Statistical inference... 2 1.1 Population and sampling... 2 2. Data organization... 4 2.1 Variable s

More information

Introduction to Basic Statistics Version 2

Introduction to Basic Statistics Version 2 Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

CS 124 Math Review Section January 29, 2018

CS 124 Math Review Section January 29, 2018 CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to

More information

Dynamics in Social Networks and Causality

Dynamics in Social Networks and Causality Web Science & Technologies University of Koblenz Landau, Germany Dynamics in Social Networks and Causality JProf. Dr. University Koblenz Landau GESIS Leibniz Institute for the Social Sciences Last Time:

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

Math 221, REVIEW, Instructor: Susan Sun Nunamaker

Math 221, REVIEW, Instructor: Susan Sun Nunamaker Math 221, REVIEW, Instructor: Susan Sun Nunamaker Good Luck & Contact me through through e-mail if you have any questions. 1. Bar graphs can only be vertical. a. true b. false 2.

More information

Math 10 - Compilation of Sample Exam Questions + Answers

Math 10 - Compilation of Sample Exam Questions + Answers Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research

Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research HOW IT WORKS For the M.Sc. Early Childhood Research, sufficient knowledge in methods and statistics is one

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

Physics 509: Non-Parametric Statistics and Correlation Testing

Physics 509: Non-Parametric Statistics and Correlation Testing Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests

More information

THE SAMPLING DISTRIBUTION OF THE MEAN

THE SAMPLING DISTRIBUTION OF THE MEAN THE SAMPLING DISTRIBUTION OF THE MEAN COGS 14B JANUARY 26, 2017 TODAY Sampling Distributions Sampling Distribution of the Mean Central Limit Theorem INFERENTIAL STATISTICS Inferential statistics: allows

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67 Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I. Statistiek I t-tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://www.let.rug.nl/nerbonne/teach/statistiek-i/ John Nerbonne 1/46 Overview 1 Basics on t-tests 2 Independent Sample t-tests 3 Single-Sample

More information

Sampling Distributions

Sampling Distributions Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Remember sampling? Sampling Part 1 of definition Selecting a subset of the population to create a sample Generally random sampling

More information

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis Rebecca Barter April 6, 2015 Multiple Testing Multiple Testing Recall that when we were doing two sample t-tests, we were testing the equality

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V

KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION Unit : I - V Unit I: Syllabus Probability and its types Theorems on Probability Law Decision Theory Decision Environment Decision Process Decision tree

More information

Statistics Handbook. All statistical tables were computed by the author.

Statistics Handbook. All statistical tables were computed by the author. Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. It is Time for Homework! ( ω `) First homework + data will be posted on the website, under the homework tab. And

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.

Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D. Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D. The number of cells in various stages of mitosis in your treatment and control onions are your raw

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

Exam 2 Practice Questions, 18.05, Spring 2014

Exam 2 Practice Questions, 18.05, Spring 2014 Exam 2 Practice Questions, 18.05, Spring 2014 Note: This is a set of practice problems for exam 2. The actual exam will be much shorter. Within each section we ve arranged the problems roughly in order

More information

Psych 230. Psychological Measurement and Statistics

Psych 230. Psychological Measurement and Statistics Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State

More information

Do students sleep the recommended 8 hours a night on average?

Do students sleep the recommended 8 hours a night on average? BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

Readings Howitt & Cramer (2014) Overview

Readings Howitt & Cramer (2014) Overview Readings Howitt & Cramer (4) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch : Statistical significance

More information

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that? Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Dealing with the assumption of independence between samples - introducing the paired design.

Dealing with the assumption of independence between samples - introducing the paired design. Dealing with the assumption of independence between samples - introducing the paired design. a) Suppose you deliberately collect one sample and measure something. Then you collect another sample in such

More information

Readings Howitt & Cramer (2014)

Readings Howitt & Cramer (2014) Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

QUANTITATIVE TECHNIQUES

QUANTITATIVE TECHNIQUES UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION (For B Com. IV Semester & BBA III Semester) COMPLEMENTARY COURSE QUANTITATIVE TECHNIQUES QUESTION BANK 1. The techniques which provide the decision maker

More information

Unit 4 Probability. Dr Mahmoud Alhussami

Unit 4 Probability. Dr Mahmoud Alhussami Unit 4 Probability Dr Mahmoud Alhussami Probability Probability theory developed from the study of games of chance like dice and cards. A process like flipping a coin, rolling a die or drawing a card from

More information

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score Interpret Standard Deviation Outlier Rule Linear Transformations Describe the Distribution OR Compare the Distributions SOCS Using Normalcdf and Invnorm (Calculator Tips) Interpret a z score What is an

More information

Non-parametric (Distribution-free) approaches p188 CN

Non-parametric (Distribution-free) approaches p188 CN Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations Basics of Experimental Design Review of Statistics And Experimental Design Scientists study relation between variables In the context of experiments these variables are called independent and dependent

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Section 5.4. Ken Ueda

Section 5.4. Ken Ueda Section 5.4 Ken Ueda Students seem to think that being graded on a curve is a positive thing. I took lasers 101 at Cornell and got a 92 on the exam. The average was a 93. I ended up with a C on the test.

More information

Time: 1 hour 30 minutes

Time: 1 hour 30 minutes Paper Reference(s) 6684/0 Edexcel GCE Statistics S Silver Level S Time: hour 30 minutes Materials required for examination papers Mathematical Formulae (Green) Items included with question Nil Candidates

More information

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last two weeks: Sample, population and sampling

More information