Data Collection and Statistical Inference
|
|
- Cameron Ryan
- 6 years ago
- Views:
Transcription
1 MWO Lecture 2 Artificial Intelligence Laboratory Vrije Universiteit Brussel katrien kevin@arti.vub.ac.be October 8, 2010
2 The Research Process Empirical Data
3 Data gathering Empirical Data Data Sampling Generating Hypotheses Variables Distributions Already a source of many mistakes (conscious or unconscious) Examples A railway company investigating the temporal accuracy of the trains The government investigating the happiness of the people A questionnaire about hygiene In all of these cases, the results are likely to be biased Be aware of possible biases, and report them together with your statistics!
4 Methods of data gathering Data Sampling Generating Hypotheses Variables Distributions Random sampling: Each case has an equal chance to become part of the sample. Needed: a well-defined population, a list of all cases and a random number generator. Systematic sampling: The first case is picked randomly, the rest according to a specific procedure (e.g. Start at random role number, increment with 10 after that.). Possibly introduces a bias. (Example?)
5 How much data is required? Data Sampling Generating Hypotheses Variables Distributions The more, the better! In practice, research is of course limited by money, time and space. The amount of data sometimes depends on the distribution of data points (e.g. some may be very rare but still have a major influence). This could require iterated sampling. Always report the number of samples and how they were obtained. If not, you could just as well be showing that the probability of throwing a 6 when rolling a dice is 100%.
6 )*+",-./-0-1$/1+"20"3+4+$%1* Empirical Data Data Sampling Generating Hypotheses Variables Distributions $%"&'('
7 Importance of Hypotheses Data Sampling Generating Hypotheses Variables Distributions Science and engineering proceed by the formulation of hypotheses and the provision of supporting (or refuting) evidence for them. Informatics should be no exception. But the provision of explicit hypotheses in Informatics is rare! This causes lots of problems: Usually many possible hypotheses Ambiguity is a major cause of referee/reader misunderstanding Vagueness is a major cause of poor methodology (inconclusive evidence, unfocussed research direction)
8 Evaluation begins with claims Data Sampling Generating Hypotheses Variables Distributions Hypotheses in Informatics can be: Claims about a task, system, technique or parameter, e.g.: System X performs better than System Y on dimension Z Technique X has property Y X is the optimal setting of parameter Y Properties and relations along scientific, engineering or cognitive science dimensions.
9 Data Sampling Generating Hypotheses Variables Distributions Scientific Hypotheses For the first claim, relevant hypotheses would be: Experimental Hypothesis (H 1 ): The mean of the ratings for the new system is higher than the mean of the ratings for the baseline system. Null Hypothesis (H 0 ): There is no difference in the mean of the ratings for the new system and the mean of the ratings of the baseline system.
10 Data Sampling Generating Hypotheses Variables Distributions Variables The data of an experiment is a set of observations that is characterised by one or more properties that are extracted as variables: Independent variable: A variable that indicates something you manipulate in an experiment, or some supposedly causal factor that you can t manipulate such as Corpus and System in the sentence compression experiment. Dependent variable: A variable that indicates to greater or lesser degree the causal effects of the factors represented by the independent variables. Examples for sentence compression are compression rate (percentage of words removed) and sentence ratings (1-5).
11 Levels of Measurement Empirical Data Data Sampling Generating Hypotheses Variables Distributions Variables can be split into categorical and continuous, and within these types there are different levels of measurement: Categorical (entities that are divided into distinct categories) Binary variable: There are only two categories Nominal variable: There are more than two categories Ordinal variable: The same as a nominal variable but the categories have a logical order Continuous (entities get a distinct score) Interval variable: Equal intervals on the variable represent equal differences in the property being measured Ratio variable: The same as an interval variable, but the ratios of scores on the scale must also make sense
12 Data as Distributions Empirical Data Data Sampling Generating Hypotheses Variables Distributions fruit - Answer at a glance...\ A distribution depicts pictures the of frequency of x value each value of a measured variable: Tally FREQUENCY //// 5 //// 4 //// // 7 //// 4 tribution of fruit preferences le na ge ar mbers of choosers = frequency B. Single snapshot: DISTRIBUTION = frequency Distribution of fruit preferences Pear Orange Banana Apple type of fruit
13 Data Sampling Generating Hypotheses Variables Distributions Probability Distributions Statistics is used to analyse experimental results. Probability Theory is a mathematical abstraction. To use it (i.e. apply it) you need an interpretation of how real world concepts relate to mathematics. Probability as a degree of belief. P = 1 is certainty; P = 0 is impossibility. We write P(X = x) for the probability distribution (density or mass) that random variable X takes value x.
14 Central Tendency Dispersion Symmetry What do we need to describe about a distribution? Where is it on the scale axis = central tendency What kind of shape does it have? Does it spread out or bunch up? = dispersion Is it symmetrical? = symmetry
15 Central Tendency Dispersion Symmetry Representative values Useful Terms Mean: The average value. x = 1 N xi Median: The value which splits a15 sorted distribution in half. The 50th quantile of themean: distribution Quantile: A cut point Median: q that 8 divides the distribution into pieces of size q/ and (q/100) Examples: th quantile Mean: 93.1 cuts the distribution in half. 25th quantile cuts off the lower quartile. 75th quantile Median: cuts 11 off the upper quartile. Median: The value which splits a sorted distribution in half. The 50th quantile of the distribution. Quantile: A "cut point" q that divides the distribution into pieces of size q/100 and 1- (q/100). Examples: 50th quantile cuts the distribution in half. 25th quantile cuts off the lower quartile. 75th quantile cuts off the upper quartile. Mode: Most frequent value.
16 M3 Empirical Data Central Tendency Dispersion Symmetry Reporting a statistic
17 Central Tendency Dispersion Symmetry Variables and their CT measures Scale Category /nominal consistent coding labels order Intervals 0-point Y N N N Ordinal Y Y N N Interval Y Y Y N Ratio Y Y Y Y permitted operations counting frequencies counting frequencies ranking counting frequencies ranking +! counting frequencies ranking +! " #, etc examples favourite fruit part of speech native language ok v * degree class letter grade beg-interm-adv ok v * v ** skirt length from knee shoe size skirt length from waist % correct height in in/cm permitted measures of central tendency mode Mode, median Mode, median, mean, Mode, median, mean MATISYAHU:TEACHING:SED:08-09:sed5-08.docLast printed 7/10/08 3:02P age 3 of 13
18 Central Tendency Dispersion Symmetry Measures of Dispersion Some measures of dispersion should always accompany representative values. A good measure of dispersion should: take into account all data points; describe the average deviation of data points with respect to the mean; increase when data heterogeneity increases. Examples: Range = max(x) min(x) Deviation = difference of a score from the sample mean: (x i x) Variance = average of squared deviations from the mean: V = 1 N (xi x) 2 (used when scale is no issue) Standard Deviation: S = V = 1 N (xi x) 2
19 Central Tendency Dispersion Symmetry Symmetry versus Skew Symmetrical distributions have 1. Symmetry v skew Inherent order (need ordinal scale or better) a. Symmetrical distributions have i. Same Inherent volume order - so either need ordinal side scale of their or better point of balance Skewed distributions are asymmetrical b. Skewed Negative distributions skew: are pulled asymmetrical out towards low values i. Negative skew: pulled out towards low values Positive pulled out towards high values C. MEASURES OF SHAPE ii. Same volume either side of their point of balance (to R and L of the red line) ii. Positive skew: pulled out towards high values (a) (b) (c) c. Relationship to measures of central tendency: i. Where mean and mode are appropriate, positively skewed distributions often have mean > median
20 Central Tendency Dispersion Symmetry Relationship to measures of central tendency Where mean and mode are appropriate, positively skewed distributions often have mean > median. Where mean and mode are appropriate, positively skewed distributions often haven mean < median.
21 Central Tendency Dispersion Symmetry Symptoms of bad methodology Where there is a minimum or maximum score and distribution is pushed up against it. Ceiling effect (negative skew) (e.g. topmost score) Floor effect (positive skew) (e.g. fastest possible reading time) Where there are a few outliers = cases separated from bulk of cases and from central tendency. If you don t examine the distribution of results in such studies, you may be drawing incorrect conclusions from your results. E.g. An outlier affects the mean disproportionally, for example, the college with the highest mean salary for its graduates. (Use standard deviation!)
22 Using statistics to test hypotheses Significance testing Examples Using statistics to test hypotheses Hypothesis: A dice is crooked I roll it twice, 6 shows up both times Hypothesis: Using Microsoft Windows makes people angry A friend of mine is using Windows and he s always complaining to me about how unstable his computer is Hypothesis: Using Microsoft Windows makes people angry I ask 312 VUB students to fill out a questionnaire after using the computer lab, stating which operation system they used and whether they felt happy or angry when leaving the lab. Operating system usage is roughly the same, but while only 12% of Linux and MacOS users felt angry, 37% of Windows users did.
23 Using statistics to test hypotheses Significance testing Examples Using statistics to test hypotheses Hypothesis: Using Microsoft Windows makes people angry I run a large scale evaluation with participants who have to perform standardised tasks on different operating systems. The participants are evenly distributed across all ages, half of them are male and half of them female. Right before and right after working at the computer for 30 minutes they undergo a standardised psychological test to evaluate their aggressiveness before and after the task. I end up with ordinal results for each participant stating whether they became less (<) or more aggressive (>), or whether their aggressiveness level stayed approximately the same (=). The percentage of Windows users with a > result seems disproportionately higher than in the other operating system groups.
24 Using statistics to test hypotheses Significance testing Examples The plural of anecdote isn t data The more data, the better Hypothesis: A coin is biased towards heads N Heads Tails Higher N is closer in size to an infinite population Remember: we want to make a general claim But: We want to have a measure on how much data we need
25 Using statistics to test hypotheses Significance testing Examples Significance testing We cannot simply compare descriptive parameters, but entire distributions of data Null Hypothesis Significance Testing H0 comes from the sampling distribution: we are seeing only variations we would expect to find by chance when sampling the population H 1 is defined against that distribution: the result is very unlikely to belong to the distribution of chance outcomes according to H 0 In order to have a standard case to compare against (H 0 ), we need a model of the data For the coin: p(heads) = p(tails) = 0.5 For the dice: p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1 To test the effectiveness of a medical treatment: the mortality rate of untreated patients 6
26 Using statistics to test hypotheses Significance testing Examples Significance level Statistical significance tests give a probability p p = the probability that the given data set is simply sampled from the normal H 0 distribution and that any variation from it can be accounted for by the deviation which we would expect for the sample size N If we want to support our H 1, we want this p to be low Typical significance levels α =.05,.01,.001,.0001 (* ** ***) A low p does not mean that H 1 is proven, only that H 0 doesn t account well for the observed data Say we run and publish 20 experiments where the result is p <.05. What does this mean?
27 Using statistics to test hypotheses Significance testing Examples Types of error in statistical hypothesis testing Type I error (false positive): reject the null hypothesis when it is actually true Type II error (false negative): accept the null hypothesis when it is actually false Since we want to be conservative with the claims we make based on our experiments, we want to keep the Type I error rate α low. The lower we set our mandatory significance level α, the higher our Type II error rate β gets - we increase the chance that we reject our H 1 when it is actually true
28 Using statistics to test hypotheses Significance testing Examples The simplest case with two outcomes: Binomial test The math is easy so we can use the Binomial distribution to get an exact result For the coin example, use B(N, 1 2 ) to run a one-tailed test How likely is it that we got H or even more Heads assuming the coin is not biased? N Heads Tails p(b(n, 1 2 ) H) Remember: None of this means that H 1 is proven!
29 Using statistics to test hypotheses Significance testing Examples Multiple nominal outcomes: χ 2 -test For more than two possible outcomes and large sample sizes we can t afford to run an exact test but have to use an approximation (e.g. Pearson s χ 2 -test) χ 2 = n (O i E i ) 2 E i i=1 O i = observed frequency E i = expected frequency n = number of possible outcomes (bins)
30 Using statistics to test hypotheses Significance testing Examples Pearson s χ 2 -test χ 2 = n (O i E i ) 2 E i i=1 Additional conditions for Pearson s χ 2 -test: unrelated design: different individuals in different bins - otherwise you would have to use a multinomial test - why? bins mutually exclusive if the sample size or the expected frequencies for each bin are too low, the approximation of Pearson s χ 2 -test to a real χ 2 distribution is not reliable. In these cases Fisher s exact test can be used instead. Can also be used to check whether 2 observed sample sets are likely to come from the same distribution!
31 Using statistics to test hypotheses Significance testing Examples Directionality of hypotheses A hypothesis can be directed/one-tailed: there is a bias in one specific direction - we are only interested in the probability of that one tail of possible outcomes undirected/two-tailed: there is some bias - unlikely high as well as unlikely low outcomes confirm our hypothesis Some tests (like χ 2 ) can only be used for two-tailed tests. Why? (Hint: only when n = 2 can you inquire about the directionality of the bias)
32 Using statistics to test hypotheses Significance testing Examples Other univariate significance tests For ordinal data: Wilcoxon, Mann-Whitney, Friedman, Kruskal-Wallis, Cohen s Kappa,... For normally distributed interval data: z-test (simply the continuous version of the Binomial distribution/test) For normally distributed interval data of which the underlying distribution of H 0 is not known: t-test For experimental designs involving more than 2 conditions: ANOVA (Analysis of Variance)
33 Using statistics to test hypotheses Significance testing Examples Multivariate statistics So far we have only looked at univariate statistics: only a single dependent variable What about correlations between multiple dependent variables measured in the same experiment? Non-parametric (ordinal): Spearman Rank Order Correlation Parametric (interval/ratio): Pearson Product-Moment Correlation Joint probability distributions - covariance Correlation will not tell you the direction of a potential causal relationship Correlation causation! There is a strong negative correlation between the number of mules and number of PhDs among American states
34 Using statistics to test hypotheses Significance testing Examples Linear regression Simple linear regression: treating Y as a function of X Multiple linear regression: treating Y as a function of any number of Xs Different from multivariate statistics: We are investigating the conditional rather than the joint probability distributions Outcome Linear model: Y = β1 X 1 + β 2 X β n X n + ɛ Quantitative measure of the strength (significance) of the relationship between Y and every X i Just like with correlation coefficients only linear relationships can be detected
35 Using statistics to test hypotheses Significance testing Examples Statistical significance tests in practice Wide-spread in psychology and medicine Controlled experiment setups (often geared towards suiting a particular test... ) Software: SPSS, MatLab, R,... Sometimes tests are run on data that they aren t actually made for... Interval+ratio data can be binned to run ordinal+nominal tests on (e.g. χ 2 )... The conditions for many tests aren t strictly adhered to and there are a number of established corrections that have been proven to work in practice (e.g. Yates Correction for χ 2 with low expected frequencies,... ) Many experimental setups (e.g. complex interactions in multi-agent systems) are hard to capture with statistical tests
36 Using statistics to test hypotheses Significance testing Examples Final thoughts on statistical significance tests Not all correlations are interesting, relevant or important You shouldn t run random or exhaustive tests Testing should be motivated by your theory and hypotheses Results should be analysed and interpreted in terms of your theory (and beyond - keep thinking!) Statistical parameters don t capture everything Eyeball your data closely before running tests, it can give you important clues on what you actually want to look for A picture is worth a thousand words
37 Using statistics to test hypotheses Significance testing Examples Exercise Running and reporting some simple significance tests using R You can download it from A manual can be found here: co.uk/education/lectures/r/basics.htm.data files available from Easily loadable into R via scan() and read.csv() Identify appropriate tests and run them Report the reasons for your choice of tests, the commands you ran and the results in a report (max. 1 page)
Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationwhere Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.
Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter
More informationCHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC
CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests
More informationStatistics: revision
NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific
More informationTextbook Examples of. SPSS Procedure
Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of
More informationChapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics
Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number
More informationCorrelation and regression
NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationpsychological statistics
psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationNon-parametric methods
Eastern Mediterranean University Faculty of Medicine Biostatistics course Non-parametric methods March 4&7, 2016 Instructor: Dr. Nimet İlke Akçay (ilke.cetin@emu.edu.tr) Learning Objectives 1. Distinguish
More information20 Hypothesis Testing, Part I
20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she
More informationIntroduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.
Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of
More informationDo not copy, post, or distribute. Independent-Samples t Test and Mann- C h a p t e r 13
C h a p t e r 13 Independent-Samples t Test and Mann- Whitney U Test 13.1 Introduction and Objectives This chapter continues the theme of hypothesis testing as an inferential statistical procedure. In
More informationOriginality in the Arts and Sciences: Lecture 2: Probability and Statistics
Originality in the Arts and Sciences: Lecture 2: Probability and Statistics Let s face it. Statistics has a really bad reputation. Why? 1. It is boring. 2. It doesn t make a lot of sense. Actually, the
More informationIntroduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh
Introduction to inferential statistics Alissa Melinger IGK summer school 2006 Edinburgh Short description Prereqs: I assume no prior knowledge of stats This half day tutorial on statistical analysis will
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationModule 9: Nonparametric Statistics Statistics (OA3102)
Module 9: Nonparametric Statistics Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 15.1-15.6 Revision: 3-12 1 Goals for this Lecture
More informationNon-parametric tests, part A:
Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are
More informationNON-PARAMETRIC STATISTICS * (http://www.statsoft.com)
NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) 1. GENERAL PURPOSE 1.1 Brief review of the idea of significance testing To understand the idea of non-parametric statistics (the term non-parametric
More informationDescriptive Statistics-I. Dr Mahmoud Alhussami
Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.
More informationBackground to Statistics
FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient
More informationNonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health
Nonparametric statistic methods Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Measurement What are the 4 levels of measurement discussed? 1. Nominal or Classificatory Scale Gender,
More informationFourier and Stats / Astro Stats and Measurement : Stats Notes
Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationChapter 26: Comparing Counts (Chi Square)
Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces
More informationBasic Statistical Analysis
indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationNonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown
Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding
More informationOverview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition
Overview INFOWO Statistics lecture S1: Descriptive statistics Peter de Waal Introduction to statistics Descriptive statistics Department of Information and Computing Sciences Faculty of Science, Universiteit
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationNonparametric Statistics
Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)
More informationBNG 495 Capstone Design. Descriptive Statistics
BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More informationAn Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01
An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there
More informationCh. 16: Correlation and Regression
Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to
More informationCS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.
Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer
More informationTable of Contents. Advanced Statistics. Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen
Advanced Statistics Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Table of Contents 1. Statistical inference... 2 1.1 Population and sampling... 2 2. Data organization... 4 2.1 Variable s
More informationIntroduction to Basic Statistics Version 2
Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More informationDescriptive Univariate Statistics and Bivariate Correlation
ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to
More informationCS 124 Math Review Section January 29, 2018
CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to
More informationDynamics in Social Networks and Causality
Web Science & Technologies University of Koblenz Landau, Germany Dynamics in Social Networks and Causality JProf. Dr. University Koblenz Landau GESIS Leibniz Institute for the Social Sciences Last Time:
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,
More informationMath 221, REVIEW, Instructor: Susan Sun Nunamaker
Math 221, REVIEW, Instructor: Susan Sun Nunamaker Good Luck & Contact me through through e-mail if you have any questions. 1. Bar graphs can only be vertical. a. true b. false 2.
More informationMath 10 - Compilation of Sample Exam Questions + Answers
Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the
More information9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.
Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More informationTransition Passage to Descriptive Statistics 28
viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of
More informationTest Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research
Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research HOW IT WORKS For the M.Sc. Early Childhood Research, sufficient knowledge in methods and statistics is one
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More informationPhysics 509: Non-Parametric Statistics and Correlation Testing
Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests
More informationTHE SAMPLING DISTRIBUTION OF THE MEAN
THE SAMPLING DISTRIBUTION OF THE MEAN COGS 14B JANUARY 26, 2017 TODAY Sampling Distributions Sampling Distribution of the Mean Central Limit Theorem INFERENTIAL STATISTICS Inferential statistics: allows
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More informationElementary Statistics
Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:
More informationChapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67
Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate
More informationHypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal
Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric
More informationBasics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.
Statistiek I t-tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://www.let.rug.nl/nerbonne/teach/statistiek-i/ John Nerbonne 1/46 Overview 1 Basics on t-tests 2 Independent Sample t-tests 3 Single-Sample
More informationSampling Distributions
Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Remember sampling? Sampling Part 1 of definition Selecting a subset of the population to create a sample Generally random sampling
More informationSTAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis
STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis Rebecca Barter April 6, 2015 Multiple Testing Multiple Testing Recall that when we were doing two sample t-tests, we were testing the equality
More informationChapter 2: Tools for Exploring Univariate Data
Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is
More informationKDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V
KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION Unit : I - V Unit I: Syllabus Probability and its types Theorems on Probability Law Decision Theory Decision Environment Decision Process Decision tree
More informationStatistics Handbook. All statistical tables were computed by the author.
Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012
More informationMATH 10 INTRODUCTORY STATISTICS
MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. It is Time for Homework! ( ω `) First homework + data will be posted on the website, under the homework tab. And
More informationChapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity
More informationMitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.
Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D. The number of cells in various stages of mitosis in your treatment and control onions are your raw
More informationMATH 10 INTRODUCTORY STATISTICS
MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:
More informationExam 2 Practice Questions, 18.05, Spring 2014
Exam 2 Practice Questions, 18.05, Spring 2014 Note: This is a set of practice problems for exam 2. The actual exam will be much shorter. Within each section we ve arranged the problems roughly in order
More informationPsych 230. Psychological Measurement and Statistics
Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State
More informationDo students sleep the recommended 8 hours a night on average?
BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8
More informationCIVL 7012/8012. Collection and Analysis of Information
CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real
More informationReadings Howitt & Cramer (2014) Overview
Readings Howitt & Cramer (4) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch : Statistical significance
More informationTastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?
Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationDealing with the assumption of independence between samples - introducing the paired design.
Dealing with the assumption of independence between samples - introducing the paired design. a) Suppose you deliberately collect one sample and measure something. Then you collect another sample in such
More informationReadings Howitt & Cramer (2014)
Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance
More informationMATH 1150 Chapter 2 Notation and Terminology
MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the
More informationQUANTITATIVE TECHNIQUES
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION (For B Com. IV Semester & BBA III Semester) COMPLEMENTARY COURSE QUANTITATIVE TECHNIQUES QUESTION BANK 1. The techniques which provide the decision maker
More informationUnit 4 Probability. Dr Mahmoud Alhussami
Unit 4 Probability Dr Mahmoud Alhussami Probability Probability theory developed from the study of games of chance like dice and cards. A process like flipping a coin, rolling a die or drawing a card from
More informationInterpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score
Interpret Standard Deviation Outlier Rule Linear Transformations Describe the Distribution OR Compare the Distributions SOCS Using Normalcdf and Invnorm (Calculator Tips) Interpret a z score What is an
More informationNon-parametric (Distribution-free) approaches p188 CN
Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14
More informationThis gives us an upper and lower bound that capture our population mean.
Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when
More informationBasics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations
Basics of Experimental Design Review of Statistics And Experimental Design Scientists study relation between variables In the context of experiments these variables are called independent and dependent
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationSection 5.4. Ken Ueda
Section 5.4 Ken Ueda Students seem to think that being graded on a curve is a positive thing. I took lasers 101 at Cornell and got a 92 on the exam. The average was a 93. I ended up with a C on the test.
More informationTime: 1 hour 30 minutes
Paper Reference(s) 6684/0 Edexcel GCE Statistics S Silver Level S Time: hour 30 minutes Materials required for examination papers Mathematical Formulae (Green) Items included with question Nil Candidates
More informationLast two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals
Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last two weeks: Sample, population and sampling
More information