Data Screening and Adjustments. Data Screening for Errors

Size: px
Start display at page:

Download "Data Screening and Adjustments. Data Screening for Errors"

Transcription

1 Purpose: ata Screening and djustments P etect and correct data errors P etect and treat missing data P etect and handle insufficiently sampled variables (e.g., rare species) P onduct transformations and standardizations P etect and handle outliers ata Screening for rrors P xamine summary statistics (e.g., n, mean, min, max) and check for irregularities Where did all the data go? Unrealistic value? ction: correct errors in the raw data

2 ata Screening for Missing ata P valuate amount and pattern of missing data and take corrective action, if needed: e.g., Median replacement ction: replace with prior knowledge; insert means or medians; use regression to estimate values ata Screening for Sufficiency P heck for and drop insufficient variables <.g., rare species in community datasets Sufficiency is the extent to which each variable, e.g., each species ecological character, is accurately and meaningfully described by the data..g., species with very few records are not likely to be accurately placed in ecological space. You must decide at what level of frequency of occurrence you want to accept the message and eliminate species below this level.

3 ata Screening for Sufficiency P Other issues: < Influence of abundant generalists in community datasets bundant generalists define strong dimensions of the data cloud that have no meaningful pattern on them. They can overwhelm the message of rarer species in some types of analysis. You must decide whether to include or exclude these dominant species. < Variables with too little variation (i.e., no signature) Variables with too little variation have no meaningful pattern (or influence) and are therefore unnecessary. ata Screening for Sufficiency Typical community dataset ominant species 9% occurrence Rare species Median occurrence % occurrence

4 ata Screening for Sufficiency Some Rules of Thumb P rop insufficient variables (species) and conduct sensitivity analysis < Rare species (e.g., <% occurrence) < Too little variability (e.g., <-% V) Too few occurrences? 7 Too little variation? ata Screening for Sufficiency Some Rules of Thumb P rop abundant generalist species and conduct sensitivity analysis < ominant species (e.g., >9% occurrence) Too ubiquitous? 8

5 ata Transformations & Standardizations Purpose: P Statistical < Improve assumptions of normality, linearity, homogeneity of variance, etc. < Make units of variables comparable when measured on different scales. P cological < Make ecological distance measures work better. < Reduce effect of total quantity in sample units, to put focus on relative quantities. < qualize (or otherwise alter) the relative importance of variables (e.g., common and rare species). < mphasize informative variables (species) at the expense of uninformative variables (species). 9 ata Transformations & Standardizations F Log Transformation b ij =log(x ij +) F olumn Z-score Standardization b ij =(x ij - j )/s j F Transformations are applied to each element of the data matrix, independent of the other elements. Standardizations adjust matrix elements by a row or column standard (e.g., max, sum, etc.).

6 Monotonic Transformations When to Transform? P To adjust for highly skewed variables P To better meet assumptions of statistical test (e.g., normality, constant variance, etc.) P To emphasize presence/absence (nonquantitative) signature Which Transformation? P epends on type of data P Whichever works best Monotonic Transformations F 7 b ij =x ij (power) F inary presence/absence Transformation b ij =x ij (power) cceptable omain of x: ll Range of f(x): and only P onverts quantitative data into nonquantitative data P pplicable for species data P Most useful when there is little quantitative information present P an be a severe transformation

7 Monotonic Transformations F b ij =log(x ij +) ? F Log Transformation b ij =log(x ij +) cceptable omain of x: > Range of f(x): ll P ompresses high values and spreads low values by expressing values as orders of magnitude P Useful when high degree of variation; ratio of largest to smallest >; highly positively skewed data Monotonic Transformations Log Transformation b ij =log(x ij +) T?

8 Monotonic Transformations F b ij =x ij ½ (power) F Square Root Transformation b ij =x ij ½ (power) cceptable omain of x: $ Range of f(x): $ P Similar in effect to, but less dramatic than, the log transformation P Often used with count (meristic) data; e.g., when mean equals the variance (Poisson distribution) Monotonic Transformations b 8 Power Transformations p=/ p=/ p=/ p=/ p=/ 7 x 8 9 Power Family Transformation b ij =x ij /p cceptable omain of x: $ Range of f(x): $ P ifferent exponents change the effect of the transformation; the smaller the exponent, the more compression applied to high values P Flexible transformation useful for a wide variety of data

9 Monotonic Transformations Power Family Transformation b ij =x ij /p 7 Monotonic Transformations F b ij =(/π)*sin - (x ij½ ) F rcsin Square Root Transformation b ij =(/π)*sin - (x ij½ ) cceptable omain of x: - Range of f(x): - P Spreads end of the scale while compressing the middle for proportion data P Useful for proportion data with positive skew (can use arcsine transformation for negative skew) 8

10 Monotonic Transformations rcsin Square Root Transformation b ij =(/π)*sin - (x ij½ ) T? 9 Monotonic Transformations Some Rules of Thumb P Use a log or square root transformation for highly skewed data or ranging over several (>) orders of magnitude P Use arcsine squareroot transformation for proportion data P If applied to related variable set (e.g., species), then use same transformation (e.g., log) so that all are scaled the same; otherwise, transform independently P onsider binary (presence/absence) transformation when: < percent zeros high (say >%) < number of distinct values low (say < ) < eta diversity high (say >) S s

11 F b ij =x ij / max(x i ) Standardizations F When to Standardize? P To place on equal footing highly unequal sample units or variables (species) P To better represent the patterns of interest Which Standardization? P epends on objective (sample or variable adjustment) and statistical technique (ordination, cluster, etc.)? P Which standard (variance, totals, max, etc.) makes sense? Standardizations F b ij =(x ij - j )/s j F b ij =(x ij - i )/s i F P Standardizations adjust matrix elements by a row or column standard (e.g., max, sum, etc.). P ll standardizations can be applied to either rows or columns (or both)

12 olumn or Row Standardizations? F 7 olumn Standardization P When the principal concern is to adjust for differences (e.g., variances, total abundance, ubiquity) among variables (species) in order to place them on equal footing. P When the focus is on the profile across sample units. Row Standardization P When the principal concern is to adjust for differences (e.g., total abundance, diversity) among sample units in order to place them on equal footing. P When the focus is on the profile within a sample unit. ommon Standardizations P...divide by margin total P Max...divide by margin maximum P Range...standardize values to range - P Frequency...divide by margin maximum and multiply by number of non-zero items, so that the average of nonzero items is P Hellinger...square root of method=total P Normalization...make margin sums of squares equal P Standardize...scale to zero mean and unit variance (zscores) P hi.square...divide by row sums and square root of column sums, and adjust for square root of matrix total

13 Standardizations F b ij =(x ij - j )/s j F olumn Z-score Standardization b ij =(x ij - j )/s j cceptable omain of x: ll Range of f(x): ll P onverts data to z-scores (mean=, variance=) P ommonly used to place variables on equal footing P ssential when variables have different scales or units of measurement Standardizations F b ij =x ij / x j F olumn Standardization b ij =x ij / x j cceptable omain of x: $ Range of f(x): - P ommonly used with species data to adjust for unequal abundances among species P qualizes areas under curves of species response profiles P Relative abundance profiles of samples depends on species relative abundances across all sites

14 Standardizations F b ij =x ij / max(x j ) F olumn Max Standardization b ij =x ij / max(x j ) cceptable omain of x: $ Range of f(x): - P Similar to column total, except: P qualizes heights of peaks of species response curves P ased on extreme values which can introduce noise P an exacerbate importance of rare species 7 Standardizations qualizes area under curve Frequency.... olumn Standardization Frequency Species Species bundance (count) olumn Max Standardization bundance (count) qualizes peaks of curves Frequency bundance (count) 8

15 Standardizations F b ij =x ij / x i F.. Row Standardization b ij =x ij / x i cceptable omain of x: $ Range of f(x): - P ommonly used with species data to adjust for unequal abundances among sample units P qualizes areas under curves of sample unit profiles P Shifts emphasis to relative abundance within a sample unit P Relative abundance profiles of samples are independent 9 Standardizations F b ij =x ij / max(x i ) F Row Max Standardization b ij =x ij / max(x i ) cceptable omain of x: $ Range of f(x): - P Similar to row total; except: P qualizes heights of peaks of sample unit profiles P ased on extreme values which can introduce noise

16 Standardizations F 7 F b.... ij =col max F b ij =row..total Wisconsin ouble Standardization cceptable omain of x: $ Range of f(x): - P st standardize by species (col) maxima, then by row totals P qualize emphasis among sample units and among species P ppealing, but comes at cost of diminishing the intuitive meaning for individual data values Standardizations Some Rules of Thumb P ffect of standardization on analysis depends on variability among rows and/or columns

17 Standardizations Some Rules of Thumb F 7 P onsider row standardizations for species data sets, commonly: < Row normalize (uclidean distance () = chord distance) (Legendre and Gallagher ) F.. < Row chi.square ( = chi.square distance of /) < Row total ( = species profile distance) < Row hellinger ( = Hellinger distance) Standardizations Some Rules of Thumb F 7 b ij =(x ij - j )/s j F P onsider column standardizations to equalize variables measured in different units and scales, commonly: < olumn standardize (z-scores = zero mean and unit variance) < olumn normalize (uncentered with unit variance) < olumn total (col sums = ) < olumn range (col range -)

18 Standardizations Some Rules of Thumb P Standardizations may not matter depending on subsequent analysis, e.g.,: < Principal components of correlation matrix has built in column standardization < orrespondence analysis of species data set has essentially a built in chi-square standardization P No theoretical basis for selecting the best standardization - should justify on biological grounds and perhaps conduct sensitivity analysis ata Screening for Outliers P What are outliers? < Sample units with extreme values for individual variables (univariate outliers) or sample units with unusual combination of values for more than one variable (mulitvariate outliers). P Why worry about outliers? < Outliers can have a large effect on the outcome of an analysis and therefore can lead to erroneous conclusions.

19 ata Screening for Outliers P Univariate outliers: < xamine sample standard deviation scores on each variable separately. Standard deviation scores > MGO MRO H 8..9 N N 8. N N N 8.7. N.9 8. N N N 8 N N N. 87 N N N N 89 N N N. 9 N N N N 9 N N.7 N xtreme observations 7 ata Screening for Outliers P Multivariate outliers: < xamine deviations of the sample average distances to other samples. Standard deviation scores > xtreme observations 8

20 ata Screening for Outliers P Multivariate outliers: < xamine each sample s Mahalanobis distance to the group of remaining samples. 9 ata Screening for Outliers P Multivariate outliers: < xamine results of subsequent analyses for extreme values (e.g., isolated points in ordination plots, single-member clusters in cluster analysis, etc.) P P P

21 ata Screening for Outliers Some Rules of Thumb P xamine data at all stages of analysis (i.e., input data, transformed/standardized data, ecological distance matrix, results of analysis) for extreme values P e aware of potential impact of extreme values in chosen analysis P elete extreme values only if justifiable on ecological grounds P onduct sensitivity analysis

Algebra of Principal Component Analysis

Algebra of Principal Component Analysis Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

ANCOVA. Lecture 9 Andrew Ainsworth

ANCOVA. Lecture 9 Andrew Ainsworth ANCOVA Lecture 9 Andrew Ainsworth What is ANCOVA? Analysis of covariance an extension of ANOVA in which main effects and interactions are assessed on DV scores after the DV has been adjusted for by the

More information

Algebra 1 Mathematics: to Hoover City Schools

Algebra 1 Mathematics: to Hoover City Schools Jump to Scope and Sequence Map Units of Study Correlation of Standards Special Notes Scope and Sequence Map Conceptual Categories, Domains, Content Clusters, & Standard Numbers NUMBER AND QUANTITY (N)

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Assessing the relation between language comprehension and performance in general chemistry. Appendices Assessing the relation between language comprehension and performance in general chemistry Daniel T. Pyburn a, Samuel Pazicni* a, Victor A. Benassi b, and Elizabeth E. Tappin c a Department of Chemistry,

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download

More information

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance

More information

Math 361. Day 3 Traffic Fatalities Inv. A Random Babies Inv. B

Math 361. Day 3 Traffic Fatalities Inv. A Random Babies Inv. B Math 361 Day 3 Traffic Fatalities Inv. A Random Babies Inv. B Last Time Did traffic fatalities decrease after the Federal Speed Limit Law? we found the percent change in fatalities dropped by 17.14% after

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Boxplots and standard deviations Suhasini Subba Rao Review of previous lecture In the previous lecture

More information

4. Ordination in reduced space

4. Ordination in reduced space Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination

More information

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

4.1. Introduction: Comparing Means

4.1. Introduction: Comparing Means 4. Analysis of Variance (ANOVA) 4.1. Introduction: Comparing Means Consider the problem of testing H 0 : µ 1 = µ 2 against H 1 : µ 1 µ 2 in two independent samples of two different populations of possibly

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III) Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.1.1 Simple Interest 0.2 Business Applications (III) 0.2.1 Expenses Involved in Buying a Car 0.2.2 Expenses Involved

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

CHAPTER 3. YAKUP ARI,Ph.D.(C)

CHAPTER 3. YAKUP ARI,Ph.D.(C) CHAPTER 3 YAKUP ARI,Ph.D.(C) math.stat.yeditepe@gmail.com REMEMBER!!! The purpose of descriptive statistics is to summarize and organize a set of scores. One of methods of descriptive statistics is to

More information

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook BIOMETRY THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH THIRD E D I T I O N Robert R. SOKAL and F. James ROHLF State University of New York at Stony Brook W. H. FREEMAN AND COMPANY New

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

PCA Advanced Examples & Applications

PCA Advanced Examples & Applications PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012 Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Chapter 1 - Lecture 3 Measures of Location

Chapter 1 - Lecture 3 Measures of Location Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What

More information

BNG 495 Capstone Design. Descriptive Statistics

BNG 495 Capstone Design. Descriptive Statistics BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

8. FROM CLASSICAL TO CANONICAL ORDINATION

8. FROM CLASSICAL TO CANONICAL ORDINATION Manuscript of Legendre, P. and H. J. B. Birks. 2012. From classical to canonical ordination. Chapter 8, pp. 201-248 in: Tracking Environmental Change using Lake Sediments, Volume 5: Data handling and numerical

More information

REVIEW: Midterm Exam. Spring 2012

REVIEW: Midterm Exam. Spring 2012 REVIEW: Midterm Exam Spring 2012 Introduction Important Definitions: - Data - Statistics - A Population - A census - A sample Types of Data Parameter (Describing a characteristic of the Population) Statistic

More information

Y i = η + ɛ i, i = 1,...,n.

Y i = η + ɛ i, i = 1,...,n. Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Chapter 2. Mean and Standard Deviation

Chapter 2. Mean and Standard Deviation Chapter 2. Mean and Standard Deviation The median is known as a measure of location; that is, it tells us where the data are. As stated in, we do not need to know all the exact values to calculate the

More information

Equations and Inequalities

Equations and Inequalities Algebra I SOL Expanded Test Blueprint Summary Table Blue Hyperlinks link to Understanding the Standards and Essential Knowledge, Skills, and Processes Reporting Category Algebra I Standards of Learning

More information

Diversity partitioning without statistical independence of alpha and beta

Diversity partitioning without statistical independence of alpha and beta 1964 Ecology, Vol. 91, No. 7 Ecology, 91(7), 2010, pp. 1964 1969 Ó 2010 by the Ecological Society of America Diversity partitioning without statistical independence of alpha and beta JOSEPH A. VEECH 1,3

More information

Chapter 2 Exploratory Data Analysis

Chapter 2 Exploratory Data Analysis Chapter 2 Exploratory Data Analysis 2.1 Objectives Nowadays, most ecological research is done with hypothesis testing and modelling in mind. However, Exploratory Data Analysis (EDA), which uses visualization

More information

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal and transformations Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Definitions An association coefficient is a function

More information

1.3: Describing Quantitative Data with Numbers

1.3: Describing Quantitative Data with Numbers 1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

Statistics: A review. Why statistics?

Statistics: A review. Why statistics? Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval

More information

Standards for Mathematical Objectives Major & Minor

Standards for Mathematical Objectives Major & Minor Standards for Mathematical Objectives Major & Minor Practice Assessments 1) Make sense of problems and determine if a situation should be modeled by a one or two Mini Quiz 1.1 persevere in solving them.

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

Σ x i. Sigma Notation

Σ x i. Sigma Notation Sigma Notation The mathematical notation that is used most often in the formulation of statistics is the summation notation The uppercase Greek letter Σ (sigma) is used as shorthand, as a way to indicate

More information

Topic 8. Data Transformations [ST&D section 9.16]

Topic 8. Data Transformations [ST&D section 9.16] Topic 8. Data Transformations [ST&D section 9.16] 8.1 The assumptions of ANOVA For ANOVA, the linear model for the RCBD is: Y ij = µ + τ i + β j + ε ij There are four key assumptions implicit in this model.

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

Why is the field of statistics still an active one?

Why is the field of statistics still an active one? Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

Introduction to multivariate analysis Outline

Introduction to multivariate analysis Outline Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence

More information

Algebra I Number and Quantity The Real Number System (N-RN)

Algebra I Number and Quantity The Real Number System (N-RN) Number and Quantity The Real Number System (N-RN) Use properties of rational and irrational numbers N-RN.3 Explain why the sum or product of two rational numbers is rational; that the sum of a rational

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

Ordination & PCA. Ordination. Ordination

Ordination & PCA. Ordination. Ordination Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation

More information

COMMON CORE STATE STANDARDS TO BOOK CORRELATION

COMMON CORE STATE STANDARDS TO BOOK CORRELATION COMMON CORE STATE STANDARDS TO BOOK CORRELATION Conceptual Category: Number and Quantity Domain: The Real Number System After a standard is introduced, it is revisited many times in subsequent activities,

More information

In many situations, there is a non-parametric test that corresponds to the standard test, as described below:

In many situations, there is a non-parametric test that corresponds to the standard test, as described below: There are many standard tests like the t-tests and analyses of variance that are commonly used. They rest on assumptions like normality, which can be hard to assess: for example, if you have small samples,

More information

Pattern Structures 1

Pattern Structures 1 Pattern Structures 1 Pattern Structures Models describe whole or a large part of the data Pattern characterizes some local aspect of the data Pattern is a predicate that returns true for those objects

More information

Noise & Data Reduction

Noise & Data Reduction Noise & Data Reduction Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum Dimension Reduction 1 Remember: Central Limit

More information

Descriptive Statistics

Descriptive Statistics *following creates z scores for the ydacl statedp traitdp and rads vars. *specifically adding the /SAVE subcommand to descriptives will create z. *scores for whatever variables are in the command. DESCRIPTIVES

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

School District of Marshfield Course Syllabus

School District of Marshfield Course Syllabus School District of Marshfield Course Syllabus Course Name: Algebra I Length of Course: 1 Year Credit: 1 Program Goal(s): The School District of Marshfield Mathematics Program will prepare students for

More information

Tennessee s State Mathematics Standards - Algebra I

Tennessee s State Mathematics Standards - Algebra I Domain Cluster Standards Scope and Clarifications Number and Quantity Quantities The Real (N Q) Number System (N-RN) Use properties of rational and irrational numbers Reason quantitatively and use units

More information

Algebra Topic Alignment

Algebra Topic Alignment Preliminary Topics Absolute Value 9N2 Compare, order and determine equivalent forms for rational and irrational numbers. Factoring Numbers 9N4 Demonstrate fluency in computations using real numbers. Fractions

More information

Statistical Concepts. Constructing a Trend Plot

Statistical Concepts. Constructing a Trend Plot Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable

More information

2 Naïve Methods. 2.1 Complete or available case analysis

2 Naïve Methods. 2.1 Complete or available case analysis 2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

SPECIES RESPONSE CURVES! Steven M. Holland" Department of Geology, University of Georgia, Athens, GA " !!!! June 2014!

SPECIES RESPONSE CURVES! Steven M. Holland Department of Geology, University of Georgia, Athens, GA  !!!! June 2014! SPECIES RESPONSE CURVES Steven M. Holland" Department of Geology, University of Georgia, Athens, GA 30602-2501" June 2014 Introduction Species live on environmental gradients, and we often would like to

More information

Pacing (based on a 45- minute class period) Days: 17 days

Pacing (based on a 45- minute class period) Days: 17 days Days: 17 days Math Algebra 1 SpringBoard Unit 1: Equations and Inequalities Essential Question: How can you represent patterns from everyday life by using tables, expressions, and graphs? How can you write

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Advanced Quantitative Data Analysis

Advanced Quantitative Data Analysis Chapter 24 Advanced Quantitative Data Analysis Daniel Muijs Doing Regression Analysis in SPSS When we want to do regression analysis in SPSS, we have to go through the following steps: 1 As usual, we choose

More information

DRAFT EAST POINSETT CO. SCHOOL DIST. - ALGEBRA I MATH

DRAFT EAST POINSETT CO. SCHOOL DIST. - ALGEBRA I MATH Module 1 - Math Test: 10/15/2015 Interpret the structure of expressions. AI.A.SSE.1 * Interpret expressions that represent a quantity in terms of its context. [Focus on linear, exponential, and quadratic

More information

Mathematics. Number and Quantity The Real Number System

Mathematics. Number and Quantity The Real Number System Number and Quantity The Real Number System Extend the properties of exponents to rational exponents. 1. Explain how the definition of the meaning of rational exponents follows from extending the properties

More information

3 GRAPHICAL DISPLAYS OF DATA

3 GRAPHICAL DISPLAYS OF DATA some without indicating nonnormality. If a sample of 30 observations contains 4 outliers, two of which are extreme, would it be reasonable to assume the population from which the data were collected has

More information

Guide Assessment Structure Algebra I

Guide Assessment Structure Algebra I Guide Assessment Structure Algebra I The Common Core State Standards for Mathematics are organized into Content Standards which define what students should understand and be able to do. Related standards

More information

MATRICES. a m,1 a m,n A =

MATRICES. a m,1 a m,n A = MATRICES Matrices are rectangular arrays of real or complex numbers With them, we define arithmetic operations that are generalizations of those for real and complex numbers The general form a matrix of

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

Mathematics Standards for High School Algebra I

Mathematics Standards for High School Algebra I Mathematics Standards for High School Algebra I Algebra I is a course required for graduation and course is aligned with the College and Career Ready Standards for Mathematics in High School. Throughout

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

UNIT 3 CONCEPT OF DISPERSION

UNIT 3 CONCEPT OF DISPERSION UNIT 3 CONCEPT OF DISPERSION Structure 3.0 Introduction 3.1 Objectives 3.2 Concept of Dispersion 3.2.1 Functions of Dispersion 3.2.2 Measures of Dispersion 3.2.3 Meaning of Dispersion 3.2.4 Absolute Dispersion

More information

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra September The Building Blocks of Algebra Rates, Patterns and Problem Solving Variables and Expressions The Commutative and Associative Properties The Distributive Property Equivalent Expressions Seeing

More information

FLORIDA STANDARDS TO BOOK CORRELATION

FLORIDA STANDARDS TO BOOK CORRELATION FLORIDA STANDARDS TO BOOK CORRELATION Florida Standards (MAFS.912) Conceptual Category: Number and Quantity Domain: The Real Number System After a standard is introduced, it is revisited many times in

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Topic 1. Definitions

Topic 1. Definitions S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information